$Cambridge: hermes/src/prayer/docs/DESIGN,v 1.1.1.1 2003/04/15 13:00:03 dpc22 Exp $ Overview ======== Prayer consists of two master daemons named "prayer" and "prayer-session", plus a small auxiliary server named "accountd" which is used for operations on user accounts such as quota checking and mail redirection which are not normally provided by the IMAP server. "prayer" and "prayer-session" both run on the same gateway system. In fact a "prayer-session" daemon is normally started automatically by the "prayer" frontend server, though it is possible to run the two daemons independently, typically when debugging. The prayer frontend server is a very simple HTTP/1.0 and HTTP/1.1 server (serving icons and login pages only) and HTTP proxy. The frontend server normally preforks a number of child processes which accept incoming HTTP connections: child process are added and retire as required, according to the load of the system and the prayer configuration. Each child process will process a defined number of HTTP connections (where each connection in turn will process a defined number of HTTP requests) and exit. The total number of child processes is capped to prevent denial of service attacks. This cap should not be reached during normal operation. Incoming HTTP connections to each child process time out after a defined amount of time. The timeout period is differs depending on whether the last incoming HTTP request was for a session or an icon URL (terms defined below). The HTTP connect timeouts for a connection last used for icons are typically quite low (e.g: 10 seconds) reflected the fact that there are typically a large number of concurrent requests for new icons when a user first logs in, but few new icons appear after the initial surge. A fresh prayer-session process is forked off whenever a user logs in. This process contains all of the permanent state associated with that login session including one or more connections to a IMAP server and possibly connections to accountd servers. This backend server communicates with the users using HTML over HTTP connections, either directly or using the prayer frontend server as a simple HTTP proxy. This arrangement will be described in much greater detail in falling sections. Backend server processes move into a dormant state after a certain period of inactivity, shutting down IMAP and accountd connections which can be easily resuscitated when the session wakes up. After a long period of inactivity, typically several hours the session process shuts down entirely. The user then has to log back in. This arrangement is rather unorthodox from a Web interface design point of view. However a Webmail interface is an extremely specialised application which contains much more permanent state than most interactive Web pages. This design has the advantage of being extremely efficient, at least compared to a Apache CGI or mod_php interface which maintains no internal session state and which has to reconnect to a database and the IMAP server for each HTTP request. Wing (see ./README) is an Apache/mod_perl application which uses an intermediate maild daemon written in Perl as a glue module which maintains a fair amount of permanent state including a single connection to the IMAP server and an SQL database. One drawback of this design is that the intelligence is split evenly between the frontend Apache/mod_perl and backend maild servers. WING uses a rich but rather adhoc interface protocol between the frontend and backend servers and a single HTTP request typically generates a number of separate protocol requests and consequently context switches between the two halves. In contrast, Prayer can often proxy the HTTP request through unchanged with a single context switch, and can proxy back the HTTP response with a single context switch. And the proxy is eliminated entirely in the direct connection mode of operation. Session Security ================ Prayer starts off with a username and password when a user logs in. A successful login attempt translates this into a session URL which consists of a hostname, username and a crypographically session identifier. The session identifier is the key to security in Prayer. If someone can derive the session URL including the the session identifier then they may be able to break into someone's login session. Prayer provides the following counter-measures. 1) The session-ID is moved from the session URL to a HTTP Cookie if at all possible to stop people from simply reading session identifiers over each others shoulders. 2) People are strongly encouraged to use HTTPS rather than HTTP to prevent sniffers from pick up valid passwords and active session identifiers. The plaintext version of the login screen has a big banner warning. 3) The login session will only accept incoming HTTP(S) connections from the IP address which originated the login address. This doesn't protect people on shared Unix systems, and it is a little bit harsh for people with private systems on dialup links which intermittently disconnect. This behaviour now controlled via the fix_client_ipaddr config option. The Prayer URL address space ============================ There are three basic forms of URL that are in use in Prayer: Login URLS. Example: http://webmail1.hermes.cam.ac.uk/ http://webmail1.hermes.cam.ac.uk/login/dpc22 Icon URLS. Example: http://webmail1.hermes.cam.ac.uk/icons/left.gif Session URLS. Example: https://webmail1.hermes.cam.ac.uk/session/dpc22//list/last Login and icon URL should hopefully be fairly self explanatory. There are a number of modifiers and options which can be provided to either form of URL to alter the behaviour of the user interface. These are documented separately in the URL_OPTIONS file in this directory. Session URLS will require a little further explanation: .../session/dpc22//display/15/452 ^ ^ ^ ^ ^ | | | | +- Arguments | | | +--------- Command | | +----------- SessionID | +---------------- Username +------------------------ Session URL i.e: not icon Session-ID is either an 18 digit (URL friendly) BASE64 encoded value which is a crypographically secure random number or an empty string which indicates that the session ID is stored in a HTTP cookie which is based on the login name, and possibly the port used by the session process (example: "dpc22:5000=FhsFG6754bncdsfhd". Prayer will attempt to store the session-ID automatically in a Cookie if the use of cookies is not disabled in the User Preferences or explicitly by options on the login URL. However the use of cookies is also negotiated with the browser using a number of round trips to set a cookie and then test to find out if the cookie value has been accepted correctly by the browser. Proxy operation =============== In the proxy mode of operation, each prayer-session process binds to a unique Unix domain socket (typically in ${var_prefix}/sockets). HTTP and HTTPS requests to the prayer frontend server which involve session URLs are transparently redirected through to the correct session process: the prayer frontend process acts as a simple HTTP proxy. Something like: Port prayer (prefork master server) | 80 prayer } | [fork()] HTTP(S) 80 prayer } v User Agent ---------+--> 80 prayer } (slave servers) +--> 443 prayer } +--> 443 prayer } <--+ | (Unix domain socket) | +-----------------------------------+ | | HTTP or HTTPS IMAP(S) +---------------> prayer-session ---+----------> imapd | +----------> imapd +------------------> accountd The advantage: This model removes some complexity from the prayer-session process: each prayer-session can be a simple single threaded process which processes a single (single shot) HTTP connection at a time. The large number of independent prayer frontend processes act as a buffer for concurrent input and output requests. The disadvantage: This model involves an extra stage as data is proxy between the prayer frontend and session processes. However this proxy connection is across a fast local network connection. The backend server take advantage of the fact that all incoming requests will come to the frontend server: it can use the various URL shortcut schemes for icons and session URLs which are described above without introducing concurrent access to the prayer-session server. Conclusion: The proxy model represents a more conservative and stable approach than the direct connection mode that is described in the following section. It is currently the recommended mode of operation. Direct operation ================ In the direct connection model, each prayer-session process binds to an Internet domain socket which is running on a unique port. The User Agent is redirected to this port for the login session. The session process effectively becomes a Web server, talking directly to the user agent in question. Something like the following: Port prayer (prefork master server) | 80 prayer } | [fork()] HTTP(S) 80 prayer } v User Agent ---------+--> 80 prayer } (slave servers) | +--> 443 prayer } | +--> 443 prayer } | | HTTP or HTTPS IMAP(S) +---------------> 5000 prayer-session ---+----------> imapd | +----------> imapd +------------------> accountd (5001) Other (5002) User Agent ------------> 5004 The advantage: This should clearly be a little more efficient, as the internal proxy is eliminated. However the direct connection model does have a few issues, which is why this mode of operation is still experimental. Session Timeouts: The proxy approach has the significant advantage that the prayer frontend server can spot sessions that have timed out (because they try to connect to a Unix domain socket which no longer exists) and put some kind of error page which tells the user that their session has timed out and gives them a link back to the initial login page. In contrast, when a session process times out in the direct connection model the socket that it is associated with becomes an orphan. If this socket is shut down then the user agent will be unable to connect at a later time (unless that port is now in use by another login session). Consequently if we want to have some kind of friendly feedback for users who come back to abandoned sessions, then we need some mechanism for managing lists of former and now idle Internet domain sockets. The master prayer-session daemon clearly needs to be involved in this process as the parent process to all of the other session processes. At the moment, the master prayer-session daemon simply collects an arbitrary list of idle Internet domain sockets that are then recycled for new login sessions. This works well for small numbers of idle sockets. It does however involve a large n way select() in the main accept() loop that looks for incoming TCP connections to one of the abandoned Internet domain sockets or alternatively an incoming stream connections indicating a new login request. select() doesn't cope very well with large n-way select()s. If we wanted to use the direct connection model on a live system, then we probably need some way to field these long lists of Internet domain ports off to subsidiary processes whose only function in life would be to listen for incoming connections to abandoned sockets and send out helpful HTML pages which explain that the session has timed out. The master daemon can reclaim sockets from these processes as it needs, or send out permanent shutdown requests after a suitably long time. The prayer-session daemon would clearly want to do some kind of sorting of active ports with a small local cache of idle sockets in order to minimise the amount of thrashing between different processes which could take place. It should also be noted that Prayer itself has no useful means of knowing whether a given idle port was last used with plain-text or SSL connections. The solution adopted at the moment is simply to assign two separate and non-overlapping ranges of port numbers that will be used for SSL and plain-text sessions. This isn't terribly elegant but it is simple, and probably the best that can be done without some ghastly and probably not very portable hackery to try and distinguish HTTP from HTTPS sessions by looking for patterns of text in the first few bytes of the connection. Packet filters: The direct connection model does have a significant disadvantage in that it requires a large number of TCP ports to be available for external hosts. In contrast the proxy model only needs a few ports (typically only port 80 and port 443) to be visible to external systems. This may not matter if Prayer is running on a dedicated system which is providing nothing other than Webmail services. However packet filters and other forms of firewall typically have lists of rules for access to each port and this may not map well onto a system that listens on large ranges of ports. Concurrency: prayer-session processes are single threaded and very stateful: they can only process a single _complete_ HTTP request at a time. The servers can however cope with an arbitrary number of incoming connections and partially complete HTTP requests at the same time. This should handle browsers that open several concurrent connections to the same server and also prevent an obvious denial of service attack where an attacker connects to the the Internet domain socket associated with an active login session, preventing the genuine user from communicating with their login session. In normal use there should only be a single active session URL at a time: prayer-session manages an interactive interface. The one obvious exception is during download (and conceivably upload) of large attachments and mailfolders. It would be rather useful if mail folders and attachments could be downloaded asynchronously while the user proceeded to other parts of the user interface. This is possible in the proxy case if the prayer frontend servers buffer download requests. It would also be possible in the direct connection case if a separate process was forked off for the download. One complication: this only becomes asynchronous once the folder or attachment has been downloaded from the IMAP server. Unfortunately it is rather hard to make IMAP downloads asynchronous without opening concurrent IMAP connections to the same mail folder, which is only practical if the IMAP server and mail folder format used on that IMAP server both support safe concurrent access to mail folders. Icon URLs can reference either the frontend or session servers. References to the frontend server have the advantage that data is much more likely to be cached on the local computers harddisk as they are global and constant. However the URLs are much longer if short URLs are allowed. Compare: http://webmail.hermes.cam.ac.uk/icons/left.gif to "/icons/left.gif" or "icons@left.gif". Short icons do however add a great deal of concurrent access to the session process, especially when it first starts up. Experiments reveal that the prayer engine works just fine. However there appear to be a number of issues with the OpenSSL library and some browsers involving concurrent SSL connections to a single process even if only a single SSL process is active at any time. Further investigation is required.