Status: This HOWTO is pretty much uncompleted. Any help completing it would be greatly appreciated. Overview: This is an attempt at a rather complicated peice of documentation which will hopefully do a good job at explaining email messages, and how EPS interfaces with them. If anything is wrong with this document, please email vol@inter7.com with changes to be made. Contents: 1.0 Structure of an email message 1.1 Headers 1.2 Structured headers 1.3 Message body 1.4 MIME structures 2.0 The EPS interface 2.1 The four basic steps 2.2 Sample code [1.0] Structure of an email message An email message is broken up into several parts which are seperated by 'line breaks'. Line breaks are defined as a blank line. EPS considers blank lines, and MIME boundaries (which will be explained later) as breaks in the message. Headers Message Body Using the above structure, we come to a sample email message Date: Tue, 03 Oct 2000 14:01:44 -0500 Return-Path: From: To: Joe Subject: Test This is just a test. As you can see, all headers appear one after the other until a line break is encountered. After this, everything is considered to be the 'body' of the message. [1.1] Headers Headers define where the message came from, how to display it, and various other important peices of information. Headers can be generated by the mail client, and the servers it passes over. From: The format for a header is
: RFC2822 says that the maximum length of an email line should be 998 bytes long excluding the CRLF, and the maximum characters per line in an email should be no more than 78. A problem arises when a header, such as the Received header, contains data exceeding the maximum display length. This is where 'rolling' takes place. Also, many clients, such as IE, and Netscape allow you to break the total maximum length of a header. EPS has been updated to allow for any line length. Received: from securityfocus.com (mail.securityfocus.com [66.38.151.9]) by lists.securityfocus.com (Postfix) with SMTP id DEF3F24C3FE for ; Tue, 8 May 2001 09:33:19 -0600 (MDT) The format for rolling is
: [1.2] Structured headers A structured header is a header which consists of more than one peice of data, which are usually refered to as 'atoms'. Atoms (as recognized by EPS) are seperated by semi-colons, and can contain any number of information types. Content-type: mixed/multipart; boundary = "------------1DC210F4223B21A1894542BF" This very important header has two parts: mixed/multipart boundary = "------------1DC210F4223B21A1894542BF" This specific header specifies that the follow email body will be contain MIME attachment information, and tells the receiving end where the attachments begin and end by specifying a boundary. [1.3] Message body The message body is generally where the text of a message appears. If the message is multiparted, and contains an inline, displayable attachment type, then the message body is ignored and any inline attachments should be displayed. [1.4] MIME structures MIME structures are basically in the same format as a simple email message. They contain headers, followed by a break, and finally the contents of the MIME message. The difference is that MIME attachments follow a 'boundary' and end with a boundary. A boundary may be defined in the Content-type header of the email message. Content-type: mixed/multipart; boundary = "------------1DC210F4223B21A1894542BF" Again, this tells the client that the message has multiple parts, and where the parts begin and end. [2.0] The EPS interface EPS is a set of API calls which allow you to more easily understand the contents of an email message. The fact that email messages are rather complex, means that EPS cannot jump around the email randomly and be expected to understand it's content. In general practice, you must run through the email line by line with EPS or it won't understand specific things like MIME attachments. [2.1] The four basic EPS steps There are basically four steps you must take when going through the contents of an email message with EPS API. Keep in mind, during some processing, you won't need some of these calls. eps_begin(): Syntax: eps_begin(Interface, Arguments); Returns: eps_t structure on Success, NULL on failure Interface: INTERFACE_STREAM Arguments: Pointer to file descriptor INTERFACE_BUFFER Arguments: Pointer to line_t structure with buffer to parse as email This function initializes EPS and tells it where it should pick up the message from. eps_next_header(): Syntax: eps_next_header(); Returns: struct header_t * This function returns the next header available. If the header it received was invalid, it returns an empty structure. Otherwise, if no more headers are available, it returns NULL. eps_next_line(): Syntax: eps_next_line(); Returns: unsigned char * This function returns the next line available eps_end(): Syntax: eps_end() Cleans up everything EPS allocated/created. If the INTERFACE_BUFFER interface is in use, EPS will not free the buffer passed to it via the line_t structure. [2.2] Sample code The following is a peice of code which uses the above four steps (plus some MIME handling code not yet discussed) to run through any email provided via stdin. #include #include int main(int argc, char *argv[]) { int fd = 0, ret = 0; unsigned char *l = NULL; struct mime_t *m = NULL; struct eps_t *eps = NULL; struct header_t *h = NULL; fd = 0; eps = eps_begin(INTERFACE_STREAM, (int *)&fd); if (!eps) return 1; for (h = eps_next_header(eps); h; h = eps_next_header(eps)) { if ((h->name) && (h->data)) printf("[%s] = [%s]\n", h->name, h->data); eps_header_free(eps); } printf("\n"); for (l = eps_next_line(eps); l; l = eps_next_line(eps)) printf("%s\n", l); printf("\n"); while((!(eps->u->b->eof)) && (eps->content_type & CON_MULTI)) { ret = mime_init_stream(eps); if (!ret) break; for (h = mime_next_header(eps); h; h = mime_next_header(eps)) { if ((h->name) && (h->data)) printf("[%s]=[%s]\n", h->name, h->data); header_kill(h); } printf("\n"); for (l = mime_next_line(eps); l; l = mime_next_line(eps)) printf("%s\n", l); } eps_end(eps); return 0; } Given the following email: From: To: Subject: test This is a test. ..and the following command line: # cat email | ./sample ..you should see the following: [From] = [] [To] = [] [Subject] = [test] This is a test. Given the following email: From: To: Content-type: multipart/alternative; boundary=x Subject: test This is a MIME message --x Content-type: text/html HTML version --x Content-type: text/plain Text version --x-- ..and the command line: # cat email | ./sample ..you should see: [From] = [] [To] = [] [Content-type] = [multipart/alternative; boundary=x] [Subject] = [test] This is a MIME message [Content-type]=[text/html] HTML version [Content-type]=[text/plain] Text version