Email Basics
What is email exactly? It is quite different from the view we see in our MUA, (Thunderbird, gmail, mutt, pine, etc.). Let’s review the basic of email mechanics, shall we?
Email consists on an Envelope and the Message Data. The envelope is
- Return Path: The email address to return the message if undeliverable
- Recipients: The email address(es) of the recipients of the message
The Message Data is the meat of the mail,
- RFC822 Headers such as Subject, From, To, Date, etc. Headers can span multiple lines by starting the subsequent line with a white-space. A blank line signals the end of the header and the start of the body follows. Non-standard header names begin with the “X-” prefix, used to denote experimental features. These usually have meaning only to the software that placed them there.
- Body, text, html, etc. Using the MIME standard, the body can be made to include alternate versions of the message based on markup, and also attach files to the message.
- File Attachments. Usually, binary files are encoded in Base64 no non-printable characters. This increases the file size by about 33% when encoded in the message.
Note that the From and To headers in the headers are really informational. It is the envelope recipient list that determines who it was from.
Also, since the From header can be virtually anything, it is easy to spoof the sender or even the full message. Even the recipient return path can be bogus. Therefore, email is not secure as such (except for PGP/GPG signings), and should not be trusted outright.
Since each email server that passes the message along prepends a Received header to the top of the message, you can generally trace its path to see if it came from a source server that is a proper MTA for the sender. Of course, you can only trust the received headers from servers as far back as you trust; they can also be tampered with.
To: you@example.comFrom: me@example.comDate: 20 Jul 2007 09:21:51 -0700Message-ID: <521.1090340511@example.com>Subject: Simple Email MessageLorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenasultrices sem sed urna accumsan cursus.
Content-Type and MIME Types
When you want to create more complex email messages, say with alternative content or attachments, you need to construct your message using MIME containers and body parts.
- Content-Type: Use this header in your email message to identify the MIME type of your content
- text/plain is the default MIME type of email. This is viewable by all mail clients.
- text/html denotes an HTML formatted body or part. This is only viewable in GUI-based clients that support HTML.
- multipart/alternative is a MIME container that holds the text, HTML or other versions of the main message. Only one of these (the best one it can show) is viewable by the user’s mail client.
- multipart/alternative is a MIME container used for attaching files to the message body. The first part is the body part (which can also be another container), and the rest are attachments.
- multipart/related is a MIME container that wraps included graphics referenced from an HTML body. These graphics are shown “in place”, such as a logo in the letterhead, instead of being seen as attachments.
- The boundary parameter of the Content-Type header is used to provide a unique identifier to define the start and end of the body parts within a MIME container. Lines starting with two hyphens followed by the boundary value is the split point in the message. The final boundary line is the two hyphens, boundary value, and two more hyphens.
- image/png is a PNG graphic file, also a image/gif or image/jpeg could be used.
- application/pdf is a pdf attachment, which could be any application-defined file
- Stir to combine…
Each MIME body part (attachment, container, or message version) itself has a small MIME header set to indicate its content-type, encoding, and other information.
Here is an example message that is composed of a text and HTML body alternatives, with an image attachment called out from the HTML version, plus another image as a regular attachment. The structure of the MIME parts is
multipart/mixed (Holds the body part plus attachments) multipart/alternative (groups the different version of the message body) text/plain multipart/related (groups the HTML part with images it references) text/html image/jpeg image/png (attachment)
Here is how this looks in the email message.
To: you@example.comFrom: me@example.comDate: 20 Jul 2007 09:21:51 -0700Message-ID: <521.1090340511@example.com>Subject: Complex Email MessageMIME-Version: 1.0Content-Type: multipart/mixed; boundary="mm001" This is a multi-part message in MIME format.--mm001Content-Type: multipart-alternative; boundary=mb001--ma001Content-Type: text/plainThis is the plain text body--ma001Content-Type: multipart-related; boundary="mr001" --mr001Content-Type: text/htmlContent-Transfer-Encoding: quoted-printableThis is the <em>HTML</em> body<IMG=20SRC=3D"No%20AttachName"=20alt=3D"Picture=20(Metafile)">--mr001Content-Type: image/jpeg; name="logo.jpg" Content-Transfer-Encoding: base64Content-Description: Picture (Metafile)Content-Location: No%20AttachNameQk2ewgIAAAAAADYAAAAoAAAAJAIAAG4AAAABABgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////....--mr001-- --ma001----mm001Content-Type: image/png; name="elroy-jetson.png" Content-Transfer-Encoding: base64Content-Disposition: attachment; filename="elroy-jetson.png" R0lGODlhMgAvAPcAAAAAAJQAAPfOjP//////////////////////////////////////////////.....--mm001--
Content-Transfer-Encoding
Note the use of quoted-printable in the above HTML segment. Quoted-printable encoding escapes special characters with an equal symbol (=) followed by the 2-character hexadecimal ASCII representation of the character value. For example, any equal symbols in the body are replaced with ”=3D”, where 3D is the hexadecimal representation of the equal symbol in the ASCII collating sequence.
Web browsers do something similar when sending special characters in the URL, but using a percent (%) symbol as the escape symbol.
Quoted-printable also wraps text so lines do not become too long. An equal symbol at the end of the line (=\n) indicates the line is wrapped. Email standards define the maximum length of a line to be 77 (?) characters, but since this is not a hard limit, most email software is flexible about this limit.
Binary files are usually encoded in Base64. The Base64 method maps every 6 bits to a printable character. Ruby has a Base64 helper class
require "base64" enc = Base64.encode64('Send reinforcements') # -> "U2VuZCByZWluZm9yY2VtZW50cw==\n" plain = Base64.decode64(enc) # -> "Send reinforcements"
SMTP: How Email is Tranferred
Email is delivered via SMTP, Simple Mail Transport Protocol. This is a simple state-machine which accepts email through a “command line” interface, usually over port 25. Open a telnet connection to any MX(mail exchanger) host on port 25 to try your hand at delivering a mail manually.
Here you can really see that the email envelope is powerful, it requires 3 part of the email message:- Return Path (MAIL FROM)
- Recipients: (RCPT TO)
- Message: (DATA), which is ended by a single period on the last line.
220 example.com mailfront ESMTPMAIL FROM: <me@yahoo.com>250 2.1.0 Sender accepted.RCPT TO: <you@example.org>250 OKDATA354 End your message with a period on a line by itself.Subject: Hello thereFrom: Me <me@yahoo.com>I just love SMTP!.250 2.6.0 Accepted message qp 16590 bytes 226QUIT221 2.0.0 Good bye.