Girders Blog
Notes on building internet applications

Email Basics

Aug 4, 2007

What is email exactly? It is quite different from the view we see in our MUA, (Thunderbird, gmail, mutt, pine, etc.). Let’s review the basic of email mechanics, shall we?

Email consists on an Envelope and the Message Data. The envelope is

The Message Data is the meat of the mail,

Note that the From and To headers in the headers are really informational. It is the envelope recipient list that determines who it was from.

Also, since the From header can be virtually anything, it is easy to spoof the sender or even the full message. Even the recipient return path can be bogus. Therefore, email is not secure as such (except for PGP/GPG signings), and should not be trusted outright.

Since each email server that passes the message along prepends a Received header to the top of the message, you can generally trace its path to see if it came from a source server that is a proper MTA for the sender. Of course, you can only trust the received headers from servers as far back as you trust; they can also be tampered with.

To: you@example.comFrom: me@example.comDate: 20 Jul 2007 09:21:51 -0700Message-ID: <521.1090340511@example.com>Subject: Simple Email MessageLorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenasultrices sem sed urna accumsan cursus.

Content-Type and MIME Types

When you want to create more complex email messages, say with alternative content or attachments, you need to construct your message using MIME containers and body parts.

Each MIME body part (attachment, container, or message version) itself has a small MIME header set to indicate its content-type, encoding, and other information.

Here is an example message that is composed of a text and HTML body alternatives, with an image attachment called out from the HTML version, plus another image as a regular attachment. The structure of the MIME parts is

multipart/mixed (Holds the body part plus attachments)    multipart/alternative (groups the different version of the message body)        text/plain        multipart/related (groups the HTML part with images it references)            text/html            image/jpeg    image/png (attachment)

Here is how this looks in the email message.

To: you@example.comFrom: me@example.comDate: 20 Jul 2007 09:21:51 -0700Message-ID: <521.1090340511@example.com>Subject: Complex Email MessageMIME-Version: 1.0Content-Type: multipart/mixed;        boundary="mm001" This is a multi-part message in MIME format.--mm001Content-Type: multipart-alternative; boundary=mb001--ma001Content-Type: text/plainThis is the plain text body--ma001Content-Type: multipart-related; boundary="mr001" --mr001Content-Type: text/htmlContent-Transfer-Encoding: quoted-printableThis is the <em>HTML</em> body<IMG=20SRC=3D"No%20AttachName"=20alt=3D"Picture=20(Metafile)">--mr001Content-Type: image/jpeg; name="logo.jpg" Content-Transfer-Encoding: base64Content-Description: Picture (Metafile)Content-Location: No%20AttachNameQk2ewgIAAAAAADYAAAAoAAAAJAIAAG4AAAABABgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA////....--mr001-- --ma001----mm001Content-Type: image/png;     name="elroy-jetson.png" Content-Transfer-Encoding: base64Content-Disposition: attachment;    filename="elroy-jetson.png" R0lGODlhMgAvAPcAAAAAAJQAAPfOjP//////////////////////////////////////////////.....--mm001--

Content-Transfer-Encoding

Note the use of quoted-printable in the above HTML segment. Quoted-printable encoding escapes special characters with an equal symbol (=) followed by the 2-character hexadecimal ASCII representation of the character value. For example, any equal symbols in the body are replaced with ”=3D”, where 3D is the hexadecimal representation of the equal symbol in the ASCII collating sequence.

Web browsers do something similar when sending special characters in the URL, but using a percent (%) symbol as the escape symbol.

Quoted-printable also wraps text so lines do not become too long. An equal symbol at the end of the line (=\n) indicates the line is wrapped. Email standards define the maximum length of a line to be 77 (?) characters, but since this is not a hard limit, most email software is flexible about this limit.

Binary files are usually encoded in Base64. The Base64 method maps every 6 bits to a printable character. Ruby has a Base64 helper class

require "base64" enc   = Base64.encode64('Send reinforcements') # -> "U2VuZCByZWluZm9yY2VtZW50cw==\n" plain = Base64.decode64(enc)  # -> "Send reinforcements"

SMTP: How Email is Tranferred

Email is delivered via SMTP, Simple Mail Transport Protocol. This is a simple state-machine which accepts email through a “command line” interface, usually over port 25. Open a telnet connection to any MX(mail exchanger) host on port 25 to try your hand at delivering a mail manually.

Here you can really see that the email envelope is powerful, it requires 3 part of the email message:
220 example.com mailfront ESMTPMAIL FROM: <me@yahoo.com>250 2.1.0 Sender accepted.RCPT TO: <you@example.org>250 OKDATA354 End your message with a period on a line by itself.Subject: Hello thereFrom: Me <me@yahoo.com>I just love SMTP!.250 2.6.0 Accepted message qp 16590 bytes 226QUIT221 2.0.0 Good bye.