Introduction
I was asked some time ago to develop software which involved extracting bodies and subject lines from emails. "Humm ...", I thought, "connect to mail server on 110, send POP3 commands, receive data, sorted!". Indeed, at my first attempt it was a piece of cake: reading emails - no problem. Colleagues working at my company were evangelizing about what we could do: "Yeah mate, we can automatically process emails, no sweat".
Clients would then ask more questions: "can we send it in rich text or HTML?". "Yeah, sure we can!!". "What about processing them automatically?". "Hey - you're talking to the email kings!!". "What about processing multiple attachments, WAV's MP3's JPEG's?". "Ermmm ... can I get back to you on that ...". Wasn't as easy as I'd thought ...
The reason why I found it quite difficult to code initially was mainly due to how MIME is written and how extremely ugly it can look at first glance. Here's a sample, which contains two multipart blocks (I'll explain this later):
Received: by Mailserver
id <A href="mailto:01C3EFF7.990BBDF0@TEST">01C3EFF7.990BBDF0@TEST</A>; Tue, 11 Feb 2003 17:02:00 -0000
Message-ID: <A href="mailto:2CB86919E23ED51191840060080C3DAE02320B76@MAILSERVER">2CB86919E23ED51191840060080C3DAE02320B76@MAILSERVER</A>
From: Desmond McCarter
To: testemail
Subject: FW: my subject
Date: Tue, 11 May 2003 17:01:59 -0000
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----_=_NextPart_000_01C3EFF7.990BD65A"
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_000_01C3EFF7.990BD65A
Content-Type: text/plain;
charset="iso-8859-1"
-----Original Message-----
From: Lisa Cleary [mailto:lisa@cleary.com]
Sent: 11 May 2003 16:17
To: 'Desmond McCarter'
Subject: RE: Test
------_=_NextPart_000_01C3EFF7.990BD65A
Content-Type: application/vnd.ms-excel;
name="test.xls"
Content-Transfer-Encoding: base64
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAEAAAAwQEAAAAAAAAA
EAAA/v///wAAAAD+////AAAAAL0BAAC+AQAAvwEAAMABAAD/////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////8J
CBAAAAYFAP4czQfJQAAABgEAAOEAAgCwBMEAAgAAAOIAAABcAHAADQAAV0ggU21pdGggTmV3cyAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEIAAgCwBGEBAgAAAMABAAA9AQQA
AQD8AJwAAgAOABkAAgAAABIAAgAAABMAAgAAAK8BAgAAALwBAgAAAD0AEgAXB6b/WC+gIzgAAQAA
AAEAWAJAAAIAAACNAAIAAAAiAAIAAAAOAAIAAQC3AQIAAADaAAIAAAAxABoAyAAAAP9/kAEAAAAA
.
MIME (Multipurpose Internet Mail Extensions): A quick and dirty guide
Data that is transferred over the Internet is sent as a collection of bytes (i.e. a collection of 8 bits). This information includes text files, CSVs or even JPEGs or movies. "Hey" you might say "you can't send binary data as a collection of bytes!". Yes you can, with a suitable encoding scheme: using the base 64 algorithm for example (check out the System.Convert.ToBase64String
method in your .NET framework). This information (we're talking in email context) also includes the subject, body and forwarded items. For the client (sending the email) and the server (reading the email) should understand each other and, in order to do that, they must conform (send and receive data) in MIME format.
In the snipped MIME example (above), you can see and easily understand the basic fields:
"From:" - who sent the email, "To:" - who is receiving the email, "Subject:" - the subject of the email and "Date:" - the date/time the email was sent.
The "Content-Type:" determines what type of content the email contains. In a simple text email (i.e. with no attachments) this is normally "text/plain". You can see however (I hope you can anyway) that this email actually contains an attachment: test.xls. Emails that contain attachments have a MIME content type of "multipart/mixed". This means that the email contains data sectioned into multiple parts: the body and attachments (or in this case "attachment") etc. The boundary (boundary="----_=_NextPart_000_01C3EFF7.990BD65A") identifies where these parts start and stop. The body in my example (and in most emails) is the first part of this multipart email. The start of the body identified at the first boundary declaration:
------_=_NextPart_000_01C3EFF7.990BD65A
Content-Type: text/plain;
charset="iso-8859-1"
-----Original Message-----
From: Lisa Cleary [mailto:lisa@cleary.com]
Sent: 11 May 2003 16:17
To: 'Desmond McCarter'
Subject: RE: Test
You can also see from the above MIME text that this part also contains its content type, i.e. the format of the body: text/plain. All parts of a multipart email have their header definitions first, then an empty line, then the actual body.
The next part of this multipart email is the attachment:
------_=_NextPart_000_01C3EFF7.990BD65A
Content-Type: application/vnd.ms-excel;
name="test.xls"
Content-Transfer-Encoding: base64
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAAEAAAAwQEAAAAAAAAA
EAAA/v///wAAAAD+////AAAAAL0BAAC+AQAAvwEAAMABAAD/////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////8J
CBAAAAYFAP4czQfJQAAABgEAAOEAAgCwBMEAAgAAAOIAAABcAHAADQAAV0ggU21pdGggTmV3cyAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg
ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEIAAgCwBGEBAgAAAMABAAA9AQQA
AQD8AJwAAgAOABkAAgAAABIAAgAAABMAAgAAAK8BAgAAALwBAgAAAD0AEgAXB6b/WC+gIzgAAQAA
AAEAWAJAAAIAAACNAAIAAAAiAAIAAAAOAAIAAQC3AQIAAADaAAIAAAAxABoAyAAAAP9/kAEAAAAA
.
Note again, the second and final "multipart part" (i.e. the attachment) start off with the boundary declaration. Also note that the content type is defined, as well as the name and encoding scheme used to convert the attachment, enabling it to be sent over the internet in byte format. You need to take note that had this email had another attachment, then the second attachment (the third "multipart part") would start off with the same boundary declaration and so on.
MIME is in fact object oriented
The first mistake I made when building a POP3 library was to develop it in a language that was unsuitable: C. It took too long to write and did indeed get very dirty. It took about 3 weeks to develop and test my library, whereas in C# it took a day and a half!! The reason for this is that MIME, you can say, is an object oriented format: each part of a multipart email (even the body of a simple text/plain mail + main headers etc.) can be thought of as being objects. This is one of the main reasons why I wrote it in C# (could have used Java or even J2EE but ...).
Code
The code I have written starts off with a class called Pop3Client
. This class is used to instantiate connection to a POP3 server:
Pop3Client email = new Pop3Client("user", "password", "mail.server.com");
You then open the Inbox as follows:
email.OpenInbox();
To go to the first email, then you call the NextEmail()
method, which returns true
if there is a "next email" or false
if no such email exists. There is also a IsMultipart
singleton, which you can use to check and see whether the email has multiple parts (i.e. attachments). Here's an example of how the code might look:
try {
Pop3Client email = new Pop3Client("user", "password", "mail.server.com");
email.OpenInbox();
while( email.NextEmail())
{
if(email.IsMultipart)
{
IEnumerator enumerator = email.MultipartEnumerator;
while(enumerator.MoveNext())
{
Pop3Component multipart = (Pop3Component)
enumerator.Current;
if( multipart.IsBody )
{
Console.WriteLine("Multipart body:"+
multipart.Body);
}
else
{
Console.WriteLine("Attachment name="+
multipart.Name);
}
}
}
}
email.CloseConnection();
}
catch(Pop3LoginException)
{
Console.WriteLine("You seem to have a problem logging in!");
}
I have also implemented other functionalities within this class library which includes saving attachments (currently done automatically) in their original format, a getter for the filename, extension etc.
Have a look and see what you think: I definitely found it fun to write and explore!!
Des is a Technical Architect working for a private telecoms based company in the United Kingdom. He has been involved in programming for over 14 years and has worked on many platforms including UNIX, Linux and Windows.
Language specialities are C, C++, C#.NET, Java & J2EE and shell scripting (especially on UNIX/Linux). Also enjoys writing and optimising SQL scripts.
Des is engaged to a lovely girl called Lisa!