PHP Mail Injection Protection and E-Mail Validation - A Beginner's Tour

M-Badger

4.89/5 (5 votes)

Jul 27, 2012

CPOL

26 min read

62383

1343

A tour of various methods for protecting against PHP mail header injection and e-mail validation

Download example source - contact.zip - 3.7 KB

Contact Form Example

Introduction

There's a lot of advice available on the subject of protecting a PHP based contact form from injection attacks, (slightly) different approaches plus various ready-made functions and classes to help you. Since I am new to this, (the article is for beginner's and I am in that category myself), I decided to take a more in depth look at as much of this advice as I could and in particular to look at the source code for a few of the ready-made solutions.

In doing so I am confident that the solution I have chosen will work well for me and more importantly I know why I chose that option and what its benefits and limitations are. Hopefully you can also get to that stage by reading this. I’m happy to receive as many improvements as you can throw at me; this is about learning and understanding. I’m no expert; this is designed for beginners (like me). Just don’t tell me it’s wrong and I’m dumb, that won’t help improve the article or the advice, please try and explain it in ‘for dummies’ mode so I can adapt the article appropriately.

This is also PHP focussed; no doubt other languages and mixes of languages provide other options. PHP is what I was using / learning when I got into this mess in the first place.

Summary
Terminology
Off The Shelf Solutions
Sanitisation
Validation
PEAR QuickForm2
Wrapping it Up
- When it’s Not Critical
- When it is Critical
Points of Interest
History

Summary

If you don't want to read all of this, (it is quite long), and your situation is not critical (e.g. it's not an e-commerce site) then I would recommend using PEAR MAIL and PEAR HTML_QuickForm2. You'll get good protection from an injection attack and decent validation that your user-supplied e-mail addresses are well formatted (i.e. they have provided something that at least looks like a real address and isn't just random characters) plus you can easily achieve a good feedback mechanism for typos etc.

I've provided an example of how you might setup a contact form using PEAR Mail and PEAR HTML_Quickform2 that should be good for non-critical situations - I have no doubt it can be improved upon, refactored etc. (since I'm not a professional developer) but it definitely meets the 'it works' criteria. It's also a bit of an ugly duckling, I've added some styling but not much, that would be for you to do to match it with the rest of your site.

The one other thing to note about PEAR Mail is that it was written for PHP 4. With PHP 5 you will get E_STRICT errors; this happens because in several cases the PEAR Mail Mail class makes static method calls to non-static methods, so to supress them you need to set PHP error reporting to E_ALL (which I've done in the example). If you need strict error reporting elsewhere on your site, remember to set it back to E_STRICT. Read more on that bug here.

Terminology

These two terms crop up regularly and everyone implicitly seems to agree what they mean.

Sanitise: the process by which you clean-up or reject user-supplied data (e-mail address, subject etc.) before attempting to send e-mails (via PHP) on the basis of that user-supplied data.

Validation: the process by which you confirm that user-supplied data (e.g. en e-mail address) is actually an e-mail address and not a random string.

Sanitising is the part that protects you from being abused by the miscreants I referred to earlier, this is the critical step. Validation ensures your users give you sensible data and may not be necessary in your circumstances (do you care if they tell you their e-mail address is flobble rather than bobby.jones@locker.com?).

Off The Shelf Solutions

Why not just use PEAR Mail or PHP Mailer? There are almost certainly other open source PHP classes that do the same job; I didn’t look beyond those two simply because they seem the most popular. Or use PEAR QuickForm2, which has built-in validation for the form fields before they get anywhere near your mail sending PHP code.

Why not just remove line-endings from the header array? For example, (preg_replace('/[\r|\n]+/', “”, $subject)

Well, that’s what got me confused. Are they all equally good? Are they completely secured? Are some better than others. I had a look at the code and that didn’t help since they all implement the sanitisation and validation in different ways – what, exactly, is the best approach?

Well, as far as I can gather the most simple of those, strip line-endings from the header array, would work just fine and dandy, only I think you’d need to accept that the user-input could crash your beautifully simple code or you’d need to add extra fail-safe handling so your users aren’t left staring at a garbled PHP error message – and whilst doing that is smart anyway, it’d probably be used more often than you want!

Sanitisation

The sub-sections below look at the different approaches taken to sanitising user-input by PEAR Mail and PHP Mailer and also look at the suggestions provided by a couple of websites that are pretty representative of much of the advice that is out there. The key message, if you don't want to read the detail, is that you should never trust user-input and as an absolute minimum should remove any line endings present in the header information that the PHP mail function requires. Personally I am likely to use PEAR Mail to do this for me rather than doing it myself; it looks like the most robust implementation, read-on if you want to know why.

PEAR Mail

Let’s look at sanitising first, since that’s the most critical one. And let’s see what the big open-source boys and girls do, just to get us started. First PEAR Mail’s Mail.php.

function _sanitizeHeaders(&$headers)
{
    foreach ($headers as $key => $value) {
        $headers[$key] = preg_replace('=((<CR>|<LF>|0x0A/%0A|0x0D/%0D|\\n|\\r)\S).*=i', null, $value);
    }
}

OK, so PEAR Mail relies on stripping line endings but has a number of different options for how a line-ending is identified.

The main PEAR Mail Mail class and the various extra files that PEAR Mail uses, which contain additional functionality to be pulled in via include all have a comment from 2006 that say ‘Guard against email injection exploits by sanitizing user-supplied headers. This is done by stripping everything in a header value which might be interpreted as a header separator.’ Each function that has implemented this then refers back to the _sanitizeHeaders function shown above.

PHP Mailer

Then let’s see what PHP Mailer’s class.phpmailer.php does.

protected function AddAnAddress($kind, $address, $name = '') {
    ...
    ...
    $address = trim($address);
    $name = trim(preg_replace('/[\r\n]+/', '', $name)); //Strip breaks and trim
    ...
    ...
}

This protected method is used to add To, cc, Bcc and ReplyTo header items, identified via the $kind variable. So, PHP Mailer first strips leading and trailing spaces and then removes the line endings, but in a slightly different way to PEAR Mail. It’s worth noting that PHP Mailer also has another function called SecureHeader which also trims leading and trailing spaces and removes line endings, but using str_replace rather than preg_replace. It gets called to manage the safety the subject header item, but what about the From address I hear you say? Yup, that uses the same code as the AddAnAddress method shown above, but in a different method. Genius, some re-factoring in order I do believe (and I’m a beginner!)

public function SecureHeader($str) {
    return trim(str_replace(array("\r", "\n"), '', $str));
}

PEAR Mail vs. PHP Mailer

The key differences appear to be exactly how they go about removing line endings. Dangerous assumption alert: let’s assume, for now, that they’re equivalent. Yes, I’m ignoring the fact that PHP Mailer is a bit of a mixed up kid, for the purposes of this discussion those various line ending replacements can be considered equivalent.

The other difference is that PEAR Mail looks at the entire header array in one go and PHP Mailer picks individual elements that will eventually contribute to the header array and checks those.

New York PHP – Phundamentals

So let’s look at another source of advice, the New York PHP website and its Phundamentals series where they make their recommendation.

Their first approach is to use a regex with the PHP preg_match function, it is suggested for pre-submit, so checking the user provided data before starting the sending functions. They provide a regex for filtering e-mail addresses and another for the remaining header fields as follows:

Pattern for filtering fields such as names
'/^[a-z0-9()\/\'":\*+|,.; \-!?&#$@]{2,75}$/i';
Pattern for filtering email addresses
'/^[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}$/i';

For example, using the pattern for email addresses, you might do the following:
{$emailPattern = '/^[^@\s]+@([-a-z0-9]+\.)+[a-z]{2,}$/i';
if (!preg_match($emailPattern, $emailFieldToTest)){print 'Please review the email address you entered. There seems to be a problem';
}

Since the suggested code is using 'Does Not Match' (the exclamation mark) I can only assume that what they are saying is that these are the only valid characters for an e-mail address and since new line characters are not included then you can’t be subject to an injection attack. This seems a bit outdated to me since the character set for acceptable domain names etc. is exploding, hence that regex would have to change as times move on and, for example, Chinese characters become permissible parts of domain names etc.; though the article appears to have been written circa 2005 so that’s not a criticism and I would hazard a guess that the author would take a different approach today.

Their second approach looks more familiar and is designed to work post submission.

function safe( $name ) {
    return( str_ireplace(array( "\r", "\n", "%0a", "%0d", "Content-Type:", "bcc:","to:","cc:" ), "", $name ) );
}

It’s good old str_replace again, or a PHP 5 variant str_ireplace in this case. They look for only some of the line ending characters that PEAR Mail does but more than PHP Mailer does.

Dream.in.code, tutorial by codeprada

Let’s look at one more option, an article by codeprada on Dream.in.code.

This one looks similar to the second suggestion by New York PHP with a slight variation.

function sanitize(&$array) {
    foreach($array as &$data)
        $data = str_replace(array("\r", "\n", "%0a", "%0d"), '', stripslashes($data));
    }
}

I’m not certain why it includes the stripslashes function but I think the essence here is that it’s similar to New York PHP’s recommendation. As with PEAR Mail it cycles through an entire array rather than picking specific header elements.

PHP filter_var and FILTER_SANITIZE_EMAIL

One of the things I stumbled into whilst looking into validating e-mail addresses was the PHP filter_var function, there are a whole load of constants you can use with it to auto-magically filter variables and one of those pre-defined constants is FILTER_SANITIZE_EMAIL. This isn’t going to be much use for the other header items but what about the e-mails, maybe that’s all you allow your user to provide and hence all you need to test?

The manual page says this, ‘Remove all characters except letters, digits and !#$%&'*+-/=?^_`{|}~@.[]’. Not sure I like that, or at least am not confident in it, similar reasons to the first approach offered by New York PHP and also there’s very little explanation, does 'all characters' incorporate all of the current and future character sets, is the second part applied as a regex? I couldn’t find the relevant parts of the PHP source to try and understand. There’s another potentially relevant comment here, FILTER_SANITIZE_EMAIL actually changes the variable you pass to filter_var, so if the user made a typo (rather than an injection attack) and you try and correct it, who knows if your corrected version is what the user intended? I think that it’s far better in this case to provide the user feedback and allow them to correct it.

Sanitisation Summary

So, removing line endings seems like the key, as we noted above, but there are different ways to do it and different extents to which you can go searching for line ending characters. I’m all for testing as many as might ever be possible, it doesn’t seem likely to add much overhead to processing time and as long as it doesn’t cause unexpected failures or false positives then it sounds smart to me.

The other thing I note from looking over all this code is that the PEAR Mail approach (replicated by the Dream.in.code article) of sanitising all headers in a foreach loop looks easier to write in code, easier to review, easier see what is actually happening and easier to maintain.

Summarising what line ending each option each approach tries to take care of:

Approach	Line Ending Character								Spaces
Approach	/n	/r	<CR>	<LF>	0x0A	%0A	0x0D	%0D	Spaces
PEAR	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	-
PHP Mailer	Yes	Yes	-	-	-	-	-	-	Yes
NY PHP	Yes	Yes	-	-	-	Yes	-	Yes	-
Dream.in.code	Yes	Yes	-	-	-	Yes	-	Yes	-

I have no idea whether removing spaces is a smart idea or not, or even if it is useful.

Given that my understanding of PHP at any depth is essentially null, then it’s entirely possible that some PHP setting means that checking for just one set of line ending characters is quite enough since maybe they are all either converted to e.g. /r/n, or maybe str_replace, str_ireplace and preg_match, preg_replace have a way of being give one line ending type and then treating all variations of that in the same way. It would be good to know.

Any which way, I’d say that’s a win for PEAR Mail. It would make me feel more comfortable anyway.

Validation

It’s one thing to know that your website is protected against injection attacks but what about failing gracefully if the user tells you their e-mail address is not_telling_you? You’d have no address to reply to. Same goes if, for whatever reason, the user is allowed to provide the To address (or cc and bcc), you may well want to validate that it is in fact a valid e-mail address before you try sending mail there. Once again there are differences.

We’ll discount the pseudo validation that the New York PHP website provided (it was essentially aimed at sanitisation anyway) – the one that checked for only valid characters since there’s no way of knowing if it’s a properly formed e-mail address, so let’s look at PEAR Mail, PHP Mailer and filter_var (this time with FILTER_VALIDATE_EMAIL).

It’s also worth noting that validate in this context means check that it’s a properly formatted e-mail address, it does not mean check that the e-mail address is a real one, exists and has a human body somewhere that looks at it.

The summary, if you don't want to read the detail, is that it's hard to make a single (and simple) choice that is in some way 'right'. It's even harder if validating the address as a real, properly formatted one is critical to your website. Taking professional advice is advisable if it is critical to you. If it's not critical then a combination of PEAR HTML_QUICKFORM2 and PEAR Mail is a good choice that will give you the minimum of headaches and work in the vast majority of circumstances.

Health Warning

This is a pretty murky area – not murky in a bad sense, just really tough to grasp even the basics since the moment you start digging you uncover all sorts of things that lead you into pretty complex places. For example, which standard defines what is and isn’t an acceptable e-mail address? PEAR Mail has a class called Mail_RFC822 that describes itself as an ‘RFC 822 Email address list validation Utility’. OK, so that’s the standard then - RFC 822, or is it? According to Dominic Sayers, it’s RFC 5321, see his About page on the isemail site. Isemail is a PHP function, hosted on code.google.com, which purports to validate e-mail addresses against RFC 5321.

That class (RFC 822 in PEAR MAIL) and that function (RFC 5321 in IsEmail) are both far too complex for me to even consider commenting on their respective validity or correctness. I have no idea whether RFC 822 or RFC 5321 is what I should be worried about let alone if those two options actually validate exactly what the standard describes or even if those standards are written in a way that makes it possible to understand definitively what is and isn’t valid.

Then if you look at the filter_var function in PHP with the FILTER_VALIDATE_EMAIL constant it gets even more confusing. It uses a regex to validate the e-mail rather than >1000 lines of code that PEAR Mail and IsEmail use, anyone fancy determining if a regex with over 1000 characters does exactly what one of those standards requires, or if that regex is better or worse than >1000 lines of code? And even that regex is a variant on one provided by someone else.

Many of the arguments seem to be around whether or not to allow e-mail addresses with just a TLD to validate, e.g. bob@au, is that ok or not? What about if it’s a local e-mail rather than across domains? Then there’s a bunch of other discussions that I have no intention of trying to understand too deeply any more than I intend to try and understand RFC standards. Just know that this is not simple and you should choose carefully, especially if it’s a registration form on an e-commerce site (see the section on validation and Gmail style addresses below for an example) since you really don’t want to reject valid e-mails addresses provided by potential customers.

PEAR Mail

As noted above, PEAR Mail has implemented as class called Mail_RFC822 that validates e-mail addresses in the header as being consistent with (or not) whatever RFC 822 defines. It does not rely on any of the PHP functionality such as filter_var. It does however use preg_replace for line endings, why this is necessary I don’t know, since the PEAR Mail Mail class itself already does this.

// Unfold any long lines in $this->address.
$this->address = preg_replace('/\r?\n/', "\r\n", $this->address);
$this->address = preg_replace('/\r\n(\t|)+/', ' ', $this->address);

Maybe it has something to do with the way the arrays are handled by the Mail_RFC822 class or maybe it’s just duplicating functionality – or maybe even it’s a minimal sanity check (sanitisation) if that class is ever used without PEAR Mail.

I cannot comment on how effectively it matches the requirements of RFC 822 nor if validating against that standard rather than RFC 5321 is correct, wrong or if it makes any real world difference.

What we could possibly say is that since PEAR Mail is very widely used then if it was kicking out a lot of valid e-mail addresses or was allowing invalid ones through it would probably be heavily commented on as a bug. The only bug you can find that is directly relevant here is this one, requesting the ability to suppress validation for specific cases such as testing environments. There is one comment in there that says ‘The validation function is indeed very unreliable’, I have no idea how valid that critique is - it might be utter nonsense.

PHP Mailer

PHP Mailer has a single, and simple, e-mail validation function as follows.

/**
* Check that a string looks roughly like an email address should
* Static so it can be used without instantiation
* Tries to use PHP built-in validator in the filter extension (from PHP 5.2), falls back to a reasonably competent regex validator
* Conforms approximately to RFC2822
* @link http://www.hexillion.com/samples/#Regex Original pattern found here
* @param string $address The email address to check
* @return boolean
* @static
* @access public
*/
public static function ValidateAddress($address) {
    if (function_exists('filter_var')) { //Introduced in PHP 5.2
        if(filter_var($address, FILTER_VALIDATE_EMAIL) === FALSE) {
            return false;
        } else {
            return true;
        }
	} else {
        return preg_match('/^(?:[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+\.)*[\w\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\|\}\~]+@(?:(?:(?:[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!\.)){0,61}[a-zA-Z0-9_-]?\.)+[a-zA-Z0-9_](?:[a-zA-Z0-9_\-](?!$)){0,61}[a-zA-Z0-9_]?)|(?:\[(?:(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:[01]?\d{1,2}|2[0-4]\d|25[0-5])\]))$/', $address);
    }
}

I think the in-code comments for the function say everything I want to say and we’ll leave it there, other than to note my comments on FILTER_VALIDATE_EMAIL below.

Not at all sure I want to rely on this, unless you want to use a potentially buggy PHP function (but see discussion below, since it might not be that buggy anymore) that falls back to something that’s ‘reasonably competent’ and ‘conforms approximately’. Go for your life, but if valid e-mails aren’t critical then I’d be tempted not to bother with validation and if they are then look elsewhere.

Filter_var and FILTER_VALIDATE_EMAIL

I could simply say that the only thing I want to add about FILTER_VALIDATE_EMAIL beyond what I have noted in the Health Warning section above is that there appears to be a strong dependency on what version of PHP you are using. The regex that sits behind that constant and is used with the filter_var function has been through some changes in recent versions of PHP to fix some bugs, though some folk appear to argue that they weren’t bugs at all. As noted for PHP Mailer, if valid e-mails aren’t critical then I’d be tempted not to bother with validation at all and if they are then look elsewhere, however...

The corollary to that comment is twofold;

In the comments section of the PHP source code it refers to RFC 5321, so if Dominic is right then at least it’s targeting the correct standard and
In the comments for IsEmail someone has used the IsEmail xml based test addresses (which purport to cover all of the RFC 5321 acceptable and not acceptable variants) against FILTER_VALIDATE_EMAIL and it passed all but the one with only a TLD; and some would argue that RFC 5321 does not allow such addresses and some would argue it does.

So maybe it is a good choice? (If you have PHP 5.2+ on your server). Not easy is it!

IsEmail

We mentioned it earlier so it probably deserves its own section. If you decide that you really want validation against RFC 5321 rather than the apparently more popular RFC 822 (regardless of whichever happens to be the right choice) then I haven’t come across anything else purporting to do that without resorting to an impenetrable regex (as FITER_VALIDATE_EMAIL does), though since I can’t penetrate the code of that function it’s essentially an arbitrary difference anyway.

I guess my only other comment is that this isn’t a maintained open-source project, so as it gets older so it risks falling out of date. It might be a good choice in some circumstances though. It would be even better if it was incorporated into PEAR Mail as an alternative to Mail_RFC822 as you could then choose your standard to validate against and the code would be maintained. Or even incorporated into PHP itself perhaps via filter_var, you could have FILTER_VALIDATE_RFC822_EMAIL, FILTER_VALIDATE_RFC5321_EMAIL etc.

Gmail Style

The blurb for IsEmail has an interesting comment about Gmail, apparently it allows bobby.jones@gmail.com to supply his e-mail address to a website requiring registration as follows bobby.jones+codeproject@gmail.com. Exciting! The message would be delivered to the standard mailbox for bobby.jones@gmail.com but you could readily see which website it should be from, so if you get an advert for Viagra sent to the +codeproject version you’d know that they’d either sold your e-mail (surely not!) or that their database had been compromised and could block (or filter) further messages to that variant of your Gmail inbox.

So, which e-mail validation techniques, of those discussed, would validate the Gmail functionality described? I’m not sure it’s easy to answer that question, from my level of knowledge, even if you know that’s valid RFC 5321 how do you know if an RFC 5321 validator actually validates it correctly (and all the other non-obvious variants). As said earlier you don’t want to reject valid e-mail addresses if you have an e-commerce site or one that relies on a large user base to derive income from advertising. Tread carefully.

Other Notes

Looking at the IsEmail blurb (again), this time in the comments section (but by the author) it says this, ‘The latest shift is that browser makers have decided they can ignore the RFCs altogether and redefine what is a valid email themselves. In creating INPUT TYPE="email" they have decided to allow only a simply-validated reduced subset of email address formats. This ad hoc standard effectively redefines what is a valid address in the real world’.

It is not getting any easier around here.

Validation Summary

It’s hard to know what to write. Let’s start simple. Perhaps a good way of doing this is to require users to respond to an e-mail sent to the e-mail address they provide in order to complete their registration? This happens a lot and users are used to it, so it seems a reasonable approach, then as long as you have sanitised your headers you don’t need to validate the e-mail, users do it for you by responding to a message. However:

Users are an impatient lot and I suspect they’ll be less and less willing to put up with this as time goes by, particularly for e-commerce sites, they just want to complete a transaction and anything you put in the way can potentially cause the loss of a customer, it’s perhaps less of a concern for sites like codeproject where the motivation for signing up is different (although from memory I don’t think I had to validate in the case of codeproject);
Helping users identify when they’ve mis-typed their address is a good thing generally, particularly now they’re used to instant feedback on website forms, highlighting omissions and errors, sometimes for security reasons (e.g. a banking site) and sometimes just to be user-friendly.

I would not use double entry as your validation method, is there anyone in existence who doesn’t use cut and paste for this? And what about those sites that prevent you using cut and paste? I hate them, I really do, annoys the hell out of me, don’t do it.

But what if, for whatever reason, e-mail validation is critical to you. How do you choose what tool, class etc. to help you – I can’t imagine you want to roll your own class with 1000’s of lines of PHP! Based on my limited experience and this investigation I’d be tempted to say you could implement your own function, based on that used by PHP Mailer, yes I know I criticised it earlier, but stay with me.

If we assume that RFC 5321 is what you should be validating against and that the comments regarding the improved capability of filter_var and FILTER_VALIDATE_EMAIL are valid – i.e. it does a good job of validating e-mail addresses against RFC 5321 (ignoring the disagreements about TLD’s) then if a site has a recent version of PHP 5 installed (5.2 or higher) you have a good validation mechanism. The change I’d make is to its fall-back option. It goes for a reasonably competent implementation of validating against RFC 822, why not just replace that fall-back with a manual implementation of what filter_var with FILTER_VALIDATE_EMAIL does? The regex is defined in the PHP source code so it shouldn’t be that hard to adopt (I’m no C nor PCRE expert though and certainly not capable of converting it to PHP code). Then you have RFC 5321 style validation whatever version of PHP you’re using.

I have assumed in this summary that RFC 5321 is the right option, PHP core code has gone that way with FILTER_VALIDATE_EMAIL for example. I guess it feels odd that PEAR Mail is using RFC 822, however if I was going to make an assumption (and beginners often have to make many) then that’s the one I’d make. Though I doubt you'd go horribly wrong by using PEAR Mail and I change my mind to that way when I wrap this discussion up :-)

PEAR QuickForm2

I’ve been using this and I began to wonder whether the rules this allows you to apply to form fields, such as the ones shown below, provide some or all of the sanitisation and validation required and how it compares to what we’ve already discussed.

$e_mail->addRule('required', 'Your E-Mail is required', null, HTML_QuickForm2_Rule::ONBLUR_CLIENT_SERVER);
$e_mail->addRule('email', 'Email address is invalid', null, HTML_QuickForm2_Rule::ONBLUR_CLIENT_SERVER);

The immediate point to note is that I don’t know how well this would work against a script based attack on your site, i.e. if someone sent a spoofed http request to your site with manually (and maliciously) created POST or GET information would the QuickForm2 rules work? Or do they only work against a manual approach, e.g. miscreant at a keyboard or spoofed submission?

A cursory test, by adding %0ABcc:myemail@emailhost.com to a from field on a PHP form with the above $e_mail validation rules confirmed it rejects that way of sending e-mails to unintended recipients, but there’s no formal sanitisation, leaves me a bit uneasy. Looking at the docs for QuickForm2 rules, particularly the $e_mail rule also leaves me feeling uneasy; it looks like a very partial implementation of what we have been discussing.

Doesn’t look like a quick and easy solution. Particularly as I don’t know how you’d apply sanitisation rules to other fields such as a Subject: or a To:, at least without rolling your own rule, which QuickForm2 allows you to do, not necessarily my idea of fun.

Wrapping it Up

Steer clear of FILTER_SANITIZE_EMAIL in most cases, unless you are happy for it to modify your user input and ignore the possibility of simple typos.

Use PHP Mailer (in its current form) with caution.

If you are going to roll your own then know what the heck you are doing since it seems very easy to leave a gaping hole in your armoury that some miscreant might well find a way through and bingo, you are no-one’s friend, potentially facing a big band-width bill, being blacklisted, being blocked by your own hosting provider etc. etc.

When it’s Not Critical

That describes 99.999% of the scenarios I’m ever likely to end up in (and that’s a conservative estimate). In this case I think the PEAR Mail sanitisation would be my preference and I’ll live with the validation step being RFC 822 (and a lack of knowledge on my part of how effectively that validation has been implemented). I don't imagine the difference between RFC 822 validation and RFC 5321 validation is going to cause me major headaches. I may of course be wrong but since PEAR Mail is widely used I think I'm probably safe with that assumption.

I may also use the QuickForm2 rules since I can cope with the odd false positive and I like the way it gives user feedback – though if you're not using QuickForm2 you could probably dispense with this step or cook up your own JavaScript to do the user feedback for you, maybe jQuery does it already (but why bother, to be using PEAR Mail you must have PEAR installed, so just install QuickForm2 as well).

But please do use something to sanitise your headers!

When it IS Critical

This is a bit tougher. If PHP Mailer sorted out its inefficient and clunky sanitisation implementation and you’re using a recent version of PHP then it would look pretty tempting. If you have an older PHP installation then if PHP Mailer replaced the validation fall-back with one designed for a robust implementation of RFC 5321 then it would seem a pretty good option. You’d be pretty safe from an injection attack and would minimise the risk of false positive validation failures.

Given that PHP Mailer doesn’t do either of those things then I’d be looking for much more knowledgeable advice than I can provide!

Points of Interest

I love open source. I hate trying to understand code written by n different people.

History

Version 5: Minor Update
- Updated Introduction, improve focus and clarity
Version 4: First Release of Original Article
Versions 1 - 3: Pre-Release editing

PHP Mail Injection Protection and E-Mail Validation - A Beginner's Tour

Introduction

Table Of Contents

Summary

Terminology

Off The Shelf Solutions

Sanitisation

PEAR Mail

PHP Mailer

PEAR Mail vs. PHP Mailer

New York PHP – Phundamentals

Dream.in.code, tutorial by codeprada

PHP filter_var and FILTER_SANITIZE_EMAIL

Sanitisation Summary

Validation

Health Warning

PEAR Mail

PHP Mailer

Filter_var and FILTER_VALIDATE_EMAIL

IsEmail

Gmail Style

Other Notes

Validation Summary

PEAR QuickForm2

Wrapping it Up

When it’s Not Critical

When it IS Critical

Points of Interest

History