Icontem

Did You Mean Advanced Email Validation in Node.js and PHP - E-mail validation package blog

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  All package blogs All package blogs   E-mail validation E-mail validation   Blog E-mail validation package blog   RSS 1.0 feed RSS 2.0 feed   Blog Did You Mean Advanced...  
  Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)  

Author: Manuel Lemos

Posted on:

Package: E-mail validation

When you take users' email addresses, for instance in a site sign-up form, there are great chances that the addresses may be incorrect because of a typing mistake or it is not possible to deliver the message to the specified address for some reason.

This e-mail validation package can detect and prevent that users enter incorrect addresses even before you accept them.

Read this article to learn how to use this package to detect incorrect email addresses and get suggestions for the correct addresses so the users can fix them with a single click.

This article is about doing it with Node.js but it also points to a similar solution to achieve the same in PHP.




Contents

The Importance of Valid E-mail Addresses

6 Forms of Invalid E-mail Addresses

Using the Node.js Email Validation to Prevent Accepting Invalid Email

Did You Mean Email Correction Suggestions Addresses

Dealing with Late Invalid Email Bounces

Doing Email Validation in PHP

Simplifying Getting User Email Addresses Using OAuth

Conclusion


The Importance of Valid E-mail Addresses

Many Web sites that provide services to their users start the relationship by asking them to fill a registration form.

The simplest registration form asks the user his e-mail address because it serves as means to contact the users when there is something relevant to them.

If for some reason the email address the user provides is incorrect, the site will not be able to contact the user.

That is why every site should validate the email addresses the users enter, or else the site may not contact the users for important things, like sending email notices, or let the users recover forgotten passwords.

6 Forms of Invalid E-mail Addresses

There many reasons why users enter invalid email addresses. The Email Validation package can help your site preventing to accept such email addresses.

1. Typing Mistakes and Fake Domains

Users are humans, therefore they make mistakes. Some forms of invalid email addresses that have typing mistakes can be easily detected using a simple regular expression.

However, there are common typing mistakes that are impossible to detect using a regular expression because the incorrect address that the user typed could eventually still be valid. That is the case for instance when users switch the order of letters of the domain. The resulting domain name is incorrect but it may still exists.

Some users also attempt to enter fake domains in the hope they are not required to enter a valid email address. So they type something randomly in their keyboards. The problem is that some of those random domains the users type actually exist, although they are not in use.

This package can detect typing mistakes of common email domains like for instance gamail.com, hotmali.com, yaho.com, reddiffmail.com, etc.. It can also detect common fake domains like foo.org, asdf.com, etc..

2. Temporary Domains

Some users can get email directly in their machines by signing up to services, like for instance dyndns.com or no-ip.com, that create temporary domains associated to the IP address that their machines get from the local Internet access service provider.

The problem of these domains is that they are only valid while the user is logged in those services. If the user logs out or stops using those services, you may not be able to contact them again via the temporary email addresses that they create using the temporary domains.

Messages sent to temporary emails will eventually bounce once the user stops using those addresses. But since the user may become out of reach after a while, it is better to not accept email addresses sent to temporary domains from the start.

3. Disposable Mailboxes

Some times users need to register to sites that they do not trust for some reason. So instead of giving the sites their real email addresses, they give email addresses of services that create temporary mailboxes that they can check arriving email without having to create a new account or a new password.

While this user attitude is understandable, sites that accept registrations of users using disposable email addresses have to face several problems. One of them is that they will not be able to contact the users via email to send important notices in the future.

Another problem is that disposable email services usually do not bounce email messages after the users are no longer checking them. This may cause waste of bandwidth and CPU resources for sites that send periodic newsletters to users that sign-up.

If messages sent to disposable mailboxes would bounce after a while, the sites could catch the bounced messages and stop wasting resources sending newsletters to users that are not reading them anyway.

So it is better to not accept disposable email addresses from the start to avoid these problems.

4. Spam Traps

Many spammers get their victims email addresses by harvesting Web sites and other Internet resources to scrape email addresses.

Spam traps are fake email addresses created to feed email scrapers in order to detect spammers that are harvesting email addresses in Web sites. These fake email addresses are also known as honey pots, as they are meant to bait email scrapers that are collecting emails to send spam to the users without their permission.

When scraped addresses are used to send messages to spam traps, the source mail servers become blacklisted as spam sources.

Although the intention is good, spam trap domains can also be fed by malicious users to cause blacklisting innocent sites. This is why most mail services do not trust in blacklists supplied by spam trap services.

There are many thousands spam trap domains. Although it is impossible to find out all of them or even a good part of them, it is always better to reject as much as possible email addresses associated to spam traps in order to discourage malicious users.

5. Rejecting Servers

If an email address is invalid, sending messages to that address will eventually result in messages being bounced. A message may be bounced immediately or later after the message is apparently accepted by the destination email server.

When a message is rejected immediately, it is possible to determine right away if a email address is invalid by simulating the delivery of a message to the destination SMTP server.

The Email Validation package can do this. It simulates a message delivery attempt by implementing all steps of the SMTP protocol attempting to delivery a message, except the last step which is to actually transmit the message body.

If the SMTP server rejects the previous step on which the client informs the recipient email address, then we know that the email address is invalid, or for some other reason the destination server is not accepting messages from the origin computer. In any case, the email address should be considered invalid.

6. Full Mailboxes and Grey lists

Some times users have their mailboxes full of messages in such way that it exceeds their limits. This may prevent the destination servers to accept messages to the full mailboxes temporarily.

Another situation that may cause temporary rejection is the use of grey lists. This is a method used by many servers to defend against spam. It works by not accepting messages on the first delivery attempt, but the message will be accepted in another attempt performed after a while.

This helps SMTP servers to avoid getting spam because some types of spamming software usually only try a single time to deliver messages to one address.

In any case, email addresses that are causing temporary rejections should not be considered invalid. This Email Validation package returns a special validation response to deal with this situation. Applications should handle this response as if the addresses are valid because it is often the case.

Using the Node.js Email Validation to Prevent Accepting Invalid Email Addresses

The usage of this Node.js module to determine if a email address may be valid or not, is very simple. After requiring the emailValidation and the sockets modules, you just create a validation object of the emailValidation module.

var validation, emailValidation;

emailValidation = require('./emailValidation');

emailValidation.socketsModule = './sockets';

var validation = new emailValidation.validation();

Several details may be configured. The localAddress variable defines the sender email address for simulations of message deliveries to the destination SMTP server of the email address to validate.

validation.localAddress = 'localuser@localhost';

If you want to understand what is going on set the debug variable to true. If you want to also see the network communication with the destination SMTP servers, set the debugSockets variable to true. You can customize the way the debugging output is processed by setting the debugOutput variable to a function that will get the debug messages.

validation.debug = true;

validation.debugSockets = false;

validation.debugOutput = console.log;

It is possible that some SMTP servers being tested may not be responding at all. Set the timeout variable to the number of seconds that it would be reasonable to wait for a SMTP server response until the Email Validation module gives up.

validation.timeout = 15;

The types of checks the Email Validation module performs can be configured by the means of separate blacklist and whitelist files.

These are files in the CSV (Comma Separated Values) format. In general they follow the same conventions. Lines that start with a comma are considered to be comments and will be ignored. The other lines should have a given number of columns separated depending on the purpose of each file.

Currently 4 types of blacklist and whitelist files are supported. This package comes with some sample configuration files for you to look at, but here follows more details in case you want to edit them.

emailDomainsWhitelistFile - This the list of domains of email domains always considered to be valid. Email addresses with domains in this file will skip the tests for the email domains and mail servers. This file only has one column per line, which is the name of the whitelisted domain.

invalidEmailUsersFile - This is a list of words that if they are found in the user part of the email address, it will be considered invalid regardless of the email domain part. This file only has one column per line, which is for the user banned words.

invalidEmailDomainsFile - This is the list of domains to always be considered invalid. This file must have three or four columns. The first column is for the domain. E-mail addresses with domain or sub-domain ending in the values specified in this file are considered invalid.

The second column is the type invalid domain. The types can be fake for fake domains, typo for domains that are typing mistakes, disposable for disposable email domains, temporary for temporary domains, or spam trap for domains used by spam traps.

The third column is the type of check to perform. If the it is an empty value, the Email Validation module tries to match the specified value with the end of the email domain. If the type is part, it tries to match parts with the email domain.

The fourth column is only considered when the type of invalid domain is typo. It contains a suggestion to fix the domain that the user may have entered by mistake. So for instance gamail.com fix suggestion is gmail.com.

invalidEmailServersFile - This is the list of addresses of email servers to be considered invalid. Each email domain may have one or more domains or IP addresses that have SMTP servers running to receive the email messages.

This file should have three columns. The first column is the domain to match. The second is the type of invalid domain. It can be any of the types for the invalidEmailDomainsFile.

The third column is the type of check. If the it is an empty value, the Email Validation module tries to match the specified value with the end of the email domain.

If it is ip, it will try to match the IP address of the email server. If it is resolve it will try to match the host name to which the reverse IP address resolves. If the type is part, it tries to match the specified text in any part of the email domain.

validation.emailDomainsWhitelistFile = 'emaildomainswhitelist.csv';
validation.invalidEmailUsersFile = 'invalidemailusers.csv';
validation.invalidEmailDomainsFile = 'invalidemaildomains.csv';
validation.invalidEmailServersFile = 'invalidemailservers.csv';

The actual email validation is performed calling the validate function. Since it may perform asynchronous I/O operations, the result is returned by passing it as parameter to a given callback function.

The callback function result parameter is an object that may contain the valid property set to three possible values: true if the email is valid, false if the email is invalid, and null if was not possible to determine if the email is valid.

If there was an error preventing to determine if the email is valid, the valid property is not set and the result object property error is set to an error message explaining the error that occurred.

validation.validate(email, function (result)
{
  if(result.valid === undefined)
  {
    console.log('Error: ' + result.error);
  }
  else
  {
    if(result.valid === null)
    {
      console.log('It was not possible to determine whether the address '
      + email + ' is valid' + (result.error ? ': ' + result.error : '.'));
    }
    else
    {
      console.log('The address ' + email + ' is ' +
      (result.valid ? 'valid' : 'invalid') + '.');
      if(!result.valid && result.status ===
        validation. EMAIL_VALIDATION_STATUS TYPO_IN_DOMAIN)
      {
        console.log('It may be a typing mistake. ' +
       'The correct email address may be ' + result.suggestions[0] + ' .');
      }
    }
  }
});

Did You Mean Email Correction Suggestions

An additional property named status may be passed in the result parameter passed to the callback function. That property is set to constant values defined in the Email Validation package.

Most of those constants have self-explanatory names. EMAIL_VALIDATION_STATUS_OK (0) is returned when the email is valid. Positive values are returned when the email is invalid. Negative values are returned when it was not possible to determine if the email is invalid.

When the status is EMAIL_VALIDATION_STATUS TYPO_IN_DOMAIN, the result parameter property named suggestions is set to an array of possible email addresses that seem to be the correct address that the user meant to enter.

Currently the suggestions property is only set to one possible fix suggestion but in the future it may return more than one suggestion if it makes sense.

Here is the complete list of possible status values to be returned:

EMAIL_VALIDATION_STATUS OK                       =  0;

EMAIL_VALIDATION_STATUS TEMPORARY_SMTP_REJECTION = -1;
EMAIL_VALIDATION_STATUS SMTP_DIALOG_REJECTION    = -2;
EMAIL_VALIDATION_STATUS SMTP_CONNECTION_FAILED   = -3;

EMAIL_VALIDATION_STATUS BANNED_WORDS_IN_USER     =  1;
EMAIL_VALIDATION_STATUS BANNED_DOMAIN            =  2;
EMAIL_VALIDATION_STATUS FAKE_DOMAIN              =  3;
EMAIL_VALIDATION_STATUS TYPO_IN_DOMAIN           =  4;
EMAIL_VALIDATION_STATUS DISPOSABLE_ADDRESS       =  5;
EMAIL_VALIDATION_STATUS TEMPORARY_DOMAIN         =  6;
EMAIL_VALIDATION_STATUS SPAM_TRAP_ADDRESS        =  7;
EMAIL_VALIDATION_STATUS BANNED_SERVER_DOMAIN     =  8;
EMAIL_VALIDATION_STATUS BANNED_SERVER_IP         =  9;
EMAIL_VALIDATION_STATUS BANNED_SERVER_REVERSE_IP = 10;

Dealing with Late Invalid Email Bounces

When an email is invalid, many types of SMTP servers still accept messages sent to that address. Later they may bounce the message sending it to the return path address.

This case is not handled by this package. However, if your site needs to know if an email is invalid, it may send a validation e-mail message setting the return path address of the message to an address that is handled by a POP3 or IMAP mailbox.

An eventual PHP solution to handle messages sent to mailboxes programmatically is described in this article about receiving and parsing incoming e-mail messages.

Doing Email Validation in PHP

The Node.js Email Validation package is inspired on the PHP Email Validation class. An article similar to this was written to describe how to use that PHP class for Email Validation.

Simplifying Getting User Email Addresses Using OAuth

One of the reasons why users enter invalid email addresses to Web sites is because they do not want to go through the process of entering their real email address and having to wait for the confirmation email message that sites usually send to them. Often the confirmation message does not arrive soon enough, leaving the user very frustrated.

A faster way to get the user email address without going through the usual painful process of email confirmation is to use what is called the social login approach.

This approach consists in allowing the users to identify themselves by using their accounts in social networks or other popular sites like Facebook, Google, Hotmail, etc.. The API of these sites can provide the user email address to your site with the user permission.

Usually the user already has validated the email address with those sites, so there is no need to put the user again through the pain of validating the email address sending a confirmation message.

That is an approach that is used register or login very quickly in the JSClasses and PHPClasses sites. It uses a PHP OAuth client class for this purpose.

Read this article to learn more on how to implement a social login in your sites with this class, as an alternative to request the user email address explicitly. The class itself comes with examples of implementing a social login with many popular sites.

Conclusion

As you may have read, advanced email validation can be a complicated process, but fortunately this package simplifies that process a lot.

The process can be even more efficient depending on how complete are the blacklist and whitelist files provided with this package. If you find more entries to add to those files to make this package more efficient, please share with us by posting a comment to this article.

If you have other questions or comments about this package issues, post a comment now, so those issues can be clarified.


You need to be a registered user or login to post a comment

Login Immediately with your account on:

FacebookGmail
HotmailStackOverflow
GitHubYahoo


Comments:

No comments were submitted yet.




  Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)  
  All package blogs All package blogs   E-mail validation E-mail validation   Blog E-mail validation package blog   RSS 1.0 feed RSS 2.0 feed   Blog Did You Mean Advanced...