Fighting for a Lost Cause.net

Thursday, October 19th, 2006 :: 12:27 PM

Comparing E-mail Address Validating Regular Expressions

Comments:

1) mdog »» October 19th, 2006 @ 2:25 pm

ow. my brain hurts.

2) Ian »» October 19th, 2006 @ 2:33 pm

Yeah, sorry, I don’t normally post geek stuff but I put a decent amount of effort into that and I know a few other web developers read the blog, so I wanted to share. I know the blog has the spirituality theme and I thought about starting a tech-specific blog for this kind of stuff, but I figured I’d just keep it here since it still relates to my life.

3) mdog »» October 19th, 2006 @ 6:19 pm

the funny part is that i scrolled all the way down, noting all the valid/invalid entries, even though i haven’t the faintest idea about how it all works.

perhaps you could add a tech sidebar.

4) Ian »» October 19th, 2006 @ 6:28 pm

hehe. Basically, regular expressions are a way to match patterns in strings of text. A valid e-mail address has a specific format, like someone@somewhere.com. So, something like ###@.net isn’t valid. The regular expression says that there has to be a name part at the beginning, and there are certain characters (like a through z and 0 through 9) that are valid and others (like #) that aren’t; and then there’s a @, and then a domain with similiar restrictions as the username, then a dot, the the top level domain.

5) mdog »» October 19th, 2006 @ 7:48 pm

ah! and here i thought your expression was just a random string of symbols. it actually sort of vaguely in a small way makes a teeny bit of sense now. :)

6) josh »» October 19th, 2006 @ 8:03 pm

mmmwhhaaattt? (@confused.com)

7) Ian »» October 19th, 2006 @ 8:50 pm

It helps a lot to break them down. For instance:

^ means that the string you’re look at should start with whatever follows this symbol

([a-zA-Z0-9_’+*$%\^&!\.\-]) this is checking the username part of the address. it means that the character a-z, A-Z, 0-9, _, ‘, +, *, $, %, ^, &, !, ., and – are valid

+ means the previous block can be repeated (like, aaa instead of just a, or b*+ instead of just b)

\@ means there should be an @ symbol at this point in the string

(([a-zA-Z0-9\-])+\.)+ we’re checking the domain name at this point in the string. this is similar to before, but with a few restrictions because domain names have some more restrictions on them than usernames

([a-zA-Z0-9:]{2,4})+ is similar to before, only with the top level domain (.com, .org, etc)

$ means that the string should end with whatever came before this symbol

8) mdog »» October 20th, 2006 @ 2:36 pm

what’s the {2,4}?

i can’t believe i’m asking this, much less following this whole thing.

9) Ian »» October 20th, 2006 @ 2:56 pm

It means that the top level domain should be between 2 and 4 characters (.museum works because the m and u are repeated, so it’s only 4 distinct characters)

This site also gives a nice breakdown and expliation of an e-mail address validating regular expression

10) Wingi »» May 30th, 2010 @ 4:40 pm

I could not found the comment foield on the email regex site, so it try it here: Try the perl-Module with the largest regex ever seen: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html

http://fightingforalostcause.net/misc/2006/compare-email-regex.php

Thanx

11) Stefan Wallin »» June 4th, 2010 @ 3:00 am

You might want to look at this project which is actively maintained. It follows the spec to the spot and he has a immense testsuite
http://code.iamcal.com/php/rfc822/

pugilists

editorials

inspiration