Yeah, sorry, I don’t normally post geek stuff but I put a decent amount of effort into that and I know a few other web developers read the blog, so I wanted to share. I know the blog has the spirituality theme and I thought about starting a tech-specific blog for this kind of stuff, but I figured I’d just keep it here since it still relates to my life.
the funny part is that i scrolled all the way down, noting all the valid/invalid entries, even though i haven’t the faintest idea about how it all works.
hehe. Basically, regular expressions are a way to match patterns in strings of text. A valid e-mail address has a specific format, like someone@somewhere.com. So, something like ###@.net isn’t valid. The regular expression says that there has to be a name part at the beginning, and there are certain characters (like a through z and 0 through 9) that are valid and others (like #) that aren’t; and then there’s a @, and then a domain with similiar restrictions as the username, then a dot, the the top level domain.
ah! and here i thought your expression was just a random string of symbols. it actually sort of vaguely in a small way makes a teeny bit of sense now. :)
^ means that the string you’re look at should start with whatever follows this symbol
([a-zA-Z0-9_’+*$%\^&!\.\-]) this is checking the username part of the address. it means that the character a-z, A-Z, 0-9, _, ‘, +, *, $, %, ^, &, !, ., and – are valid
+ means the previous block can be repeated (like, aaa instead of just a, or b*+ instead of just b)
\@ means there should be an @ symbol at this point in the string
(([a-zA-Z0-9\-])+\.)+ we’re checking the domain name at this point in the string. this is similar to before, but with a few restrictions because domain names have some more restrictions on them than usernames
([a-zA-Z0-9:]{2,4})+ is similar to before, only with the top level domain (.com, .org, etc)
$ means that the string should end with whatever came before this symbol
It means that the top level domain should be between 2 and 4 characters (.museum works because the m and u are repeated, so it’s only 4 distinct characters)
This site also gives a nice breakdown and expliation of an e-mail address validating regular expression
You might want to look at this project which is actively maintained. It follows the spec to the spot and he has a immense testsuite http://code.iamcal.com/php/rfc822/
1) mdog »» October 19th, 2006 @ 2:25 pm
ow. my brain hurts.
Yeah, sorry, I don’t normally post geek stuff but I put a decent amount of effort into that and I know a few other web developers read the blog, so I wanted to share. I know the blog has the spirituality theme and I thought about starting a tech-specific blog for this kind of stuff, but I figured I’d just keep it here since it still relates to my life.
the funny part is that i scrolled all the way down, noting all the valid/invalid entries, even though i haven’t the faintest idea about how it all works.
perhaps you could add a tech sidebar.
hehe. Basically, regular expressions are a way to match patterns in strings of text. A valid e-mail address has a specific format, like someone@somewhere.com. So, something like ###@.net isn’t valid. The regular expression says that there has to be a name part at the beginning, and there are certain characters (like a through z and 0 through 9) that are valid and others (like #) that aren’t; and then there’s a @, and then a domain with similiar restrictions as the username, then a dot, the the top level domain.
ah! and here i thought your expression was just a random string of symbols. it actually sort of vaguely in a small way makes a teeny bit of sense now. :)
mmmwhhaaattt? (@confused.com)
It helps a lot to break them down. For instance:
^ means that the string you’re look at should start with whatever follows this symbol
([a-zA-Z0-9_’+*$%\^&!\.\-]) this is checking the username part of the address. it means that the character a-z, A-Z, 0-9, _, ‘, +, *, $, %, ^, &, !, ., and – are valid
+ means the previous block can be repeated (like, aaa instead of just a, or b*+ instead of just b)
\@ means there should be an @ symbol at this point in the string
(([a-zA-Z0-9\-])+\.)+ we’re checking the domain name at this point in the string. this is similar to before, but with a few restrictions because domain names have some more restrictions on them than usernames
([a-zA-Z0-9:]{2,4})+ is similar to before, only with the top level domain (.com, .org, etc)
$ means that the string should end with whatever came before this symbol
what’s the {2,4}?
i can’t believe i’m asking this, much less following this whole thing.
:)
It means that the top level domain should be between 2 and 4 characters (.museum works because the m and u are repeated, so it’s only 4 distinct characters)
This site also gives a nice breakdown and expliation of an e-mail address validating regular expression
I could not found the comment foield on the email regex site, so it try it here: Try the perl-Module with the largest regex ever seen: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
http://fightingforalostcause.net/misc/2006/compare-email-regex.php
Thanx
You might want to look at this project which is actively maintained. It follows the spec to the spot and he has a immense testsuite
http://code.iamcal.com/php/rfc822/