5 Regular Expressions Every Web Programmer Should Know
Lists, Programming, Tutorials April 6th, 2007 - 225,530 viewsI’m going to assume you have a basic understanding of regular expressions at this point. If you’re a regex n00b (or /n0{2}b/, as I like to call them), or if you need a quick refresher, check out my previous post on the absolute bare minimum that every programmer should know about regular expressions. You won’t be disappointed.
So, without further adu, here are the five regular expressions that I have found the most useful for day-to-day web programming tasks.
Matching a username
This one’s quite easy, but it’s really invaluable if you’re trying to build a user registration system for a website. We typically want to limit usernames to a restricted set of characters in order to make development easier, and to keep malicious users from spoofing someone else’s name (e.g. replacing a space with multiple spaces or a newline character, which are all displayed the same by a web browser).
Without regular expressions, this would be a tedious exercise that would involve splitting the string into it’s component characters and examining each one individually. With regular expressions, it’s a breeze. First, let’s define what we want to accept, we’ll keep it simple and limit the example to the following characters:
- Alphanumeric characters (letters and numbers)
- The underscore character (_)
We’ll also want to enforce a 3 character minimum and a 16 character maximum length. Here’s the regular expression that matches this fairly standard set of criteria:
/[a-zA-Z0-9_]{3,16}/
If you’re familiar with regular expressions you may have notice something missing at this point - don’t worry, I’ll get to it.
If you’ve read my introductory to regular expressions you should already know how this regex works. First we’re defining a character class that will match any letters (a through z, and A through Z) and any numbers (0 through 9), as well as the _ (underscore) character. Next comes an interval quantifier that tells the regex engine we’ll only match sequences of between 3 and 16 characters. Because the quantifier follows a character class rather than a single character it attaches itself to the entire class, and will match every sequence between 3 and 16 characters so long as each character falls within our restricted character set.
So what’s missing? As it stands our regex will match anywhere within a string. It won’t just match ‘mike_84′, it will also match any ‘%! mike_84&’, which contains several characters we don’t want. What we need are anchors, the ^ (caret) and $ (dollar) characters will anchor our regex to the beginning and end of the string, ensuring that the whole username meets our requirements and not just a portion of it.
So our revised regex will look like this:
/^[a-zA-Z0-9_]{3,16}$/
Here’s a quick PHP code snippet that shows how we can use this regex in production (we could just as easily use perl, java, ruby, or even javascript to do this validation).
function validate_username( $username ) {
if(preg_match('/^[a-zA-Z0-9_]{3,16}$/', $_GET['username'])) {
return true;
}
return false;
}
Matching an XHTML/XML tag
Matching an XML or XHTML tag can be extremely useful if you’re scraping a website for data, or trying to quickly extract information from an XML document. A simple regex to accomplish this sort of extraction follows this form (the word ‘tag’ should be replaced with whatever tag you are looking for):
{<tag[^>]*>(.*?)</tag>}
The question mark following the star turns the start into a lazy quantifier. By default, quantifiers are greedy, meaning they’ll consume as much of the input text as they can. Lazy quantifiers, by contrast, will match as little of the input text as they can. If we used a greedy quantifier in this case, our regex would not work as advertised on an input document like
<tag>item 1</tag><tag>item 2</tag>
Instead of matching a single tag, a greedy quantifier would match up to the final closing tag in the input text.
Here’s a simple PHP function to extract the contents of each matching XML or XHTML tag as an array:
function get_tag( $tag, $xml ) {
$tag = preg_quote($tag);
preg_match_all('{<'.$tag.'[^>]*>(.*?)</'.$tag.'>.'}',
$xml,
$matches,
PREG_PATTERN_ORDER);
return $matches[1];
}
Matching an XHTML/XML tag with a certain attribute value (e.g. class or tag)
This regex is very similar to the last example, except we only want tags with a certain attribute value. This comes in handy when you want to extract a tag with a particular class or ID value, for example. The regex is just slightly more complicated than our previous example (again, replace tag, attribute, and value with whatever you’re looking for):
{<tag[^>]*attribute\\s*=\\s*(["'])value\\\\1[^>]*>(.*?)</tag>}
We use a character class to allow either single or double quotes around our value. The portion of the regex following the value is called a backreference. It will be replaced with whatever is captured by the first set of parenthesis in the expression (either a single quote or double quote). That way we can be sure that the opening and closing quotes match.
Here’s a PHP function that shows how you can extract information form an XHTML document with this regex. The function tags an attribute, value, input text, and an optional tag name as arguments. If no tag name is specified it will match any tag with the specified attribute and attribute value.
function get_tag( $attr, $value, $xml, $tag=null ) {
if( is_null($tag) )
$tag = '\\w+';
else
$tag = preg_quote($tag);
$attr = preg_quote($attr);
$value = preg_quote($value);
$tag_regex = "/<(".$tag.")[^>]*$attr\\s*=\\s*".
"(['\\"])$value\\\\2[^>]*>(.*?)<\\/\\\\1>/"
preg_match_all($tag_regex,
$xml,
$matches,
PREG_PATTERN_ORDER);
return $matches[3];
}
Matching and parsing an email address
This one comes courtesy of Cal Henderson, the programmer behind Flickr and author of Building Scalable Web Sites (a great read). For more information check out Cal’s article on parsing email addresses.
This one’s such a behemoth that it’s easier to digest when broken into it’s component parts. Constructing a regex like this is a bit like describing a grammer in Backus-Naur form (BNF), which is convenient because many of the things we’re trying to match are already described using BNF in their specifications. This is the case for email addresses, which are described in RFC 822. Anyways, here’s a PHP function that will check the validity of an e-mail address:
function is_valid_email_address($email){
$qtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]';
$dtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]';
$atom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c'.
'\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+';
$quoted_pair = '\\x5c[\\x00-\\x7f]';
$domain_literal = "\\x5b($dtext|$quoted_pair)*\\x5d";
$quoted_string = "\\x22($qtext|$quoted_pair)*\\x22";
$domain_ref = $atom;
$sub_domain = "($domain_ref|$domain_literal)";
$word = "($atom|$quoted_string)";
$domain = "$sub_domain(\\x2e$sub_domain)*";
$local_part = "$word(\\x2e$word)*";
$addr_spec = "$local_part\\x40$domain";
return preg_match("!^$addr_spec$!", $email) ? 1 : 0;
}
The ‘\x##’ sequences are hexadecimal character references. It’s just a fancy way of specifying a character using it’s underlying code point (the numerical representation of a particular symbol). Otherwise this is a fairly straightforward, albeit incredibly complex regular expression. I’ll refrain from any further analysis since it’s been done elsewhere.
Tim Fletcher has ported Cal’s original PHP function into Ruby and Perl, if that’s what you’re into..
Matching a URL
Matching a URL is a lot like matching an e-mail address, except that you tend to do it in more controlled, and less critical situations where you can tolerate a few false positives. I use this regex frequently in projects when I need to automatically generate links when a URL is typed in a comment field, for example. Like the email regex, this one’s a doozy, but it’s pretty easy to understand.
I’ve taken advantage of the ‘x’ and ‘i’ pattern modifiers for this regex. Pattern modifiers are tacked onto the end of a regex and change the way the regex engine interprets the expression. The ‘x’ modifier tells the engine to ignore whitespace, except when escaped or used inside of a character class. It also tells the engine to interpret any text following a ‘#’ character outside of a character class as a comment (i.e. ignore it). The ‘i’ modifier makes the regex case insensitive. This can drastically simplify a complicated regex like this one when case doesn’t matter. This regex is derived from one developed by Jeffrey Friedl in his book Mastering Regular Expressions.
{
\\b
# Match the leading part (proto://hostname, or just hostname)
(
# http://, or https:// leading part
(https?)://[-\\w]+(\\.\\w[-\\w]*)+
|
# or, try to find a hostname with more specific sub-expression
(?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \\. )+ # sub domains
# Now ending .com, etc. For these, require lowercase
(?-i: com\\b
| edu\\b
| biz\\b
| gov\\b
| in(?:t|fo)\\b # .int or .info
| mil\\b
| net\\b
| org\\b
| [a-z][a-z]\\.[a-z][a-z]\\b # two-letter country code
)
)
# Allow an optional port number
( : \\d+ )?
# The rest of the URL is optional, and begins with /
(
/
# The rest are heuristics for what seems to work well
[^.!,?;"\\'<>()\[\]\{\}\s\x7F-\\xFF]*
(
[.!,?]+ [^.!,?;"\\'<>()\\[\\]\{\\}\s\\x7F-\\xFF]+
)*
)?
}ix
The comments in this expression are fairly self explanatory, so I don’t think it needs a whole lot of explanation. There are a few things to watch out for though. First, this regex will match some things that are not valid URLs. The regex assumes that any two-letter combination is a valid top-level domain (TLD), which is not the case. It also misses TLDs that were recently added to the IANA list like .travel, .name, and .museum. You can fix this by downloading the latest IANA TLD list and adding any missing TLDs in the list of alternatives mid-way through the expression.
That being said, this regex works great 99.9% of the time. Here’s a quick PHP function that will parse a section of text, replacing any URLs it finds with links. I’m going to assume you’ve set the variable $url_regex to the the above pattern so I don’t have to repeat it here.
function auto_link( $text ) {
$url_regex = ...
return preg_replace( $url_regex,
'<a href="$0"^gt;$0=</a>',
$text );
}
So that’s it. If you think I left something off the list that deserves mention, or if you have any suggestions for improvements, post a comment and let me know.
April 6th, 2007 at 8:37 pm
[...] I’m Mike » Blog Archive » 5 Regular Expressions Every Web Programmer Should Know Says: April 6th, 2007 at 8:32 pm [...]
May 6th, 2007 at 1:27 pm
I think you wrote an excellent article. You didn’t hold back, instead you presented some very good regular expressions not simple ones like zip codes and phone numbers that don’t require much thought. You also posted a number of links that were alright as well (Cal’s and Tim’s websites).
I also smiled when I noticed that you used different delimiters in the regular expressions for matching tags and urls, however I think a number of your readers won’t understand that small detail. So I’ll briefly mention it:
Regular expression delimiters denote when the regular expression starts and ends. Traditionally you will see them as the forward slash, and the contents within are the regex: /example/. However the delimiter is not required to be the ‘/’ character, instead it could be a number of different characters for instance { and }, |, #, , and even some others. The idea is that you choose a delimiter that won’t be needed to match anything inside the regex, therefore you won’t need to escape the delimiter within the expression, adding unnecessary complexity. Briefly lets match a date in MM/DD/YYYY format, you could use:
/^\d\d\/\d\d\/\d{4}$/ …or just… #^\d\d/\d\d/\d{4}$#
As mentioned in the article, you are allowed to place the modifiers (like ‘i’ for case insensitive, and ‘x’ for free-form) after the ending delimiter.
I don’t have any links to resources but I believe that the full list of delimiters are any character pairs , (), {}, [], or any non-word character.
You’ve got some really great articles here. I am going to keep on reading, I hope you keep on writing!
Joe P
On a sidenote (you can delete once fixed):
It looks like your ‘\s’ metacharacters (for whitespace) where shortened to just ’s’, therefore messing up a few of the above regexs for tags with attributes and URLs.
May 7th, 2007 at 10:02 am
[...] la continuación de este artículo donde hago un repaso a las expresiones regulares más útiles para tareas de [...]
May 7th, 2007 at 10:21 am
Joe:
Thanks for the explanation. You’re right, that probably could have been clearer. I beleve any character can be used as a delimiter as long as it’s not alphanumeric or a backslash. You can optionally use matching delimiters for characters that have a match: (). {}. []. and <>.
Thanks for pointing out the escaping issue, I somehow overlooked that. Wordpress is a big PITA when it comes to escaping backslashes. I haven’t quite figured out when it does and doesn’t escape them. Please let me know if you notice any missing backslashes, or if one of the regexps doesn’t work (I tested them all, so they all should work as written).
May 15th, 2007 at 5:51 pm
Nice article, but check your examples with this one:
http://blog.php-security.org/archives/76-Holes-in-most-preg_match-filters.html
May 15th, 2007 at 6:18 pm
Good find Soenke.
The username and email validation examples would (sort of) be vulnerable to this kind of attack. A single newline _probably_ wouldn’t harm anything, but it could be a problem if you don’t expect it.
I always trim() user input, so I never really bother to use the /D pattern modifier. Contrary to what some other people were saying in the comments you linked to, I think trim()ing user input (from input boxes, for example) is the proper thing to do about 97% of the time. Particularly for usernames & email addresses, since not doing so would allow users to spoof another person’s name.
Matching before a newline is actually standard behavior for the $ line anchor, since historically it was used by line filters like egrep where it makes sense. In that environment, you want to be able to specify something like /^[a-z0-9]+$/ to match a line with all lower-case alphanumeric characters. It would be less intuitive to do /^[a-z0-9]+\n$/.
May 16th, 2007 at 4:19 am
Hi Mike,
yeah if you use trim() you don’t run into it. The single newline is a problem if it goes into subsystems (shell, sql, header …) and isn’t trimmed() before. I found these holes in my projects, too. I guess it’s a good advise if your examples have the D modifier because many people (escp. in the PHP world are copy’n pasting) and you don’t know if they use trim() :)
May 18th, 2007 at 5:41 pm
Hello Mike,
nice work. WRT ‘Matching a URL’, shouldn’t you also take into account a .co.uk domain like http://www.bbc.co.uk/
Cheers!
May 18th, 2007 at 5:47 pm
Hi Christopher,
Thanks for your comment. The URL regex actually does find country code domains like .co.uk. In fact, it’s a bit too loose. It will find anything that looks like a domain and ends in .xx.xx (where x is any letter, a to z). So it would find http://www.bbc.fa.ke too, which isn’t actually a valid domain (I don’t think). It’s good enough 99% of the time though, and it’s easy to make the modifications necessary for it to be more accurate (it’s just madenningly boring).
May 24th, 2007 at 11:02 am
I mostly agree but it should be said that for serious parsing of HTML/XHTML tags you should really use a parser. Especially if its for input validation. The regex relies on a consistent pattern and HTML can be written inconsistently. In input validation this means there’s a good chance someone can find a way to defeat the regex and input arbitrary code. (XSS attacks and all that!)
Defeating a real parser is much more difficult because it will go character by character. A good parser is also probably much faster than the regex method.
Not saying your method is bad, just that its more suited for smaller scale input that isn’t coming from a user.
May 24th, 2007 at 12:12 pm
Yes, I’d almost always prefer an XML parser over using a regex. I was actually thinking that the tag-related regexes would be useful for screen scraping, where you often don’t have valid XML to run through a parser.
May 26th, 2007 at 3:39 pm
Any reason in your validate_username code why you just don’t
replace the
if(preg_match(…){
return true;
}
return false;
with
return (preg_match(…));
It’s both shorter and clearer.
Dave
May 26th, 2007 at 7:20 pm
It’s been a little while, so I’m not sure why I did it that way. The only reason I can think of for the if … else … block is that it allows you to do some more work before returning.
June 21st, 2007 at 2:47 pm
[...] or if you could use a quick refresher, go read my intro to regular expressions, and work through a few examples. Trust me, it’ll be one of the most rewarding twenty minutes you’ve ever spent. If [...]
June 22nd, 2007 at 4:43 pm
“Tim Fletcher has ported Cal’s original PHP function into Ruby and Perl, if that’s what you’re into..”
Thats Python, not Perl.
June 23rd, 2007 at 9:06 am
Doesn’t a* match axx, a2d, abc,….. Anything starting with a ?
June 24th, 2007 at 7:15 pm
@Nitin: nope. You’re thinking of filename globs. a* in a regex matches 0 or more ‘a’ characters.
July 11th, 2007 at 11:37 pm
[...] 5 Regular Expressions Every Web Programmer Should Know Matching a username, an XHTML/XML tag, an XHTML/XML tag with a certain attribute value (e.g. class or tag), and parsing an email address, a URL. (tags: regex tutorials webdev php javascript python) [...]
July 12th, 2007 at 3:22 pm
[...] 5 Regular Expressions Every Web Programmer Should Know [...]
July 20th, 2007 at 2:10 am
Hi, Thanks for such excellent article. Kindly tell me the RegX to validate single quotes. i.e it should validate the names like D’souza and single quote should be taken only once in the word.
Ex : Genelia D’Souza , Aarthi P’son .
July 20th, 2007 at 9:45 am
Great article for my collection. I also would recommend the good list of resources for web masters and programmers that I often use:
http://www.800-webdesign.com/web-master-links.html
(SQL, PHP, Java, DHTML, etc)
July 20th, 2007 at 9:55 am
As other people have noted, there are quite a few bugs in these expressions. Here are a few more:
The [^>]* is going to cause you some grief.
]*>(.*?)
This will match any opening tag that STARTS with your tag (e.g. foo, food, fool, etc.) up to a closing tag that matches your tag exactly.
]*bar\s*=\s*(["'])value\\1[^>]*>(.*?)
This will match any tags (including the bug listed above) with any attributes that END with the text you specify. (e.g. bar, rebar, open_bar, etc.)
July 20th, 2007 at 11:26 am
@bryce
Good catch, I didn’t notice that little bug. It’d probably be pretty rare for it to be a problem, but you’re right. In many cases you could resolve the issue by requiring space (or a close tag) after the tag name (in the first case) or before the attribute name (in the second).
What other bugs are there?
July 20th, 2007 at 11:51 am
More info at http://www.regular-expressions.info
Regards,
Hector Fidel
July 20th, 2007 at 12:35 pm
Without digging too deeply into them:
They don’t look for some optional white-space in the tag names.
It will not work for nested tags of the same type, e.g. <div class=’foo’> … <div class=’bar’> … </div> … </div>
They rely on well-formed HTML or XML.
In general, parsing HTML and XML is pretty hard. Your best bet is to not reinvent the wheel and use established libraries if possible.
July 20th, 2007 at 2:19 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike i have probably used these in (tags: regex development) This entry was written by sxtxixtxcxh and posted on July 20, 2007 at 12:19 pm and filed under noteworthy. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Post a comment or leave a trackback: Trackback URL. « links for 2007-07-13 [...]
July 20th, 2007 at 3:23 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike (tags: immike.net 2007 regexp expressões_regulares web) [...]
July 20th, 2007 at 6:22 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike (tags: tutorial php) [...]
July 21st, 2007 at 1:18 am
[...] 5 Regular Expressions Every Web Programmer Should Know (tags: programming) [...]
July 21st, 2007 at 2:31 am
[...] The beginning of the expression is fairly straightforward: it matches the sequence “h - t - t - p - : - / - /”. This initial sequence is followed by parenthesis, which are used to capture the characters that match the subexpression they surround. In this case the subexpression is ‘[^/]+’, which matches any character except for ‘/’ one or more times. For a URL like http://immike.net/blog/Some-blog-post, ‘immike.net’ will be captured by the parenthesis. ——————————————————- [links] [...]
July 21st, 2007 at 7:29 am
[...] 5 Regular Expressions Every Web Programmer Should Know Matching a username, Matching an XHTML/XML tag, Matching an XHTML/XML tag with a certain attribute value (e.g. class or tag), Matching and parsing an email address, Matching a URL (tags: RegEx) [...]
July 21st, 2007 at 10:20 am
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike (tags: regex) [...]
July 21st, 2007 at 9:22 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike (tags: algorithm code development regexp regex programming) [...]
July 21st, 2007 at 10:13 pm
[...] stumbled across this fine little gem today via one of my feeds, 5 Regular Expressions Every Web Programmer Should Know it has some great little regexp pieces in it, like this complete email parsing function for php [...]
July 22nd, 2007 at 1:19 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike Here are the five regular expressions that I have found the most useful for day-to-day web programming tasks. (tags: regex regexp email expressions regular php xml xhtml) [...]
July 23rd, 2007 at 8:01 am
[...] Expressions - Primer - Introduction, Advanced, [...]
July 23rd, 2007 at 5:24 pm
[...] Exceptionally Useful Photoshop Shortcuts 5 Regular Expressions Every Web Programmer Should Know Flock 15 Coolest Firefox Tricks Ever super OS X menubar items Dashalytics Plug-ins for OS X [...]
July 23rd, 2007 at 9:58 pm
[...] 5 regular expressions Matching a username, Matching an XHTML/XML tag, Matching an XHTML/XML tag with a certain attribute value (e.g. class or tag), Matching and parsing an email address, Matching a URL [...]
July 24th, 2007 at 11:35 am
[...] programación tipo php, c, etc., también te sirve en javascript y otros lenguajes lado cliente. En I’m Mike, escribieron un articulo con 5 expresiones regulares que debes conocer, las cuales [...]
July 27th, 2007 at 1:06 pm
[...] 5 Regular Expressions Every Web Programmer Should Know - I’m Mike Ah…now I won’t look so dumb when needing to figure out the regex for a URL for the hundredth time. (tags: regex howto development) [...]
July 29th, 2007 at 7:02 pm
[...] fileAn Agile Bookshelf: 10 Must-Read BooksEffective Email Address ValidationFirst Look at IronRuby5 Regular Expressions Every Web Programmer Should KnowCreate a List of all CountriesVisual Studio Macros (C# [...]
August 7th, 2007 at 9:02 pm
it took me quite a while to learn this one..
anyways thank you for this great post, as always
August 11th, 2007 at 3:00 am
[...] expressions). The absolute bare minimum every programmer should know about regular expressions 5 Regular Expressions Every Web Programmer Should Know Extreme regex foo: what you need to know to become a regular expression [...]
August 23rd, 2007 at 1:12 pm
Life saver. Thanks SO much.
August 27th, 2007 at 11:30 am
not to be TOO pedantic about it…
It’s ‘ado’, not ‘adu’.
Unless there’s some subtle joke I’m missing.
August 30th, 2007 at 9:17 am
[...] funzionare). Per fare solo un controllo ti basta preg_match o l’equivalente eregi_. Qua trovi alcuni esempi di espressioni regolari utili in php. __________________ chezDreadnaut - dailyDreadnaut "Un <BR> è impuro, [...]
August 31st, 2007 at 12:28 pm
[...] Visto aqui. [...]
September 2nd, 2007 at 5:26 pm
What about german “umlaut domains” like öko.de?
http://www.denic.de/en/domains/idns/liste.html
Basti
September 2nd, 2007 at 11:30 pm
Hi.
I have a question…
How do you allow only one consequitive underscore for username?
I mean, with your regex, username can be “____” for example. As you see, it’s not a good username, so, I would want to allow usernames such as “hello_world_hi”, where there are some characters between underscores…
thnx
September 7th, 2007 at 12:38 pm
Thanks for writing this article. It helped me solve a problem.
September 23rd, 2007 at 3:58 pm
The function for “Matching an XHTML/XML tag” should look like this:
function get_tag( $tag, $xml ) {
$tag = preg_quote($tag);
preg_match_all('{]*>(.*?)}', $xml, $matches, PREG_PATTERN_ORDER);
{]*>(.*?)}
return $matches[1];
}
</pre
The above version doesn’t work…doh!
September 23rd, 2007 at 4:00 pm
Sorry, let’s try that again!
function get_tag( $tag, $xml ) {
$tag = preg_quote($tag);
preg_match_all('{]*>(.*?)}',
$xml,
$matches,
PREG_PATTERN_ORDER);
return $matches[1];
}
March 13th, 2008 at 7:21 am
Hi Mike,
It’s an extraordinary article .
Will you please tell me how to represent / accept below given URL/Domain Name in XML
“fmc.lab.ip.qwest.net”
Vaibhav.
March 14th, 2008 at 7:32 am
Hi Mike,
How to not accept a URL with multiple consecutive dots ,
e.g. http://www………..yahoo.com
Vaibhav.
March 27th, 2008 at 5:32 am
Hi,
Is there a way to ignore a item in a collection starting with a specific character?
for example i need,
only first hello.. it should ignore the dot before http://www.hello hello “hello http://www.hellos.com “
June 27th, 2008 at 3:06 pm
The first PHP example function is so bad. Try:
function validate_username( $username ) {
return preg_match(’/^[a-zA-Z0-9_]{3,16}$/’, $username);
}
June 28th, 2008 at 2:37 pm
Hi I’m also really lame and rather than improve the article with my comment I will instead beg for help on a feature of regular expressions that is really fundamental, and obvious and trivial to google for. Because I’m also lame at using Google. True, I could have emailed the author, but actually I thought I’d humiliate myself in front of the whole Internet. And obviously everyone wants to read my comment! And I somehow got this programming job for this flaky web2 product even though I have no relevant skills.
Regards,
Max Howell
ps I wrote my name in full because I want to look like I’m important enough that people will have heard of my name.
June 30th, 2008 at 9:00 am
Nice article! In your validate_username() function, you pass in a $username argument, but you actually check the $_GET['username'] var. Validating the $username you pass in is more generally useful, and less confusing. :)
August 22nd, 2008 at 4:25 am
[...] 5 Regular Expressions Every Web Programmer Should Know [...]
September 4th, 2008 at 4:19 pm
Your XML tag expression will fail on nested tags with the same name, as are often used in XSL stylesheets (ie …).