Scary Spookvember: Regex!
This is the first blog post in a new series of blog posts that involves me learning and blogging about things that scare developers. I came up with this idea basically right after Halloween ended, so unfortunately this has to be the Scary Spookvember series now instead of Spooktober, which would've made more sense 🤦.
This first article is about Regex. Many developers think of it as like a template that can match strings, but actually, Regex is a simple programming language! To show this, let's take a look at a simple Regex that matches a simple lowercase UUID, such as
e81dd3ba-44bb-11ec-81d3-0242ac130003
. The Regex that I found on stack overflow is:
\b([0-9a-f]){8}-([0-9a-f]){4}-([0-9a-f]){4}-([0-9a-f]){4}-([0-9a-f]){12}\b
At first glance, this looks quite confusing and scary, but once we reformat it a bit, things begin to start taking shape:
\b # match the beginning or end of a word
(
[0-9a-f] # match any character from 0-9 or a-f
) {8} # loop 8 times
- # match the - character
(
[0-9a-f] # match any character from 0-9 or a-f
) {4} # loop 4 times
- # match the - character
(
[0-9a-f] # match any character from 0-9 or a-f
) {4} # loop 4 times
- # match the - character
(
[0-9a-f] # match any character from 0-9 or a-f
) {4} # loop 4 times
- # match the - character
(
[0-9a-f] # match any character from 0-9 or a-f
) {12} # loop 12 times
\b # match the beginning or end of a word
Now, the regex is trivial to understand! We can even use this type of reasoning to build much more complicated Regex patterns. For example, I was signing up for an Epic Games account a few months back to get some of their free games, and when setting a password, they gave me the following requirements:
How can we build a way to validate Epic Games passwords? Let's start by abstracting away some details into "functions".
has_number()
has_letter()
has_greater_than_or_equal_to_seven_characters()
not has_whitespace()
Now, we can think about solving each one of these functions on its own. Let's try has_number
first:
. # match any character
*? # loop 0 times and try the rest of the regex, loop 1 times and try the rest of the regex, etc.
[0-9] # match any digit
Then, let's do has_letter
which is pretty similar to has_number
:
. # match any character
*? # loop 0 times and try the rest of the regex, loop 1 times and try the rest of the regex, etc. until match
[a-zA-Z] # match any letter
Then, let's do has_greater_than_or_equal_to_seven_characters
:
. # match any character
{7,} # loop at least 7 times and try the rest of the regex
Another way to do this is something like:
. # match any character
{6} # loop 6 times
. # match any character
And finally, let's do has_whitespace
. # match any character
*? # loop 0 times and try the rest of the regex, loop 1 times and try the rest of the regex, etc. until match
\s # match a whitespace character
Now, we can combine these to form a full pattern validator. To do this, we will use positive and negative lookaheads. A lookahead basically attaches some conditions to a match. Let's say I were to write code to match the "x"
character at position p
, I might write "x" == word[p]
. If I attached a lookahead to this "x"
character match, that would look something like this in Regex x(?=some_function)
for positive lookahead and x(?!some_function)
for negative lookahead, and the pseudocode would be like "x" == word[p] and some_function(word[p+1:])
for positive lookahead and "x" == word[p] and !some_function(word[p+1:])
for negative lookahead. With this extra bit of knowledge, we can now get to the Regex for our password validator:
^ # match the start of the line
(?= # positive lookahead for has_number function
.
*?
[0-9]
)
(?= # positive lookahead for has_letter function
.
*?
[a-zA-Z]
)
(?= # positive lookahead for has_greater_than_or_equal_to_seven_characters function
.
{7,}
)
(?! # negative lookahead for has_whitespace function
.
*?
\s
)
. # match any character
* # loop to the end
$ # match the end of the line
All in all, this Regex looks like: ^(?=.*?[0-9])(?=.*?[a-zA-Z])(?=.{7,})(?!.*?\s).*$
and it works!
That's all for this post about Regex. For more information, check out this great conference talk: Understanding and Using Regular Expressions and for some extra practice, try implementing a Regex display name validator for Epic Games!
Comments
Post a Comment