The Power of Regular Expressions

Tom Cantwell
2 min readJun 10, 2020

--

Knowing how to read and use regex will make you a more powerful developer!

Regular Expressions, commonly known as Regex, are at first a confusing nightmare of odd characters that seemingly does magic to turn otherwise lengthy code with multiple methods into one single neat, beautiful line of code. It’s worth learning to give your developer’s toolkit a powerful tool.

You might have seen something that looks like this:

/^\$2[aby]?\$\d{1,2}\$[.\/A-Za-z0-9]{53}$/

This looks like nonsense to the untrained eye, but this is a pattern you can use to match strings that share that pattern, and there are only a few special characters and rules you need to know in order to understand what’s going on here. This is a check for password validation. The special characters for regex are:

/.../ denotes regex expression^...$ starts and ends, meaning the string matches exactly what's between these characters. This will prevent it matching strings that have more characters before or after what's inside.\ escapes special characters in order to use them as normal characters. In the example above, \ is used to escape the special character $ and use it as a standard character.[abc] is to only match a, b or c. In the example above, a 2 is in front so the expression [aby] is done twice to match two characters that are a,b or y.[^abc] the ^ character means something different if it's inside []. This time it means not a,b, or c.? denotes that the preceding character is optional.\d this time the \ is used to define a special character instead of escape one. \d can be thought of as a single character that denotes any digit.\D any non-digit character\w any alphanumeric character\W any non-alphanumeric character\s any whitespace character\S any non-whitespace character. any character{} quantifier that specifies repititions. In the example above, \d{1,2} is asking for a digit 1 to 2 times.[a-z] characters a through z. Any range can be specified for alphanumeric characters. Case sensitive.* quantifier for zero or more repetitions+ quantifier for one or more repetitions() denotes a capture group. You can define a group of characters you want to match. For example, if you want to find all image files of a certain type that start with "IMG" and end in ".jpg" you can use the regex expression /^(IMG.+)\.jpg$/ The .+ represents at least one of any character with no upper limit between "IMG" and ".jpg"(abc|xyz) matches abc or xyz.

Now that we know some regex, let’s try one out:

Remove the 👏 from this string:

sentence = "Regular 👏 Expressions 👏 are 👏 great."

We can use this line of code:

puts sentence.split().select {|word| /[^\u{1F44F}]/.match(word)}.join(" ")

Which gives us:

"Regular Expressions are great."

\u{1F44F} is the unicode for 👏.

--

--

No responses yet