Regular expressions (regex) are powerful patterns for searching, matching, and manipulating text. They might look intimidating at first -- a string like ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ can seem like an alien language. But the basics are surprisingly simple, and once you understand the building blocks, you can read and write regex patterns with confidence. This guide will take you from zero to practical competence.
A regular expression is a sequence of characters that defines a search pattern. Think of it as a mini programming language designed specifically for finding and manipulating text. Regex is used in virtually every programming language (JavaScript, Python, Java, Go, PHP, Ruby, C#), text editors (VS Code, Sublime Text, Vim), command-line tools (grep, sed, awk), databases (MySQL, PostgreSQL), and even spreadsheet applications.
When you search for a literal string like "hello" in a document, you find exact matches. Regex lets you search for patterns -- "any word that starts with h and ends with o," or "any sequence of digits that looks like a phone number," or "any email address in this document." That is the fundamental power of regular expressions: they describe categories of text, not just specific strings.
Before diving into syntax, it is worth understanding why regex is such a valuable skill:
Once you know regex, you will find uses for it constantly. It is one of those skills that pays dividends across your entire career.
These are the fundamental building blocks of regex. Each one matches a specific type of character:
\d matches any digit (0-9)\D matches any non-digit character\w matches any word character (letters, digits, underscore)\W matches any non-word character\s matches any whitespace (space, tab, newline)\S matches any non-whitespace character. matches any character except newline^ matches the start of a string$ matches the end of a stringExample: The pattern \d\d\d matches any three consecutive digits -- "123", "456", "789", but not "12a" or "ab3".
Character classes let you define custom sets of characters to match:
[abc] matches any single character that is a, b, or c[a-z] matches any lowercase letter[A-Z] matches any uppercase letter[0-9] matches any digit (same as \d)[a-zA-Z0-9] matches any letter or digit[^abc] matches any character that is NOT a, b, or c (the ^ inside brackets means "not")Example: The pattern [aeiou] matches any single vowel. The pattern [^aeiou] matches any single character that is not a vowel.
Quantifiers specify how many times a pattern should repeat:
* matches 0 or more times (greedy)+ matches 1 or more times (greedy)? matches 0 or 1 time (makes something optional){3} matches exactly 3 times{2,5} matches 2 to 5 times{3,} matches 3 or more timesExample: The pattern \d{3}-\d{4} matches a three-digit number, a hyphen, and a four-digit number -- like "555-1234".
Greedy vs Lazy: By default, quantifiers are greedy -- they match as much text as possible. Adding a ? after a quantifier makes it lazy, matching as little as possible. For example, given the text <b>hello</b> world <b>goodbye</b>, the greedy pattern <b>.*</b> matches everything from the first <b> to the last </b>. The lazy pattern <b>.*?</b> matches only <b>hello</b>.
Parentheses create groups, and the pipe character creates alternation (logical OR):
(abc) captures the group "abc" -- useful for extracting specific parts of a match(a|b|c) matches a, b, or c (alternation)(?:abc) is a non-capturing group -- matches "abc" but does not capture it for later useExample: The pattern (cat|dog|bird) matches "cat", "dog", or "bird". The pattern (\d{3})-(\d{4}) matches "555-1234" and captures "555" in group 1 and "1234" in group 2.
Here are real-world regex patterns you can use today:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\d{3}[-.]?\d{3}[-.]?\d{4}https?://[\w.-]+(?:\.[\w.-]+)+[\w.,@?^=%&:/~+#-]*\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])<([a-z]+)[^>]*>.*?</\1>#(?:[0-9a-fA-F]{3}){1,2}^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$Flags modify how the regex engine interprets your pattern:
g (global): Find all matches, not just the firsti (case insensitive): Ignore case when matching, so hello matches "Hello", "HELLO", etc.m (multiline): ^ and $ match line boundaries instead of string boundariess (dotAll): . matches newline characters tooExample: The pattern /hello/gi matches "hello", "Hello", "HELLO", and every other case variation, and finds all occurrences in the text (not just the first).
Watch out for these frequent mistakes when writing regex:
., *, +, ?, (, ), [, ], {, }, ^, $, and | have special meaning. To match them literally, escape with a backslash: \. matches an actual period.*?, +?) when you need the shortest possible match.^ and $, your pattern might match substrings you did not intend. For validation, always anchor both ends.(a+)+ can cause the regex engine to hang on certain inputs. Avoid nested repetition.Regular expressions are one of the most universally useful skills in programming and data work. The investment you make in learning them will pay off every single week of your career. Start experimenting with the free Regex Tester at Vaxtim Yoxdu and build your pattern-matching skills one step at a time.
Subscribe to get notified about new blog posts and useful tools.