Regular Expressions (Regex): A Beginner's Complete Guide
Regular expressions let you search, validate, and transform text with precise pattern matching. This beginner guide covers every core concept with real examples.
Advertisement
A regular expression (regex) is a sequence of characters that defines a search pattern. They are used in every programming language for finding text, validating input, parsing data, and making replacements. The syntax looks cryptic at first, but once you understand the building blocks, regex becomes one of the most powerful tools in a developer's arsenal.
What Is a Regular Expression?
A regex like /^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i is an email validator — but let's start much simpler. The most basic regex is just a literal string: /cat/ matches the letters 'c', 'a', 't' appearing in sequence in any string. From this foundation, special characters let you describe patterns rather than fixed text.
Character Classes and Metacharacters
| Pattern | Matches | Example |
|---|---|---|
| . | Any character except newline | /c.t/ matches cat, cut, c3t |
| [abc] | Any one of a, b, or c | /[aeiou]/ matches any vowel |
| [a-z] | Any lowercase letter | /[a-z]+/ matches one or more lowercase letters |
| [^abc] | Any character NOT a, b, or c | /[^0-9]/ matches any non-digit |
| \d | Any digit (0-9) | /\d{4}/ matches exactly 4 digits |
| \w | Word character (a-z, A-Z, 0-9, _) | /\w+/ matches whole words |
| \s | Any whitespace (space, tab, newline) | /\s+/ matches whitespace runs |
| \D, \W, \S | Negated versions | /\D/ matches any non-digit |
Quantifiers: How Much to Match
| Quantifier | Meaning | Example |
|---|---|---|
| * | 0 or more | /ab*c/ matches ac, abc, abbc |
| + | 1 or more | /ab+c/ matches abc, abbc — not ac |
| ? | 0 or 1 (optional) | /colou?r/ matches color and colour |
| {n} | Exactly n times | /\d{4}/ matches exactly 4 digits |
| {n,} | n or more times | /\d{3,}/ matches 3+ digits |
| {n,m} | Between n and m times | /\d{2,4}/ matches 2, 3, or 4 digits |
Anchors: Position Matching
- ^ — matches at the start of the string (or line in multiline mode)
- $ — matches at the end of the string (or line in multiline mode)
- \b — word boundary (position between \w and \W)
- \B — not a word boundary
Groups and Capturing
Parentheses create groups that can capture matched text for later use. /(\d{4})-(\d{2})-(\d{2})/ matches a date like 2025-06-15 and captures year, month, day in groups 1, 2, and 3. Non-capturing groups (?:...) group without capturing — useful for applying quantifiers to a pattern without capturing the match.
Practical Patterns
- Email (basic): /^[^\s@]+@[^\s@]+\.[^\s@]+$/
- URL: /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}/
- Indian mobile: /^[6-9]\d{9}$/
- Pincode (India): /^[1-9][0-9]{5}$/
- Date (YYYY-MM-DD): /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/
- IPv4 address: /^(\d{1,3}\.){3}\d{1,3}$/
Test your regex patterns
Write and test regular expressions with real-time match highlighting.
Regex Flags
| Flag | Meaning |
|---|---|
| g | Global — find all matches, not just the first |
| i | Case-insensitive — /cat/i matches Cat, CAT, cat |
| m | Multiline — ^ and $ match at line boundaries |
| s | DotAll — . matches newlines too |
| u | Unicode — treat pattern as sequence of Unicode code points |
Frequently Asked Questions
Are regex patterns the same across all programming languages?
The core syntax (character classes, quantifiers, anchors) is consistent across most languages, but advanced features vary. Lookbehind assertions, named captures, possessive quantifiers, and atomic groups are not universally supported. JavaScript, Python, Java, PHP, and Ruby all have slightly different feature sets and edge-case behaviours.
What are greedy vs lazy quantifiers?
Greedy quantifiers (*, +, {n,m}) match as many characters as possible. Lazy quantifiers (*?, +?, {n,m}?) match as few as possible. For example, with <.*> on '<b>text</b>', greedy matches the entire string; lazy <.*?> matches just <b> then </b> separately.
Can regex validate an email address completely?
Technically, a fully RFC 5321-compliant email regex is extraordinarily complex (RFC 5321's ABNF grammar produces a regex spanning multiple pages). For practical use, a simple check for @, a dot after it, and no spaces is sufficient. Always verify emails by sending a confirmation message, not by regex alone.