Regular Expressions (Regex): A Beginner's Complete Guide

Regular expressions let you search, validate, and transform text with precise pattern matching. This beginner guide covers every core concept with real examples.

NK
Nitin KaushikPublished 10 June 2025 · 10 min read

Advertisement

A regular expression (regex) is a sequence of characters that defines a search pattern. They are used in every programming language for finding text, validating input, parsing data, and making replacements. The syntax looks cryptic at first, but once you understand the building blocks, regex becomes one of the most powerful tools in a developer's arsenal.

What Is a Regular Expression?

A regex like /^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i is an email validator — but let's start much simpler. The most basic regex is just a literal string: /cat/ matches the letters 'c', 'a', 't' appearing in sequence in any string. From this foundation, special characters let you describe patterns rather than fixed text.

Character Classes and Metacharacters

PatternMatchesExample
.Any character except newline/c.t/ matches cat, cut, c3t
[abc]Any one of a, b, or c/[aeiou]/ matches any vowel
[a-z]Any lowercase letter/[a-z]+/ matches one or more lowercase letters
[^abc]Any character NOT a, b, or c/[^0-9]/ matches any non-digit
\dAny digit (0-9)/\d{4}/ matches exactly 4 digits
\wWord character (a-z, A-Z, 0-9, _)/\w+/ matches whole words
\sAny whitespace (space, tab, newline)/\s+/ matches whitespace runs
\D, \W, \SNegated versions/\D/ matches any non-digit

Quantifiers: How Much to Match

QuantifierMeaningExample
*0 or more/ab*c/ matches ac, abc, abbc
+1 or more/ab+c/ matches abc, abbc — not ac
?0 or 1 (optional)/colou?r/ matches color and colour
{n}Exactly n times/\d{4}/ matches exactly 4 digits
{n,}n or more times/\d{3,}/ matches 3+ digits
{n,m}Between n and m times/\d{2,4}/ matches 2, 3, or 4 digits

Anchors: Position Matching

  • ^ — matches at the start of the string (or line in multiline mode)
  • $ — matches at the end of the string (or line in multiline mode)
  • \b — word boundary (position between \w and \W)
  • \B — not a word boundary

Groups and Capturing

Parentheses create groups that can capture matched text for later use. /(\d{4})-(\d{2})-(\d{2})/ matches a date like 2025-06-15 and captures year, month, day in groups 1, 2, and 3. Non-capturing groups (?:...) group without capturing — useful for applying quantifiers to a pattern without capturing the match.

Practical Patterns

  • Email (basic): /^[^\s@]+@[^\s@]+\.[^\s@]+$/
  • URL: /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._+~#=]{2,256}\.[a-z]{2,6}/
  • Indian mobile: /^[6-9]\d{9}$/
  • Pincode (India): /^[1-9][0-9]{5}$/
  • Date (YYYY-MM-DD): /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/
  • IPv4 address: /^(\d{1,3}\.){3}\d{1,3}$/

Test your regex patterns

Write and test regular expressions with real-time match highlighting.

Open Regex Tester →

Regex Flags

FlagMeaning
gGlobal — find all matches, not just the first
iCase-insensitive — /cat/i matches Cat, CAT, cat
mMultiline — ^ and $ match at line boundaries
sDotAll — . matches newlines too
uUnicode — treat pattern as sequence of Unicode code points

Frequently Asked Questions

Are regex patterns the same across all programming languages?

The core syntax (character classes, quantifiers, anchors) is consistent across most languages, but advanced features vary. Lookbehind assertions, named captures, possessive quantifiers, and atomic groups are not universally supported. JavaScript, Python, Java, PHP, and Ruby all have slightly different feature sets and edge-case behaviours.

What are greedy vs lazy quantifiers?

Greedy quantifiers (*, +, {n,m}) match as many characters as possible. Lazy quantifiers (*?, +?, {n,m}?) match as few as possible. For example, with <.*> on '<b>text</b>', greedy matches the entire string; lazy <.*?> matches just <b> then </b> separately.

Can regex validate an email address completely?

Technically, a fully RFC 5321-compliant email regex is extraordinarily complex (RFC 5321's ABNF grammar produces a regex spanning multiple pages). For practical use, a simple check for @, a dot after it, and no spaces is sufficient. Always verify emails by sending a confirmation message, not by regex alone.

Related Tools