Python RegEx Tutorial: Mastering Regular Expressions

Last updated 5 months, 1 week ago | 499 views 75     5

Tags:- Python

Regular Expressions (RegEx or regex) are powerful tools for pattern matching and text manipulation. Python’s built-in re module enables you to work with regex seamlessly.

Whether you're validating emails, parsing logs, or scraping data, regex helps you process strings efficiently and flexibly.


What Is a Regular Expression?

A regular expression is a special string pattern that describes a search pattern in text. It can match, find, replace, or split strings based on complex rules.

Example:

import re

pattern = r"\d+"
text = "There are 123 apples"
result = re.findall(pattern, text)
print(result)  # Output: ['123']

Python re Module Functions

Function Description
re.match() Checks for a match at the beginning of a string
re.search() Searches the entire string for a match
re.findall() Returns all non-overlapping matches
re.finditer() Returns an iterator over all matches
re.sub() Replaces matches
re.split() Splits a string using a pattern
re.compile() Compiles a regex pattern for reuse

Regex Syntax Basics

Pattern Meaning Example
. Any character except newline a.c → matches abc, axc
^ Start of string ^Hello
$ End of string world$
* 0 or more ab*cac, abc, abbc
+ 1 or more ab+cabc, abbc
? 0 or 1 colou?rcolor, colour
[] Character set [aeiou]
` ` OR
\d Digit [0-9]
\w Word character [a-zA-Z0-9_]
\s Whitespace space, tab, newline
{n} Exactly n times \d{3}
(…) Group (\d{3})-(\d{2})

Using re.match()

Matches pattern only at the start of the string.

import re

result = re.match(r"Hello", "Hello World")
print(result.group())  # Output: Hello

Using re.search()

Searches for the pattern anywhere in the string.

result = re.search(r"World", "Hello World")
print(result.group())  # Output: World

Using re.findall()

Returns a list of all matches.

text = "Contact: 123-456-7890 or 987-654-3210"
phones = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(phones)

Output:

['123-456-7890', '987-654-3210']

Using re.sub() for Replacements

text = "The price is $100"
new_text = re.sub(r"\$\d+", "$XXX", text)
print(new_text)  # Output: The price is $XXX

Grouping with () and Accessing with .group()

text = "Name: John, Age: 30"
match = re.search(r"Name: (\w+), Age: (\d+)", text)
if match:
    print(match.group(1))  # Output: John
    print(match.group(2))  # Output: 30

⚡ Using re.compile() for Reusability

pattern = re.compile(r"\d{4}-\d{2}-\d{2}")
dates = ["2024-01-01", "Date: 2025-05-07"]
for date in dates:
    match = pattern.search(date)
    if match:
        print(match.group())

Real-World Example: Validate Email Address

import re

def is_valid_email(email):
    pattern = r"^[\w\.-]+@[\w\.-]+\.\w{2,}$"
    return re.match(pattern, email) is not None

print(is_valid_email("[email protected]"))  # True
print(is_valid_email("bad@email"))         # False

⚠️ Common Pitfalls

Pitfall Issue Solution
Using match() instead of search() match() only checks start of string Use search() for full search
Forgetting raw strings (r"…") Backslashes may be interpreted by Python Always use raw strings
Greedy matches .* consumes too much Use .*? for non-greedy
Misusing character sets [abc](abc) Use () for grouping, [] for sets
Overusing regex For simple substring search, use in Use regex only when needed

Tips for Working with RegEx in Python

  • ✅ Always prefix patterns with r"…", e.g., r"\d+"

  • ✅ Use re.compile() for performance in repeated matches

  • ✅ Break long patterns using verbose mode (re.VERBOSE)

  • ✅ Use finditer() for large texts (returns match objects lazily)

  • ✅ Test patterns online (e.g., regex101.com)


Complete Working Code Example

import re

def extract_emails(text):
    pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
    return re.findall(pattern, text)

def anonymize_emails(text):
    return re.sub(r"([a-zA-Z0-9_.+-]+)@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", r"\1@***.com", text)

# Example usage
text = "Contact us at [email protected] or [email protected]"
emails = extract_emails(text)
print("Found emails:", emails)

anonymized = anonymize_emails(text)
print("Anonymized text:", anonymized)

Summary Table

Method Use
re.match() Match from the beginning
re.search() Search anywhere in string
re.findall() Find all non-overlapping matches
re.sub() Replace matched patterns
re.split() Split by pattern
re.compile() Compile regex for reuse

Conclusion

Python’s re module gives you powerful pattern matching capabilities. With regex, you can automate everything from simple string validation to complex data extraction.

Start small, build your patterns step by step, and always test thoroughly!