Python RegEx Tutorial: Mastering Regular Expressions
Last updated 5 months, 1 week ago | 499 views 75 5

Regular Expressions (RegEx or regex) are powerful tools for pattern matching and text manipulation. Python’s built-in re
module enables you to work with regex seamlessly.
Whether you're validating emails, parsing logs, or scraping data, regex helps you process strings efficiently and flexibly.
What Is a Regular Expression?
A regular expression is a special string pattern that describes a search pattern in text. It can match, find, replace, or split strings based on complex rules.
Example:
import re
pattern = r"\d+"
text = "There are 123 apples"
result = re.findall(pattern, text)
print(result) # Output: ['123']
Python re
Module Functions
Function | Description |
---|---|
re.match() |
Checks for a match at the beginning of a string |
re.search() |
Searches the entire string for a match |
re.findall() |
Returns all non-overlapping matches |
re.finditer() |
Returns an iterator over all matches |
re.sub() |
Replaces matches |
re.split() |
Splits a string using a pattern |
re.compile() |
Compiles a regex pattern for reuse |
Regex Syntax Basics
Pattern | Meaning | Example |
---|---|---|
. |
Any character except newline | a.c → matches abc , axc |
^ |
Start of string | ^Hello |
$ |
End of string | world$ |
* |
0 or more | ab*c → ac , abc , abbc |
+ |
1 or more | ab+c → abc , abbc |
? |
0 or 1 | colou?r → color , colour |
[] |
Character set | [aeiou] |
` | ` | OR |
\d |
Digit | [0-9] |
\w |
Word character | [a-zA-Z0-9_] |
\s |
Whitespace | space, tab, newline |
{n} |
Exactly n times | \d{3} |
(…) |
Group | (\d{3})-(\d{2}) |
Using re.match()
Matches pattern only at the start of the string.
import re
result = re.match(r"Hello", "Hello World")
print(result.group()) # Output: Hello
Using re.search()
Searches for the pattern anywhere in the string.
result = re.search(r"World", "Hello World")
print(result.group()) # Output: World
Using re.findall()
Returns a list of all matches.
text = "Contact: 123-456-7890 or 987-654-3210"
phones = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(phones)
Output:
['123-456-7890', '987-654-3210']
Using re.sub()
for Replacements
text = "The price is $100"
new_text = re.sub(r"\$\d+", "$XXX", text)
print(new_text) # Output: The price is $XXX
Grouping with ()
and Accessing with .group()
text = "Name: John, Age: 30"
match = re.search(r"Name: (\w+), Age: (\d+)", text)
if match:
print(match.group(1)) # Output: John
print(match.group(2)) # Output: 30
⚡ Using re.compile()
for Reusability
pattern = re.compile(r"\d{4}-\d{2}-\d{2}")
dates = ["2024-01-01", "Date: 2025-05-07"]
for date in dates:
match = pattern.search(date)
if match:
print(match.group())
Real-World Example: Validate Email Address
import re
def is_valid_email(email):
pattern = r"^[\w\.-]+@[\w\.-]+\.\w{2,}$"
return re.match(pattern, email) is not None
print(is_valid_email("[email protected]")) # True
print(is_valid_email("bad@email")) # False
⚠️ Common Pitfalls
Pitfall | Issue | Solution |
---|---|---|
Using match() instead of search() |
match() only checks start of string |
Use search() for full search |
Forgetting raw strings (r"…" ) |
Backslashes may be interpreted by Python | Always use raw strings |
Greedy matches | .* consumes too much |
Use .*? for non-greedy |
Misusing character sets | [abc] ≠ (abc) |
Use () for grouping, [] for sets |
Overusing regex | For simple substring search, use in |
Use regex only when needed |
Tips for Working with RegEx in Python
-
✅ Always prefix patterns with
r"…"
, e.g.,r"\d+"
-
✅ Use
re.compile()
for performance in repeated matches -
✅ Break long patterns using verbose mode (
re.VERBOSE
) -
✅ Use
finditer()
for large texts (returns match objects lazily) -
✅ Test patterns online (e.g., regex101.com)
Complete Working Code Example
import re
def extract_emails(text):
pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
return re.findall(pattern, text)
def anonymize_emails(text):
return re.sub(r"([a-zA-Z0-9_.+-]+)@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", r"\1@***.com", text)
# Example usage
text = "Contact us at [email protected] or [email protected]"
emails = extract_emails(text)
print("Found emails:", emails)
anonymized = anonymize_emails(text)
print("Anonymized text:", anonymized)
Summary Table
Method | Use |
---|---|
re.match() |
Match from the beginning |
re.search() |
Search anywhere in string |
re.findall() |
Find all non-overlapping matches |
re.sub() |
Replace matched patterns |
re.split() |
Split by pattern |
re.compile() |
Compile regex for reuse |
Conclusion
Python’s re
module gives you powerful pattern matching capabilities. With regex, you can automate everything from simple string validation to complex data extraction.
Start small, build your patterns step by step, and always test thoroughly!