
Python RegEx Tutorial: Mastering Regular Expressions
Last updated 2 weeks, 2 days ago | 51 views 75 5

Regular Expressions (RegEx or regex) are powerful tools for pattern matching and text manipulation. Python’s built-in re
module enables you to work with regex seamlessly.
Whether you're validating emails, parsing logs, or scraping data, regex helps you process strings efficiently and flexibly.
What Is a Regular Expression?
A regular expression is a special string pattern that describes a search pattern in text. It can match, find, replace, or split strings based on complex rules.
Example:
import re
pattern = r"\d+"
text = "There are 123 apples"
result = re.findall(pattern, text)
print(result) # Output: ['123']
Python re
Module Functions
Function | Description |
---|---|
re.match() |
Checks for a match at the beginning of a string |
re.search() |
Searches the entire string for a match |
re.findall() |
Returns all non-overlapping matches |
re.finditer() |
Returns an iterator over all matches |
re.sub() |
Replaces matches |
re.split() |
Splits a string using a pattern |
re.compile() |
Compiles a regex pattern for reuse |
Regex Syntax Basics
Pattern | Meaning | Example |
---|---|---|
. |
Any character except newline | a.c → matches abc , axc |
^ |
Start of string | ^Hello |
$ |
End of string | world$ |
* |
0 or more | ab*c → ac , abc , abbc |
+ |
1 or more | ab+c → abc , abbc |
? |
0 or 1 | colou?r → color , colour |
[] |
Character set | [aeiou] |
` | ` | OR |
\d |
Digit | [0-9] |
\w |
Word character | [a-zA-Z0-9_] |
\s |
Whitespace | space, tab, newline |
{n} |
Exactly n times | \d{3} |
(…) |
Group | (\d{3})-(\d{2}) |
Using re.match()
Matches pattern only at the start of the string.
import re
result = re.match(r"Hello", "Hello World")
print(result.group()) # Output: Hello
Using re.search()
Searches for the pattern anywhere in the string.
result = re.search(r"World", "Hello World")
print(result.group()) # Output: World
Using re.findall()
Returns a list of all matches.
text = "Contact: 123-456-7890 or 987-654-3210"
phones = re.findall(r"\d{3}-\d{3}-\d{4}", text)
print(phones)
Output:
['123-456-7890', '987-654-3210']
Using re.sub()
for Replacements
text = "The price is $100"
new_text = re.sub(r"\$\d+", "$XXX", text)
print(new_text) # Output: The price is $XXX
Grouping with ()
and Accessing with .group()
text = "Name: John, Age: 30"
match = re.search(r"Name: (\w+), Age: (\d+)", text)
if match:
print(match.group(1)) # Output: John
print(match.group(2)) # Output: 30
⚡ Using re.compile()
for Reusability
pattern = re.compile(r"\d{4}-\d{2}-\d{2}")
dates = ["2024-01-01", "Date: 2025-05-07"]
for date in dates:
match = pattern.search(date)
if match:
print(match.group())
Real-World Example: Validate Email Address
import re
def is_valid_email(email):
pattern = r"^[\w\.-]+@[\w\.-]+\.\w{2,}$"
return re.match(pattern, email) is not None
print(is_valid_email("[email protected]")) # True
print(is_valid_email("bad@email")) # False
⚠️ Common Pitfalls
Pitfall | Issue | Solution |
---|---|---|
Using match() instead of search() |
match() only checks start of string |
Use search() for full search |
Forgetting raw strings (r"…" ) |
Backslashes may be interpreted by Python | Always use raw strings |
Greedy matches | .* consumes too much |
Use .*? for non-greedy |
Misusing character sets | [abc] ≠ (abc) |
Use () for grouping, [] for sets |
Overusing regex | For simple substring search, use in |
Use regex only when needed |
Tips for Working with RegEx in Python
-
✅ Always prefix patterns with
r"…"
, e.g.,r"\d+"
-
✅ Use
re.compile()
for performance in repeated matches -
✅ Break long patterns using verbose mode (
re.VERBOSE
) -
✅ Use
finditer()
for large texts (returns match objects lazily) -
✅ Test patterns online (e.g., regex101.com)
Complete Working Code Example
import re
def extract_emails(text):
pattern = r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+"
return re.findall(pattern, text)
def anonymize_emails(text):
return re.sub(r"([a-zA-Z0-9_.+-]+)@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", r"\1@***.com", text)
# Example usage
text = "Contact us at [email protected] or [email protected]"
emails = extract_emails(text)
print("Found emails:", emails)
anonymized = anonymize_emails(text)
print("Anonymized text:", anonymized)
Summary Table
Method | Use |
---|---|
re.match() |
Match from the beginning |
re.search() |
Search anywhere in string |
re.findall() |
Find all non-overlapping matches |
re.sub() |
Replace matched patterns |
re.split() |
Split by pattern |
re.compile() |
Compile regex for reuse |
Conclusion
Python’s re
module gives you powerful pattern matching capabilities. With regex, you can automate everything from simple string validation to complex data extraction.
Start small, build your patterns step by step, and always test thoroughly!
Tips and Tricks
What is pass in Python?
Python | Pass Statement
The pass statement is used as a placeholder for future code. It represents a null operation in Python. It is generally used for the purpose of filling up empty blocks of code which may execute during runtime but has yet to be written.
def myfunction():
pass
How can you generate random numbers?
Python | Generate random numbers
Python provides a module called random using which we can generate random numbers. e.g: print(random.random())
We have to import a random module and call the random() method as shown below:
import random
print(random.random())
The random() method generates float values lying between 0 and 1 randomly.
To generate customized random numbers between specified ranges, we can use the randrange() method
Syntax: randrange(beginning, end, step)
import random
print(random.randrange(5,100,2))
What is lambda in Python?
Python | Lambda function
A lambda function is a small anonymous function. This function can have any number of parameters but, can have just one statement.
Syntex:
lambda arguments : expression
a = lambda x,y : x+y
print(a(5, 6))
It also provides a nice way to write closures. With that power, you can do things like this.
def adder(x):
return lambda y: x + y
add5 = adder(5)
add5(1) #6
As you can see from the snippet of Python, the function adder takes in an argument x and returns an anonymous function, or lambda, that takes another argument y. That anonymous function allows you to create functions from functions. This is a simple example, but it should convey the power lambdas and closures have.
What is swapcase() function in the Python?
Python | swapcase() Function
It is a string's function that converts all uppercase characters into lowercase and vice versa. It automatically ignores all the non-alphabetic characters.
string = "IT IS IN LOWERCASE."
print(string.swapcase())
How to remove whitespaces from a string in Python?
Python | strip() Function | Remove whitespaces from a string
To remove the whitespaces and trailing spaces from the string, Python provides a strip([str]) built-in function. This function returns a copy of the string after removing whitespaces if present. Otherwise returns the original string.
string = " Python "
print(string.strip())
What is the usage of enumerate() function in Python?
Python | enumerate() Function
The enumerate() function is used to iterate through the sequence and retrieve the index position and its corresponding value at the same time.
lst = ["A","B","C"]
print (list(enumerate(lst)))
#[(0, 'A'), (1, 'B'), (2, 'C')]
Can you explain the filter(), map(), and reduce() functions?
Python | filter(), map(), and reduce() Functions
- filter() function accepts two arguments, a function and an iterable, where each element of the iterable is filtered through the function to test if the item is accepted or not.
>>> set(filter(lambda x:x>4, range(7))) # {5, 6}
-
map() function calls the specified function for each item of an iterable and returns a list of result
>>> set(map(lambda x:x**3, range(7))) # {0, 1, 64, 8, 216, 27, 125}
-
reduce() function reduces a sequence pair-wise, repeatedly until we arrive at a single value..
>>> reduce(lambda x,y:y-x, [1,2,3,4,5]) # 3
Let’s understand this:
2-1=1
3-1=2
4-2=2
5-2=3Hence, 3.
What is a namedtuple?
Python | namedtuple
A namedtuple will let us access a tuple’s elements using a name/label. We use the function namedtuple() for this, and import it from collections.
>>> from collections import namedtuple
#format
>>> result=namedtuple('result','Physics Chemistry Maths')
#declaring the tuple
>>> Chris=result(Physics=86,Chemistry=92,Maths=80)
>>> Chris.Chemistry
# 92
Write a code to add the values of same keys in two different dictionaries and return a new dictionary.
We can use the Counter method from the collections module
from collections import Counter
dict1 = {'a': 5, 'b': 3, 'c': 2}
dict2 = {'a': 2, 'b': 4, 'c': 3}
new_dict = Counter(dict1) + Counter(dict2)
print(new_dict)
# Print: Counter({'a': 7, 'b': 7, 'c': 5})
Python In-place swapping of two numbers
Python | In-place swapping of two numbers
>>> a, b = 10, 20
>>> print(a, b)
10 20
>>> a, b = b, a
>>> print(a, b)
20 10
Reversing a String in Python
Python | Reversing a String
>>> x = 'PythonWorld'
>>> print(x[: : -1])
dlroWnohtyP
Python join all items of a list to convert into a single string
Python | Join all items of a list to convert into a single string
>>> x = ["Python", "Online", "Training"]
>>> print(" ".join(x))
Python Online Training
python return multiple values from functions
Python | Return multiple values from functions
>>> def A():
return 2, 3, 4
>>> a, b, c = A()
>>> print(a, b, c)
2 3 4
Python Print String N times
Python | Print String N times
>>> s = 'Python'
>>> n = 5
>>> print(s * n)
PythonPythonPythonPythonPython
Python check the memory usage of an object
Python | Check the memory usage of an object
>>> import sys
>>> x = 100
>>> print(sys.getsizeof(x))
28