Regular Expressions in Python - Lecture Notes
Introduction
- Topic: Working with regular expressions (regex) using Python's
re
module
- Definition: Regex provides a method to search and match text patterns
- Usage: Can be used across various programming languages, text editors, etc.
- Importance: Useful for searching, modifying, and manipulating text based on patterns
Setting Up
- Importing
re
module: import re
- Text to Search: Can be a multi-line string containing various text patterns (e.g., lower and upper case letters, digits, URLs, phone numbers, etc.)
- Raw Strings: Strings prefixed with
r
(e.g., r'string'
) to avoid special treatment of backslashes
Basic Regex Functions
- Compiling Patterns:
re.compile(pattern)
- Finding Matches:
pattern.finditer(text)
returns an iterator of match objects
- Match Objects: Contains information like
span
(start and end indices) and group
(matched text)
- String Slicing: Using
text[span[0]:span[1]]
for exact matches
Special Characters & Meta Characters
- Literal Matches: Direct text matches
- Escaping Characters: Use backslash () to escape special characters (e.g.,
.
becomes \.
)
- Dot (
.
): Matches any character except newline
- Digit (
\d
): Matches a digit (0-9)
- Non-digit (
\D
): Matches any character except a digit
- Word Character (
\w
): Matches alphanumeric characters and underscore
- Non-word Character (
\W
): Matches any character except alphanumeric and underscore
- Whitespace (
\s
): Matches any whitespace character (space, tab, newline)
- Non-whitespace (
\S
): Matches any non-whitespace character
- Anchors:
^
(start of string) and $
(end of string)
- Word Boundaries:
\b
(word boundary), \B
(non-word boundary)
Character Sets
- Syntax: Square brackets
[ ]
to match any one character inside
- Example:
[a-z]
matches any lowercase letter
- Negation:
[^...]
matches any character not inside the brackets
Quantifiers
- Asterisk (
*
): Zero or more
- Plus (
+
): One or more
- Question Mark (
?
): Zero or one
- Curly Braces
{n}
: Exact number
- Curly Braces Range
{min,max}
: Range of numbers
Practical Examples
- Matching phone numbers using meta characters, character sets, and quantifiers
- Parsing data from text files using regex patterns
- Validation and pattern matching (e.g., URLs, email addresses)
Groups
- Grouping Patterns: Use parentheses
( )
to create groups
- Capture and Reference: Capture parts of the pattern and reference them using backreferences
- Example: Capturing domain and top-level domain in URLs
Substitution
- Substitution Method:
pattern.sub(replacement, text)
- Backreferences in Substitution: Use
\num
where num
is the group number
Additional Methods
findall
: Returns a list of all matches
match
: Checks for a match only at the beginning of the string
search
: Searches the entire string for a match
Flags
- Ignore Case:
re.IGNORECASE
or re.I
- Multi-line Mode:
re.MULTILINE
or re.M
- Verbose Mode:
re.VERBOSE
or re.X
for readable regex
Conclusion
- Regular expressions are powerful tools for text pattern matching and manipulation
- Practice and familiarity are key to mastering regex
- Future videos will cover advanced topics in regex for deeper understanding
Questions: Feel free to ask questions or request further explanations in the comments or discussion forums.