Regular Expressions in Python - Lecture Notes

Jul 12, 2024

Regular Expressions in Python - Lecture Notes

Introduction

  • Topic: Working with regular expressions (regex) using Python's re module
  • Definition: Regex provides a method to search and match text patterns
  • Usage: Can be used across various programming languages, text editors, etc.
  • Importance: Useful for searching, modifying, and manipulating text based on patterns

Setting Up

  • Importing re module: import re
  • Text to Search: Can be a multi-line string containing various text patterns (e.g., lower and upper case letters, digits, URLs, phone numbers, etc.)
  • Raw Strings: Strings prefixed with r (e.g., r'string') to avoid special treatment of backslashes

Basic Regex Functions

  • Compiling Patterns: re.compile(pattern)
  • Finding Matches: pattern.finditer(text) returns an iterator of match objects
  • Match Objects: Contains information like span (start and end indices) and group (matched text)
  • String Slicing: Using text[span[0]:span[1]] for exact matches

Special Characters & Meta Characters

  • Literal Matches: Direct text matches
  • Escaping Characters: Use backslash () to escape special characters (e.g., . becomes \.)
  • Dot (.): Matches any character except newline
  • Digit (\d): Matches a digit (0-9)
  • Non-digit (\D): Matches any character except a digit
  • Word Character (\w): Matches alphanumeric characters and underscore
  • Non-word Character (\W): Matches any character except alphanumeric and underscore
  • Whitespace (\s): Matches any whitespace character (space, tab, newline)
  • Non-whitespace (\S): Matches any non-whitespace character
  • Anchors: ^ (start of string) and $ (end of string)
  • Word Boundaries: \b (word boundary), \B (non-word boundary)

Character Sets

  • Syntax: Square brackets [ ] to match any one character inside
  • Example: [a-z] matches any lowercase letter
  • Negation: [^...] matches any character not inside the brackets

Quantifiers

  • Asterisk (*): Zero or more
  • Plus (+): One or more
  • Question Mark (?): Zero or one
  • Curly Braces {n}: Exact number
  • Curly Braces Range {min,max}: Range of numbers

Practical Examples

  • Matching phone numbers using meta characters, character sets, and quantifiers
  • Parsing data from text files using regex patterns
  • Validation and pattern matching (e.g., URLs, email addresses)

Groups

  • Grouping Patterns: Use parentheses ( ) to create groups
  • Capture and Reference: Capture parts of the pattern and reference them using backreferences
  • Example: Capturing domain and top-level domain in URLs

Substitution

  • Substitution Method: pattern.sub(replacement, text)
  • Backreferences in Substitution: Use \num where num is the group number

Additional Methods

  • findall: Returns a list of all matches
  • match: Checks for a match only at the beginning of the string
  • search: Searches the entire string for a match

Flags

  • Ignore Case: re.IGNORECASE or re.I
  • Multi-line Mode: re.MULTILINE or re.M
  • Verbose Mode: re.VERBOSE or re.X for readable regex

Conclusion

  • Regular expressions are powerful tools for text pattern matching and manipulation
  • Practice and familiarity are key to mastering regex
  • Future videos will cover advanced topics in regex for deeper understanding

Questions: Feel free to ask questions or request further explanations in the comments or discussion forums.