📊

Basics of Filtering Data in Pandas

Jul 10, 2024

Basics of Filtering Data in Pandas

Introduction

  • Focus on filtering data from DataFrames and Series in Pandas.
  • Examples include filtering by knowledge of Python, country, or salary range.
  • Highlight of the importance of filtering as an initial step in data projects.

Getting Started

  • Using basic comparisons to create filters.
  • Filters return Series objects of True/False values corresponding to rows.
    • df['last_name'] == 'Doe' returns a boolean Series.
  • Applying filters to DataFrames helps extract rows meeting specific criteria.

Assignment and Usage

  • Assigning filters to variables (avoid using the keyword filter).
  • Best practice: wrap filter conditions in parentheses for readability.
  • Applying the filter:
    • Direct assignment (df[filt]) vs. df.loc[filt].
    • df.loc[filt] allows specifying columns.

Logical Operators

  • AND Operator: &
    • Example: (df['last_name'] == 'Doe') & (df['first_name'] == 'John')
  • OR Operator: |
    • Example: (df['last_name'] == 'Schaefer') | (df['first_name'] == 'John')
  • NOT Operator: ~
    • Example: ~((df['last_name'] == 'Schaefer') | (df['first_name'] == 'John'))

Real-World Applications

High Salary Filter

  • Filtering DataFrame for salaries over a certain amount.
    • high_salary = df['ConvertedComp'] > 70000
    • Applying df.loc[high_salary, ['Country', 'LanguageWorkedWith', 'ConvertedComp']]

Filter by Multiple Values

  • Filtering based on multiple countries.
    • Create a list of countries.
    • Use df['Country'].isin(countries) to filter.

String Methods in Filters

  • Using string methods to filter data.
  • Example: Check if languages contain 'Python'.
    • df['LanguageWorkedWith'].str.contains('Python', na=False)

Summary

  • Filters return Series of True/False and apply them to DataFrames.
  • Use various logical operators for more complex filters.
  • Apply filters directly via df.loc or inline.
  • Importance of filters in preprocessing data before further analysis.

Additional Comments

  • Mention of the sponsor, Brilliant.org.
  • Example of use cases for Brilliant in learning data science basics and Python.

Upcoming Content

  • Next video will cover altering data in DataFrames.
  • Examples include making email addresses lowercase, removing spaces in column names, etc.