Pandas Library in Python

Jul 12, 2024

Pandas Library in Python

Introduction

  • Pandas Library: Essential for data science in Python
  • Video Scope: From beginner to comfortable with Pandas
  • Two Audiences: Newcomers and those looking for specific tips
  • Flexibility: More than Excel, large datasets
  • Install Pandas: pip install pandas

Loading Data

Steps

  1. CSV Data: Download from GitHub (Example: Pokemon data)
    • Save as .csv
  2. Text Editor: Open in Jupyter Notebooks recommended
  3. Import Pandas: import pandas as pd
  4. Load Data: df = pd.read_csv('pokemon_data.csv')

Data Inspection

  • Print Head/Tail: df.head(3), df.tail(3)

Loading Different File Types

  • Excel: df = pd.read_excel('pokemon_data.xlsx')
  • Tab-Separated Values (TSV): df = pd.read_csv('pokemon_data.txt', delimiter='\t')

Data Manipulation Basics

Columns and Rows

  • View Columns: df.columns
  • Access Column: df['Name'] or df.Name
  • Multiple Columns: df[['Name', 'Type 1', 'HP']]

Specific Rows and Values

  • Single Row: df.iloc[1]
  • Multiple Rows: df.iloc[1:4]
  • Specific Value: df.iloc[2, 1]

Iterating Through Rows

  • Iterate: for index, row in df.iterrows():
  • Access Data: row['Name']

Conditional Selection

  • Single Condition: df.loc[df['Type 1'] == 'Fire']
  • Multiple Conditions: df.loc[(df['Type 1'] == 'Fire') & (df['HP'] > 70)]

Descriptive Statistics

  • Describe Method: df.describe()
  • Sort Values: df.sort_values('Name', ascending=False)

Advanced Data Manipulation

Adding/Dropping Columns

  • Add Column: df['Total'] = df.iloc[:, 4:10].sum(axis=1)
  • Drop Column: df.drop(columns=['Total'], inplace=True)

Saving Data

  • To CSV: df.to_csv('modified.csv', index=False)
  • To Excel: df.to_excel('modified.xlsx', index=False)
  • To TSV: df.to_csv('modified.txt', sep='\t', index=False)

Advanced Filtering

  • Regular Expressions: df.loc[df['Name'].str.contains('^Pi[a-z]*', flags=re.I, regex=True)]

Modifying Data Based on Conditions

  • Change Values: df.loc[df['Type 1'] == 'Fire', 'Type 1'] = 'Flamer'
  • Set Multiple Columns: df.loc[df['Total'] > 500, ['Generation', 'Legendary']] = ['Test 1', 'Test 2']

Group By and Aggregate Functions

  • Group By: df.groupby(['Type 1']).mean().sort_values('Defense', ascending=False)
  • Sum and Count: df.groupby(['Type 1']).sum(), df.groupby(['Type 1']).count()

Handling Large Datasets

  • Read in Chunks: for chunk in pd.read_csv('modified.csv', chunksize=10000)

Conclusion

  • Subscribe: For more tutorials and advanced topics
  • Feedback: Leave comments for clarity and additional features