Back to notes
What method allows you to convert a DataFrame's column to a different data type, such as int or float?
Press to flip
The `astype()` method.
What Python library is essential for data manipulation and analysis, particularly when cleaning data?
Pandas
What function is used to calculate the average value of a DataFrame column?
`df['ColumnName'].mean()`
How can you split a 'Span' column into separate 'Rookie Year' and 'Final Year' columns in a DataFrame?
`df['Span'].str.split('-').str[0]` for 'Rookie Year', and `df['Span'].str.split('-').str[1]` for 'Final Year'.
How do you drop duplicate rows from a DataFrame in Pandas?
By using `df.drop_duplicates()`.
Describe a method to extract the player names and countries by splitting a 'Player' column.
Use the `str.split()` method and assign the splits to new 'Name' and 'Country' columns.
How does one identify duplicate rows in a DataFrame?
By using the `df.duplicated()` method.
What method in Pandas is used to rename DataFrame columns for clarity?
The `df.rename()` method.
What is a necessary step before importing web data into Excel when direct download via CSV is not available?
Use Excel's 'From Web' feature to pull in the data and then save it as a CSV file.
What steps are involved in preparing a CSV file imported via Excel for analysis in Jupyter Notebook?
Import Pandas, upload the CSV to Jupyter Notebook, and use `pd.read_csv('file_path')` to read it into a DataFrame.
How can you fill missing values with zeros for specific columns in a Pandas DataFrame?
Use `df['ColumnName'].fillna(0, inplace=True)`.
Which method checks for any null values in a DataFrame?
`df.isnull().any()`
How can the career length of a player be calculated using column values in Pandas?
Subtract 'Rookie Year' from 'Final Year' and assign it to a new column, `df['Career Length']`.
What method would you use to find players who debuted before a certain year in your dataset?
Use a filter condition like `df[df['Rookie Year'] < 1960]`.
How can you group a DataFrame by a specific column and calculate the maximum for another column within those groups?
Using `df.groupby('ColumnName').max()`
Previous
Next