Back to notes
What methods can be used to find and handle duplicate data in a DataFrame?
Press to flip
The methods `duplicated()` to identify duplicates and `drop_duplicates()` to remove them can be used.
List the steps to install Pandas and verify the installation.
1. Install Python and Pip. 2. Install PyCharm Community Edition. 3. Connect Python with PyCharm. 4. Install Numpy and Pandas using PyCharm. 5. Verify installation using command prompt, e.g., `python --version`.
What types of file formats does Pandas support for data import/export?
Pandas supports various file formats like Excel, CSV, JSON, and XML for data import and export.
List and explain two types of plots that can be created using Pandas plotting capabilities.
1. Histogram: Represents frequency distributions. 2. Scatter Plot: Shows relationships between two continuous variables.
What is the origin of the Pandas library and who developed it?
Pandas was developed by Wes McKinney in 2008.
What are some common attributes of a DataFrame?
Common attributes of a DataFrame include `.dtype`, `.ndim`, `.size`, `.shape`, `.index`, and `.T`.
What method would you use to remove duplicate entries in a DataFrame?
Use the `drop_duplicates()` method to remove duplicate entries.
How can you handle missing data in a Pandas DataFrame?
Missing data can be handled using methods like `.isnull()`, `.notnull()`, `.dropna()`, and `.fillna()` to find, drop, or fill missing values.
Provide an example of creating a DataFrame in Pandas.
1. Import pandas. 2. Create a dictionary or list of data. 3. Load the data into a DataFrame using `pd.DataFrame(data)`. Example: `df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})`.
How can you clean data using Pandas, specifically for replacing missing values?
You can clean data by replacing missing values using the `fillna()` method, specifying a value or method for filling missing data.
Describe how you can group data in a DataFrame and apply aggregations.
You can group data using the `groupby()` method and apply aggregations like `mean` and `size` to these groups.
How do you access rows and columns in a DataFrame by integer positions?
You can access rows and columns by integer positions using the `.iloc[]` method.
Which library is essential for plotting with Pandas?
The Matplotlib library is essential for plotting with Pandas.
How can you iterate over rows in a DataFrame?
You can iterate over rows in a DataFrame using loops, especially with methods like `.iterrows()`.
Explain the difference between a Pandas DataFrame and a Series.
A DataFrame is a two-dimensional tabular data structure with rows and columns, while a Series is a one-dimensional array, similar to a single column of a DataFrame.
Previous
Next