Aug 1, 2024
kaggle.json
file.C:/Users/YourUsername/.kaggle/
pip install kaggle
kaggle datasets download -d <DATASET_LINK>
import pandas as pd
df = pd.read_csv('orders.csv')
df.head(20)
to view the first 20 records.None
for ship_mode
.df['ship_mode'].replace(['Not Available', 'Unknown'], None, inplace=True)
df.rename()
or update with lower case and underscores.
df.columns = df.columns.str.lower().str.replace(' ', '_')
df['discount'] = df['list_price'] * (df['discount_percent'] / 100)
df['sale_price'] = df['list_price'] - df['discount']
df['profit'] = df['cost_price'] - df['sale_price']
df.drop(['list_price', 'cost_price', 'discount_percent'], axis=1, inplace=True)
df['order_date'] = pd.to_datetime(df['order_date'], format='%Y-%m-%d')
from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc://<username>:<password>@<server_name>/<database_name>?driver=ODBC+Driver+17+for+SQL+Server')
df.to_sql('df_orders', con=engine, index=False, if_exists='replace')
Find Top 10 Highest Revenue Generating Products:
SELECT product_id, SUM(sale_price) AS sales
FROM df_orders
GROUP BY product_id
ORDER BY sales DESC
LIMIT 10;
Top 5 Highest Selling Products in Each Region:
Month-over-Month Growth Comparison for 2022 and 2023:
Highest Sales Month for Each Category:
Subcategory with Highest Profit Growth in 2023 Compared to 2022: