Simple Linear Regression Example in Python

Jul 11, 2024

Simple Linear Regression Example in Python

Introduction

Tools needed: scikit-learn, Pandas, Quandl
Installation: Use pip install sklearn, pip install quandl, pip install pandas

Basic Concepts

Regression: Used to model continuous data and find the best-fit line.
Equation of Line: y = mx + b, where m and b need to be determined.
Application Example: Stock prices.
Data Type: Continuous data (e.g., stock prices over months).
Difference from Classification: Classification assigns unique labels to different data groups.

Features and Labels

Supervised Machine Learning: Involves features (attributes) and labels (outcomes).
Meaningful Features: Important for effective modeling.

Implementation Steps

Import Required Libraries

import pandas as pd
import quandl

Fetching Data

df = quandl.get("WIKI/GOOGL")
print(df.head())

Quandl Dataset: Use the wiki dataset for Google stock (WIKI/GOOGL).
Account: Not mandatory, but allows more requests.

Understanding Features

Columns: Open, High, Low, Close, Volume, and Adjusted prices.
Adjusted: Accounts for stock splits.
Feature Relationships: Consider relationships (e.g., High-Low for volatility).

Selecting Meaningful Features

columns = ['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']
df = df[columns]

Selected Columns: Adjusted Open, High, Low, Close, and Volume.

Creating New Features

df['HL_PCT'] = (df['Adj. High'] - df['Adj. Close']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

HL_PCT: Measures daily volatility (High - Low / Low * 100).
PCT_change: Measures daily movement (Close - Open / Open * 100).

Final Dataframe

import pandas as pd
import quandl
df['HL_PCT'] = df['Adj. High'] - df['Adj. Low'] / df['Adj. Low'] * 100
# daily percent change - you can use a different logic
df['PCT_change'] = df['Adj. Close'] / df['Adj. Open'] * 100

final_df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
print(final_df.head())

Considerations for next steps

Features vs. Labels: Features help predict labels.
Future Prediction: Decide if Adj. Close will be a feature or a label.

Conclusion

Think about the relationship between features and labels.
Next steps involve making predictions with the clean data.
For questions or comments, further video instructions are available.

Full transcript