Simple Linear Regression Example in Python
Introduction
- Tools needed:
scikit-learn
, Pandas
, Quandl
- Installation: Use
pip install sklearn
, pip install quandl
, pip install pandas
Basic Concepts
- Regression: Used to model continuous data and find the best-fit line.
- Equation of Line:
y = mx + b
, where m
and b
need to be determined.
- Application Example: Stock prices.
- Data Type: Continuous data (e.g., stock prices over months).
- Difference from Classification: Classification assigns unique labels to different data groups.
Features and Labels
- Supervised Machine Learning: Involves features (attributes) and labels (outcomes).
- Meaningful Features: Important for effective modeling.
Implementation Steps
Import Required Libraries
import pandas as pd
import quandl
Fetching Data
df = quandl.get("WIKI/GOOGL")
print(df.head())
- Quandl Dataset: Use the wiki dataset for Google stock (
WIKI/GOOGL
).
- Account: Not mandatory, but allows more requests.
Understanding Features
- Columns: Open, High, Low, Close, Volume, and Adjusted prices.
- Adjusted: Accounts for stock splits.
- Feature Relationships: Consider relationships (e.g., High-Low for volatility).
Selecting Meaningful Features
columns = ['Adj. Open', 'Adj. High', 'Adj. Low', 'Adj. Close', 'Adj. Volume']
df = df[columns]
- Selected Columns: Adjusted Open, High, Low, Close, and Volume.
Creating New Features
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Close']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0
- HL_PCT: Measures daily volatility (
High
- Low
/ Low
* 100).
- PCT_change: Measures daily movement (
Close
- Open
/ Open
* 100).
Final Dataframe
import pandas as pd
import quandl
df['HL_PCT'] = df['Adj. High'] - df['Adj. Low'] / df['Adj. Low'] * 100
# daily percent change - you can use a different logic
df['PCT_change'] = df['Adj. Close'] / df['Adj. Open'] * 100
final_df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
print(final_df.head())
Considerations for next steps
- Features vs. Labels: Features help predict labels.
- Future Prediction: Decide if
Adj. Close
will be a feature or a label.
Conclusion
- Think about the relationship between features and labels.
- Next steps involve making predictions with the clean data.
- For questions or comments, further video instructions are available.