Lecture on Machine Learning Models
Introduction
- Importance of prior videos: foundation for understanding machine learning and its usefulness
- Key term: model
What is a Model?
- A model is a representation of reality
- Data set: a large collection of data representing reality
- More data = closer to reality
- Sometimes a data set alone is not enough
Splitting Data Sets
- Representation of reality by splitting data into sections
- Example: Predicting diabetes
- Data set includes health information, age, family history, etc.
- Each piece of data represents an instance of reality
Simplified Data Representation
- Simplification for understanding purpose
- E.g., Binary data (yes/no, male/female)
- Creating a table with this simplified data
- Columns: Attributes like sex, age, family history
- Extra Column: Whether they had diabetes
Using Historical Data
- Historical data helps in making accurate models
- Modeling reality using if statements
- Example:
If sex is male, age < 50, history = true, then diabetes = false
- Simplifies understanding but not practical for large data sets
Predictive Data Analytics vs. Machine Learning
- For data analytics: Fine to use simple if statements/model
- For machine learning: Deals with incomplete data representation
- Predict best model for unknown data sections
- The model is the choice the algorithm makes
Possibilities and Model Selection
- How many models are possible?
- Example situation: four models
- Consider only rows without data
- Different combinations (e.g., both true, both false)
- Machine learning algorithm chooses the best possible model
Conclusion
- Machine learning models: most likely representations of reality from incomplete data
- Video aim: Better understanding of what a model is in machine learning
End Note: The importance of understanding models and how machine learning selects the best model.