Machine Learning for Everyone - Lecture Notes

Introduction to Kylie Ying

Physical scientist and engineer worked at notable places like MIT, CERN, Free Code Camp.
Aimed at teaching machine learning to beginners.

Machine learning (ML) is a sub-domain of computer science that focuses on algorithms allowing computers to learn from data without explicit programming.
Difference between AI, ML, and Data Science:
- AI: Enabling computers to perform human-like tasks.
- ML: Subset of AI focused on making predictions using data.
- Data Science: Field that finds patterns and insights in data.

Supervised Learning:
- Uses labeled inputs to train models.
- Example: Predicting classes based on features (e.g., pictures of animals).
Unsupervised Learning:
- Uses unlabeled data to find patterns or groupings.
- Example: Clustering data points based on similarities.
Reinforcement Learning:
- An agent learns in an interactive environment through rewards and penalties.

Key Concepts:
- Features: Inputs used by the model to make predictions.
- Labels: Output to predict based on features.
Types of Supervised Tasks:
- Classification: Predicting discrete classes (e.g., spam/not spam).
  - Binary Classification: Two classes (e.g., cat vs not cat).
  - Multi-Class Classification: More than two classes (e.g., cat, dog, lizard).
- Regression: Predicting continuous values (e.g., price of a house).

Uses properties of light hitting a gamma telescope to predict types of particles.
Process:
1. Download dataset from UCI machine learning repository.
2. Set up Google CoLab.
3. Import libraries: NumPy, Pandas, Matplotlib, etc.
4. Read and prepare the dataset (e.g., assign column names).

Data Exploration Techniques:
- Viewing the first few entries of the dataset (using .head()).
- Checking unique labels in the dataset.
- Converting labels of gamma and hadron particles from letters to numbers.

Data Loading:
- Load the data into a DataFrame.
Data Cleaning:
- Assign column names.
- Convert categorical labels into numeric.
Feature Selection:
- Select relevant features for the model.

K-Nearest Neighbors (KNN):
- Classifies points based on majority class of neighbors.
- Distance function used: Euclidean distance.
Naive Bayes:
- Uses probability to classify data based on prior probabilities.
- Assumes independence of features.
Logistic Regression:
- Predicts probability of class membership using a logistic function.
- Models the relationship between dependent and independent variables.
Support Vector Machines (SVM):
- Finds hyperplane that best separates different classes in high-dimensional space.
Neural Networks:
- Models with layers of interconnected nodes (neurons).
- Trains on labeled data to predict outputs.
- Uses backpropagation to adjust weights based on loss.

Reduces dimensionality of dataset while preserving variance:
- Projects data into lower dimensions.
- Finds the direction with the most variance.
- Enables visualization of high-dimensional data.

Summarized the key concepts and implementations in supervised and unsupervised learning.
Encouraged community engagement for further learning.