CS231n Lecture Notes

Introduction

Course Introduction:
- CS231n is a computer vision class at Stanford, growing exponentially in enrollment over the years.
- Initial enrollment: 150 students.
- Last year: 350 students.
- This year: 730 students.
- Apology for those who couldn't fit into the lecture hall; videos will be available on the SCPD website.

Definition: Study of visual data.
Importance: Explosion of visual data due to smartphones with multiple cameras.
- 2015 CISCO study estimated that by 2017, 80% of all internet traffic would be video.
- Visual data is difficult to understand for algorithms—referred to as the "dark matter" of the internet.
Example: Youtube uploads 5 hours of video every second.

Fields connected to Computer Vision:
- Physics (optics and image formation).
- Biology and psychology (animal visual processing).
- Computer science, mathematics, and engineering.

Instructors:
- Course taught by PHD students Justin and Serena from the Stanford Vision Lab.
- Lab led by professor Fei-Fei Li, focusing on machine learning and computer vision.

Related Stanford Courses:
- CS131: Introductory computer vision course.
- Deep Learning and NLP course by Chris Manning and Richard Socher.
- CS231a: More comprehensive computer vision course by Silvio Savarese.
- CS231n focuses on neural networks and convolutional neural networks (CNNs) for visual recognition tasks.

Origin: Began around 540 million years ago; explosion in animal species due to the development of vision.
Importance: Almost 50% of neurons in the human cortex are involved in visual processing.

Early Cameras: Camera obscura originated in the 1600s.
Biological Studies: Hubel and Wiesel (1950s-60s) studied cat brains to understand visual processing.

Block World (1960s): Work by Larry Roberts using geometric shapes for object recognition.
MIT Summer Vision Project (1966): Attempted to construct a visual system in one summer.
David Marr: Influential work in the late 1970s on hierarchical processes for visual recognition.

Object Segmentation: Clustering pixels into meaningful areas.
Face Detection: Real-time face detection by Paul Viola and Michael Jones (2001).
Feature-based Object Recognition: Using features like SIFT for object matching by David Lowe (1999-2000).

PASCAL Visual Object Challenge: Benchmark for object recognition.
ImageNet Project: Massive dataset collected from the internet for object recognition.
ImageNet Challenge: International competition for image classification (1.4 million objects across 1,000 classes).

Breakthrough in 2012: Hinton's group won with a CNN model (AlexNet).
Key Innovations: Increase in computational power (GPUs) and availability of large datasets.

Applications: Medical diagnosis, self-driving cars, robotics, augmented reality.
Open Challenges: Semantic segmentation, 3D understanding, activity recognition, VR/AR.
Visual Genome Project: Capturing rich visual relationships in images.

Instructors and Staff:
- Fei-Fei Li, Justin Johnson, Serena Yeung.
- 18 TAs for course support.
Communication: Piazza for Q&A and course interactions, optional textbook on deep learning.
Assignments and Exams:
- Three problem sets.
- Written midterm exam.
- Final course project in teams.
- Late policy: 7 late days allocated across assignments.
- Collaboration policy: Adherence to the Stanford honor code.
Prerequisites:
- Proficiency in Python and familiarity with C/C++.
- Knowledge of calculus and linear algebra.
- Basic understanding of computer vision and machine learning principles.

Deep Understanding: Implementing CNNs from scratch in Python.
Practicals: Exposure to state-of-the-art tools like TensorFlow and PyTorch.
State of the Art: Covering latest content and research in computer vision.
Fun: Exploring interesting applications like image captioning, DeepDream, and style transfer.