🖼️

CS231n Introduction Lecture

Jul 2, 2024

CS231n Lecture Notes

Introduction

  • Course Introduction:
    • CS231n is a computer vision class at Stanford, growing exponentially in enrollment over the years.
    • Initial enrollment: 150 students.
    • Last year: 350 students.
    • This year: 730 students.
    • Apology for those who couldn't fit into the lecture hall; videos will be available on the SCPD website.

What is Computer Vision?

  • Definition: Study of visual data.
  • Importance: Explosion of visual data due to smartphones with multiple cameras.
    • 2015 CISCO study estimated that by 2017, 80% of all internet traffic would be video.
    • Visual data is difficult to understand for algorithms—referred to as the "dark matter" of the internet.
  • Example: Youtube uploads 5 hours of video every second.

Interdisciplinary Nature

  • Fields connected to Computer Vision:
    • Physics (optics and image formation).
    • Biology and psychology (animal visual processing).
    • Computer science, mathematics, and engineering.

Teaching Staff

  • Instructors:
    • Course taught by PHD students Justin and Serena from the Stanford Vision Lab.
    • Lab led by professor Fei-Fei Li, focusing on machine learning and computer vision.

Course Relations

  • Related Stanford Courses:
    • CS131: Introductory computer vision course.
    • Deep Learning and NLP course by Chris Manning and Richard Socher.
    • CS231a: More comprehensive computer vision course by Silvio Savarese.
    • CS231n focuses on neural networks and convolutional neural networks (CNNs) for visual recognition tasks.

History of Computer Vision

Biological Vision

  • Origin: Began around 540 million years ago; explosion in animal species due to the development of vision.
  • Importance: Almost 50% of neurons in the human cortex are involved in visual processing.

Human-made Vision Systems

  • Early Cameras: Camera obscura originated in the 1600s.
  • Biological Studies: Hubel and Wiesel (1950s-60s) studied cat brains to understand visual processing.

Early Computer Vision

  • Block World (1960s): Work by Larry Roberts using geometric shapes for object recognition.
  • MIT Summer Vision Project (1966): Attempted to construct a visual system in one summer.
  • David Marr: Influential work in the late 1970s on hierarchical processes for visual recognition.

Challenges and Advances

  • Object Segmentation: Clustering pixels into meaningful areas.
  • Face Detection: Real-time face detection by Paul Viola and Michael Jones (2001).
  • Feature-based Object Recognition: Using features like SIFT for object matching by David Lowe (1999-2000).

Larger Datasets and Benchmarks

  • PASCAL Visual Object Challenge: Benchmark for object recognition.
  • ImageNet Project: Massive dataset collected from the internet for object recognition.
  • ImageNet Challenge: International competition for image classification (1.4 million objects across 1,000 classes).

Rise of Convolutional Neural Networks (CNNs)

  • Breakthrough in 2012: Hinton's group won with a CNN model (AlexNet).
  • Key Innovations: Increase in computational power (GPUs) and availability of large datasets.

Practical Applications and Future Challenges

  • Applications: Medical diagnosis, self-driving cars, robotics, augmented reality.
  • Open Challenges: Semantic segmentation, 3D understanding, activity recognition, VR/AR.
  • Visual Genome Project: Capturing rich visual relationships in images.

Logistics and Course Structure

  • Instructors and Staff:

    • Fei-Fei Li, Justin Johnson, Serena Yeung.
    • 18 TAs for course support.
  • Communication: Piazza for Q&A and course interactions, optional textbook on deep learning.

  • Assignments and Exams:

    • Three problem sets.
    • Written midterm exam.
    • Final course project in teams.
    • Late policy: 7 late days allocated across assignments.
    • Collaboration policy: Adherence to the Stanford honor code.
  • Prerequisites:

    • Proficiency in Python and familiarity with C/C++.
    • Knowledge of calculus and linear algebra.
    • Basic understanding of computer vision and machine learning principles.

Course Philosophy and Goals

  • Deep Understanding: Implementing CNNs from scratch in Python.
  • Practicals: Exposure to state-of-the-art tools like TensorFlow and PyTorch.
  • State of the Art: Covering latest content and research in computer vision.
  • Fun: Exploring interesting applications like image captioning, DeepDream, and style transfer.