Overview
This lecture provides a comprehensive introduction to Python for data science, covering Python basics, data types, control flow, functions, essential libraries (NumPy), and key interview topics for aspiring data scientists.
Why Python for Data Science?
- Python is preferred for data science due to its simple syntax, community support, and extensive libraries.
- Python is an interpreted, high-level, dynamically-typed language making debugging and learning easier.
- Python's versatility allows its use in web dev, ML, AI, and automation.
- Most data science libraries and tools (NumPy, Pandas, scikit-learn) are built on or for Python.
Python Fundamentals
- Python variables are references to objects and can store any data type.
- Variable assignment uses the = operator; declaration and initialization can occur together.
- Python uses dynamic typing; variables are case-sensitive and must follow naming conventions (no special chars/hyphens, can't start with numbers, avoid keywords).
- There are three primary data types: numeric, sequential, and boolean.
Data Types & Collections
- Numeric types: int (integers), float (decimals), complex (e.g. 2+3j).
- Sequential types include list (ordered, mutable, allows duplicates), tuple (ordered, immutable), string (immutable), set (unordered, no duplicates), dict (key-value pairs).
- Lists support positive/negative indexing, slicing, and various methods (append, extend, insert, remove, pop, clear, sort, reverse, count).
Control Flow: Conditionals & Loops
- Conditional statements: if, if-else, if-elif-else for decision making.
- Loops: for (definite iterations over sequences), while (indefinite iterations while a condition is true).
- Loop control: break (exit loop), continue (skip iteration), pass (do nothing).
Functions
- Functions promote code reuse ("Do Not Repeat Yourself" principle), modularity, and easier debugging.
- Defined using def, can accept parameters (with defaults or as optional using *args, **kwargs).
- Lambda (anonymous) functions allow short, one-line logic.
- Functional programming: map, filter for element-wise or condition-based operations on collections.***
NumPy for Data Science
- NumPy arrays are faster & more memory efficient than lists due to C-based implementation.
- Support only one data type per array (homogeneous).
- Enable direct (element-wise) operations, array reshaping, slicing, broadcasting, and many mathematical functions (universal functions, ufuncs).
- Comprehensive tools for handling missing values (np.nan), random generation, aggregations, and linear algebra.
Common Data Science/Interview Questions
- Mutability: lists are mutable, tuples/strings are immutable.
- Difference between shallow and deep copy (copy module).
- List vs tuple vs set vs dictionary—mutability, use-cases, uniqueness, indexing.
- List comprehension for concise list creation.
- Scope: global vs local variables.
- Exception handling: try/except/finally blocks.
- Object-oriented basics: abstraction, encapsulation, inheritance, method overloading/overriding, self.
- Sorting: sorted() (returns new), sort() (in-place).
- Generators (yield), decorators (@decorator syntax).
- Linked list and algorithm logic (including sample problems like finding substrings or knapsack).
Key Terms & Definitions
- Interpreted language — Code is executed line-by-line, improving debugging.
- Mutable — Objects that can be changed after creation (e.g., list).
- Immutable — Objects that cannot be modified after creation (e.g., tuple, string).
- Slicing — Extracting a subset from sequences (list, string).
- List comprehension — A concise way to create lists using an expression inside brackets.
- Broadcasting (NumPy) — Automatic expansion of smaller arrays to perform element-wise operations with larger arrays.
- Lambda function — Anonymous, inline function defined with the lambda keyword.
Action Items / Next Steps
- Practice coding: variables, data types, indexing, slicing, and loops.
- Write functions and explore lambda, map, and filter.
- Install and explore NumPy; practice array creation and operations.
- Review and implement common interview questions and data structure problems.
- Complete homework assignments regarding functions, list operations, and algorithmic challenges (e.g., grades, linked lists).
- Read further on Pandas and other data science libraries for next classes.