CS50 Week 5: Data Structures and Design Possibilities

Goals for Today

Revisit past topics focusing on design possibilities via data structures
Understand how to structure data and use memory to solve problems more effectively

Key Topics

Abstract Data Types (ADTs)
Queues
Stacks
Arrays vs. Linked Lists
Trees
Hash Tables
Tries

Abstract Data Types (ADTs)

ADTs provide certain properties and characteristics
Implementation details are up to the programmer
Example: Queue
- FIFO (First In, First Out)
- Operations: enqueue, dequeue
Stack
- LIFO (Last In, First Out)
- Operations: push, pop

Implementing Queues and Stacks in C

Can use arrays to implement basic queues and stacks
Arrays need a specified size, which can be limiting
Example: Array with capacity 50, if 51 people need space, it can lead to issues
For a dynamic size, additional approaches are necessary

Dynamic Memory with Malloc

Use malloc for dynamic memory allocation
Check if malloc returns null
Manually copy data from old array to new larger array
Free old memory

Procedural steps for implementing dynamic arrays have overhead and potential inefficiencies

Linked Lists

Advantages: Dynamic memory allocation, growth and shrinkage as needed
Disadvantages: Pointers add memory overhead, searching takes O(n) time

Basic Node Structure:

typedef struct node {
  int number;
  struct node *next;
} node;

Uses dynamic memory insights from malloc

Example Operations

Inserting at the beginning (Prepending)
- Constant time O(1) for insertions
- Results in items inserted in reverse order
Appending (Inserting at the end)
- O(n) time to traverse to the end
Maintaining Sorted Order
- Requires traversal and appropriate insertion maintaining O(n) time

Trees

Binary Search Trees (BST)
- Nodes with two children pointers ('left' and 'right')
- Properties:
  - Nodes to the left are smaller
  - Nodes to the right are larger
- Advantages: Allows O(log n) operations (search, insert, delete)
- Disadvantages: Can be unbalanced (degrade to O(n))

Tree Operations

Search Function (Recursive)

bool search(node *tree, int number) {
  if (tree == NULL) return false;
  if (number < tree->number) return search(tree->left, number);
  if (number > tree->number) return search(tree->right, number);
  return true;
}

Must maintain balanced trees for optimal performance

Hash Tables

Concept: Map keys to values using a hash function
Example: Phone contacts application
- Uses an array of size 26 for letters A-Z
- Handles collisions with linked lists
- Hash Function
```
unsigned int hash(const char *word) {
  // Implementation example
  return toupper(word[0]) - 'A';
}
```
Collision Handling: Linked Lists
Advantages: Near constant time, O(1), searching with good hash function
Disadvantages: Potential for collisions, worst-case O(n)

Tries (Prefix Trees)

Specialized tree structure for efficient retrieval
Composed of arrays of pointers at each node
Example Names stored: Toad, Toadette, Tom
Properties:
- Efficient prefix matching
- Constant time insertion and lookup
- High memory overhead for wide trees with sparse nodes

Node Structure:

typedef struct node {
  struct node *children[26];
  char *number;
} node;

Used in implementations like contact lists for efficient autocomplete

Practical Applications

Hash tables and tries can be seen in real-world data structures, e.g., dictionaries, contact lists
Trade-offs between memory and time efficiency must be considered carefully

Conclusion

Data structures should be chosen based on the problem requirements and constraints
Understanding the trade-offs between different implementations is crucial for efficient software design
Next week: Moving from low-level manipulations (C) to high-level abstractions (Python)