πŸ—‚οΈ

CS50 Week 5: Data Structures and Design

Jun 13, 2024

CS50 Week 5: Data Structures and Design

Introduction

  • Focus on design possibilities through data structures.
  • How to use computer memory to solve problems more effectively.

Abstract Data Types (ADT)

  • Abstract data type: high-level description of data structure with certain properties.
  • Example: Queue.
    • Characteristic: FIFO (First In, First Out).
    • Real-world example: forming lines or queues.
    • Operations: enqueue and dequeue.
  • Implementation via arrays:
    • Total capacity (e.g., 50 people).
    • Maintains data contiguously in memory.
    • Requires size tracking.
    • Example in C: typedef struct { int capacity; person people[50]; int size; } queue;
  • Upper limit constraints in fixed size allocation.

Stacks

  • Characteristic: LIFO (Last In, First Out).
  • Real-world example: inbox viewed as a stack.
  • Operations: push and pop.
  • Stack vs Queue demonstration with students.
  • Implementation considerations:
    • Efficient for recent data access (newer data on top).
    • Underlying implementation may resemble queues.
  • Code example: typedef struct { int capacity; person people[50]; int top; } stack;
  • Limitation: Fixed size capacity.
  • Solution involving dynamic memory allocation:
    • Allocate arrays dynamically as data grows.
    • Problem: costly copying and reallocation for growing data sets.

Linked Lists

  • Solve the limitation of fixed array sizes.
  • Nodes linked via pointers, not contiguous in memory.
  • Structure of a node: typedef struct node { int number; struct node *next; } node;
  • Advantages:
    • Dynamic memory allocation.
    • Each node links to the next, so size grows without moving data.
  • Implementation in C: int main(void) { node *list = NULL; // Allocate nodes, link them }
  • Insertions can be done in constant time if always prepending.
  • Downsides:
    • Search operations are linear (O(n) time)
    • Potential for messy memory management (e.g., orphaning nodes, leaks).

Binary Search Trees

  • Inventories sorted in a two-dimensional bifurcation manner.
  • Tree invariant: For any node, left subtree nodes < right subtree nodes.
  • Terminology: root, leaf, subtree.
  • Structured for efficient searching:
    • From the root, traverse left or right based on comparisons.
  • Code implementation: typedef struct node { int number; struct node *left; struct node *right; } node;
  • Search recursively. bool search(node *tree, int number) { /* Base case and other conditions */ }
  • Problem with balance: Trees can degenerate into linked lists if unbalanced.
  • Advanced trees (balanced trees) maintain efficient structure.

Hash Tables

  • Data structure combining ideas of arrays & linked lists for quick look-ups.
  • Hash function: Maps inputs (keys) to specific indices in a table.
  • Resolving collisions: Example of linked lists at array indices.
  • Hash table running times:
    • Best case: Constant time with ideal hashes (O(1)).
    • Worst case: Degrades to O(n).
    • Average cases improved with good hash functions.
  • Example: Simple hash function for name unsigned int hash(const char *word) { return toupper(word[0]) - 'A'; }

Tries

  • Tree of arrays facilitating very fast look-ups.
  • Leverages nodes connected via array pointers.
  • Code example defining node in a trie with implied letters: typedef struct node { struct node *children[26]; bool is_word; } node;
  • Ensures very fast retrieval (practical O(1) time).
  • Downsides: Significant memory usage due to space for all possible letter combinations.

Practical Examples

  • Sweetgreen: Organizes orders by customer's initial.
  • Salads hashing to shelves akin to hash table principles.

Summary

  • Various data structures have different trade-offs between time and space.
  • Moving to higher-level languages like Python will simplify implementation while understanding principles is crucial.