CS50 Week 5: Data Structures and Design Possibilities

Jun 26, 2024

CS50 Week 5: Data Structures and Design Possibilities

Goals for Today

  • Revisit past topics focusing on design possibilities via data structures
  • Understand how to structure data and use memory to solve problems more effectively

Key Topics

  • Abstract Data Types (ADTs)
  • Queues
  • Stacks
  • Arrays vs. Linked Lists
  • Trees
  • Hash Tables
  • Tries

Abstract Data Types (ADTs)

  • ADTs provide certain properties and characteristics
  • Implementation details are up to the programmer
  • Example: Queue
    • FIFO (First In, First Out)
    • Operations: enqueue, dequeue
  • Stack
    • LIFO (Last In, First Out)
    • Operations: push, pop

Implementing Queues and Stacks in C

  • Can use arrays to implement basic queues and stacks
  • Arrays need a specified size, which can be limiting
  • Example: Array with capacity 50, if 51 people need space, it can lead to issues
  • For a dynamic size, additional approaches are necessary

Dynamic Memory with Malloc

  1. Use malloc for dynamic memory allocation
  2. Check if malloc returns null
  3. Manually copy data from old array to new larger array
  4. Free old memory
  • Procedural steps for implementing dynamic arrays have overhead and potential inefficiencies

Linked Lists

  • Advantages: Dynamic memory allocation, growth and shrinkage as needed
  • Disadvantages: Pointers add memory overhead, searching takes O(n) time
  • Basic Node Structure:
    typedef struct node {
      int number;
      struct node *next;
    } node;
    
  • Uses dynamic memory insights from malloc

Example Operations

  • Inserting at the beginning (Prepending)
    • Constant time O(1) for insertions
    • Results in items inserted in reverse order
  • Appending (Inserting at the end)
    • O(n) time to traverse to the end
  • Maintaining Sorted Order
    • Requires traversal and appropriate insertion maintaining O(n) time

Trees

  • Binary Search Trees (BST)
    • Nodes with two children pointers ('left' and 'right')
    • Properties:
      • Nodes to the left are smaller
      • Nodes to the right are larger
    • Advantages: Allows O(log n) operations (search, insert, delete)
    • Disadvantages: Can be unbalanced (degrade to O(n))

Tree Operations

  • Search Function (Recursive)
    bool search(node *tree, int number) {
      if (tree == NULL) return false;
      if (number < tree->number) return search(tree->left, number);
      if (number > tree->number) return search(tree->right, number);
      return true;
    }
    
  • Must maintain balanced trees for optimal performance

Hash Tables

  • Concept: Map keys to values using a hash function

  • Example: Phone contacts application

    • Uses an array of size 26 for letters A-Z
    • Handles collisions with linked lists
    • Hash Function
      unsigned int hash(const char *word) {
        // Implementation example
        return toupper(word[0]) - 'A';
      }
      
  • Collision Handling: Linked Lists

  • Advantages: Near constant time, O(1), searching with good hash function

  • Disadvantages: Potential for collisions, worst-case O(n)

Tries (Prefix Trees)

  • Specialized tree structure for efficient retrieval
  • Composed of arrays of pointers at each node
  • Example Names stored: Toad, Toadette, Tom
  • Properties:
    • Efficient prefix matching
    • Constant time insertion and lookup
    • High memory overhead for wide trees with sparse nodes
  • Node Structure:
    typedef struct node {
      struct node *children[26];
      char *number;
    } node;
    
  • Used in implementations like contact lists for efficient autocomplete

Practical Applications

  • Hash tables and tries can be seen in real-world data structures, e.g., dictionaries, contact lists
  • Trade-offs between memory and time efficiency must be considered carefully

Conclusion

  • Data structures should be chosen based on the problem requirements and constraints
  • Understanding the trade-offs between different implementations is crucial for efficient software design
  • Next week: Moving from low-level manipulations (C) to high-level abstractions (Python)