💻

[Lecture 28] Understanding VLIW and Computer Architecture

Apr 11, 2025

Lecture Notes: Computer Architecture Paradigms

Recap of Previous Topics

  • Microarchitecture
  • Pipelining
  • Precise Exceptions
  • Out of Order Execution
  • Superscalar Execution
  • Branch Prediction
  • High-impact concepts used in modern processors.

New Paradigms

VLIW (Very Long Instruction Word)

  • Definition: An ISA variant that encodes multiple operations in a single instruction.
  • Comparison with Superscalar:
    • Superscalar: Hardware fetches, decodes, executes multiple instructions, managing dependencies.
    • VLIW: Compiler identifies and packs independent instructions; hardware executes them concurrently without dependency checking.
  • Compiler Role: Responsible for finding and packing independent instructions, scheduling them without hardware dependency checking.
  • Hardware Simplification:
    • Simple hardware design as it executes instructions without checking dependencies.
    • Compiler must know the hardware pipeline structure to place instructions correctly.
  • Challenges:
    • Difficult to achieve due to the complexity of finding independent instructions.
    • Hardware still needs minimal support for variable latency operations (e.g., memory operations).
  • Practical Impact:
    • VLIW hasn't been widely successful commercially due to challenges with variable latency operations.
    • Successful in domains where static scheduling is feasible, such as DSPs and embedded systems.

VLIW Characteristics

  • Lockstep Execution: All instructions in a bundle start and complete together.
  • Static Scheduling: Compiler handles all scheduling, including variable latency predictions.
  • Static Scheduling Challenges:
    • Memory operations often have variable latencies, complicating static scheduling.
    • Compiler predictions can lead to performance issues if incorrect.

VLIW vs. RISC

  • RISC Philosophy: Simple instructions, compiler handles complexity.
  • VLIW Extension: Extends RISC philosophy to multiple instructions per cycle.
  • Benefits of Simple Hardware:
    • Easier to design and lower power consumption.
    • Potentially higher frequency.

Energy Efficiency Considerations

  • Power vs. Energy:
    • Low power does not necessarily mean low energy consumption.
    • High power processors may consume less energy due to faster execution times.

Historical Attempts at VLIW

  • Companies like Multiflow, CyDrome, and Transmeta attempted VLIW designs.
  • Intel's Itanium: Attempt to replace x86 with a new VLIW-based architecture. Ultimately not successful.
  • AMD's x86-64: Extended x86 with 64-bit instructions, maintaining compatibility and success.

VLIW Compiler Optimizations

  • Trace Scheduling: Merging frequently executed paths for optimization.
  • Superblock Formation: Combining frequently executed blocks into larger blocks for optimization.
  • Common Subexpression Elimination: Reducing redundancy in execution.
  • Challenges:
    • Larger code size due to tail duplication and fix-up code.
    • Optimizations are heavily profile-dependent.

Dynamic ISA Translation

  • Transmeta's Crusoe: Dynamic binary translation from x86 to VLIW.
  • Apple's Rosetta: Translating x86 to ARM ISA.

Conclusion

  • VLIW, while not successful in general-purpose computing, has influenced many compiler optimizations and had success in specialized areas.
  • Understanding these paradigms provides insight into the trade-offs between hardware and software complexities in computer architecture.