đź§©

Ambiguity in Language and Code

Nov 7, 2025

Overview

The transcript explains linguistic ambiguity and its parallels in programming languages, focusing on compiler phases and common C++ ambiguities. It provides English analogies, C++ examples, and fixes, then touches on formal language theory and ambiguity-reducing languages.

Linguistic Ambiguity and Programming

  • Ambiguity in human language mirrors issues in programming interpretation.
  • Programmers communicate with compilers; compilers cannot ask for context.
  • Nearly all programming ambiguities parallel human language quirks.

Compiler Phases

  • Lexical analysis: converts character stream into tokens (identifiers, operators).
  • Parsing (syntactic analysis): builds abstract syntax tree (AST) and catches syntax errors.
  • Semantic analysis: checks meaning and context (types, scopes, logical consistency).

English Examples and Parallels

  • Buffalo sentence: same word as noun/verb causing ambiguity.
  • Visiting relatives can be boring: “visiting” as verb or adjective (structural ambiguity).
  • If you see Sam... punctuation ambiguity changes meaning.
  • He said, “she said hi”: nested quotes clarify nesting.
  • Polish ambassadors are rare: capitalization disambiguates Polish vs polish.
  • Colons: “pack the following: snacks, water, maps” clarifies list expectation.

C++ Syntactic Ambiguities (Parsing)

  • Most vexing parse: declaration parsed as function instead of variable.
  • Fix (modern): use brace initialization to disambiguate variable construction.
  • Dangling else: else may bind to nearest if unexpectedly.
  • Fix: always use curly braces to explicitly group if/else blocks.

C++ Lexical Ambiguities (Tokenization)

  • Nested templates: >> parsed as shift-right in C++98 without space.
  • C++98 fix: insert space between > > to prevent shift tokenization.
  • Modern C++: compilers parse >> correctly as nested template closers.

C++ Semantic Ambiguities (Meaning/Context)

  • Dependent type names: container::const_iterator needs typename keyword.
  • Fix: add typename to indicate the dependent name is a type.
  • Dependent templates: call requires template keyword to parse < as template args.
  • Fix: use template keyword before member template name.

Demonstration: Writing a Tiny Parser

  • Custom parser interprets tokens and prints random lines from a list.
  • Tokens defined include a keyword and semicolon; occurrences trigger output.
  • Shows tokenization and simple interpretation logic in a small program.

Reducing Ambiguity: Languages and Standards

  • Simplified Technical English (ASD-STE100): ~900-word controlled English for clarity.
  • Lisp and Lisp-like languages: parentheses yield unambiguous structure for compilers.
  • ADA and Haskell: comparatively good at reducing lexical ambiguity.
  • C++ historically had notable lexical/syntactic ambiguities.

Common C++ Ambiguities and Fixes

IssueCompiler PhaseCauseExample SymptomFix
Most vexing parseParsingDeclaration parsed as functionVariable intended parsed as functionUse brace initialization or explicit construction
Dangling elseParsingElse binds to nearest ifUnexpected else branch executionAlways use curly braces to group blocks
Nested templates >>Lexical>> tokenized as shift-rightError requiring space between > >Modern compiler; or insert space in C++98
Dependent type nameSemanticAmbiguous member as type or valueMissing typename prior to dependent typeAdd typename before dependent type
Dependent template callSemantic< read as less-than, not template argsUse template keyword errorInsert template before member template name

Key Terms & Definitions

  • Token: smallest meaningful unit (identifier, operator) from lexical analysis.
  • Abstract Syntax Tree (AST): tree representation of program structure.
  • Most vexing parse: C++ grammar ambiguity where a declaration is parsed incorrectly as a function.
  • Dangling else: ambiguity in associating an else with one of multiple if statements.
  • Dependent name: name in a template that depends on a template parameter.
  • typename keyword: tells compiler a dependent name is a type.
  • template keyword (dependent): indicates following name is a template for parsing.

Action Items / Next Steps

  • Use braces consistently to prevent dangling else and improve readability.
  • Prefer modern C++ initialization (braces) to avoid most vexing parse.
  • Add typename and template where required for dependent names and templates.
  • Explore a small open-source compiler (e.g., a simple C compiler) to learn phases.
  • Try writing a tiny parser to experience tokenization and interpretation.
  • Study controlled languages (ASD-STE100) and Lisp to understand clarity by design.