Overview
The transcript explains linguistic ambiguity and its parallels in programming languages, focusing on compiler phases and common C++ ambiguities. It provides English analogies, C++ examples, and fixes, then touches on formal language theory and ambiguity-reducing languages.
Linguistic Ambiguity and Programming
- Ambiguity in human language mirrors issues in programming interpretation.
- Programmers communicate with compilers; compilers cannot ask for context.
- Nearly all programming ambiguities parallel human language quirks.
Compiler Phases
- Lexical analysis: converts character stream into tokens (identifiers, operators).
- Parsing (syntactic analysis): builds abstract syntax tree (AST) and catches syntax errors.
- Semantic analysis: checks meaning and context (types, scopes, logical consistency).
English Examples and Parallels
- Buffalo sentence: same word as noun/verb causing ambiguity.
- Visiting relatives can be boring: “visiting” as verb or adjective (structural ambiguity).
- If you see Sam... punctuation ambiguity changes meaning.
- He said, “she said hi”: nested quotes clarify nesting.
- Polish ambassadors are rare: capitalization disambiguates Polish vs polish.
- Colons: “pack the following: snacks, water, maps” clarifies list expectation.
C++ Syntactic Ambiguities (Parsing)
- Most vexing parse: declaration parsed as function instead of variable.
- Fix (modern): use brace initialization to disambiguate variable construction.
- Dangling else: else may bind to nearest if unexpectedly.
- Fix: always use curly braces to explicitly group if/else blocks.
C++ Lexical Ambiguities (Tokenization)
- Nested templates: >> parsed as shift-right in C++98 without space.
- C++98 fix: insert space between > > to prevent shift tokenization.
- Modern C++: compilers parse >> correctly as nested template closers.
C++ Semantic Ambiguities (Meaning/Context)
- Dependent type names: container::const_iterator needs typename keyword.
- Fix: add typename to indicate the dependent name is a type.
- Dependent templates: call requires template keyword to parse < as template args.
- Fix: use template keyword before member template name.
Demonstration: Writing a Tiny Parser
- Custom parser interprets tokens and prints random lines from a list.
- Tokens defined include a keyword and semicolon; occurrences trigger output.
- Shows tokenization and simple interpretation logic in a small program.
Reducing Ambiguity: Languages and Standards
- Simplified Technical English (ASD-STE100): ~900-word controlled English for clarity.
- Lisp and Lisp-like languages: parentheses yield unambiguous structure for compilers.
- ADA and Haskell: comparatively good at reducing lexical ambiguity.
- C++ historically had notable lexical/syntactic ambiguities.
Common C++ Ambiguities and Fixes
| Issue | Compiler Phase | Cause | Example Symptom | Fix |
|---|
| Most vexing parse | Parsing | Declaration parsed as function | Variable intended parsed as function | Use brace initialization or explicit construction |
| Dangling else | Parsing | Else binds to nearest if | Unexpected else branch execution | Always use curly braces to group blocks |
| Nested templates >> | Lexical | >> tokenized as shift-right | Error requiring space between > > | Modern compiler; or insert space in C++98 |
| Dependent type name | Semantic | Ambiguous member as type or value | Missing typename prior to dependent type | Add typename before dependent type |
| Dependent template call | Semantic | < read as less-than, not template args | Use template keyword error | Insert template before member template name |
Key Terms & Definitions
- Token: smallest meaningful unit (identifier, operator) from lexical analysis.
- Abstract Syntax Tree (AST): tree representation of program structure.
- Most vexing parse: C++ grammar ambiguity where a declaration is parsed incorrectly as a function.
- Dangling else: ambiguity in associating an else with one of multiple if statements.
- Dependent name: name in a template that depends on a template parameter.
- typename keyword: tells compiler a dependent name is a type.
- template keyword (dependent): indicates following name is a template for parsing.
Action Items / Next Steps
- Use braces consistently to prevent dangling else and improve readability.
- Prefer modern C++ initialization (braces) to avoid most vexing parse.
- Add typename and template where required for dependent names and templates.
- Explore a small open-source compiler (e.g., a simple C compiler) to learn phases.
- Try writing a tiny parser to experience tokenization and interpretation.
- Study controlled languages (ASD-STE100) and Lisp to understand clarity by design.