LLM Reasoning and Limitations

Overview

This lecture discusses the reasoning abilities of large language models (LLMs), highlighting issues with pattern matching, token bias, and recent advancements like chain-of-thought prompting and inference time compute.

Example Math Problem and LLM Mistakes

LLMs can be misled by irrelevant details in math problems due to their pattern-matching approach.
Extraneous information, like “five were smaller,” often triggers LLMs to incorrectly adjust answers.
This behavior reflects training data patterns more than actual understanding.

How LLMs "Reason"

LLMs perform probabilistic pattern matching, searching for similar examples in their training data.
Most answers are based on statistical likelihoods, not genuine comprehension or logic.
LLMs predict the next token (word or part-word) in a sequence, similar to advanced autocomplete.

Token Bias and Prompt Sensitivity

Tiny changes in prompts (input tokens) can significantly alter LLM outputs, leading to inconsistent reasoning.
Token bias means LLM responses are highly context-sensitive, sometimes causing errors or hallucinations.

Advancements in LLM Reasoning

Two primary opportunities for improving LLM reasoning: during model training (training time compute) and while generating answers (inference time compute).
Chain-of-thought prompting encourages LLMs to display step-by-step reasoning by adding explicit instructions to prompts.
Inference time compute allows models to spend more time "thinking" before producing answers, improving complex reasoning.

Philosophical Discussion: Real vs. Simulated Thought

LLMs simulate thinking by generating plausible responses but lack true understanding, awareness, or purpose.
The difference between thinking and simulation: real thinking involves consciousness and subjective understanding; simulation only mimics patterns.

Key Terms & Definitions

LLM (Large Language Model) — An AI system trained on vast amounts of text data to generate human-like responses.
Probabilistic Pattern Matching — Matching input to likely outcomes based on patterns learned from data.
Token — The smallest unit (word or part-word) used in language models to process text.
Token Bias — Sensitivity of LLM responses to small changes in input tokens.
Chain-of-Thought Prompting — Prompting technique that encourages step-by-step reasoning.
Inference Time Compute — Allowing the model to spend more time reasoning before answering.

Action Items / Next Steps

Review chain-of-thought prompting techniques.
Explore how changing prompts affects LLM outputs.
Reflect on the philosophical distinction between real and simulated thought.