🖥️

Understanding the Evolution of grep

May 18, 2025

Lecture on 'grep'

Introduction to 'grep'

  • 'grep' is a well-known command in UNIX systems, originating in the early 1970s.
  • Used for searching arbitrary patterns of text in one or more files.
  • Input can come from files or other programs (e.g., UNIX pipelines).
  • Allows filtering of large amounts of input, which text editors struggle with.

History of 'grep'

  • Name 'grep' has an interesting origin story.
  • Developed in early UNIX days (circa 1970-71) on a PDP 11 computer.
  • PDP 11 had limited computing power:
    • 32K to 64K bytes of memory.
    • Small secondary storage.
  • UNIX software was simple due to hardware limitations and the preferences of developers like Ken Thompson and Dennis Ritchie.

The 'ed' Text Editor

  • 'ed' was the standard UNIX text editor, pronounced 'ee dee'.
  • Written by Ken Thompson, inspired by the QED editor.
  • Operated on paper, not video display terminals.
  • Commands were single-letter (e.g., 'p' for print, 'd' for delete, 's' for substitute).
  • Line addressing was basic but included features like:
    • '1,$p' to print all lines.
    • '$p' to print the last line.
    • '1d' to delete the first line.

Regular Expressions in 'ed'

  • Regular expressions were a significant feature added by Ken.
  • Allowed patterns of text to be specified for operations.
  • Written in a format like '/pattern/' to find text patterns.

Development of 'grep'

  • 'ed' had limitations due to memory constraints (couldn't handle large files).
  • Lee McMahon wanted to analyze the Federalist Papers, which required more capability than 'ed' could provide.
  • Ken Thompson created 'grep' to handle this need:
    • Searches for regular expressions in one or more files.
    • Based on 'ed's 'g' command (global) for operations on matching lines.
    • The name 'grep' comes from the command structure 'g/re/p' (global-regular expression-print).

Educational Anecdote

  • In 1993, the lecturer used 'grep' as a class assignment at Princeton.
  • Students received the source code for 'ed' (1800 lines of C) to convert into 'grep'.
  • They had advantages:
    • Knew the target behavior of 'grep'.
    • Worked with C instead of assembly language.
    • Disadvantage: none were Ken Thompson, who originally created 'grep'.

Conclusion

  • 'grep' is an example of UNIX's capability to handle text processing efficiently.
  • It originated from practical needs and was implemented swiftly by utilizing existing tools.