🧠

NLP Overview and Techniques

Aug 28, 2025

Overview

This lecture introduces Natural Language Processing (NLP), its real-world applications, core techniques, and basic steps for processing human language with computers.

What is Natural Language Processing (NLP)?

  • NLP is a field of Artificial Intelligence that enables machines to read, understand, and derive meaning from human languages.
  • NLP combines linguistics and computer science to help computers interpret and process text and speech.

Real-World Applications of NLP

  • Virtual assistants, website chatbots, and automated customer service use NLP to interact with humans.
  • Everyday tools like autocorrect and plagiarism checkers rely on NLP to process and analyze language.
  • NLP automates responses and saves manpower by mimicking human conversational behavior.

Basic NLP Techniques and Steps

  • Segmentation breaks documents into sentences using punctuation.
  • Tokenization splits sentences into individual words, each called a token.
  • Stop words (e.g., 'are', 'the') are filtered out as they add little meaning.
  • Stemming reduces words (e.g., 'skipping', 'skipped') to their root form.
  • Lemmatization finds the base form (lemma) of words, accounting for tense, gender, etc.
  • Part-of-speech tagging labels words as nouns, verbs, etc., to inform grammar understanding.
  • Named Entity Tagging flags names of people, places, movies, and other important entities.
  • Machine learning algorithms like Naive Bayes help teach the model to interpret sentiment and context.

Common NLP Exam Question

  • Example: Which NLP technique separates words from sentences?
    a) Stemming, b) Tokenization, c) Lemmatization, d) Segmentation

Key Terms & Definitions

  • NLP (Natural Language Processing) — AI branch that processes and understands human language.
  • Tokenization — Breaking text into individual words or tokens.
  • Stop Words — Common words with little meaning, often removed in preprocessing.
  • Stemming — Cutting words down to their root form by removing suffixes or prefixes.
  • Lemmatization — Reducing words to their base or dictionary form (lemma).
  • Part-of-Speech Tagging — Assigning labels (noun, verb, etc.) to each word.
  • Named Entity Tagging — Identifying proper nouns like names, places, and brands.

Action Items / Next Steps

  • Review NLP techniques and their definitions.
  • Answer the example exam question about tokenization.
  • Explore further learning resources or courses for in-depth NLP study.