Coconote
AI notes
AI voice & video notes
Export note
Try for free
Understanding CRFs for Named Entity Recognition
Aug 2, 2024
Conditional Random Fields (CRF) for Named Entity Recognition (NER)
Introduction
Discussion on how Conditional Random Fields (CRF) works for extracting named entities from text.
Overview of topics:
Named Entity Recognition (NER)
Information Extraction (IE)
CRF basics
Information Extraction
Definition: Extracting important information from long text.
Examples of information extracted:
Names of persons, organizations, locations.
E.g., "Ram works at Google" (Entities: Ram - person, Google - organization).
Named Entity Recognition (NER)
Part of Information Extraction focused on identifying proper nouns.
Tags for entities:
Person:
PER
Organization:
ORG
Location:
LOC
Geopolitical Entity:
GPE
Challenges in NER
Segmentation Ambiguity:
Example: "New York" as a city name.
Difficulty in training models to recognize compound entities.
Tag Assignment Ambiguity:
Example: "Nirma" can refer to a person or a brand.
Example: "Apple" can refer to a fruit or a tech organization.
Approaches to NER
Common methods include:
Linear CRFs
Maximum Entropy Markov Models
BiLSTMs
Conditional Random Fields (CRF)
Definition: A linear chain CRF assigns tags based on the previous word's tag and features.
Example sentence: "CRF is amongst the most prominent approach used for NER."
The CRF considers the tag of the previous word to assign tags, in conjunction with feature functions.
Feature Functions
Definition: Functions that generate useful features for individual words.
Examples of feature functions:
Is the first letter capital?
Is a vowel present?
Is the word in a gazetteer?
Word embeddings, presence of hyphens, etc.
Output of feature functions helps in building word-level features.
CRF Process Example
Input: Sentence "Ram is cool" with NER labels:
PER O O
(Ram - person, other words - not entities).
Tags explained:
PER
: Person (Ram)
O
: Other (non-entities)
Feature Function Signature
Signature format:
f(x, y, y-1, i)
x
: Sentence (e.g., "Ram is cool")
y
: Current word's tag
y-1
: Tag of the previous word
i
: Index of the current word in the sentence
Main Equation in CRF
Equation for probability of tag sequence given the sentence:
P(y|x) = (1/Z) exp(∑_j w_j f_j(x, y))
Z = ∑_y_hat exp(∑_j w_j f_j(x, y_hat))
Explanation of the Equation
f_j(x, y)
: Feature function for a given word and tag.
w_j
: Weights assigned to each feature function.
Z
: Normalization constant, summing probabilities over all possible tag sequences.
Calculating Weights (Training)
Weights are learned from training data using gradient descent.
Important for adjusting the influence of different feature functions in predictions.
Conclusion
Understanding CRF's role in NER helps in effectively tagging entities in text.
The complexity of feature functions and tag assignment is managed through statistical models like CRFs.
📄
Full transcript