📚

Overview of Information Retrieval Techniques

Aug 31, 2024

Information Retrieval and Web Search

Introduction to Information Retrieval

  • Information retrieval (IR) defined as finding documents that satisfy an information need.
  • Commonly involves unstructured data, usually text-based.
  • Examples beyond web search:
    • Music information retrieval (sounds)
    • Email searches
    • Searching company knowledge bases
    • Legal information retrieval.

Historical Context

  • Mid-1990s:
    • Large volume of data in unstructured form vs structured forms (like relational databases).
    • Unstructured data management was underdeveloped compared to structured data management.
  • Post-2000:
    • Shift in the landscape; growth of unstructured data due to blogs, tweets, forums, etc.
    • Emergence of major companies addressing unstructured information retrieval.

Basic Framework of Information Retrieval

  1. Static Document Collection
    • Begin with a collection of documents to search through.
    • Later discussions will include dynamic collections (addition/deletion of documents).
  2. User Task and Information Need
    • User has a task (e.g., getting rid of mice humanely).
    • Translate the task into an information need (e.g., how to trap mice alive).
    • This need is formulated as a query (e.g., "How to trap mice alive?").
  3. Interrogating the Document Collection
    • Query sent to the search engine to retrieve relevant documents.
    • Possible to refine queries based on initial search results.

Challenges in Information Retrieval

  • Misinterpretations or misconceptions in framing the information need.
  • Errors in query formulation:
    • Wrong terminology or usage of operators can affect search results.
    • Query choices may influence the relevance of returned documents.

Evaluating Information Retrieval Systems

  • Two key metrics:
    1. Precision
      • The fraction of relevant documents retrieved from the total results.
      • Example: If 1 out of 10 retrieved documents is relevant, precision is low.
    2. Recall
      • The fraction of relevant documents retrieved compared to the total relevant documents in the collection.
  • Importance of assessing these metrics relative to the user's information need.
  • Misformulated queries can lower the precision of results.

Conclusion

  • The discussion provides a basic understanding of information retrieval and the evaluation of search engines.
  • Further exploration of queries and evaluation metrics will follow in upcoming segments.