๐Ÿ”

In-Depth Elastic Search Insights

Dec 17, 2024

Deep Dive Series: Elastic Search Lecture Notes

Introduction

  • Speaker: Stefan, Co-founder of Hello Interview
  • Topics Covered:
    • Usage of Elastic Search in interviews and system design.
    • Elastic Search internals and cluster functionalities.
  • Purpose: To provide in-depth knowledge relevant for tech interviews.

Section 1: Using Elastic Search in System Design

Overview of Search

  • Search Experience Components:
    • Criteria input (e.g., searching for books by title or price).
    • Sorting and refining results (faceted search).
    • Result as documents (JSON blobs).

Key Concepts in Elastic Search

  • Document: JSON blobs containing data.
  • Index: Collection of documents for searching.
  • Mapping and Fields: Define searchable schema and types (e.g., price as a float).

API and Operations

  • Index Creation:
    • RESTful API (PUT /index).
    • Optional configurations (e.g., shards, replicas).
  • Document Insertion:
    • POST /index/_doc with document contents._

Searching

  • Search API:
    • Query language similar to SQL (e.g., match, range queries).
  • Sorting:
    • Based on fields or relevance score (_score).
  • Pagination:
    • Stateful (cursors) vs. Stateless (from and size, search after)._

Use Cases and Considerations

  • Use Cases: Geospatial, vector search, full-text search.
  • Considerations:
    • Not a primary database.
    • Works best with read-heavy workloads.
    • Supports eventual consistency.
    • Requires denormalization for efficiency.

Section 2: Elastic Search Internals

Architecture

  • Components of a Cluster:
    • Master Node: Administrative tasks.
    • Coordinating Node: API layer, handles search requests.
    • Data Node: Stores index data.
    • Ingest Node: Processes incoming documents.
    • Machine Learning Node: For ML tasks.

Shards and Replication

  • Shards:
    • Divides data across nodes.
    • Enables scalability and parallel queries.
  • Replicas:
    • Copies of shards for fault tolerance and load distribution.

Lucene and Indexing

  • Luceneโ€™s Role:
    • Low-level search operations within nodes.
    • Manages immutable segments for documents.
  • Inverted Index:
    • Maps terms to document IDs for fast search.

Optimization Techniques

  • Query Planning:
    • Coordinating nodes optimize search execution.
  • Doc Values:
    • Columnar storage for efficient retrieval of specific fields.

Document Life Cycle

  • Ingestion:
    • Client -> Ingest Node -> Data Node -> Lucene Index.
  • Search Execution:
    • Client -> Coordinating Node -> Data Nodes -> Lucene Segments.

Conclusion

  • Applications: Ideal for high-performance and complex search scenarios.
  • Interview Application: Useful in system design and infrastructure interviews.
  • Further Learning: Subscribe for more content on system design and interviews.