🔍

In-Depth Elastic Search Insights

Dec 17, 2024

View transcript

Review flashcards

Deep Dive Series: Elastic Search Lecture Notes

Introduction

Speaker: Stefan, Co-founder of Hello Interview
Topics Covered:
- Usage of Elastic Search in interviews and system design.
- Elastic Search internals and cluster functionalities.
Purpose: To provide in-depth knowledge relevant for tech interviews.

Section 1: Using Elastic Search in System Design

Overview of Search

Search Experience Components:
- Criteria input (e.g., searching for books by title or price).
- Sorting and refining results (faceted search).
- Result as documents (JSON blobs).

Key Concepts in Elastic Search

Document: JSON blobs containing data.
Index: Collection of documents for searching.
Mapping and Fields: Define searchable schema and types (e.g., price as a float).

API and Operations

Index Creation:
- RESTful API (PUT /index).
- Optional configurations (e.g., shards, replicas).
Document Insertion:
- POST /index/_doc with document contents._

Searching

Search API:
- Query language similar to SQL (e.g., match, range queries).
Sorting:
- Based on fields or relevance score (_score).
Pagination:
- Stateful (cursors) vs. Stateless (from and size, search after)._

Use Cases and Considerations

Use Cases: Geospatial, vector search, full-text search.
Considerations:
- Not a primary database.
- Works best with read-heavy workloads.
- Supports eventual consistency.
- Requires denormalization for efficiency.

Section 2: Elastic Search Internals

Architecture

Components of a Cluster:
- Master Node: Administrative tasks.
- Coordinating Node: API layer, handles search requests.
- Data Node: Stores index data.
- Ingest Node: Processes incoming documents.
- Machine Learning Node: For ML tasks.

Shards and Replication

Shards:
- Divides data across nodes.
- Enables scalability and parallel queries.
Replicas:
- Copies of shards for fault tolerance and load distribution.

Lucene and Indexing

Lucene’s Role:
- Low-level search operations within nodes.
- Manages immutable segments for documents.
Inverted Index:
- Maps terms to document IDs for fast search.

Optimization Techniques

Query Planning:
- Coordinating nodes optimize search execution.
Doc Values:
- Columnar storage for efficient retrieval of specific fields.

Document Life Cycle

Ingestion:
- Client -> Ingest Node -> Data Node -> Lucene Index.
Search Execution:
- Client -> Coordinating Node -> Data Nodes -> Lucene Segments.

Conclusion

Applications: Ideal for high-performance and complex search scenarios.
Interview Application: Useful in system design and infrastructure interviews.
Further Learning: Subscribe for more content on system design and interviews.

Full transcript