🔍

Machine Learning Search with Amazon OpenSearch Service

Jul 3, 2024

Influencing Machine Learning in Search with Amazon OpenSearch Service

Introduction

Presenters: Prine Moan (Analytic Specialist) & Haer (OpenSearch Solution Architect)
Objective: Demo on building machine learning search with Amazon OpenSearch Service

Search Types Supported by OpenSearch

Sparse Retrieval

Algorithms: tf-idf, BM25
Keyword Search: Basic match technique with keyword overlap
Neural Sparse Search: Expands documents and queries with contextual terms to improve relevance

Dense Retrieval

Similarity Search: Uses vectors or embeddings from ML models, compared with algorithms like KNN or Approximate Nearest Neighbors
Vector Search: For text, images, audio, and video
Multimodal Search: Uses models trained for shared embedding space for various elements

Hybrid Search

Combines keyword search and vector search scores
Methods: Retrieval-Augmented Generative (RAG) Search, adding large language models for enhanced responses

Conversational Search

Adds memory element to RAG applications for a Q&A conversational style

Demo Highlights

Architecture

Client Application: Hosted on an EC2 machine
Backend: Amazon OpenSearch Service via Lambda
ML Models: Hosted in Amazon SageMaker or Bedrock for vector generation
Machine Learning Connectors: Using blueprints to connect with third-party ML platforms (SageMaker, Bedrock, OpenAI models)
Version: Demonstrated on OpenSearch 2.11

Demonstrated Search Types

Sparse Retrieval (Keyword Search)

Functionality: Maps keywords to terms in image captions
Issue: May return irrelevant results based on unexpected keyword matches

Neural Sparse Search

Improvement: Generates and attributes similar terms for better relevance in search results

Dense Retrieval (Vector Search)

Method: Uses embedding vectors for more semantically relevant results
Advantage: No need for exact keyword matches, looks for concept similarity (e.g., 'style' and 'comfort' related to 'trendy')

Hybrid Search

Combination: Merges BM25 and KNN search results
Fine-Tuning: Adjust weights and normalization techniques for better results

Multimodal Search

Options: Search by text, image, or both
Example: Uploading an image influences search relevance based on both text and image content

Key Features

Expand Query and Documents: Uses sparse embeddings
Pre-trained Models: Leverages OpenSearch pre-trained bi-encoder models
Machine Learning Models: Options from Titan embedding (Amazon Bedrock) and custom models (Amazon SageMaker)
Out-of-the-Box AI Connectors: Available from OpenSearch 2.9

Practical Application

Create similar web applications using provided QR codes for demos, detailed search types info, and new features of OpenSearch in 2023

Full transcript