🔍

Machine Learning Search with Amazon OpenSearch Service

Jul 3, 2024

Influencing Machine Learning in Search with Amazon OpenSearch Service

Introduction

  • Presenters: Prine Moan (Analytic Specialist) & Haer (OpenSearch Solution Architect)
  • Objective: Demo on building machine learning search with Amazon OpenSearch Service

Search Types Supported by OpenSearch

Sparse Retrieval

  • Algorithms: tf-idf, BM25
  • Keyword Search: Basic match technique with keyword overlap
  • Neural Sparse Search: Expands documents and queries with contextual terms to improve relevance

Dense Retrieval

  • Similarity Search: Uses vectors or embeddings from ML models, compared with algorithms like KNN or Approximate Nearest Neighbors
  • Vector Search: For text, images, audio, and video
  • Multimodal Search: Uses models trained for shared embedding space for various elements

Hybrid Search

  • Combines keyword search and vector search scores
  • Methods: Retrieval-Augmented Generative (RAG) Search, adding large language models for enhanced responses

Conversational Search

  • Adds memory element to RAG applications for a Q&A conversational style

Demo Highlights

Architecture

  • Client Application: Hosted on an EC2 machine
  • Backend: Amazon OpenSearch Service via Lambda
  • ML Models: Hosted in Amazon SageMaker or Bedrock for vector generation
  • Machine Learning Connectors: Using blueprints to connect with third-party ML platforms (SageMaker, Bedrock, OpenAI models)
  • Version: Demonstrated on OpenSearch 2.11

Demonstrated Search Types

Sparse Retrieval (Keyword Search)

  • Functionality: Maps keywords to terms in image captions
  • Issue: May return irrelevant results based on unexpected keyword matches

Neural Sparse Search

  • Improvement: Generates and attributes similar terms for better relevance in search results

Dense Retrieval (Vector Search)

  • Method: Uses embedding vectors for more semantically relevant results
  • Advantage: No need for exact keyword matches, looks for concept similarity (e.g., 'style' and 'comfort' related to 'trendy')

Hybrid Search

  • Combination: Merges BM25 and KNN search results
  • Fine-Tuning: Adjust weights and normalization techniques for better results

Multimodal Search

  • Options: Search by text, image, or both
  • Example: Uploading an image influences search relevance based on both text and image content

Key Features

  • Expand Query and Documents: Uses sparse embeddings
  • Pre-trained Models: Leverages OpenSearch pre-trained bi-encoder models
  • Machine Learning Models: Options from Titan embedding (Amazon Bedrock) and custom models (Amazon SageMaker)
  • Out-of-the-Box AI Connectors: Available from OpenSearch 2.9

Practical Application

  • Create similar web applications using provided QR codes for demos, detailed search types info, and new features of OpenSearch in 2023