Building a News Research Tool for Equity Analysis

Mar 22, 2025

End-to-End LLM Project: Equity Research Analysis

Introduction

  • The project involves building a news research tool for equity research analysis using an LLM.
  • Utilizes Lang chain, OpenAI, and Streamlit.
  • Features a storytelling aspect involving a character named Peter Pandey, inspired by a hypothetical Rocky Bhai.

Industry Use Case

  • Mutual Funds: Common people invest in mutual funds, which then invest in individual stocks.
  • Equity Research Analysts: Responsible for researching stocks (e.g., Tata Motors, Reliance) and providing insights.
  • Analysts read numerous articles which can be tedious; hence this tool aids in compiling and querying news data.

Project Overview

  • The tool takes multiple news article URLs, answers user queries based on these articles, and can summarize articles.
  • Technical Framework:
    • Lang Chain: Used for processing and understanding language data.
    • Streamlit: Used for creating a user-friendly interface.
    • OpenAI: Provides the LLM capabilities.

Challenges with Existing Solutions

  • Manual Entry: Copy-pasting articles is not feasible for busy analysts.
  • Aggregate Knowledge Base: Necessary to compile data from various sources efficiently.
  • Token Limit: ChatGPT has word limitations that are impractical for large article texts.

Technical Architecture

Initial Setup

  • Document Loader: Gathers news articles.
  • Text Splitting: Breaks down articles into manageable chunks.
  • Vector Database: Stores these chunks to facilitate efficient searching.

Advanced Setup

  • Semantic Search: Uses embeddings and vector databases for context-aware searches.
  • Phase Usage: Lightweight in-memory vector database (alternative to Pinecone, Milvus, Chroma).

Tools and Libraries

  • Lang Chain: Provides document loaders, text splitters, and embedding classes.
  • OpenAI API: Used for creating embeddings and LLM interactions.
  • Sentence Transformer: Converts text into vectors using predefined models.
  • Phase Library: Allows fast similarity search within vector data.

Building the Tool

Steps Involved

  1. Load News Articles: Use unstructured URL loader to gather articles' content.
  2. Text Splitting: Utilize recursive text splitter for chunking articles.
  3. Embedding: Convert text chunks into embeddings using OpenAI.
  4. Create Vector Index: Use phase library to store embeddings and perform search operations.

Execution

  • Streamlit Interface: Interactive UI with sidebars for URL input and question box for queries.
  • Processing URLs: On clicking process, data is loaded, split, embedded, and stored in the vector index.
  • Answering Queries: Retrieve and display answers based on question input using embeddings and vector search.

Advanced Features

  • Retrieval QA with Sources Chain:
    • Stuff Method: Combines all relevant chunks in one LLM prompt.
    • Map Reduce Method: Divides the task into multiple LLM calls per chunk, then aggregates results.

Deployment Considerations

  • Proof of Concept: Initial development using Streamlit for quick deployment and testing.
  • Long Term Architecture: Incorporate reliable data ingestion systems and comprehensive vector databases.

Conclusion

  • The tool is designed for real-world application by equity research analysts.
  • It enhances efficiency by automating retrieval and analysis of news data.
  • Encouraged to share and utilize the provided code for enhancing personal or professional projects.