Coconote
AI notes
AI voice & video notes
Try for free
Building a News Research Tool for Equity Analysis
Mar 22, 2025
End-to-End LLM Project: Equity Research Analysis
Introduction
The project involves building a news research tool for equity research analysis using an LLM.
Utilizes
Lang chain
,
OpenAI
, and
Streamlit
.
Features a storytelling aspect involving a character named Peter Pandey, inspired by a hypothetical Rocky Bhai.
Industry Use Case
Mutual Funds
: Common people invest in mutual funds, which then invest in individual stocks.
Equity Research Analysts
: Responsible for researching stocks (e.g., Tata Motors, Reliance) and providing insights.
Analysts read numerous articles which can be tedious; hence this tool aids in compiling and querying news data.
Project Overview
The tool takes multiple news article URLs, answers user queries based on these articles, and can summarize articles.
Technical Framework
:
Lang Chain
: Used for processing and understanding language data.
Streamlit
: Used for creating a user-friendly interface.
OpenAI
: Provides the LLM capabilities.
Challenges with Existing Solutions
Manual Entry
: Copy-pasting articles is not feasible for busy analysts.
Aggregate Knowledge Base
: Necessary to compile data from various sources efficiently.
Token Limit
: ChatGPT has word limitations that are impractical for large article texts.
Technical Architecture
Initial Setup
Document Loader
: Gathers news articles.
Text Splitting
: Breaks down articles into manageable chunks.
Vector Database
: Stores these chunks to facilitate efficient searching.
Advanced Setup
Semantic Search
: Uses embeddings and vector databases for context-aware searches.
Phase Usage
: Lightweight in-memory vector database (alternative to Pinecone, Milvus, Chroma).
Tools and Libraries
Lang Chain
: Provides document loaders, text splitters, and embedding classes.
OpenAI API
: Used for creating embeddings and LLM interactions.
Sentence Transformer
: Converts text into vectors using predefined models.
Phase Library
: Allows fast similarity search within vector data.
Building the Tool
Steps Involved
Load News Articles
: Use unstructured URL loader to gather articles' content.
Text Splitting
: Utilize recursive text splitter for chunking articles.
Embedding
: Convert text chunks into embeddings using OpenAI.
Create Vector Index
: Use phase library to store embeddings and perform search operations.
Execution
Streamlit Interface
: Interactive UI with sidebars for URL input and question box for queries.
Processing URLs
: On clicking process, data is loaded, split, embedded, and stored in the vector index.
Answering Queries
: Retrieve and display answers based on question input using embeddings and vector search.
Advanced Features
Retrieval QA with Sources Chain
:
Stuff Method
: Combines all relevant chunks in one LLM prompt.
Map Reduce Method
: Divides the task into multiple LLM calls per chunk, then aggregates results.
Deployment Considerations
Proof of Concept
: Initial development using Streamlit for quick deployment and testing.
Long Term Architecture
: Incorporate reliable data ingestion systems and comprehensive vector databases.
Conclusion
The tool is designed for real-world application by equity research analysts.
It enhances efficiency by automating retrieval and analysis of news data.
Encouraged to share and utilize the provided code for enhancing personal or professional projects.
📄
Full transcript