Pydantic-AI Investment Research System¶

This document details a multi-agent investment research system designed to autonomously gather and analyze financial data. The system performs comprehensive financial analysis and generates actionable investment insights. Its adaptive architecture, featuring dynamic plan adjustment and intelligent memory management, allows it to evolve its research strategy based on real-time findings.

Overview¶

This investment research system leverages a multi-agent architecture built on Pydantic-AI to provide comprehensive, adaptive financial analysis. The system combines autonomous AI agents with specialized tools to gather data from multiple sources, perform calculations, and generate actionable investment insights. The system uses adaptive memory management and dynamic plan adjustment, to evolve its research strategy based on findings.

The Agents¶

The system employs specialized agents built using pydantic-ai that handle different aspects of the research process. Investment agent topology Figure 1: Agent topology. Blue - planning agents, Green - Research agents, Purple - Specilized agents. Red - agent output, Yellow - Tools

Architecture¶

The project follows an atomic, modular architecture with clear separation of concerns to help me focus on composability:

agents/
├── dependencies.py        # Type-safe shared context and resources
├── memory_processors.py   # Advanced memory management for conversations
├── planning_agent.py      # Investment research planning and adaptation
└── research_agent.py      # Research execution with tool orchestration

tools/
├── calculator.py          # Financial metrics and ratio calculations
├── pdf_extractor.py       # Hybrid PDF extraction (PyMuPDF + VLM)
├── vector_search.py       # ChromaDB document search with caching
├── web_scraper.py         # BeautifulSoup content extraction
└── web_search.py          # Tavily API integration (not SearxNG)

models/
└── schemas.py             # Pydantic data models for type safety

streamlit_app.py           # Web interface with multiple research modes
main.py                    # CLI entry point and core workflows
config.py                  # OpenRouter/OpenAI configuration
logfire_config.py          # Observability and monitoring setup

Core Agents¶

Planning Agents¶

Agent	Purpose	Input	Output	Memory Strategy
planning_agent	Creates initial 2-4 step research plans following investment methodology	User query + context	`ResearchPlan` with logical steps, reasoning, and priorities	Uses `adaptive_memory_processor` for conversation management
adaptive_planning_agent	Evaluates execution feedback and dynamically adjusts plans	`PlanUpdateRequest` with feedback and remaining steps	`PlanUpdateResponse` indicating plan updates needed	Minimal context strategy - keeps only system prompt + last 2 successful decisions

Research Agents¶

Agent	Purpose	Input	Output	Memory Strategy
research_agent	Executes research plans using available tools	Query + plan + `ResearchDependencies`	`InvestmentFindings` with summary, insights, metrics, risks, recommendation	Uses `filter_research_responses` to preserve valuable content
feedback_agent	Evaluates research quality after each step	Step description + findings + expectations	`ExecutionFeedback` with quality scores, gaps, confidence	Stateless evaluation

Specialized Agents¶

Agent	Purpose	Usage	Key Features
summary_agent	Condenses conversation history while preserving key findings	Activated when conversations exceed 6 messages	Summarizes older messages while keeping recent 3 intact
vlm_agent	Vision-Language Model for PDF text extraction	Fallback when PyMuPDF quality < threshold	Processes PDF pages as images for complex layouts

Available Tools¶

The research_agent autonomously decides when and how to use these tools:

Tool	Function	Features	Implementation
search_internal_docs	Searches ChromaDB vector database	Query enhancement, result caching (5min TTL), relevance scoring	ChromaDB with embeddings
search_web	Current market news and analysis	Privacy-focused search, multiple categories	Tavily API (not SearxNG)
scrape_webpage	Extract content from web pages	Article/table/full content modes	aiohttp + BeautifulSoup4
calculate_financial_metrics	Compute financial ratios	P/E, debt ratios, ROE, margins, etc.	LLM-based parsing and calculation

Memory Management System¶

The adaptive_memory_processor implements sophisticated conversation management:

Short conversations (≤6 messages): Keep all with validation
Medium conversations (7-12 messages): Filter responses, keep 8 recent
Long conversations (>12 messages): Aggressive filtering, keep 6 essential

Key features: - Tool call sequence integrity: Maintains proper tool call → response pairs - Research keyword preservation: Keeps messages with "analysis", "findings", "recommendation", etc. - Context preservation: Always maintains system prompts - Session-level caching: 5-minute TTL for vector search results

How It All Connects: The Workflow¶

The system operates to transforms a user's investment query into comprehensive analysis. The process begins when the user submits their question (either through either the CLI or the Streamlit web app). This query, along with any relevant context about investment goals or constraints, is processed by the planning agent, which analyzes the request and formulates a logical research strategy consisting of 2-4 steps. That is followed by orchestration of research consisting of a loop of data gathering and analysis, culminating in a report, final recommendation, and its confidence.

Upon creation of the initial plan, the research agent takes over execution, working through each planned step sequentially. For each step, the agent autonomously determines which tools are most appropriate for gathering the required information. It might start by searching the internal vector database for relevant SEC filings and earnings reports, then expand to web searches for current market sentiment and news, and finally perform financial calculations on the gathered data. (It will soon be able to use API as well) Throughout this process, the agent maintains awareness of what information has already been collected and what gaps remain to be filled.

The adaptive workflow mode adds an additional layer of intelligence through continuous evaluation and adjustment of the plan. After each research step completes, the feedback agent assesses the quality and completeness of the findings, generating structured feedback that includes confidence scores, identified data gaps, and unexpected discoveries. This feedback is then passed to the adaptive planning agent, which decides whether the original plan needs modification. It updates the state through memory. If significant gaps are identified or valuable unexpected information is discovered, the planning agent can insert new steps, reorder priorities, or remove redundant tasks. This creates an iterative loop where the system learns and adapts as it progresses, much like a human analyst who adjusts their approach based on initial findings. Adaptive Workflow Blue - Start/End, Red - Core Process, Orange - Decisions, Green - Adaptation, Purple - Execution.

Throughout the entire workflow, the memory management system ensures efficient operation by intelligently filtering and summarizing conversation history to prevent token explosion while preserving essential context. The system maintains multiple types of memory: permanent storage in the vector database for documents, session-level caching for search results, and adaptive conversation memory that adjusts its retention strategy based on conversation length. All findings are continuously aggregated and structured according to Pydantic schemas, ensuring type safety and consistency. The workflow concludes when either all planned steps are completed or the confidence threshold is met, at which point the system generates a final InvestmentAnalysis containing a comprehensive summary, key insights, risk factors, opportunities, and a clear investment recommendation with supporting rationale.

Workflow Modes¶

The system supports multiple research modes through the Streamlit interface:

Simple Chat: Basic Q&A without tools
RAG Only: Vector database search only
Deep Research: Web search + analysis
Full Planning: Complete workflow with planning agent
Adaptive Memory: Full workflow with dynamic plan adaptation

Running the System¶

Prerequisites¶

# Install dependencies
poetry install

# Set environment variables
export OPENROUTER_API_KEY="your-openrouter-api-key"
export TAVILY_API_KEY="your-tavily-api-key"  # For web search

# Optional: Logfire for observability
export LOGFIRE_TOKEN="your-logfire-token"

Usage Options¶

Command Line Interface:
```
python main.py
```
Streamlit Web Interface (recommended):
```
streamlit run streamlit_app.py
```

Programmatic Usage:

from main import adaptive_research_investment

analysis = await adaptive_research_investment(
    query="Should I invest in Apple?",
    context="Conservative investor, 5-year horizon",
    max_adaptations=3
)

Configuration¶

LLM Provider: Configured for OpenRouter (supports OpenAI-compatible APIs)
Vector Database: ChromaDB with local persistence at ./investment_chroma_db/
Knowledge Base: Pre-loaded documents in ./knowledge_base/
Default Model: GPT-4 variants via OpenRouter

The system's modular design allows easy extension with new agents, tools, or data sources while maintaining clear boundaries between components.

Conclusion: Augmenting Human Expertise¶

This Pydantic-AI based system offers a powerful and flexible framework for building sophisticated, AI-driven investment research applications. Its emphasis on modularity, composability, natural tool use, and adaptive planning.

Before embarking on any endeavour research must be conducted. By automating the time-consuming work of data gathering and analysis, the agent significantly reduces the search space, illuminating a path for humans. It serves as a powerful tool to enhance context, recognize patterns, and ultimately empower us to make better, faster decisions. The complexity of financial markets are growing, agentic systems like this one are vital for maintaining a competitive edge.