The Company Researcher is an open-source tool designed for in-depth company analysis. Built with Tavily’s search
and extract
capabilities and powered by LangGraph, it delivers percise, real-time insights in a structured format. Ideal for competitive intelligence, lead research, and Go-to-Market (GTM) strategies, this tool leverages advanced AI-driven workflows to provide comprehensive, reliable reports for data-driven decision-making.
- Overview
- Key Workflow Features
- Running the Tool Locally
- Running the Tool in LangGraph Studio
- Customization
- Future Directions
The Company Researcher is an open-source tool designed for in-depth company analysis. Built with Tavily’s search and extract capabilities and powered by LangGraph, it gathers both general and targeted information, using feedback loops and optional human validation for accuracy. It is designed to handle complex scenarios, such as distinguishing similarly named companies or gathering data in sparsely documented fields, and can be easily adapted to other research domains.
- Establishing a Ground Truth with Tavily Extract: Each session begins by setting a “ground truth” with Tavily’s
extract
tool, using a user-provided company name and URL. This foundational data anchors the subsequent search, ensuring all steps stay within accurate and verified data boundaries. - Sub-Question Generation and Tavily Search: The workflow dynamically generates specific research questions to drive Tavily’s
search
, focusing the retrieval on relevant, high-value information rather than conducting broad, unfocused searches. - AI-Driven Document Clustering: Retrieved documents are clustered based on relevance to the target company. This process, anchored by the ground truth, filters out unrelated content, a critical feature for similarly named companies or entities with minimal online presence.
- Human-on-the-Loop Validation: In cases where clustering yields ambiguous results, optional human review allows for manual cluster selection, ensuring the data aligns accurately with the target entity.
- Document Curation and Enrichment with Tavily Extract: Once the appropriate cluster is identified, Tavily’s
extract
further refines and enriches the content, adding substantial depth to the research. This step enhances the precision and comprehensiveness of the final output. - Report Generation and Evaluation with Feedback Loops: An LLM synthesizes the enriched data into a structured report. If gaps are detected, feedback loops prompt additional information gathering, enabling iterative improvements without restarting the entire workflow.
- Multi-Format Output: The finalized report can be exported in PDF or Markdown formats, making it ready for easy sharing and integration.
- Python 3.11 or later: Python Installation Guide
- Tavily API Key - Sign Up
- Anthropic API Key - Sign Up
-
Clone the Repository:
git clone https://github.com/danielleyahalom/company-researcher.git cd company-researcher
-
Create a Virtual Environment:
To avoid dependency conflicts, it's recommended to create and activate a virtual environment using
venv
:python -m venv venv source venv/bin/activate # macOS/Linux venv\Scripts\activate # Windows
-
Set Up API Keys: Configure your OpenAI and Tavily API keys as environment variables or place them in a
.env
file:export TAVILY_API_KEY={Your Tavily API Key here} export ANTHROPIC_API_KEY={Your Anthropic API Key here}
-
Install Dependencies:
Install the required Python packages:
pip install -r requirements.txt
-
Run the Application:
python app.py
-
Open the App in Your Browser:
http://localhost:5000
LangGraph Studio enables visualization, debugging, and real-time interaction with the Company Researcher's workflow. Here’s how to set it up:
-
Download LangGraph Studio:
- For macOS, download the latest
.dmg
file for LangGraph Studio from here or visit the releases page. - Note: Currently, only macOS is supported.
- For macOS, download the latest
-
Install Docker:
- Ensure Docker Desktop is installed and running. LangGraph Studio requires Docker Compose version 2.22.0 or higher.
-
Clone the Repository:
git clone https://github.com/danielleyahalom/company-researcher.git cd company_researcher
- Note: This repository includes all required files except for the
.env
file, which you need to create to store your API keys.
- Note: This repository includes all required files except for the
-
Configure the Environment:
- Create a
.env
file in the root directory to store your API keys:touch .env
- Add your API keys to the
.env
file:TAVILY_API_KEY={Your Tavily API Key here} ANTHROPIC_API_KEY={Your Anthropic API Key here}
- Create a
-
Ensure LangGraph Configuration Files Are in Place:
- The repository includes
langgraph.json
andlanggraph_entry.py
, defining the entry point and configuration for LangGraph Studio.
- The repository includes
-
Start LangGraph Studio:
- Open LangGraph Studio and select the
company_researcher
directory from the dashboard.
- Open LangGraph Studio and select the
-
Running the Workflow in Studio:
- Visualize each step of the workflow, make real-time edits, and monitor the workflow’s state.
- Important Note: If a cluster cannot be automatically selected, the tool will attempt to re-cluster instead.
LangGraph Studio provides a hands-on approach to refining the workflow, enhancing both development efficiency and output reliability.
The tool’s modular structure makes it adaptable to various research applications:
- Modify Prompts: Adjust prompts in question generation or report synthesis for different research needs.
- Extend Workflow Nodes: Add, remove, or modify nodes to focus on specific types of analysis.
- Customize Output Formats: Tailor output formats (e.g., CSS for PDF styling) to suit organizational standards.
This adaptable workflow can be fine-tuned for a range of applications beyond company research:
- Market Analysis: Apply the workflow to track trends, competitors, and emerging tech.
- Lead Generation: Compile detailed profiles on potential clients for targeted outreach.
- Ongoing Knowledge Bases: Build continuously updated research repositories in fields like law, finance, or healthcare.
This tool exemplifies how AI-driven workflows, backed by precise data extraction and real-time search, can reshape research and analysis across domains.