Update Readme.md.

meerkat0688 · Jul 16, 2024 · 03ace46 · 03ace46
1 parent 708fc04
commit 03ace46
Showing 1 changed file with 103 additions and 69 deletions.
diff --git a/README.md b/README.md
@@ -6,15 +6,16 @@
 
 <p align="center">
 | <a href="http://storm.genie.stanford.edu"><b>Research preview</b></a> | <a href="https://arxiv.org/abs/2402.14207"><b>Paper</b></a> | <a href="https://storm-project.stanford.edu/"><b>Website</b></a> |
-
+</p>
 
 **Latest News** 🔥
 
+- [2024/07] You can now install our package with `pip install knowledge-storm`!
 - [2024/07] We add `VectorRM` to support grounding on user-provided documents, complementing existing support of search engines (`YouRM`, `BingSearch`). (check out [#58](https://github.com/stanford-oval/storm/pull/58))
 - [2024/07] We release demo light for developers a minimal user interface built with streamlit framework in Python, handy for local development and demo hosting (checkout [#54](https://github.com/stanford-oval/storm/pull/54))
 - [2024/06] We will present STORM at NAACL 2024! Find us at Poster Session 2 on June 17 or check our [presentation material](assets/storm_naacl2024_slides.pdf). 
-- [2024/05] We add Bing Search support in [rm.py](src/rm.py). Test STORM with `GPT-4o` - we now configure the article generation part in our demo using `GPT-4o` model.
-- [2024/04] We release refactored version of STORM codebase! We define [interface](src/interface.py) for STORM pipeline and reimplement STORM-wiki (check out [`src/storm_wiki`](src/storm_wiki)) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.
+- [2024/05] We add Bing Search support in [rm.py](knowledge_storm/rm.py). Test STORM with `GPT-4o` - we now configure the article generation part in our demo using `GPT-4o` model.
+- [2024/04] We release refactored version of STORM codebase! We define [interface](knowledge_storm/interface.py) for STORM pipeline and reimplement STORM-wiki (check out [`src/storm_wiki`](knowledge_storm/storm_wiki)) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.
 
 ## Overview [(Try STORM now!)](https://storm.genie.stanford.edu/)
 
@@ -46,25 +47,95 @@ Based on the separation of the two stages, STORM is implemented in a highly modu
 
 
 
-## Getting started
+## Installation
 
-### 1. Setup
 
-Below, we provide a quick start guide to run STORM locally.
+To install the knowledge storm library, use `pip install knowledge-storm`. 
 
+You could also install the source code which allows you to modify the behavior of STORM engine directly.
 1. Clone the git repository.
- ```shell
- git clone https://github.com/stanford-oval/storm.git
- cd storm
- ```
+  ```shell
+  git clone https://github.com/stanford-oval/storm.git
+  cd storm
+  ```
 
 2. Install the required packages.
  ```shell
  conda create -n storm python=3.11
  conda activate storm
  pip install -r requirements.txt
  ```
-3. Set up OpenAI API key (if you want to use OpenAI models to power STORM) and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
+
+
+## API
+The STORM knowledge curation engine is defined as a simple Python `STORMWikiRunner` class.
+
+As STORM is working in the information curation layer, you need to set up the information retrieval module and language model module to create a `STORMWikiRunner` instance. Here is an example of using You.com search engine and OpenAI models.
+```python
+import os
+from knowledge_storm import STORMWikiRunnerArguments, STORMWikiRunner, STORMWikiLMConfigs
+from knowledge_storm.lm import OpenAIModel
+from knowledge_storm.rm import YouRM
+
+
+lm_configs = STORMWikiLMConfigs()
+openai_kwargs = {
+ 'api_key': os.getenv("OPENAI_API_KEY"),
+ 'temperature': 1.0,
+ 'top_p': 0.9,
+}
+
+# STORM is a LM system so different components can be powered by different models to reach a good balance between cost and quality.
+# For a good practice, choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation. Choose a more powerful model for `article_gen_lm` to generate verifiable text with citations.
+gpt_35 = OpenAIModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
+gpt_4 = OpenAIModel(model='gpt-4-o', max_tokens=3000, **openai_kwargs)
+
+lm_configs.set_conv_simulator_lm(gpt_35)
+lm_configs.set_question_asker_lm(gpt_35)
+lm_configs.set_outline_gen_lm(gpt_4)
+lm_configs.set_article_gen_lm(gpt_4)
+lm_configs.set_article_polish_lm(gpt_4)
+
+
+# Check out the STORMWikiRunnerArguments class for more configurations.
+engine_args = STORMWikiRunnerArguments(...)
+
+rm = YouRM(ydc_api_key=os.getenv('YDC_API_KEY'), k=engine_args.search_top_k)
+
+runner = STORMWikiRunner(engine_args, lm_configs, rm)
+```
+
+Currently, our package support:
+- `OpenAIModel`, `AzureOpenAIModel`, `ClaudeModel`, `VLLMClient`, `TGIClient`, `TogetherClient` as language model components
+- `YouRM`, `BingSearch`, `VectorRM` as retrieval components
+
+:star2: **PRs for integrating more language models into [knowledge_storm/lm.py](knowledge_storm/lm.py) and search engines/retrievers into [knowledge_storm/rm.py](knowledge_storm/rm.py) are highly appreciated!**
+
+The `STORMWikiRunner` instance can be evoked with the simple `run` method:
+```python
+topic = input('Topic: ')
+runner.run(
+ topic=topic,
+ do_research=True,
+ do_generate_outline=True,
+ do_generate_article=True,
+ do_polish_article=True,
+)
+runner.post_run()
+runner.summary()
+```
+- `do_research`: if True, simulate conversations with difference perspectives to collect information about the topic; otherwise, load the results.
+- `do_generate_outline`: if True, generate an outline for the topic; otherwise, load the results.
+- `do_generate_article`: if True, generate an article for the topic based on the outline and the collected information; otherwise, load the results.
+- `do_polish_article`: if True, polish the article by adding a summarization section and (optionally) removing duplicate content; otherwise, load the results.
+
+
+## Quick Start with Example Scripts
+
+We provide scripts in our [examples folder](examples) as a quick start to run STORM with different configurations.
+
+**To run STORM with `gpt` family models with default configurations:**
+1. We suggest using `secrets.toml` to set up the API keys. Create a file `secrets.toml` under the root directory and add the following content:
  ```shell
  # Set up OpenAI API key.
  OPENAI_API_KEY="your_openai_api_key"
@@ -77,72 +148,31 @@ Below, we provide a quick start guide to run STORM locally.
  # Set up You.com search API key.
  YDC_API_KEY="your_youcom_api_key"
  ```
+2. Run the following command.
+ ```
+ python examples/run_storm_wiki_gpt.py \
+ --output-dir $OUTPUT_DIR \
+ --retriever you \
+ --do-research \
+ --do-generate-outline \
+ --do-generate-article \
+ --do-polish-article
+ ```
 
+**To run STORM using your favorite language models or grounding on your own corpus:** Check out [examples/README.md](examples/README.md).
 
-### 2. Running STORM-wiki locally
-
-**To run STORM with `gpt` family models with default configurations**: Make sure you have set up the OpenAI API key and run the following command.
-
-```
-python examples/run_storm_wiki_gpt.py \
- --output-dir $OUTPUT_DIR \
- --retriever you \
- --do-research \
- --do-generate-outline \
- --do-generate-article \
- --do-polish-article
-```
-- `--do-research`: if True, simulate conversation to research the topic; otherwise, load the results.
-- `--do-generate-outline`: If True, generate an outline for the topic; otherwise, load the results.
-- `--do-generate-article`: If True, generate an article for the topic; otherwise, load the results.
-- `--do-polish-article`: If True, polish the article by adding a summarization section and (optionally) removing duplicate content.
-
-
-We provide more example scripts under [`examples`](examples) to demonstrate how you can run STORM using your favorite language models or grounding on your own corpus.
-
-
-## Customize STORM 
 
-### Customization of the Pipeline
+## Customization of the Pipeline
 
-Besides running scripts in `examples`, you can customize STORM based on your own use case. STORM engine consists of 4 modules:
+If you have installed the source code, you can customize STORM based on your own use case. STORM engine consists of 4 modules:
 
 1. Knowledge Curation Module: Collects a broad coverage of information about the given topic.
 2. Outline Generation Module: Organizes the collected information by generating a hierarchical outline for the curated knowledge.
 3. Article Generation Module: Populates the generated outline with the collected information.
 4. Article Polishing Module: Refines and enhances the written article for better presentation.
 
-The interface for each module is defined in `src/interface.py`, while their implementations are instantiated in `src/storm_wiki/modules/*`. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).
+The interface for each module is defined in `knowledge_storm/interface.py`, while their implementations are instantiated in `knowledge_storm/storm_wiki/modules/*`. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).
 
-:star2: **You can share your customization of `Engine` by making PRs to this repo!**
-
-### Customization of Retriever Module
-
-As a knowledge curation engine, STORM grabs information from the Retriever module. The Retriever modules are implemented in [`src/rm.py`](src/rm.py). Currently, STORM supports the following retrievers:
-
-- `YouRM`: You.com search engine API
-- `BingSearch`: Bing Search API
-- `VectorRM`: a retrieval model that retrieves information from user provide corpus
-
-:star2: **PRs for integrating more search engines/retrievers are highly appreciated!**
-
-### Customization of Language Models
-
-STORM provides the following language model implementations in [`src/lm.py`](src/lm.py):
-
-- `OpenAIModel`
-- `ClaudeModel`
-- `VLLMClient`
-- `TGIClient`
-- `TogetherClient`
-
-:star2: **PRs for integrating more language model clients are highly appreciated!**
-
-:bulb: **For a good practice,** 
-
-- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
-- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
-- for open models, adding one-shot example can help it better follow instructions.
 
 Please refer to the scripts in the [`examples`](examples) directory for concrete guidance on customizing the language model used in the pipeline.
 
@@ -157,7 +187,7 @@ Please switch to the branch `NAACL-2024-code-backup`
 
 The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
 
-Run the following commands under [./src](src).
+Run the following commands under [./src](knowledge_storm).
 
 #### Pre-writing Stage
 For batch experiment on FreshWiki dataset:
@@ -196,7 +226,7 @@ python -m scripts.run_writing --input-source console --engine gpt-4 --do-polish-
 The generated article will be saved in `{output_dir}/{topic}/storm_gen_article.txt` and the references corresponding to citation index will be saved in `{output_dir}/{topic}/url_to_info.json`. If `--do-polish-article` is set, the polished article will be saved in `{output_dir}/{topic}/storm_gen_article_polished.txt`. 
 
 ### Customize the STORM Configurations
-We set up the default LLM configuration in `LLMConfigs` in [src/modules/utils.py](src/modules/utils.py). You can use `set_conv_simulator_lm()`,`set_question_asker_lm()`, `set_outline_gen_lm()`, `set_article_gen_lm()`, `set_article_polish_lm()` to override the default configuration. These functions take in an instance from `dspy.dsp.LM` or `dspy.dsp.HFModel`.
+We set up the default LLM configuration in `LLMConfigs` in [src/modules/utils.py](knowledge_storm/modules/utils.py). You can use `set_conv_simulator_lm()`,`set_question_asker_lm()`, `set_outline_gen_lm()`, `set_article_gen_lm()`, `set_article_polish_lm()` to override the default configuration. These functions take in an instance from `dspy.dsp.LM` or `dspy.dsp.HFModel`.
 
 
 ### Automatic Evaluation
@@ -224,7 +254,11 @@ For rubric grading, we use the [prometheus-13b-v1.0](https://huggingface.co/prom
 
 </details>
 
-## Contributions
+## Roadmap & Contributions
+Our team is actively working on:
+1. Human-in-the-Loop Functionalities: Supporting user participation in the knowledge curation process.
+2. Information Abstraction: Developing abstractions for curated information to support presentation formats beyond the Wikipedia-style report.
+
 If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!
 
 Contact person: [Yijia Shao](mailto:[email protected]) and [Yucheng Jiang](mailto:[email protected])