forked from microsoft/UFO
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request microsoft#38 from microsoft/pre-release
release v0.0.1
- Loading branch information
Showing
51 changed files
with
4,726 additions
and
897 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,28 @@ | ||
# Ignore login file | ||
*.bin | ||
|
||
# Ignore Jupyter Notebook checkpoints | ||
.ipynb_checkpoints | ||
/test/* | ||
/deprecated/* | ||
/test/*.ipynb | ||
/logs/* | ||
__pycache__/ | ||
**/__pycache__/ | ||
*.pyc | ||
|
||
# Ignore the config file | ||
ufo/config/config.yaml | ||
ufo/config/config_llm.yaml | ||
|
||
|
||
# Ignore the helper files | ||
ufo/rag/app_docs/* | ||
learner/records.json | ||
vectordb/docs/* | ||
vectordb/experience/* | ||
|
||
# Don't ignore the example files | ||
!vectordb/docs/example/ | ||
|
||
.vscode |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
|
||
# Enhancing UFO with RAG using Offline Help Documents | ||
|
||
|
||
## How to Prepare Your Help Documents ❓ | ||
|
||
### Step 1: Prepare Your Help Doc and Metadata | ||
|
||
UFO currently supports processing help documents in XML format, as this is the default format for official help documents of Microsoft apps. More formats will be supported in the future. | ||
|
||
You can write a dedicated document for a specific task of an app in a file named, for example, `task.xml`. Note that it should be accompanied by a metadata file with the same prefix, but with the `.meta` extension, i.e., `task.xml.meta`. This metadata file should have a `title` describing the task at a high level and a `Content-Summary` field summarizing the content of the help document. These two files are used for similarity search with user requests, so please write them carefully. The [ppt-copilot.xml](./doc_example/ppt-copilot.xml) and [ppt-copilot.xml.meta](./doc_example/ppt-copilot.xml.meta) are examples of a help document and its metadata. | ||
|
||
### Step 2: Prepare Your Help Document Set | ||
|
||
Once you have all help documents and metadata ready, put all of them into a folder. There can be sub-folders for the help documents, but please ensure that each help document and its corresponding metadata **are placed in the same directory**. | ||
|
||
|
||
## How to Create an Indexer for Your Help Document Set ❓ | ||
|
||
|
||
Once you have all documents ready in a folder named `path_of_the_docs`, you can easily create an offline indexer to support RAG for UFO. Follow these steps: | ||
|
||
```console | ||
# assume you are in the cloned UFO folder | ||
python -m learner --app <app_name> --docs <path_of_the_docs> | ||
``` | ||
Replace `app_name` with the name of the application, such as PowerPoint or WeChat. | ||
> Note: Ensure the `app_name` is accurately defined as it is used to match the offline indexer in online RAG. | ||
Replace `path_of_the_docs` with the full path to the folder containing all your documents. | ||
|
||
This command will create an offline indexer for all documents in the `path_of_the_docs` folder using Faiss and embedding with sentence transformer (more embeddings will be supported soon). The created index by default will be placed [here](../vectordb/docs/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
|
||
from . import learn | ||
|
||
if __name__ == "__main__": | ||
# Execute the main script | ||
learn.main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
# Copyright (c) Microsoft Corporation. | ||
# Licensed under the MIT License. | ||
from . import utils | ||
|
||
|
||
class BasicDocumentLoader: | ||
""" | ||
A class to load documents from a list of files with a given extension list. | ||
""" | ||
|
||
def __init__(self, extensions: str = None, directory: str = None): | ||
""" | ||
Create a new BasicDocumentLoader. | ||
:param extensions: The extensions to load. | ||
""" | ||
self.extensions = extensions | ||
self.directory = directory | ||
|
||
|
||
def load_file_name(self): | ||
""" | ||
Load the documents from the given directory. | ||
:param directory: The directory to load from. | ||
:return: The list of loaded documents. | ||
""" | ||
return utils.find_files_with_extension(self.directory, self.extensions) | ||
|
||
|
||
def construct_document_list(self): | ||
""" | ||
Load the metadata from the given directory. | ||
:param directory: The directory to load from. | ||
:return: The list of metadata for the loaded documents. | ||
""" | ||
pass | ||
|
||
|
||
|
||
|
Oops, something went wrong.