Skip to content

Applying the latest advancements in AI and machine learning to solve complex business problems.

License

Notifications You must be signed in to change notification settings

sachins-eng/neo-sophia

 
 

Repository files navigation

Create you AI or LLM Strategy

Large-scale language models like GPT-4 from OpenAI serve as foundational technologies that can be applied to virtually any business issue. However, the robust power and flexibility of this technology come with a significant challenge: it is extremely difficult to pinpoint the optimal opportunities for leveraging this technology within a company.

This project is designed to assist analytics leaders, product managers, and development teams in surmounting these obstacles by demonstrating the technology's application across a variety of common business problems. The project unfolds through a series of episodes, each accompanied by the following resources:

  1. Walkthrough videos available on Prolego's YouTube Channel.
  2. Tagged releases on the main branch of this repository.
  3. Conversations held within Prolego's Discord community.

Capabilities approach

We advise our clients to take a capabilties-based approach when building their AI. That is, create foundational solutions that allow you to solve many different business use cases. Unfortunately too many teams begin solving specifing business problems withough building a generalizable foundation.

Most companies are developing the following capabilties as part of their AI strategy.

Capability Explanation Examples
text classification Assigning categories to documents or document sections. Episode 2
information extraction Pulling out names, places, or specific sections from documents. Episode 2
semantic search Finding information based on its meaning instead of keywords. Episode 1, Episode 4
information summarization Condense extensive documents into concise and essential highlights.
information comparison Identifying similar documents or sections of documents.
document generation Creating precisely written content consistent with style and needs. Often includes a review step. Episode 2
unified natural language query Empowering anyone to get answers to questions about data and documents without SQL or tools. Episode 3, Episode 4, Episode 5
routine task automation Automating analysis of information from various sources, reasoning across them, and making decisions.

Quickstart installation

First install the neo-sophia code on your local machine before proceeding to the examples from the Episodes below.

Install the base neo-sophia code

git clone https://github.com/prolego-team/neo-sophia.git
conda env create -f neo-sophia/env.yml
conda activate neosophia 
pip install -e neo-sophia
cd neo-sophia
cp config_example.json config.json
cp openai_api_key_example.txt openai_api_key.txt

Setup your local configuration.

  1. Change the path locations in config.json or use the defaults.
  2. Add your OpenAI API key to openai_api_key.txt.

Run the tests.

./test.sh

If the tests pass you are ready to run the code in one of the Episodes.

Questions? Just ask in our Discord Community.

Episodes

Episode 5 - Unified NLQ with multiple tables

Episode 4 - Unified NLQ with a table and document

Episode 3 - Intro to Unified Natural Language Query

What will be the AI "killer app" in the enterprise? Our bet is Unified Natural Language Query (NQL). It gives executives and business leaders the ability to get insights from data by asking "natural" questions, similar to how you currently use ChatGPT. In this Episode we describe the business problem and show the extensible power of a simple example of SQL generation supplemented with the reasoning power of an LLM like GPT-4.

Videos (coming soon!)

  • [Unified Natural Language Query is the enerprise AI "killer app"]
  • [SQL generation and interpretation with LLMs]
  1. Checkout Episode 3, Release v0.3.2
    git checkout tags/v0.3.2
  1. Start the demo by running
    python -m examples.sqlite_chat

Episode 2 - Automated document processing

Every company has businesses processes that require ingesting and processing a stream of text documents. Most of this processing requires tedious human effort to find, edit, review, summarize, score, etc. chunks of text from larger documents. In this Episode we demonstrate a generalized approach for solving many of these problems using LLMs. The example takes a set of SEC 10-Q company filings and replaces the "Basis of Presentation" section with different text based on an editable templates.

Videos (coming soon!)

  • [Your AI strategy “quick win” - automated document processing]
  • [Audomated document processing - technical walkthrough]

Running the Episode 2 application

  1. Checkout Episode 2, Release v0.2.0
    git checkout tags/v0.2.0
  1. Start the demo by running
    python -m examples.generate_10q_basis

Episode 1 - Embeddings, Semantic search and Document Q&A

Most companies are struggling to pick the best AI use cases from many different options. By building a core competency in document embeddings you can begin developing a set of capabilities applicable for many enterprise use cases. In Episoide 1 we provide a primer on embeddings for a business audience and demonstrate the use of embeddings in semantic search and document Q&A.

This episode uses data from the MSRB Regulatory Rulebook

Videos

Running the Episode 1 application

  1. Checkout Episode 1, Release v0.1.1
    git checkout tags/v0.1.1
  1. Extract text from the MSRB Rulebook:
    python -m scripts.download_and_extract_msrb
  1. Start the demo by running
    python -m examples.interface

About Prolego & Neo Sophia

Prolego is an AI services company that started in 2017 and has helped some of the world’s biggest companies generate opportunities with AI. "Prolego" is the Greek word for "predict". We needed a name for this repo and decided to use the Greek words for "new" (neo) and "wisdom" (sophia). And we just thought that Neo Sophia sounded cool.

The team:

About

Applying the latest advancements in AI and machine learning to solve complex business problems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%