diff --git a/examples/math/gsm8k/gsm8k_assertions.ipynb b/examples/math/gsm8k/gsm8k_assertions.ipynb index 2a61898e6..076d9ce61 100644 --- a/examples/math/gsm8k/gsm8k_assertions.ipynb +++ b/examples/math/gsm8k/gsm8k_assertions.ipynb @@ -1,36 +1,25 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "collapsed_sections": [ - "ghG4tCe3e-kQ", - "xJcHuvwZ-rUk" - ] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "zdbPo0MbQ4R7" + }, "source": [ "\"DSPy7\n", "\n", - "### **SolveGSM8k**: Solving grade school math problems using DSPy" - ], - "metadata": { - "id": "zdbPo0MbQ4R7" - } + "### **SolveGSM8k**: Solving grade school math problems using DSPy\n", + "\n", + "\n", + " \"Open\n", + "" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "6Aq1HjnZdV0e" + }, "source": [ "This notebook builds upon the foundational concepts of the DSPy framework. DSPy overs a novel programming-centric approach to utilizing language and retrieval models. It offers a unique blend of prompting, reasoning, fine-tuning, and tool augmentation, all encapsulated under a minimalistic Python syntax.\n", "\n", @@ -41,26 +30,26 @@ "3. Constrain DSPy program's behavior with runtime DSPy assertions and suggestions.\n", "2. Optimize the DSPy program with in-context learning and prompt tuning.\n", "\n" - ], - "metadata": { - "id": "6Aq1HjnZdV0e" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NckBcbZN9p6y" + }, + "outputs": [], "source": [ "# Set up your API key for OpenAI\n", "import os\n", "os.environ[\"OPENAI_API_KEY\"]=\"Paste your key here\"" - ], - "metadata": { - "id": "NckBcbZN9p6y" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "ghG4tCe3e-kQ" + }, "source": [ "### Step -1. **Installing Cache and DSPy** (Run the collapsed cells)\n", "\n", @@ -68,10 +57,7 @@ "\n", "The first cell ensures all the following LM calls in this notebook will be using the cached OpenAI's API result. Removing this step might sigificantly increase the running time of all the following DSPy programs depending on your OpenAI account setup.\n", "\n" - ], - "metadata": { - "id": "ghG4tCe3e-kQ" - } + ] }, { "cell_type": "code", @@ -101,15 +87,20 @@ }, { "cell_type": "markdown", - "source": [ - "Before we start, we install DSPy and all dependencies." - ], "metadata": { "id": "BiS0I8Oc9HDC" - } + }, + "source": [ + "Before we start, we install DSPy and all dependencies." + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u8WPRleVRkln" + }, + "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", @@ -130,88 +121,86 @@ "from dspy.evaluate import Evaluate\n", "from dspy.datasets.gsm8k import GSM8K, gsm8k_metric\n", "from dspy.teleprompt import BootstrapFewShotWithRandomSearch" - ], - "metadata": { - "id": "u8WPRleVRkln" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "xJcHuvwZ-rUk" + }, "source": [ "### Step 0. **Getting Started** (Run the collapsed cells)\n", "\n", "We'll start by importing our dataset GSM8K, a dataset containing 8.5K high quality linguistically diverse grade school math word problems created by human problem writers. We preshuffled the dataset, and divided it into three smaller sets - train set, dev set (validation set), and test set. For simplicity, we will be using the train set and dev set.\n", "\n", "If you would like to inspect the dataset and see how we setup DSPy to use OpenAI gpt3, expand this step.\n" - ], - "metadata": { - "id": "xJcHuvwZ-rUk" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ZQDVRJ4wgFQx" + }, + "outputs": [], "source": [ "gms8k = GSM8K()\n", "trainset, devset = gms8k.train, gms8k.dev\n", "len(trainset), len(devset)" - ], - "metadata": { - "id": "ZQDVRJ4wgFQx" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "Now we can inspect some examples." - ], "metadata": { "id": "cYMPUeeGgg9_" - } + }, + "source": [ + "Now we can inspect some examples." + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "x3XWN_wfhgGW" + }, + "outputs": [], "source": [ "math_problem_example = devset[10]\n", "\n", "print(f\"Question: {math_problem_example.question}\\n\")\n", "print(f\"Gold Reasoning: {math_problem_example.gold_reasoning}\\n\")\n", "print(f\"Answer: {math_problem_example.answer}\")" - ], - "metadata": { - "id": "x3XWN_wfhgGW" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "js5bCUuYcAks" + }, "source": [ "Then we set up the language model (LM). **DSPy** supports multiple API and local models. In this notebook, we'll work with GPT-3.5 (`gpt-3.5-turbo`).\n", "\n", "We configure **DSPy** to use the turbo LM (`gpt-3.5-turbo`) by default. This can be overwritten for local parts of programs if needed." - ], - "metadata": { - "id": "js5bCUuYcAks" - } + ] }, { "cell_type": "code", - "source": [ - "turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=500)\n", - "dspy.settings.configure(lm=turbo)" - ], + "execution_count": null, "metadata": { "id": "DBQg4TQBS7GH" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "turbo = dspy.OpenAI(model='gpt-3.5-turbo', max_tokens=500)\n", + "dspy.settings.configure(lm=turbo)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "bCTCd0l1kQda" + }, "source": [ "### Step 1. **First DSPy program**\n", "\n", @@ -220,10 +209,7 @@ "This allows you to focus on the information flow of your pipeline. **DSPy** will then take your program and automatically optimize **how to prompt** (or finetune) LMs **for your particular pipeline** so it works well.\n", "\n", "If you have experience with PyTorch, you can think of DSPy as the PyTorch of the foundation model space. Before we see this in action, let's first understand some key pieces." - ], - "metadata": { - "id": "bCTCd0l1kQda" - } + ] }, { "cell_type": "markdown", @@ -246,89 +232,92 @@ }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ue7--j2Tk5qe" + }, + "outputs": [], "source": [ "class SimpleMathSignature(dspy.Signature):\n", " \"\"\"Answer the math question.\"\"\"\n", "\n", " question = dspy.InputField(desc=\"A simple math question.\")\n", " answer = dspy.OutputField(desc=\"The answer to the math question.\")" - ], - "metadata": { - "id": "Ue7--j2Tk5qe" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "P-lX8Grsl9wT" + }, "source": [ "In `SimpleMathSignature`, the docstring describes the sub-task here (i.e., answering math questions). Each `InputField` or `OutputField` can optionally contain a description `desc` too. When it's not given, it's inferred from the field's name (e.g., `question`).\n", "\n", "Notice that there isn't anything special about this signature in **DSPy**. We can just as easily define a signature that takes a long snippet from a PDF and outputs structured information, for instance.\n", "\n", "One trick for DSPy signature is that when it only contains simple fields performing straightforward tasks, we can replace the whole class definition with a syntactic sugar `question -> answer`. Now, lets define our first DSPy program with DSPy predictor. A predictor is a module that knows how to use the LM to implement a signature. Importantly, predictors can learn to fit their behavior to the task!" - ], - "metadata": { - "id": "P-lX8Grsl9wT" - } + ] }, { "cell_type": "code", - "source": [ - "basic_math_solver = dspy.Predict(\"question -> answer\") # Alternatively, we can write dspy.Predict(SimpleMathSignature)" - ], + "execution_count": null, "metadata": { "id": "D3A1CH3foAFv" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "basic_math_solver = dspy.Predict(\"question -> answer\") # Alternatively, we can write dspy.Predict(SimpleMathSignature)" + ] }, { "cell_type": "markdown", - "source": [ - "**`DSPy.Predict`** is the simplest DSPy predictor. Now, we can call this minimal _program_ with a hand crafted question:" - ], "metadata": { "id": "_GBP5jBUpRQP" - } + }, + "source": [ + "**`DSPy.Predict`** is the simplest DSPy predictor. Now, we can call this minimal _program_ with a hand crafted question:" + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "w1rZSbu2pQY3" + }, + "outputs": [], "source": [ "prediction = basic_math_solver(question=\"What is 1+1+1?\")\n", "\n", "print(f\"Answer: {prediction.answer}\")" - ], - "metadata": { - "id": "w1rZSbu2pQY3" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "WxFlRkxVqPw_" + }, "source": [ "In the example above, we asked the predictor a simple math question \"What is 1+1+1?\". The model outputs an answer (\"3\").\n", "\n", "For visibility, we can inspect how this extremely basic predictor implemented our signature. Let's inspect the history of our LM (**turbo**)." - ], - "metadata": { - "id": "WxFlRkxVqPw_" - } + ] }, { "cell_type": "code", - "source": [ - "_ = turbo.inspect_history(n=1)" - ], + "execution_count": null, "metadata": { "id": "7qMBWy-8qO0t" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "_ = turbo.inspect_history(n=1)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "VllCRN6tq-ty" + }, "source": [ "Great. Now let's define the actual program. This is a class that inherits from `dspy.Module`.\n", "\n", @@ -336,13 +325,15 @@ "\n", "- The `__init__` method will simply declare the sub-modules it needs: This time, we will be using a fancier predictor that implementes Chain-of-Thought prompting `dspy.ChainOfThought`. `dspy.ChainOfThought` will add another field called \"rationale\" as output to help the model think step-by-step.\n", "- The `forward` method will describe the control flow of answering the question using the modules we have (here, we just have one)." - ], - "metadata": { - "id": "VllCRN6tq-ty" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kpxVuUblVvP7" + }, + "outputs": [], "source": [ "class SimpleMathSolver(dspy.Module):\n", " def __init__(self):\n", @@ -353,90 +344,88 @@ " return pred\n", "\n", "simple_math_solver = SimpleMathSolver()" - ], - "metadata": { - "id": "kpxVuUblVvP7" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", + "metadata": { + "id": "sUmsMEGtrjm7" + }, "source": [ "#### **Exercise**\n", "Create your own math problem, use the math solver we just defined. Then, inspect the trace of the LM with `turbo.inspect_history` to see what has changed compared to the `dspy.Predict` predictor." - ], - "metadata": { - "id": "sUmsMEGtrjm7" - } + ] }, { "cell_type": "code", - "source": [ - "### Fill this code cell" - ], + "execution_count": null, "metadata": { "id": "cXUH1WaOsIEr" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "### Fill this code cell" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "pVC0KPuvsoMD" + }, "source": [ "We can now evaluate our simple math solver on the validation set.\n", "\n", "For a start, let's evaluate the accuracy of the predicted answer. We provide a simple metric function called `gsm8k_metric`, which essentially extract the numerical answer from the model input." - ], - "metadata": { - "id": "pVC0KPuvsoMD" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EBOdvzCGTBYD" + }, + "outputs": [], "source": [ "evaluate = Evaluate(devset=devset[:], metric=gsm8k_metric, num_threads=16, display_progress=True, display_table=5)\n", "\n", "evaluate(simple_math_solver)" - ], - "metadata": { - "id": "EBOdvzCGTBYD" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "### Step 2. **Adding constraints with DSPy Assertions**" - ], "metadata": { "id": "CNFkEVkvtaWz" - } + }, + "source": [ + "### Step 2. **Adding constraints with DSPy Assertions**" + ] }, { "cell_type": "markdown", - "source": [ - "We have **61.67%** on our validation set, not bad! But we also noticed two things: 1). many answers are sentences rather than the numerical result we want. Although we are able to parse most of the answers within `gsm8k_metric`, generating irrevalent tokens as answers might negatively affect the overall accuracy; 2). some of the reasoning might not contain the desired computational steps as in the example below." - ], "metadata": { "id": "rFKOzC4NxkSe" - } + }, + "source": [ + "We have **61.67%** on our validation set, not bad! But we also noticed two things: 1). many answers are sentences rather than the numerical result we want. Although we are able to parse most of the answers within `gsm8k_metric`, generating irrevalent tokens as answers might negatively affect the overall accuracy; 2). some of the reasoning might not contain the desired computational steps as in the example below." + ] }, { "cell_type": "code", - "source": [ - "simple_math_solver(devset[0].question)\n", - "_ = turbo.inspect_history()" - ], + "execution_count": null, "metadata": { "id": "EMrMDwRXAdqi" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "simple_math_solver(devset[0].question)\n", + "_ = turbo.inspect_history()" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "90BxOnIrAgnD" + }, "source": [ "Forturnately, in DSPy, we can utilize a simple yet powerful construct called **LM Assertions** to constrain the output of LMs. For example, here, we can say:\n", "\n", @@ -448,23 +437,25 @@ "\n", "LM assertions in DSPy could either be a hard constraint `Assert` or a soft constraint `Suggest`. LM assertions accept two argument, one is the predicate to be tested, similar to that of traditional assertions; then, we also require an additional \"error message\" to guide the language model to refine itself when failing.\n", "\n" - ], - "metadata": { - "id": "90BxOnIrAgnD" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "W2dBQ-g3YvO9" + }, "source": [ "#### Math Solver with Suggestions\n", "We can encode the two observations we have into two suggestions, and add them to the `SimpleMathSolver`:" - ], - "metadata": { - "id": "W2dBQ-g3YvO9" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "UDCY7Jw3ys86" + }, + "outputs": [], "source": [ "def extract_number(question):\n", " numbers = [int(s) for s in question.split() if s.isdigit()]\n", @@ -488,56 +479,54 @@ " return pred\n", "\n", "simple_math_solver_suggest = SimpleMathSolverWithSuggest().activate_assertions()" - ], - "metadata": { - "id": "UDCY7Jw3ys86" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "Now we can rerun our math solver on the first question, and see how LM assertions in DSPy internally fix these errors." - ], "metadata": { "id": "VWuOTcUIZLtj" - } + }, + "source": [ + "Now we can rerun our math solver on the first question, and see how LM assertions in DSPy internally fix these errors." + ] }, { "cell_type": "code", - "source": [ - "simple_math_solver_suggest(devset[0].question)\n", - "_ = turbo.inspect_history(n=3)" - ], + "execution_count": null, "metadata": { "id": "bANEA2jmFXpM" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "simple_math_solver_suggest(devset[0].question)\n", + "_ = turbo.inspect_history(n=3)" + ] }, { "cell_type": "markdown", - "source": [ - "Finally, let's evaluate the performance of `simple_math_solver_suggest`:" - ], "metadata": { "id": "jzHHqOA0Zb9x" - } + }, + "source": [ + "Finally, let's evaluate the performance of `simple_math_solver_suggest`:" + ] }, { "cell_type": "code", - "source": [ - "evaluate(simple_math_solver_suggest)" - ], + "execution_count": null, "metadata": { "id": "ASzhftKkZbHx" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "evaluate(simple_math_solver_suggest)" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "35xoJc9lye7x" + }, "source": [ "### Step 3. **Compiling DSPy programs with optimizers**\n", "\n", @@ -554,83 +543,83 @@ "3. A few **training inputs**. This may be very small (i.e., only 5 or 10 examples) and incomplete (only inputs to your program, without any labels).\n", "\n", "If you happen to have a lot of data, DSPy can leverage that. But you can start small and get strong results.\n" - ], - "metadata": { - "id": "35xoJc9lye7x" - } + ] }, { "cell_type": "markdown", + "metadata": { + "id": "FV5worzCvkyA" + }, "source": [ "In this turtorial, we demonstrate one DSPy optimizer called `BootstrapFewShotWithRandomSearch`, which bootstraps demonstrations from the training set and search for the best combination of demonstrations. Two things to note here:\n", "1. Most optimizers work with LM assertions.\n", "2. This step is time/compute intensive. Therefore we cached the API calls. The good thing is, once you optmized the program, you can save the compiled DSPy program and reuse it later!" - ], - "metadata": { - "id": "FV5worzCvkyA" - } + ] }, { "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xF4Y2LMxnR1j" + }, + "outputs": [], "source": [ "optimizer = BootstrapFewShotWithRandomSearch(gsm8k_metric, max_bootstrapped_demos=3, max_labeled_demos=6, num_candidate_programs=6)\n", "\n", "compiled_prog = optimizer.compile(student=simple_math_solver, trainset=trainset[:], valset=devset[:100])\n", "compiled_prog_suggest = optimizer.compile(student=simple_math_solver_suggest, trainset=trainset[:], valset=devset[:100])\n" - ], - "metadata": { - "id": "xF4Y2LMxnR1j" - }, - "execution_count": null, - "outputs": [] + ] }, { "cell_type": "code", - "source": [ - "# Evaluating compiled program\n", - "evaluate(compiled_prog)" - ], + "execution_count": null, "metadata": { "id": "xiKwKBPJnEN6" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "# Evaluating compiled program\n", + "evaluate(compiled_prog)" + ] }, { "cell_type": "code", - "source": [ - "# Evaluating compiled program with suggestions\n", - "evaluate(compiled_prog_suggest)" - ], + "execution_count": null, "metadata": { "id": "nRRDcsV-AF8_" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "# Evaluating compiled program with suggestions\n", + "evaluate(compiled_prog_suggest)" + ] }, { "cell_type": "markdown", - "source": [ - "Now we inspect on our previous example, and see how the optmizer tunes the prompt:" - ], "metadata": { "id": "Mz4u8j-k2lNY" - } + }, + "source": [ + "Now we inspect on our previous example, and see how the optmizer tunes the prompt:" + ] }, { "cell_type": "code", - "source": [ - "compiled_prog(devset[0].question)\n", - "_ = turbo.inspect_history()" - ], + "execution_count": null, "metadata": { "id": "7Esd1Ry92vIt" }, - "execution_count": null, - "outputs": [] + "outputs": [], + "source": [ + "compiled_prog(devset[0].question)\n", + "_ = turbo.inspect_history()" + ] }, { "cell_type": "markdown", + "metadata": { + "id": "UGG4S1TpUnFz" + }, "source": [ "### More DSPy turtorials\n", "\n", @@ -640,10 +629,25 @@ "4. ... more on [DSPy github](https://github.com/stanfordnlp/dspy)\n", "\n", "#### Contact: Shangyin Tan (shangyin@berkeley.edu)" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [ + "ghG4tCe3e-kQ", + "xJcHuvwZ-rUk" ], - "metadata": { - "id": "UGG4S1TpUnFz" - } + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" } - ] -} \ No newline at end of file + }, + "nbformat": 4, + "nbformat_minor": 0 +}