diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 29e1122872..c2aecb0711 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -932,6 +932,7 @@ netif newaxis newaxisngram nfcorpus +nfl ngram ngrams nlp diff --git a/gemini/getting-started/intro_gemini_2_0_flash_thinking_mode.ipynb b/gemini/getting-started/intro_gemini_2_0_flash_thinking_mode.ipynb new file mode 100644 index 0000000000..37af16fa2b --- /dev/null +++ b/gemini/getting-started/intro_gemini_2_0_flash_thinking_mode.ipynb @@ -0,0 +1,579 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sqi5B7V_Rjim" + }, + "outputs": [], + "source": [ + "# Copyright 2024 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VyPmicX9RlZX" + }, + "source": [ + "# Getting Started with the Gemini 2.0 Flash Thinking Mode\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"Vertex
Open in Vertex AI Workbench\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
\n", + "\n", + "
\n", + "\n", + "Share to:\n", + "\n", + "\n", + " \"LinkedIn\n", + "\n", + "\n", + "\n", + " \"Bluesky\n", + "\n", + "\n", + "\n", + " \"X\n", + "\n", + "\n", + "\n", + " \"Reddit\n", + "\n", + "\n", + "\n", + " \"Facebook\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8MqT58L6Rm_q" + }, + "source": [ + "| | |\n", + "|-|-|\n", + "| Author(s) | [Guillaume Vernade](https://github.com/giom-v), [Eric Dong](https://github.com/gericdong) |" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3w14yjWnPVD-" + }, + "source": [ + "## Overview\n", + "\n", + "Gemini 2.0 Flash with Thinking, is an experimental model that explicitly showcases its thoughts. Built on the speed and performance of Gemini 2.0 Flash, this model is trained to use thoughts in a way that leads to stronger reasoning capabilities.\n", + "\n", + "This tutorial demonstrates how you can use the Gemini 2.0 Flash Thinking mode to solve the following complex tasks that require multiple rounds of strategizing and iteratively solving.\n", + "\n", + "- Example 1: Code simplification\n", + "- Example 2: Geometry problem (with image)\n", + "- Example 3: Understanding the image of a table\n", + "- Example 4: Generating question for a specific level of knowledge\n", + "- Example 5: Statistics\n", + "- Example 6: Mathematical brain teaser\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gPiTOAHURvTM" + }, + "source": [ + "## Getting Started" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CHRZUpfWSEpp" + }, + "source": [ + "### Install Google Gen AI SDK for Python\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "sG3_LKsWSD3A" + }, + "outputs": [], + "source": [ + "%pip install --upgrade --quiet google-genai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HlMVjiAWSMNX" + }, + "source": [ + "### Authenticate your notebook environment (Colab only)\n", + "\n", + "If you are running this notebook on Google Colab, run the cell below to authenticate your environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "12fnq4V0SNV3" + }, + "outputs": [], + "source": [ + "import sys\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + " from google.colab import auth\n", + "\n", + " auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0Ef0zVX-X9Bg" + }, + "source": [ + "### Import libraries\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xBCH3hnAX9Bh" + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "from IPython.display import Markdown, display\n", + "from PIL import Image\n", + "from google import genai" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LymmEN6GSTn-" + }, + "source": [ + "### Set Google Cloud project information and create client\n", + "\n", + "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", + "\n", + "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Nqwi-5ufWp_B" + }, + "outputs": [], + "source": [ + "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\"}\n", + "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", + " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", + "\n", + "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "T-tiytzQE0uM" + }, + "outputs": [], + "source": [ + "client = genai.Client(\n", + " vertexai=True,\n", + " project=PROJECT_ID,\n", + " location=LOCATION,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "w0u6hYSleE0H" + }, + "source": [ + "## Use the Gemini 2.0 Flash Thinking Mode\n", + "\n", + "The following examples are some complex tasks of what the Gemini 2.0 Flash Thinking mode can solve. In each of examples you can try using different models to see how this new model compares to other models. In some cases, you'll still get the good answer from the other models, in that case, re-run it a couple of times and you'll see that Gemini 2.0 Thinking mode is more consistent thanks to its thinking step." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5M7EKckIYVFy" + }, + "source": [ + "### Set model ID\n", + "\n", + "See the [Google models](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models) page for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-coEslfWPrxo" + }, + "outputs": [], + "source": [ + "MODEL_ID = \"gemini-2.0-flash-thinking-exp-1219\" # @param {type: \"string\"}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IrRBg9UGC9nn" + }, + "source": [ + "### **Example 1**: Code simplification\n", + "\n", + "First, try with a simple code comprehension and simplification example." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "dLhhffx2C9nn" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID,\n", + " contents=\"How can I simplify this? `(Math.round(radius/pixelsPerMile * 10) / 10).toFixed(1);`\",\n", + ")\n", + "\n", + "print(response.candidates[0].content)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d6cOmdVPC9nn" + }, + "source": [ + "The model response has multiple parts. While you could use `response.text` to get all of it right away as usual it's actually more interesting to check each of them separately when using the thinking mode.\n", + "\n", + "The first part is the \"inner thoughts\" of the model, that where it analyzes the problem and comes up with its strategy. Field `thought` indicates if the part is thought from the model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "c2e4f7b7cc2c" + }, + "outputs": [], + "source": [ + "print(response.candidates[0].content.parts[0].thought)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OgaDQSL2C9nn" + }, + "outputs": [], + "source": [ + "Markdown(response.candidates[0].content.parts[0].text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ig6POLZnC9no" + }, + "source": [ + "Most of the time you won't need to check the thoughts as you'll be mostly interested in the answer, but having access to them gives you a way to check where the answers come from and how the model comes up with it. It's not a black box anymore!\n", + "\n", + "Then the second part is the actual answer:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7cbb7f5d9faf" + }, + "outputs": [], + "source": [ + "print(response.candidates[0].content.parts[1].thought)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BJ2aYsAhC9no" + }, + "outputs": [], + "source": [ + "Markdown(response.candidates[0].content.parts[1].text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3ZqndSMtC9no" + }, + "source": [ + "As a comparison here's what you'd get with the \"classic\" [Gemini 2.0 Flash](https://cloud.google.com/vertex-ai/generative-ai/docs/gemini-v2) model.\n", + "\n", + "Unlike thinking mode, the normal model does not articulate its thoughts and tries to answer right away which can lead to more simpler answers to complex problems." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pY1colD0C9no" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=\"gemini-2.0-flash-exp\",\n", + " contents=\"How can I simplify this? `(Math.round(radius/pixelsPerMile * 10) / 10).toFixed(1);`\",\n", + ")\n", + "\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JmxrcZPfC9no" + }, + "source": [ + "### **Example 2**: Geometry problem (with image)\n", + "\n", + "This geometry problem requires complex reasoning and is also using Gemini multimodal capabilities to read the image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "8ksBfYPZC9no" + }, + "outputs": [], + "source": [ + "!wget https://storage.googleapis.com/generativeai-downloads/images/geometry.png -O geometry.png -q\n", + "\n", + "im = Image.open(\"geometry.png\").resize((256, 256))\n", + "im" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "80f2vzWxC9no" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID, contents=[im, \"What's the area of the overlapping region?\"]\n", + ")\n", + "\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9_WlcJZjC9np" + }, + "source": [ + "### **Example 3**: Understanding the image of a table\n", + "\n", + "Here's another example based on an image, this time the difficulty is to understand the table and add all these numbers correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "-G0d2an5C9np" + }, + "outputs": [], + "source": [ + "!wget https://storage.googleapis.com/generativeai-downloads/images/nfl.png -O nfl.png -q\n", + "\n", + "im = Image.open(\"nfl.png\")\n", + "im" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "rexKogjfC9np" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID, contents=[im, \"Who is going to win this week?\"]\n", + ")\n", + "\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UriRl34GC9np" + }, + "source": [ + "### **Example 4**: Generating question for a specific level of knowledge\n", + "\n", + "This time, the questions require a few types of knowledge, including what is relevant to the Physics C exam. The questions generated are not the interesting part, but the reasoning to come up with them shows they are not just randomly generated.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "MU3FAUqcC9np" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID,\n", + " contents=\"Give me a practice question I can use for the AP Physics C exam?\",\n", + ")\n", + "\n", + "Markdown(response.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FnDwUxI-fRWo" + }, + "source": [ + "### **Example 5**: Statistics\n", + "\n", + "Here's a new mathematical problem. Once again, what's interesting is not the answer (as you might know it already) but how the model is coming up with it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "hQozYPZzgXRE" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID,\n", + " contents=\"You repeatedly flipped a coin until you either flip three heads, or heads tails heads. Which is more likely to happen first?\",\n", + ")\n", + "\n", + "display(Markdown(\"### Thoughts\"))\n", + "display(Markdown(response.candidates[0].content.parts[0].text))\n", + "display(Markdown(\"### Answer\"))\n", + "display(Markdown(response.candidates[0].content.parts[1].text))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "m9RG4DRCm1vY" + }, + "source": [ + "### **Example 6**: Mathematical brain teaser" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "A0_4uP5Wm6yx" + }, + "outputs": [], + "source": [ + "response = client.models.generate_content(\n", + " model=MODEL_ID,\n", + " contents=\"Add mathematical operations (additions, substractions, multiplications) to get 746 using these numbers only once: 8, 7, 50, and 4\",\n", + ")\n", + "\n", + "display(Markdown(\"### Thoughts\"))\n", + "display(Markdown(response.candidates[0].content.parts[0].text))\n", + "display(Markdown(\"### Answer\"))\n", + "display(Markdown(response.candidates[0].content.parts[1].text))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lND4jB6MrsSk" + }, + "source": [ + "## Next Steps\n", + "\n", + "- Explore the Vertex AI [Cookbook](https://cloud.google.com/vertex-ai/generative-ai/docs/cookbook) for a curated, searchable gallery of notebooks for Generative AI.\n", + "- Explore other notebooks and samples in the [Google Cloud Generative AI repository](https://github.com/GoogleCloudPlatform/generative-ai)." + ] + } + ], + "metadata": { + "colab": { + "name": "intro_gemini_2_0_flash_thinking_mode.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}