The Gemini AI Toolkit makes it easy to use Google's 'Gemini' language models for creating chatbots, generating text, analyzing images and transcribing audio. It's designed for everyone, from beginners to experienced developers, allowing quick addition of AI features to projects with simple commands. While it offers simplicity and lightweight integration, it doesn't compromise on power; experienced developers can access the full suite of advanced options available via the API, ensuring robust customization and control. This toolkit is perfect for those looking to efficiently tap into advanced AI without getting bogged down in technical details, yet it still provides the depth needed for complex project requirements.
- Conversational AI: Create interactive, real-time chat experiences (chatbots) or AI assistants.
- Image Captioning: Generate detailed descriptions and insights or create captions from images.
- Audio Transcription: Convert audio files into transcripts or analyze their content seamlessly.
- Text Creation: Produce coherent and contextually relevant text and answers from simple prompts.
- Highly Customizable: Tailor settings like streaming, JSON outputs, system prompts and more to suit your specific requirements.
- Lightweight Integration: Efficiently designed with minimal dependencies, requiring only the
requests
package for core functionality.
Python 3.x
- An API key from Google AI Studio
The following Python packages are required:
requests
: For making HTTP requests to Google's Gemini API.
The following Python packages are optional:
python-dotenv
: For managing API keys and other environment variables.
To use the Gemini AI Toolkit, clone the repository to your local machine and install the required Python packages.
Clone the repository:
git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git
Navigate to the repositories folder:
cd gemini-ai-toolkit
Install the required dependencies:
pip install -r requirements.txt
-
Obtain an API key from Google AI Studio.
-
You have three options for managing your API key:
Click here to view the API key configuration options
-
Setting it as an environment variable on your device (recommended for everyday use)
- Navigate to your terminal.
- Add your API key like so:
export GEMINI_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI.
-
Using an .env file (recommended for development):
- Install python-dotenv if you haven't already:
pip install python-dotenv
. - Create a .env file in the project's root directory.
- Add your API key to the .env file like so:
GEMINI_API_KEY=your_api_key
This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.
- Install python-dotenv if you haven't already:
-
Direct Input:
-
If you prefer not to use a
.env
file, you can directly pass your API key as an argument to the CLI or the wrapper functions.CLI
--api_key "your_api_key"
Wrapper
api_key="your_api_key"
This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.
-
-
The Gemini AI Toolkit can be used in four different modes: Chat
, Text
, Vision
and Audio
. Each mode is designed for specific types of interactions with the Gemini models.
Chat mode is intended for chatting with an AI model (similar to a chatbot) or building conversational applications.
CLI
python cli.py --chat
Wrapper
from gemini import Chat
Chat().run()
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Text mode is suitable for generating text content based on a provided prompt.
CLI
python cli.py --text --prompt "Write a story about a magic backpack."
Wrapper
from gemini import Text
Text().run(prompt="Write a story about a magic backpack.")
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Vision mode allows for generating text based on a combination of text prompts and images.
CLI
python cli.py --vision --prompt "Describe the image with a creative description." --media "https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg"
Wrapper
from gemini import Vision
Vision().run(prompt="Describe the image with a creative description.", media="https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg")
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Audio mode allows for generating text based on a combination of text prompts and audio.
CLI
python cli.py --audio --prompt "Listen carefully to the following audio file. Provide a brief summary." --media "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3"
Wrapper
from gemini import Audio
Audio().run(prompt="Listen carefully to the following audio file. Provide a brief summary.", media="https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3")
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Description | CLI Flags | CLI Usage | Wrapper Usage |
---|---|---|---|
Enable chat mode | -c , --chat |
--chat | See mode usage above. |
Enable text mode | -t , --text |
--text | See mode usage above. |
Enable vision mode | -v , --vision |
--vision | See mode usage above. |
Enable audio mode | -a , --audio |
--audio | See mode usage above. |
User prompt | -p , --prompt |
--prompt "Write a story about a magic backpack." | prompt="Write a story about a magic backpack." |
Media file path or url | -m , --media |
--prompt "Describe this media." --media "media_path_or_url" | prompt="Describe this media.", media="media_path_or_url" |
Enable streaming output | -s , --stream |
--stream | stream=True |
Enable json output | -js , --json |
--json | json=True |
API Key | -ak , --api_key |
--api_key "your_api_key" | api_key="your_api_key" |
Model name | -md , --model |
--model "gemini-1.0-pro-latest" | model="gemini-1.0-pro-latest" |
System prompt (instructions) | -sp , --system_prompt |
--system_prompt "You are a cat. Your name is Neko." | system_prompt="You are a cat. Your name is Neko." |
Maximum tokens to generate | -mt , --max_tokens |
--max_tokens 1024 | max_tokens=1024 |
Sampling temperature | -tm , --temperature |
--temperature 0.7 | temperature=0.7 |
Nucleus sampling threshold | -tp , --top_p |
--top_p 0.9 | top_p=0.9 |
Top-k sampling threshold | -tk , --top_k |
--top_k 40 | top_k=40 |
Number of candidates to generate | -cc , --candidate_count |
--candidate_count 1 | candidate_count=1 |
Stop sequences for completion | -ss , --stop_sequences |
--stop_sequences ["\n", "."] | stop_sequences=["\n", "."] |
Safety categories for filtering | -sc , --safety_categories |
--safety_categories ["HARM_CATEGORY_HARASSMENT"] | safety_categories=["HARM_CATEGORY_HARASSMENT"] |
Safety thresholds for filtering | -st , --safety_thresholds |
--safety_thresholds ["BLOCK_NONE"] | safety_thresholds=["BLOCK_NONE"] |
To exit the program at any time, you can type
exit
orquit
. This command works similarly whether you're interacting with the program via the CLI or through the Python wrapper ensuring that you can easily and safely conclude your work with the Gemini AI Toolkit without having to resort to interrupt signals or forcibly closing the terminal or command prompt.
Description | Model | Max Tokens |
---|---|---|
Gemini Pro 1.0 [Latest] | gemini-1.0-pro-latest |
2048 |
Gemini Pro 1.0 [Stable] | gemini-1.0-pro |
2048 |
Gemini Pro 1.0 [Stable] | gemini-1.0-pro-001 |
2048 |
Gemini Pro 1.0 Vision [Latest] | gemini-pro-vision |
4096 |
Gemini Pro 1.0 Vision [Stable] | gemini-1.0-pro-vision |
4096 |
Gemini Pro 1.5 (Preview) [Stable] | gemini-1.5-pro |
8192 |
Gemini Pro 1.5 (Preview) [Latest] | gemini-1.5-pro-latest |
8192 |
Gemini Flash 1.5 (Preview) [Stable] | gemini-1.5-flash |
8192 |
Gemini Flash 1.5 (Preview) [Latest] | gemini-1.5-flash-latest |
8192 |
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.