Skip to content

A starter template for building multimodal applications using FastAPI and Google GenAI SDK

License

Notifications You must be signed in to change notification settings

capybara-brain346/my-genai-starter-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Genai Starter Template

A comprehensive FastAPI starter template for working with Google's Gemini AI models, supporting text, image, and audio processing.

Features

  • Text Processing

    • Text generation with prompts
    • Structured chat conversations
    • Context-aware text generation
  • Image Processing

    • Single image analysis
    • Multiple image analysis
    • Image comparison
  • Audio Processing

    • Audio transcription
    • Content analysis
    • Audio summarization

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/google-genai-starter-template.git
cd google-genai-starter-template
  1. Install dependencies:
pip install -r requirements.txt
  1. Install additional audio dependencies:
# For Ubuntu/Debian
sudo apt-get install ffmpeg

# For macOS
brew install ffmpeg

# For Windows
# Download ffmpeg from https://ffmpeg.org/download.html
  1. Set up environment variables:
export GOOGLE_API_KEY="your_api_key_here"

Usage

  1. Start the server:
uvicorn app.main:app --reload
  1. Access the API documentation:
  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

Text Processing

POST /api/text/generate
POST /api/text/chat
POST /api/text/context

Image Processing

POST /api/image/analyze
POST /api/image/analyze-multiple
POST /api/image/compare

Audio Processing

POST /api/audio/transcribe
POST /api/audio/analyze
POST /api/audio/summarize

Example Requests

Text Generation

import requests

response = requests.post(
    "http://localhost:8000/api/text/generate",
    json={
        "prompt": "Write a short story about AI",
        "temperature": 0.7
    }
)
print(response.json())

Image Analysis

import requests

files = {
    'image': open('image.jpg', 'rb'),
    'prompt': (None, 'Describe this image'),
    'temperature': (None, '0.7')
}

response = requests.post(
    "http://localhost:8000/api/image/analyze",
    files=files
)
print(response.json())

Audio Processing

import requests

files = {
    'audio_file': open('audio.mp3', 'rb'),
    'language': (None, 'en-US')
}

response = requests.post(
    "http://localhost:8000/api/audio/transcribe",
    files=files
)
print(response.json())

Dependencies

  • FastAPI
  • google-generativeai
  • Pillow
  • SpeechRecognition
  • pydub
  • python-multipart
  • uvicorn

Development

  1. Install development dependencies:
pip install pytest black isort flake8
  1. Run tests:
pytest
  1. Format code:
black .
isort .

Contributing

  1. Fork the repository
  2. Create your feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a new Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support, please open an issue in the GitHub repository.

About

A starter template for building multimodal applications using FastAPI and Google GenAI SDK

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published