Educational Content Generation Prototype

Project Description

This Python prototype system is designed to generate and analyze educational content tailored to specific subjects and grade levels. It demonstrates the ability to:

Generate Contextually Relevant Content: Process user prompts to create coherent educational material using a pre-trained language model (Groq's llama3-8b-instant).
Refine Content for Clarity and Readability: Simplify the generated content to be grade-level appropriate and engaging using Google's Gemini (gemini-2.0-flash-exp).
Integrate Information Retrieval: Enhance content with relevant information from DuckDuckGo search to improve accuracy and context.
Evaluate Content Quality: Automatically analyze generated content for readability using various metrics from the textstat library.
Modular and Scalable Pipeline: Structure the system as a modular pipeline for content generation, refinement, and evaluation, making it extensible for future features.

This prototype is built to fulfill the requirements outlined in the Task Description for Granville Tech company.

Output Screenshot

Educational Content Output

Readability Analysis

Workflow Architecture

The system follows a modular pipeline:

Prompt Input: User provides input specifying grade level, subject, topic, and optional details.
Content Generation (Groq LLM): Uses Groq's llama-3.3-70b-versatile model to generate initial educational content based on the user prompt.
DuckDuckGo Search: Formulates a search query based on the generated content and uses DuckDuckGo Search to retrieve relevant web results for fact-checking and context enrichment.
Content Simplification (Gemini LLM): Employs Google's Gemini (gemini-2.0-flash-exp) to simplify the generated content for the target grade level and incorporate relevant information from search results. The output is structured as a JSON object.
Content Analysis: Analyzes the simplified content (JSON output) for readability using metrics from the textstat library (Flesch-Kincaid, SMOG, Coleman-Liau, ARI, Linsear Write, Dale-Chall).
JSON Output and Storage: Saves the simplified content in a structured JSON format to a file in the output_json directory.

Simplified Workflow Diagram:

User Input (Grade Level, Subject, Topic)
--> [Groq LLM - Content Generation]
--> [DuckDuckGo Search]
--> [Gemini LLM - Content Simplification & JSON Structuring]
--> [Textstat - Readability Analysis]
--> JSON Output File (output_json directory)

Setup and Installation

Clone the Repository:

git clone https://github.com/Kaos599/GranVille_Assignment.git
cd GranVille_Assignment

Install Python Dependencies: Ensure you have Python 3.8 or higher installed. Create a virtual environment (recommended):
```
python -m venv venv
source venv/bin/activate  # On Linux/macOS
venv\Scripts\activate  # On Windows
```
Install the required Python libraries using pip:
```
pip install -r requirements.txt
```
(Create a requirements.txt file in your repository root with the following content):
```
groq
duckduckgo-search
google-generativeai
textstat
python-dotenv
```
Set API Keys:
- Groq API Key: Obtain a Groq API key from Groq Console and set it as an environment variable named GROQ_API_KEY. You can use a .env file for this (example .env file in the repository root):
```
GROQ_API_KEY=YOUR_GROQ_API_KEY_HERE
```
- Gemini API Key: Obtain a Gemini API key from Google AI Studio and set it as an environment variable named GEMINI_API_KEY. Add this to your .env file as well:
```
GEMINI_API_KEY=YOUR_GEMINI_API_KEY_HERE
```
- Install python-dotenv: If you use a .env file, make sure you have python-dotenv installed (included in requirements.txt).

How to Run

Navigate to the project directory in your terminal.
Run the main script:
```
python educational_content_generator_json_output_analysis.py
```
(Ensure the script filename is correct if you named it differently)
Example Usage in educational_content_generator_json_output_analysis.py: The script includes example test inputs in the if __name__ == "__main__": block. You can modify these inputs or add more to test different subjects, grade levels, and topics.
Output:
- The script will print progress and analysis information to the console.
- Generated educational content in structured JSON format will be saved to files in the output_json directory.
- Analysis metrics (readability scores) for each generated JSON file will also be printed to the console.

Project Structure

GranVille_Assignment/
├── output_json/ # Directory to store generated JSON output files
├── educational_content_generator_json_output_analysis.py # Main Python script (or your script's name)
├── analyze_content_quality.py # Python script for content analysis
├── requirements.txt # Python dependencies
├── README.md # Project documentation (this file)
└── .env # (Optional) File to store API keys (not committed to Git)

Key Modules and Functions

educational_content_generator_json_output_analysis.py (Main Script):
- generate_educational_content_workflow(grade_level, subject, topic, topic_details): The main function that orchestrates the entire content generation, search, simplification, and analysis pipeline.
- call_groq_llm(prompt, system_message=None, model="llama3-8b-instant", temperature=0.1): Function to interact with the Groq LLM API for content generation.
- call_gemini_llm_structured(prompt, model_name="gemini-2.0-flash-exp", temperature=1.0): Function to interact with the Gemini LLM API for content simplification and structured JSON output.
- duckduckgo_search_func(query): Function to perform DuckDuckGo text searches.
analyze_content_quality.py (Analysis Script):
- analyze_educational_content_json(json_filepath): Analyzes a single JSON file for readability metrics using textstat.
- analyze_json_files_in_directory(directory_path="output_json"): Analyzes all JSON files in a directory and generates a summary report.

Metrics for Content Quality Evaluation

The system currently implements the following metrics for evaluating content quality:

Readability Scores (from textstat library):
- Flesch-Kincaid Grade Level
- SMOG Grade
- Coleman-Liau Index
- Automated Readability Index (ARI)
- Linsear Write Formula
- Dale-Chall Readability Score

These metrics provide an initial assessment of the linguistic clarity and grade-level appropriateness of the generated content.

Future Enhancements

Improved Bias Detection and Mitigation: Implement robust bias detection using libraries like Perspective API or Fairlearn, and integrate LLM-based bias mitigation into the workflow.
Enhanced Coherence and Contextual Alignment Analysis: Incorporate more advanced NLP techniques to automatically evaluate content coherence and contextual alignment with curriculum standards.
Curriculum Integration: Integrate curriculum data or knowledge bases to enable automated curriculum alignment checks and improve content relevance.
User Interface: Develop a user interface (e.g., using Langflow's UI features or a web framework) to make the system more user-friendly and accessible.
Fact-Checking Module: Enhance the fact-checking process by more systematically verifying factual claims against DuckDuckGo search results or dedicated fact-checking APIs.
Content Engagement Analysis: Implement metrics to assess the engagement potential of the content (e.g., sentiment analysis, linguistic feature analysis).

Project Status

This project is currently in the Prototype stage. It demonstrates the core functionality of educational content generation and analysis but is not yet a production-ready system. Future development is planned to enhance its features and robustness.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
output_json		output_json
.gitignore		.gitignore
README.md		README.md
analysis.py		analysis.py
main.py		main.py
requirements.txt		requirements.txt
tempCodeRunnerFile.py		tempCodeRunnerFile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Educational Content Generation Prototype

Project Description

Output Screenshot

Educational Content Output

Readability Analysis

Workflow Architecture

Setup and Installation

How to Run

Project Structure

Key Modules and Functions

Metrics for Content Quality Evaluation

Future Enhancements

Project Status

About

Releases

Packages

Languages

Kaos599/GranVille_Assignment

Folders and files

Latest commit

History

Repository files navigation

Educational Content Generation Prototype

Project Description

Output Screenshot

Educational Content Output

Readability Analysis

Workflow Architecture

Setup and Installation

How to Run

Project Structure

Key Modules and Functions

Metrics for Content Quality Evaluation

Future Enhancements

Project Status

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages