📄 Document Analyzer

A user-friendly document analysis tool built with Streamlit and Microsoft's MarkItDown technology. This application enables users to extract and analyze content from various document formats with optional GPT-4o enhancement for image descriptions.

✨ Features

Multi-Format Support: Analyze a wide range of document formats including PDF, PPTX, DOCX, XLSX, images, audio files, and more
GPT-4o Integration: Image descriptions using OpenAI's GPT-4o
Interactive UI: Simple Intuitive interface built with Streamlit
Export Functionality: Download extracted content in text format
Privacy-Focused: Temporary file handling with secure cleanup
Preview: View document extraction results

🚀 Getting Started

Prerequisites

Python 3.7+
OpenAI API key (optional, for GPT-4 enhancement)

Installation

Clone the repository:

git clone https://github.com/lesteroliver911/microsoft-markitdown-streamlit-ui.git
cd document-analyzer

Install required packages:

pip install -r requirements.txt

Set up environment variables:

# Create .env file
touch .env

# Add your OpenAI API key (optional)
echo "OPENAI_API_KEY=your_api_key_here" >> .env

Run the application:

streamlit run app.py

💻 Usage

Launch the application
Upload your document using the sidebar
Toggle GPT-4o enhancement if desired
View extracted content and document information in the respective tabs
Download the extracted content as needed

📋 Supported Formats

PDF documents
PowerPoint presentations (PPTX)
Word documents (DOCX)
Excel spreadsheets (XLSX)
Images (JPG, PNG) with EXIF data and OCR
Audio files (MP3, WAV) with EXIF data and transcription
HTML files
Text-based files (CSV, JSON, XML)

⚙️ Configuration

The application can be configured using environment variables or through the UI:

OPENAI_API_KEY: Your OpenAI API key for GPT-4 enhancement
Custom API key input available in the UI
Cache management with built-in clearing functionality

📝 License & MS Repo

This project is licensed under the MIT License - see the LICENSE file for details.

Orginal MS Markitdown repo: https://github.com/microsoft/markitdown

🙏 Acknowledgments

Microsoft MarkItDown technology
Streamlit framework
OpenAI GPT-4o (optional integration)

Made with ❤️ by Lester Oliver

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
misc		misc
.env.example		.env.example
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Document Analyzer

✨ Features

🚀 Getting Started

Prerequisites

Installation

💻 Usage

📋 Supported Formats

⚙️ Configuration

📝 License & MS Repo

🙏 Acknowledgments

About

Releases

Packages

Languages

License

JanSystemic/microsoft-markitdown-streamlit-ui

Folders and files

Latest commit

History

Repository files navigation

📄 Document Analyzer

✨ Features

🚀 Getting Started

Prerequisites

Installation

💻 Usage

📋 Supported Formats

⚙️ Configuration

📝 License & MS Repo

🙏 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages