
Starred repositories
Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
基于transformer的ocr识别,在公章(印章识别, seal recognition)拓展应用
[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
No fortress, purely open ground. OpenManus is Coming.
A Conversational Speech Generation Model
img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing
Open-sourced code for "HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit".
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Pippo: High-Resolution Multi-View Humans from a Single Image
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
[NeurIPS 2024] Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation
Document Rectification and Illumination Correction using a Patch-based CNN
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Official Code for MotionCtrl [SIGGRAPH 2024]
An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。
Official implementation of "DepthLab: From Partial to Complete"
Non-rigid iterative closest point, nricp.
MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.
Code to accompany "A Method for Animating Children's Drawings of the Human Figure"
Python tool for converting files and office documents to Markdown.