Skip to content
View paddy0914's full-sized avatar

Block or report paddy0914

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

ESC-50: Dataset for Environmental Sound Classification

Python 1,454 291 Updated Mar 20, 2024

CLIP-based aesthetics predictor inspired by the interface of 🤗 huggingface transformers.

Python 33 Updated Jun 14, 2024

Modelscope-Sora挑战赛第五名参赛方案

Python 10 1 Updated Sep 12, 2024

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

Python 348 28 Updated Feb 21, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,398 93 Updated Aug 13, 2024

Convert PDF to markdown + JSON quickly with high accuracy

Python 19,280 1,150 Updated Jan 15, 2025

Taming Stable Diffusion for Lip Sync!

Python 1,807 204 Updated Jan 15, 2025

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

Python 30 6 Updated Nov 10, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,557 97 Updated Jan 14, 2025

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Python 904 125 Updated Apr 12, 2024

智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能,语言无界”

Jupyter Notebook 2,011 190 Updated Aug 23, 2024

State-of-the-Art Text Embeddings

Python 15,765 2,525 Updated Jan 10, 2025

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

Python 244 23 Updated Mar 20, 2024

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Python 2,575 241 Updated Jan 16, 2025

Contrastive Language-Audio Pretraining

Python 1,500 149 Updated Nov 21, 2024

Tiny RDM (Tiny Redis Desktop Manager) - A modern, colorful, super lightweight Redis GUI client for Mac, Windows, and Linux.

Vue 9,600 481 Updated Jan 8, 2025

Minimal keyword extraction with BERT

Python 3,656 357 Updated Jul 16, 2024

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Python 3,286 255 Updated Oct 18, 2024

Interpreting and Analyzing CLIP's Zero-Shot Image Classification via Mutual Knowledge, NeurIPS 2024

Jupyter Notebook 8 1 Updated Dec 5, 2024

Python SDK for Milvus.

Python 1,068 339 Updated Jan 16, 2025

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 31,843 3,000 Updated Jan 16, 2025

The GUI for Milvus

TypeScript 1,459 133 Updated Jan 14, 2025

A Gradio web UI for Large Language Models with support for multiple inference backends.

Python 41,620 5,421 Updated Jan 15, 2025

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,211 257 Updated Jan 11, 2025

A free, open source, multi-platform SQLite database manager.

C 5,612 601 Updated Jan 16, 2025

微信公众号文章批量下载工具,支持图片、评论下载,支持保存html/mhtml/md/pdf/docx文件

HTML 3,653 394 Updated Jan 15, 2025

The Places365-CNNs for Scene Classification

Python 1,945 537 Updated Jul 16, 2020

AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection - CVPR NAS 2023

Python 122 14 Updated Apr 18, 2023

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Python 1,504 126 Updated Dec 24, 2024

Open Source Computer Vision Library

C++ 80,098 55,938 Updated Jan 15, 2025
Next