Stars
clownrat6 / VideoLLaMA2
Forked from DAMO-NLP-SG/VideoLLaMA2VideoLLaMA 2: Improving Video-LLMs with Convolutional Spatial-Temporal Aggregation and Stronger Audio Capability
[cvpr2023] implementation of out-of-candidate rectification methods
This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
[CVPR 2025] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
clownrat6 / mmsegmentation
Forked from open-mmlab/mmsegmentationOpenMMLab Semantic Segmentation Toolbox and Benchmark.
2021 Fall Computer Vision (Jian Zhang)
The implementation of VectorNet. Done and Lose
Space Invaders game implemented with VHDL
Implementation of YOLO v3 object detector in Tensorflow (TF-Slim)
AlexeyAB / darknet
Forked from pjreddie/darknetYOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
This is a robot project for television live. System will tracking the host's face, making the face in the middle of the screen. Main algorithm is Yolov3, trained on WIDER FACE and tested on FDDB be…
LabelImgTool is a graphical image annotation tool which supports CLS,DET and SEG(semantic&instance )
Complete YOLO v3 TensorFlow implementation. Support training on your own dataset.
General code to convert a trained keras model into an inference tensorflow model
Deep learning-based Face detection using the YOLOv3 algorithm (https://github.com/sthanhng/yoloface)
🚀 😏 Near Real Time CPU Face detection using deep learning
(WARNING: This repository is NO LONGER maintained ) Real time face detection and recognition base on opencv/tensorflow/mtcnn/facenet