Skip to content
View amazarashi's full-sized avatar

Highlights

  • Pro

Block or report amazarashi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 886 26 Updated Oct 5, 2024

Fast and differentiable MS-SSIM and SSIM for pytorch.

Python 1,062 127 Updated Mar 12, 2024

A Collection of Variational Autoencoders (VAE) in PyTorch.

Python 6,517 1,058 Updated Jun 13, 2024

A Python library for the Docker Engine API

Python 6,789 1,668 Updated Sep 30, 2024

🚀🚀 Revisiting Binary Local Image Description for Resource Limited Devices

C 156 26 Updated Oct 21, 2023
Python 116 3 Updated Jun 23, 2024

Make drawing and labeling bounding boxes easy as cake

Python 389 30 Updated Aug 29, 2024
Jupyter Notebook 50 1 Updated Jul 14, 2024

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 695 39 Updated Sep 24, 2024

The Official PyTorch Implementation of "LSGM: Score-based Generative Modeling in Latent Space" (NeurIPS 2021)

Python 348 49 Updated Dec 2, 2021

MOMENT: A Family of Open Time-series Foundation Models

TypeScript 324 52 Updated Oct 3, 2024

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Python 4,487 333 Updated Jul 10, 2024

Understand Human Behavior to Align True Needs

Python 3,316 292 Updated Jul 20, 2024

Figma clone with NextJS 14, TypeScript, Liveblocks, Fabric.js, Tailwind CSS, Shadcn UI.

TypeScript 5 1 Updated Feb 23, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,335 85 Updated Sep 23, 2024

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 541 59 Updated Oct 4, 2024

Control Any Computer Using LLMs

Python 474 32 Updated Jul 22, 2024

Academic paper management with Notion

Python 9 4 Updated Jan 30, 2024

Open-Set Grounded Text-to-Image Generation

Python 1,983 148 Updated Mar 6, 2024

Grounded Language-Image Pre-training

Python 2,177 191 Updated Jan 24, 2024

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.

Python 396 62 Updated Jun 25, 2024

Yet another SAM webui + CLIP

TypeScript 241 29 Updated Mar 26, 2024

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]

Python 1,240 47 Updated Oct 2, 2024

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper

Python 11,418 795 Updated Oct 2, 2024

The official Meta Llama 3 GitHub site

Python 26,476 2,993 Updated Aug 12, 2024

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 749 37 Updated Jun 2, 2024

official repository for the Instance Prototype Contrastive Learning (IPCL)

Python 15 5 Updated Jun 20, 2022

A natural language interface for computers

Python 52,482 4,628 Updated Sep 26, 2024

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 4,915 533 Updated Aug 8, 2024
Next