Highlights
- Pro
Lists (9)
Sort Name ascending (A-Z)
bio-related
tools for working with biological data (eg., animal & plant images or labels)πBookmarks
Interesting tools and suchgeneral tools
useful/interesting toolsImageomics Projects
π My stack
π§° tools to explore
training repos
vector databases
Stars
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
An open source implementation of CLIP.
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
Refine high-quality datasets and visual AI models
Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
Automatically visualize your pandas dataframe via a single print! π π‘
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Track emissions from Compute and recommend ways to reduce their impact on the environment.
Access large archives as a filesystem efficiently, e.g., TAR, RAR, ZIP, GZ, BZ2, XZ, ZSTD archives
Lightning fast data version control system for structured and unstructured machine learning datasets. We aim to make versioning datasets as easy as versioning code.
Creative interactive views of any dataset.
Croissant is a high-level format for machine learning datasets that brings together four rich layers.
A deep learning framework for multi-animal pose tracking.
a toolkit for pose estimation using deep learning
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).
Humane command line arguments parser. Now with maintenance, typehints, and complete test coverage.
This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].
Command line (CLI) tool to inspect Apache Parquet files on the go
Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.
Open source, scalable software for the analysis of bioacoustic recordings
Simplify camera trap image analysis with ML species recognition models based around the MegaDetector model
Command line program to validate and convert CITATION.cff files.
Python package that simplifies using the BioCLIP foundation model.