Stars
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
PlasX, a machine learning classifier for identifying plasmid sequences based on genetic architecture
Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates
SCAPP is a plasmid assembly tool. This tool is described in our paper: https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-021-01068-z
Python script that downloads all pubmed abstracts corresponding to user-specified keyword searches, by performing automated NCBI E-utility queries
Large Language Model Text Generation Inference
Load Google's pre-trained Word2Vec model using gensim APIs
Parse multiple Antimicrobial Resistance Analysis Reports into a common data structure
Biological foundation modeling from molecular to genome scale
AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
CodonBert: a BERT-based architecture tailored for codon optimization using the cross-attention mechanism.
Genomic language model predicts protein co-regulation and function
Repository for mRNA Paper and CodonBERT publication.
A CLI tool for clustering with the Leiden algorithm
Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
CONJScan models - Models for detection of conjugative, decayed conjugative and mobilisable elements
LAVIS - A One-stop Library for Language-Vision Intelligence
PLSDB pipeline to collect bacterial plasmids from NCBI
Orin-beep / StrainScan
Forked from liaoherui/StrainScanHigh-resolution strain-level microbiome composition analysis tool based on reference genomes and k-mers
K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained
Evolutionary Scale Modeling (esm): Pretrained language models for proteins