https://github.com/abacaj/awesome-transformers A curated list of awesome transformer models.
Awesome Transformers
A curated list of awesome transformer models.
If you want to contribute to this list, send a pull request or reach out to me on twitter: @abacaj . Let's make this list useful.
There are a number of models available that are not entirely open source (non-commercial, etc), this repository should serve to also make you aware of that. Tracking the original source/company of the model will help.
I would also eventually like to add model use cases. So it is easier for others to find the right one to fine-tune.
Format :
Model name: short description, usually from paper
Model link (usually huggingface or github)
Paper link
Source as company or group
Model license
Table of Contents
Encoder models
ALBERT: "A Lite" version of BERT
BERT: Bidirectional Encoder Representations from Transformers
DistilBERT: Distilled version of BERT smaller, faster, cheaper and lighter
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Electra: Pre-training Text Encoders as Discriminators Rather Than Generators
RoBERTa: Robustly Optimized BERT Pretraining Approach
Decoder models
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
LLaMa: Open and Efficient Foundation Language Models
GPT: Improving Language Understanding by Generative Pre-Training
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-J: A 6 Billion Parameter Autoregressive Language Model
GPT-NEO: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow
GPT-NEOX-20B: An Open-Source Autoregressive Language Model
NeMo Megatron-GPT: Megatron-GPT 20B is a transformer-based language model.
OPT: Open Pre-trained Transformer Language Models
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
GLM: An Open Bilingual Pre-Trained Model
YaLM: Pretrained language model with 100B parameters
Encoder+decoder (seq2seq) models
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
FLAN-T5: Scaling Instruction-Finetuned Language Models
Code-T5: Identifier-aware Unified Pre-trained Encoder-Decoder Models
for Code Understanding and Generation
Bart: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization
MT5: A Massively Multilingual Pre-trained Text-to-Text Transformer
UL2: Unifying Language Learning Paradigms
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Multimodal models
Donut: OCR-free Document Understanding Transformer
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Model
Paper
Microsoft
CC BY-NC-SA 4.0 (non-commercial)
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
CLIP: Learning Transferable Visual Models From Natural Language Supervision
Vision models
DiT: Self-supervised Pre-training for Document Image Transformer
DETR: End-to-End Object Detection with Transformers
EfficientFormer: Vision Transformers at MobileNet Speed
Audio models
Whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Recommendation models
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)