facebookresearch / xformers
- воскресенье, 27 ноября 2022 г. в 00:36:26
Hackable and optimized Transformers building blocks, supporting a composable construction.
xFormers is:
conda install xformers -c xformers/label/dev
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)
Memory-efficient MHA
Setup: A100 on f16, measured total time for a forward+backward pass
Note that this is exact attention, not an approximation, just by calling xformers.ops.memory_efficient_attention
More benchmarks
xFormers provides many components, and more benchmarks are available in BENCHMARKS.md.
This command will provide information on an xFormers installation, and what kernels are built/available:
python -m xformers.info
Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")
You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.
Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.
├── ops # Functional operators
└ ...
├── components # Parts zoo, any of which can be used directly
│ ├── attention
│ │ └ ... # all the supported attentions
│ ├── feedforward #
│ │ └ ... # all the supported feedforwards
│ ├── positional_embedding #
│ │ └ ... # all the supported positional embeddings
│ ├── activations.py #
│ └── multi_head_dispatch.py # (optional) multihead wrap
|
├── benchmarks
│ └ ... # A lot of benchmarks that you can use to test some parts
└── triton
└ ... # (optional) all the triton parts, requires triton + CUDA gpu
Local. Notably used in (and many others)
... add a new one see Contribution.md
This is completely optional, and will only occur when generating full models through xFormers, not when picking parts individually.
There are basically two initialization mechanisms exposed, but the user is free to initialize weights as he/she sees fit after the fact.
init_weights()
method, which define sane defaultsIf the second code path is being used (construct model through the model factory), we check that all the weights have been initialized, and possibly error out if it's not the case
(if you set xformers.factory.weight_init.__assert_if_not_initialized = True
)
Supported initialization schemes are:
One way to specify the init scheme is to set the config.weight_init
field to the matching enum value.
This could easily be extended, feel free to submit a PR !
module unload cuda; module load cuda/xx.x
, possibly also nvcc
TORCH_CUDA_ARCH_LIST
env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"
MAX_JOBS
(eg MAX_JOBS=2
)UnsatisfiableError
when installing with conda, make sure you have pytorch installed in your conda environment, and that your setup (pytorch version, cuda version, python version, OS) match an existing binary for xFormersxFormers has a BSD-style license, as found in the LICENSE file.
If you use xFormers in your publication, please cite it by using the following BibTeX entry.
@Misc{xFormers2022,
author = {Benjamin Lefaudeux and Francisco Massa and Diana Liskovich and Wenhan Xiong and Vittorio Caggiano and Sean Naren and Min Xu and Jieru Hu and Marta Tintore and Susan Zhang and Patrick Labatut and Daniel Haziza},
title = {xFormers: A modular and hackable Transformer modelling library},
howpublished = {\url{https://github.com/facebookresearch/xformers}},
year = {2022}
}
The following repositories are used in xFormers, either in close to original form or as an inspiration: