The builders module takes care of simplifying the construction of transformer networks. The following example showcases how simple it is to create a transformer encoder using the TransformerEncoderBuilder.
import torch # Building without a builder from fast_transformers.transformers import TransformerEncoder, \ TransformerEncoderLayer from fast_transformers.attention import AttentionLayer, FullAttention bert = TransformerEncoder( [ TransformerEncoderLayer( AttentionLayer(FullAttention(), 768, 12), 768, 12, activation="gelu" ) for l in range(12) ], norm_layer=torch.nn.LayerNorm(768) ) # Building with a builder from fast_transformers.builders import TransformerEncoderBuilder bert = TransformerEncoderBuilder.from_kwargs( attention_type="full", n_layers=12, n_heads=12, feed_forward_dimensions=768*4, query_dimensions=768, activation="gelu" )
Although it seems that the creation of a transformer is as simple with and
without the builder, it becomes apparent that changing the creation logic with
the builder is significantly easier. For instance, the
attention_type can be
read from a configuration file or from command line arguments.
The rest of this page describes the API of the builders.
The interface for all the builders is a simple method
get() without any
arguments that returns a PyTorch module that implements a transformer.
All the parameters of the builders are simple python properties that can be set after the creation of the builder object.
builder = ... # create a builder builder.parameter = value # set a parameter builder.other_parameter = other_value # and another parameter transformer = builder.get() # construct the transformer builder.parameter = changed_value # change a parameter other_transformer = builder.get() # construct another transformer
The BaseBuilder provides helper static methods that make it simpler to set multiple builder arguments at once from configuration files or command line arguments.
Construct a builder and set all the parameters in the dictionary. If
is set to True then throw a ValueError in case a dictionary key does not
correspond to a builder parameter.
Construct a builder and set all the keyword arguments as builder parameters.
Construct a builder from an argument list returned by the python argparse
strict is set to True then throw a ValueError in case an argument
does not correspond to a builder parameter.
There exist the following transformer builders for creating encoder and decoder architectures for inference and training:
- TransformerEncoderBuilder builds instances of TransformerEncoder
- TransformerDecoderBuilder builds instances of TransformerDecoder
- RecurrentEncoderBuilder builds instances of RecurrentTransformerEncoder
- RecurrentDecoderBuilder builds instances of RecurrentTransformerDecoder
Attention builders simplify the construction of the various attention modules and allow for plugin-like extension mechanisms when creating new attention implementations.
Their API is the same as the transformer builders, namely they accept
attributes as parameters and then calling
get(attention_type: str) constructs
nn.Module that implements an attention layer.
from fast_transformers.builders import AttentionBuilder builder = AttentionBuilder.from_kwargs( attention_dropout=0.1, # used by softmax attention softmax_temp=1., # used by softmax attention feature_map=lambda x: (x>0).float() * x # used by linear ) softmax = builder.get("full") linear = builder.get("linear")
The library provides the following attention builders that create the correspondingly named attention modules.
The attention builders allow for attention composition through a simple
convention of the
attention_type parameter. Attention composition allows the
creation of an attention layer that accepts one or more attention layers as a
parameters. An example of this pattern is the ConditionalFullAttention that
performs full softmax attention when the sequence length is small and delegates
to another attention type when the sequence length becomes large.
The following example code creates an attention layer that uses improved clustered attention for sequences larger than 512 elements and full softmax attention otherwise.
builder = AttentionBuilder.from_kwargs( attention_dropout=0.1, # used by all softmax_temp=0.125, topk=32, # used by improved clustered clusters=256, bits=32, length_limit=512 # used by conditional attention ) attention = builder.get("conditional-full:improved-clustered")
Attention layers that are designed for composition cannot be used
standalone. For instance
conditional-full is not a valid
attention type by itsself.
The attention builders allow the dynamic registering of attention implementations through an attention registry. There are three registries, one for each available builder. You can find plenty of usage examples in the provided attention implementations (e.g. FullAttention).
This should only concern developers of new attention implementations and a simple example can be found in the custom attention layer section of the docs.