Tips & Tricks
In this module we will provide examples of common usecases when using the fast transformers library. We will be adding more examples as more utilities are implemented.
Mirrored networks
We call mirrored networks, networks that share the parameter instances but have different module implementations. The most common use case is to have mirrored batch and recurrent versions of the same transformer model in order to train with the batch version and evaluate using the recurrent version.
We provide the utility make_mirror(src_module, dst_module)
to automatically
set the source module parameters to the destination module.
from fast_transformer.builders import TransformerEncoderBuilder, \
RecurrentEncoderBuilder
from fast_transfomer.utils import make_mirror
params = dict(...)
transformer = TransformerEncoderBuilder.from_dictionary(params).get()
recurrent_transformer = RecurrentEncoderBuilder.from_dictionary(params).get()
make_mirror(transformer, recurrent_transformer)
# Now training transformer also changes the parameters of recurrent transformer
# and vice-versa.
Checkpointing
Checkpointing is important
when training large neural networks to allow for more layers to fit in a single
GPU. The default PyTorch method of checkpointing, only accepts tensors as
arguments which unfortunately excludes our self-attention and transformer
modules that expect BaseMask
objects for masking.
Under development
We are developing wrappers around the default checkpointing mechanisms that will allow users to checkpoint modules of their choosing or even checkpoint every transformer block in a transformer encoder or decoder.
Check back for details or check our github repository issue #21.