Module fast_transformers.recurrent.attention.self_attention

Autoregressive implementations for self attention as a recurrent module.

The attention implementations in this module expect one input for query, one for key and one for value and attend to all the keys and values seen so far. No masking is necessary as an implicit lower triangular attention mask is assumed in all cases.

Example

import torch

from fast_transformers.recurrent.attention import         RecurrentAttentionLayer, RecurrentFullAttention

att = RecurrentAttentionLayer(RecurrentFullAttention(), 16, 4)
state = None
x = torch.rand(8, 16)
for i in range(10):
    x, state = att(x, x, x, state=state)
Expand source code
#
# Copyright (c) 2020 Idiap Research Institute, http://www.idiap.ch/
# Written by Angelos Katharopoulos <angelos.katharopoulos@idiap.ch>
#

"""Autoregressive implementations for self attention as a recurrent module.

The attention implementations in this module expect one input for query, one
for key and one for value and attend to all the keys and values seen so far. No
masking is necessary as an implicit lower triangular attention mask is assumed
in all cases.

Example
-------

    import torch

    from fast_transformers.recurrent.attention import \
        RecurrentAttentionLayer, RecurrentFullAttention

    att = RecurrentAttentionLayer(RecurrentFullAttention(), 16, 4)
    state = None
    x = torch.rand(8, 16)
    for i in range(10):
        x, state = att(x, x, x, state=state)
"""

from .attention_layer import RecurrentAttentionLayer
from .full_attention import RecurrentFullAttention
from .linear_attention import RecurrentLinearAttention

Sub-modules

fast_transformers.recurrent.attention.self_attention.attention_layer

Similar to the corresponding module in fast_transformers.attention, this module performs all the query, key, value projections and output projections …

fast_transformers.recurrent.attention.self_attention.full_attention

Implement the typical softmax attention as a recurrent module to speed up autoregressive inference. See fast_transformers.attention.full_attention .

fast_transformers.recurrent.attention.self_attention.linear_attention

Implement the causally masked linear attention as a recurrent model.