# MLP-based architecture with variable length input for automatic speech recognition

Implementation of the paper "MLP-based architecture with variable length input for automatic speech recognition."

## Materials
`mlp-based-models/`: implementation of our proposed mlp-based architecture for speech recognition.     
`implementation-for-espnet/`: imprementation to experiment on espnet.    

## Usage
```python
import torch
from torch import nn

from cmlp import CMLPEncoder
# from tsmlp import TSMLPEncoder
# from fmlp import FMLPEncoder

cmlp = CMLPEncoder(
    elayers=18,
    adim=256,
    eunits=1024,
    act=nn.GELU(),
    act_in=nn.GELU(),
    act_out=nn.Identity(),
    attn_dim=0,
    causal=False,
    cmlp_type=1,
    kernel=15,
    dropout=0.1,
)

# tsmlp = TSMLPEncoder(
#     elayers=18,
#     adim=256,
#     eunits=1024,
#     act=nn.GELU(),
#     act_in=nn.GELU(),
#     act_out=nn.Identity(),
#     attn_dim=0,
#     causal=False,
#     shift_size=2,
#     dropout=0.1,
# )

# fmlp = FMLPEncoder(
#     elayers=18,
#     adim=256,
#     eunits=1024,
#     act=nn.GELU(),
#     act_in=nn.GELU(),
#     act_out=nn.Identity(),
#     attn_dim=0,
#     causal=False,
#     kernel=15,
#     dropout=0.1,
# )

x = torch.randint(0, 20000, (1, 100, 256), dtype=torch.float32)
logits = cmlp(x) # (1, 100, 256)
```

## Experiment on espnet
To experiment on espnet, put espnet under this directory.
```
git clone https://github.com/espnet/espnet
```
Then, run `setup.sh` to copy files in `implementation-for-espnet/` to appropriate directory.
```
bash setup.sh
```
