# Segmentation Free Inference of RNA Modifications from Direct RNA Signal Data
This is a PyTorch implementation of the methodology described in the article. Basecalling utilities are mostly adapted from Bonito https://github.com/nanoporetech/bonito/tree/master/bonito

There are two preprocessing scripts
1. `prepare_basecalling_training.py` to prepare datasets for training a basecaller
2. `prepare_m6araw_training_data.py` to prepare datasets for training m6araw

There are two training scripts
1. `train_basecalling.py` to train basecaller on dataset prepared by `prepare_basecalling_training.py`
2. `m6araw_utils/train.py` to train m6araw on dataset prepared by `prepare_m6araw_training_data.py`

The training and testing information is available in hct116_train_test_info.csv. The annotation for HCT116 is available as hct116_m6ace.csv.gz while the annotation for HEK293T is available as hek293t_m6ace.csv.gz and hek293t_miclip.csv.gz. We use the GRCh38 Ensembl annotations release version 91 with fasta file obtained by combining the coding and noncoding RNA reference annotations