# DeepASmRNA: attention-based convolutional neural network method with scalability and interpretability for predicting alternative splicing events from transcript sequences without a reference genome The sharp increase in the number of sequenced transcriptomes without reference genomes has empowered the investigation of AS events. We proposed an attention-based CNN model, DeepASmRNA, an accurate, scalable and biologically interpretable tool for predicting AS events using a transcriptome without a reference genome, which is a step towards investigating AS genome-wide in species without a reference genome. To our knowledge, DeepASmRNA is the first only dependence of primary sequences of mRNA transcripts for predicting AS at the genome-wide level. DeepASmRNA will greatly expand the studies of alternative splicing in species without a reference genome. Our method, DeepASmRNA, is composed of two parts. For the first part, we use all-versus-all BLASTN to identify alternatively spliced transcripts. For the second part, we use an attention-based convolution neural network (CNN) approach to classify 4 basic types of AS: exon skipping (ES), alternative acceptor site (AA), alternative donor site (AD) and intron retention (IR). Our model takes the primary sequence of transcripts without a reference genome as input and outputs the probability of ES, AA, AD and IR for each transcript pair. ##### the workflow of model ![img](workflow.png) ## Dependencies: 1. Python 3.5–3.8 2. TensorFlow >2.1 3. blastn 2.10.1 We strongly recommend using Anaconda to install all dependencies. You can simply install the dependencies by running following commands. ```bash conda create -n DeepASmRNA python=3.7 conda activate DeepASmRNA conda install tensorflow=2.1 conda install biopython ``` ## Installation: After testing all dependencies works well, you can git clone it into your working directory, and all executable file placed in bin/ ```bash git clone https://github.com/CMB-BNU/DeepASmRNA.git cd DeepASmRNA/bin chmod 777 identifier.sh echo "export PATH=`pwd`:\$PATH" >>~/.bashrc && source ~/.bashrc ``` ## Running: #### first way: for overall workflow ```bash sh identifier.sh transcript.fasta model ### For model, choosing from [arabidopsis, rice, human], arabidopsis or rice for plant, human for animal ``` #### second way: step by step #### 1). predict AS event For predict AS event: ```bash makeblastdb -in transcript.fasta -dbtype nucl ### make blast database blastn -query transcript.fasta -db transcript.fasta -strand plus -evalue 1E-10 -outfmt 5 -ungapped -num_threads 20 -out transcript.xml ### sequence alignment using blastn python3 predictAS.py transcript.xml transcriptas.txt >transcript.seq ### predict AS transcript pair ``` #### 2). classify AS event You can run AS classification model by ```bash python classifyAS.py transcript.seq \ ### input file name -m human \ ### optional, model for species, choosing from [arabidopsis, human, rice, fine_tune], default = human -o transcriptas_type.txt ### optional, output file name ``` Also you can use a very small dataset to enhance the performance of model, for example: ```bash python classfyAS.py PATH/TO/input_file \ ### input file name -m human \ ### optional, model for species, choosing from [arabidopsis, human, rice, fine_tune], default = human -o PATH/TO/input_file \ ### optional, output file name -ft PATH/TO/fine_tune_input_file ### optional, fine tune input file ``` Different from 'input' option, the 1st column of fine tune file should be the label of the sequence, choosing from (ES, AA, AD, IR). If you use -ft option to fine tune the model, the meaning of -m is the basic model you choose to be trained, the original model will not be changed after each time of fine tune learning. ## Output file : For transcript.fasta as input file Three output file were obtained: 1> transcriptas.txt about the AS events information which including position, identity, coverage, length 2> transcriptas.seq about the sequence of AS events which including the AS region and upstream 50 bp and downstream 50 bp 3> transcriptas_type.txt about the type of the AS event and its probability ## Running example You can run the example by run_example.sh ## Citation: Please cite: