Bulk RNA-Seq Analysis
RNA-Seq technology provides insights into how cells & tissues function by measuring the levels of gene expression. Since all normal cells within an organism possess the same genome, the differences in cell identities and functions are determined by gene expression. Bulk RNA-Seq (whole transcriptome sequencing) experiments produce a view of gene expression of an entire sample. However, they do not differentiate among cell types within the sample, rather they give a view of gene expression within a whole organ or tissue type. This method has been instrumental in the development of many single-cell RNA sequencing methods.
Bulk RNA-Seq data analyses consist of the following key steps:
- Quality check and preprocessing of raw sequence reads: PCR Clean & Trimmomatic
- Mapping reads to a reference genome or transcriptome: Bowtie-2
- Counting reads mapped to individual genes or transcripts: RSemExpTable
- Identification of differential expression: DeSeq2
The T-BioInfo platform allows users to build intuitive pipelines on a user-friendly interface that simplifies the process and eliminates the need for large amounts of RAM and processing power. In addition, the pipeline builder stores all variables in a reference file to make completed pipelines easier to reproduce.
Downstream Analysis: Transcriptomic Data
To run the pipeline, the data needs to be in either of the format types: fastQ, fasta, SAM_BAM, SimulateFQ or SimulateFA with single-end or paired end reads. RNA-Seq analysis pipeline starts with a job called “Start” that compiles user selected data input options into a series of tags and generates the correct pipeline options, reducing the number of possible algorithms to the ones that can handle the input data.
Following the flow of the algorithms mentioned here:
Start > PCR Clean (cleans all duplicate reads from raw sequencing data) > Trimmomatic (cleans raw sequencing reads from technical adapters) > Bowtie-2t (fast alignment algorithm that is based on the “seed” (or k-mer) approach) > RsemExpTable (quantifies transcript abundances based on the alignment file) > DeSeq2 (carries differential analysis of high- dimensional count data.), a researcher can create the pipeline using the graphical interface. Some buttons will open a parameters dialog box. After selecting all the desired options, select the “end” button to give the pipeline a name, upload data and run the pipeline.
Bulk RNA-Seq Data Analysis and Visualization
After the pipeline has completed its processing, we will obtain a list of output files that could be downloaded to carry out statistical analysis and interpret biological insights.
– Multi Qc Report & FastQC that provides a simple way to carry out quality control checks on raw sequence data coming from high throughput sequencing pipelines.
– Gene expression table: Gene expression values for isoforms of each gene in all the samples.
– Plots: Heatmap, Volcano plot, IFC plot and MA Plot: These plots enable quick visual identification of genes with large fold changes that are also statistically significant. These may be the most biologically significant genes.
Differentially Expressed Genes: Gene expression of each gene in all the samples and statistical tests performed on gene expression values of each gene to list out differentially expressed genes along with their statistical significance. The Deseq outputs includes:
- “FPKM_DeSeq2.txt” that contains the differentially expressed gene ids (along with their tested statistical values) with respect to FPKM (fragments per kilobase of exon per million mapped fragments) counts
- “expression_genes_not_filtered_DeSeq2.txt” (with 0 values)
- “expression_isoforms_DeSeq2.txt” that yields the differentially expressed gene ids for each isoform in the dataset.
Get Started with Your Project Today
Cloud HPC Resources
Experiment & Analysis Planning
Pipeline Modification for Best Results
Custom Analysis and Troubleshooting