Identification of differentially expressed genes involves the identification of genes that are differentially expressed in disease. In pharmaceutical and clinical research, DEGs can be valuable to pinpoint candidate biomarkers, therapeutic targets, and gene signatures for diagnostics

The three postulates of differential gene expression are as follows: 1.) Every cell nucleus contains the complete genome established in the fertilized egg. In molecular terms, the DNAs of all differentiated cells are identical. 2.) The unused genes in differentiated cells are not destroyed or mutated, and they retain the potential for being expressed. 3.) Only a small percentage of the genome is expressed in each cell, and a portion of the RNA synthesized in the cell is specific for that cell type. (Reference:https://www.ncbi.nlm.nih.gov/books/NBK10061/)

Processing Differential Expression Analysis

Pipelines Server

RNA-seq is increasingly favored for high-throughput expression analysis & modern microarray. The RNA-seq platforms produce expression values that are highly correlated and each possesses its own technical advantages. Investigating the differences between diseased patients & healthy states guides us to understand the pathology of diseases and, eventually, treat them. One particular focus of investigation is differentially-expressed genes (DEGs), which involves the identification of genes that are differentially expressed in diseased patients. In pharmaceutical and clinical research, DEGs can be valuable to pinpoint candidate biomarkers, therapeutic targets and gene signatures for diagnostics. The raw expression data requires processing and quality assessment, resulting expression values conveying a relative rather than an absolute measure. If data follows normal distribution, we can use various types of T-test method (Welch method, Wilcoxon, etc); if data is non-normalized (i.e. Raw read count, read count values), we can apply DeSeq, Deseq2, EdgeR, etc. Thus, the analysis of microarray expression is usually restricted to identifying expression values with largest change between samples or that change beyond a certain statistically-significant threshold or a fixed fold-change threshold.

Processing Differential Expression and Pathway Analysis Pipeline

To run the differential expression analysis pipeline, it is necessary that the data must be read count values/raw counts, must contain the gene ids/symbols with no duplicate genes in the data and should have no blank line at the end of the file. There are different methods for differential expression analysis such as edgeR & DESeq. The DESeq2 package is designed for normalization, visualization, & differential analysis of high- dimensional count data. It makes use of empirical Bayes techniques to estimate priors for log fold change and dispersion, & to calculate posterior estimates for these quantities. It requires setting up parameters for count filter, volcano plot threshold & database to be referred (human or mouse).


Enrichment Analysis & GSEA

pathway analysis

To understand the biological implications of the significant genes, we will perform Enrichment Analysis & GSEA (Gene Set Enrichment Analysis) analysis. Based on the enrichment analysis, we will understand what are important pathways in which our significant genes are enriched. Similarly, we will learn what the important Gene Ontology (GO) terms are significantly associated with the set of significant genes. Users can optimize these parameters. But, to consider most significant genes and significantly enriched terms and pathways 0.05 is the key threshold.

Results obtained from Differential Expression and Pathway Analysis Pipeline:

Differential expression pipeline on the T-Bioinfo Server yields several outputs as listed below:

  • Deseq_all.txt :  Lists out differential expressed genes in the Tabular format with following mentioned values,
  • Fold change:
    This value is typically reported in logarithmic scale (base 2)
  • P-value: Indicates whether the gene analyzed is likely to be differentially expressed in that comparison. 
  • Adjusted  p-value (or, corrected for multiple genes testing): The p-value obtained for each gene above is re-calculated to correct for running many statistical tests (as many as the number of genes). In the result, we can say that all genes with adjusted p-value < 0.05 are significantly differentially expressed in these two samples.
  • Volcano plot : Visual Representation of differential expressed genes
  • Gene Enrichment plot: Genes associated with which pathways, biological processes or terms, etc. 
  • GSEA plots: You can find details regarding each plot in the next section.
    Kegg pathways: Enriched pathways, i.e. which pathways are activated or suppressed in samples of a group
  • Upset plot:  Emphasizes on the gene overlapping among different gene sets.
  • Ridgeplot: Distribution of core enriched genes for GSEA enriched categories.
  • CNet plot for GO terms : Network of genes enriched in the GO terms. 
  • Activated & Suppressed Enriched GO terms: Dot plot for GO terms enrichment representing activated or suppressed in samples of a group
  • Heatmap: Each row represents a gene and each column represents a sample. The color & intensity of the boxes represents different gene expression.

Get Started with Your Project Today

T-BioInfo Server

Research License

$ 99
  • User-friendly Interface
  • Cloud HPC Resources
  • Reproducible Workflows

Support Service

Free Consultation

$ 55
Per Sample
  • Experiment & Analysis Planning
  • Pipeline Modification for Best Results
  • Custom Analysis and Troubleshooting

Submit your proposal today

Fill out the form below and our team will reach out to you in the next 24 hours

Scroll to Top