Dashboards: T-BioInfo Server
Differential Gene Expression
Identification of differentially expressed genes involves the identification of genes that are differentially expressed in disease. In pharmaceutical and clinical research, DEGs can be valuable to pinpoint candidate biomarkers, therapeutic targets, and gene signatures for diagnostics.
Based on the embryological evidence for genomic equivalence (and on bacterial models of gene regulation), a consensus emerged in the 1960s that cells differentiate through differential gene expression.
To identify differentially expressed genes between two conditions, it is important to find statistical distributional properties of the data to approximate the nature of differential genes. There are different methods for differential expression analysis such as edgeR and DESeq based on negative binomial (NB) distributions.
Metagenomics is the study of microbial communities in their original communities. Approaches to study these communities allow for various levels of resolution and functional annotation. Metagenomics sequencing includes amplicon, whole metagenome, metatranscriptome and metaproteome/metabolome resolution. Most commonly used 16S rRNA sequencing can be used to study microbiome composition and identify over-represented species of microorganisms. QIIME2 is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data.
Also, microbial studies utilizing DADA2 provide high-resolution accurately reconstructed amplicon sequences that improve the detection of sample diversity and biological variants.
Single Cell RNA-Seq
Single Cell RNA-Seq Data is being generated at unprecedented rates due to increasingly affordable and easy-to-use technologies like 10x Genomics.
Moreover, increasing accuracy, algorithms for integration that eliminate batch effects, and user-friendly tools make single cell experiments a valuable addition to research projects.
There are several ways scRNA-Seq data can be summarized after processing. This includes UMAP or tSNE plots, summary statistics for cell types, and marker genes that are characteristic of identified clusters. Many of these vary based on processing and integration steps as well as contain important insights into data reliability for interpretation.
Along with UMAP/tSNE plots, summary statistics, and marker genes. Single Cell Data Analysis on the T-Bioinfo Server also provides heatmaps and ridge plots highlighting differentially expressed genes for cells in each cluster compared to all other cluster cells. Not only this, but you can also visualize the expression levels through violin plots that are being generated to see the expression pattern across the clusters and feature plots for different lattice plots.
Finally integrating Celldex for human and mouse data researchers can obtain a collection of reference expression datasets. There’s also a possibility to manually annotate cell types for the clusters obtained based on the maker genes present. Interested to know more and get your data analyzed, feel free to contact us.
The standard approach to processing scRNA-Seq data has several inputs like barcodes, count matrix and features (genes). The data has to be “processed” – meaning it should be 1) Cleaned up, 2) Normalized, 3) Aanalyzed for highly variable features and adjusted for cell cycle heterogeneity. These methods have some variability, but they are made available through commonly used and updated packages like seurat (https://satijalab.org/seurat/index.html).
The T-BioInfo platform allows users to build intuitive pipelines on a user-friendly interface that simplifies the process and eliminates the need for large amounts of RAM and processing power that single cell RNA-Seq data ultimately requires. In addition, the pipeline builder stores all variables in a reference file to make completed pipelines easier to reproduce
The unsupervised analysis section provides several methods for clustering and visualization of unlabeled data. These algorithms are designed for data exploration through revealing groups of similar samples (clusters) without information of existing groups. Using these methods, unknown classes inherent in the data samples can be revealed by analyzing their similarities based on chosen input features.
Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. Therefore, PCA analysis plots gene profiles on a 2D plane, generating a visual graph where it is easy to see groups that are associated. Components are the underlying structure in the data.
K-means is one of the most widely used data clustering algorithms. In k-means we take a number of clusters k as an input parameter and randomly select k initial “cluster centroids” in the feature space. Then each data point is attributed to the cluster associated with the nearest centroid. After that, the position of each cluster centroid is updated to the mean of its assigned data points. After changing the position of the cluster centroids, some data points may be assigned to a new nearest cluster centroid, and the next iteration begins. The process continues until convergence (no data points are reassigned) or reaching a predefined stop condition (e.g., reaching maximum number of iterations). Note that k-means starts with random initialization, and thus different runs on the same data with the same parameters may give different results. This facet should be kept in mind and multiple runs of k-means should be performed with subsequent analysis of consistency of resulting clustering.
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modifications, it can also be used to accelerate k-means clustering and Gaussian mixture modeling with the expectation–maximization algorithm.
An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources (memory and time constraints). In most cases, BIRCH only requires a single scan of the database.