GENOMIC DATA ANALYSIS

The innovation of next-generation sequencing (NGS) technologies has enabled exponential growth of the production of high throughput omics data which is widely analyzed for identification of genomic variants including single nucleotide polymorphisms (SNPs) and DNA insertions and deletions (indels) in a spectrum of genetic-related disorders, and provide new insights into how genetic polymorphisms affect disease phenotypes. To facilitate this research, the T-Bioinfo platform hosts the variant calling pipeline that has been developed to enable researchers to accurately and rapidly identify, and annotate, sequence variants. Variant calling refers to the task of identifying possible variations in genome or transcriptome sequences with respect to a chosen reference sequence. In germline variant calling, the reference sequence is the standard for the species of interest. For somatic variant calling, the reference is the genome of a chosen control somatic cell sample. “The variant calling pipeline identifies single nucleotide variants present within the whole genome and exome data. The variants are identified by comparing the datasets of an individual with a reference sequence”.

Processing Genomic Data Analysis pipeline

Genomics

Genomic Data Analysis pipeline follows a series of algorithms to report the variants observed in the samples. These algorithms include: Bowtie2 which takes reads in .fq/.fa files, aligns these onto the reference genome and gives mapping results in SAM format. Each row of the SAM file contains input read name, reference read name, position on the reference read & number of mapped/skipped/inserted/deleted positions, Visualization where Users can view genome annotations against a reference “ruler,” with an overhead bar giving a visual indication of chromosome position in JBrowse Visualization and Variant Calling Algorithms: Freebayes which uses short-read alignments (BAM files)  for any number of individuals from a population and a reference genome (in FASTA format) to determine the most-likely combination of genotypes for the population at each position in the reference.

 It reports positions which it finds putatively polymorphic in variant call file (VCF) format and Mutect2, developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes. muTect attempts to call mutations; it also generates a coverage file (in a wiggle file format, which indicates for every base whether it is sufficiently covered in the tumor and normal to be sensitive enough to call mutations). We currently use cutoffs of at least 14 reads in the tumor and at least 8 in the normal (these cutoffs are applied after removing noisy reads in the preprocessing step).

Outputs Obtained:

The Genomic Data Analysis on the T-Bioinfo Server integrates JBrowse for visualizing the variants observed in the patient  samples, JBrowse 2 is a pluggable open-source platform for visualizing and integrating biological data. At its core, it is a genome browser, but it has also been built as an extensible platform to enable visualization of all kinds of biological data.

The results obtained by running the pipeline, also includes theMapping Statistics” table highlighting the “Overall alignment rate” of the reads on the reference genome. 

In the JBrowse, we can visualize three different tracks, 

  • Reference sequence (GRCh38NoPatch)-  Includes the GRCh38 reference sequence on which the reads are aligned for variant calling
  • GRCh38NoPatch.gtf.sorted.gff- Represents genes and transcripts in GFF format  
  • Mutect.vcf –A VCF file contains meta-information lines, a header line, and then data lines each containing information about a position in the genome.
genomics pipeline

Get Started with Your Project Today

T-BioInfo Server

Research License

$ 99
Monthly
  • User-friendly Interface
  • Cloud HPC Resources
  • Reproducible Workflows

Support Service

Free Consultation

$ 75
Per Sample
  • Experiment & Analysis Planning
  • Pipeline Modification for Best Results
  • Custom Analysis and Troubleshooting

Submit your proposal today

Fill out the form below and our team will reach out to you in the next 24 hours

Learn more about Bioinformatics Expert Services: Pine Biotech- Services

Scroll to Top