For the past year or so, Big Data Analysis Tbio has been under development and testing. Taking on showcase projects, we’ve been trying our bioinformatics tools on public domain and collaboration projects. These are our conclusions and updates:
- RNA-seq Challenges:
– Application of the BS segmentation algorithm for unsupervised and accurate identification of the expressed genomic regions – regions that are not based on GTF annotation (the method is also useful for poorly annotated genomes).
– Clustering-based Analysis of transcriptomic and genomic repeatome: detection of new repeats/transposons and their expression level when DNA methylation is damaged as in cancer, and abundance of repeats in raw genomic data.
– An approach to differentiation of expressed genomic regions (coding and non-coding) using Factor Regression Analysis.
- Epigenetics pipelines: accurate detection of CHiP-Seq enriched fragments, newly developed special algorithms for Bisulfate DNA methylation data, and interplay of these regions in epigenetic regulation of the genome.
- Virology: methods for high precision detection of mutations; uses for characterizing mutation fitness and and detecting virus quasi-species (haplotypes).
- Small Molecules: screening against libraries of small molecules, identification of pharmacophores.
- Integration of heterogenous omics datasets: different types of omics data describing the same biological process (set of biological samples). Networks of mutual regulation of omics players. Clustering algorithms and network modules of co-expression associated with biological conditions.
Our collaborators have been providing great feedback so far and we are hoping to continue taking on new challenging projects soon! Please contact us for more information on what bioinformatics and interpretation tools we can offer.