Project-based learning is perhaps the most effective way to gain both an understanding of and practical experience in a technical subject. A growing number of public domain datasets is making it possible to conduct bioinformatic analysis of real data without having access to a lab or a sequencer. This is especially true for RNA-seq. The Pine Biotech team has been working on several publication-based educational projects, in which a portion of the original raw data is extracted and reduced so that it can be analyzed by students or professionals who have never run an RNA-seq. Using our beta version of the T-BioInfo platform, a simple RNA-seq can be run in just under an hour, allowing students to experience the power of machine learning for bioinformatics data analysis within the timespan of a single workshop.

Our goal at Pine Biotech is to make big data bioinformatic analysis easier for non-bioinformatician biologists.  Our main means of doing this is the development of our multi-omics analysis platform,T-BioInfo. Our second and related area is educational activities related to big data analysis using the T-BioInfo platform.

Increased and improved data collection, especially high-throughput data, has driven effective and personalized diagnostics and treatment. While today’s massive and exponentially growing body of data provides an unprecedented level of biomedical detail, the generated datasets are huge, heterogeneous, full of artifacts, and very complex. Realization of the potential of these resources – whether in basic science research, translational research, biotech, or clinical practice – requires practical education in the technological skills needed to harness them. Such education must go beyond theory to provide a practical understanding of the tools, approaches, logic, expected outcomes, and applications related to interpretation of such datasets.

We recently began to ask our contacts ( how important it was for them personally to be able to analyze and interpret omics data. While the survey is ongoing, a wide range of people have already responded – professors and Ph.D. students as well as healthcare and biotech professionals.

While informatics literacy has increased substantially among students, researchers, and clinicians, most are not ready to use code line interface and/or are not inclined to invest the significant time needed to learn this skill. In addition, while solutions using a graphic user interface have started to appear, most of these require a good understanding of input/output and configuration of algorithms as well as the logic of constructing pipelines of algorithms. Further, working with big data requires a foundation in statistics and machine learning as well as substantial computational resources, which are expensive and require hardware expertise to assemble. All these factors hinder the usefulness of available public domain datasets and tools, limit the active adopters to those who have access to resources, and delay the adoption of big data for use in biomedical practice.

This is especially apparent farther away from established clusters of advanced universities and high-tech centers, located disproportionately in the US Northeast and California. Such institutions are supported by large grants and higher incomes, and therefore are already well-positioned in terms of both economic and human resources. In contrast, more isolated academic communities often struggle with a lack of the experience, skills, and resources needed to conduct technologically advanced research. This is also true of many countries outside the US.

To address this gap, we decided to develop a set of practical, modular courses in ‘omics data analysis based on public domain projects and our user-friendly, web-based bioinformatics analysis platform. The goal is to use our GUI-based platform to skip the complexities of coding and in-depth theory; instead, students are given a basic overview of the essential biological and informatics concepts and then jump right into practice. Toward that end, we have begun development of a series of hands-on workshops and online courses, with all of the data accessible online.

