Exponential growth of data in biomedicine needs a skilled and experienced workforce

A wealth of public domain biomedical data is available to anyone online, however, data analysis technology can only be utilized by those trained to work with it, leaving researchers without the proper skills and tools drowning in the overwhelming amount of information. It might sound intimidating, but in reality, this is GOOD NEWS!

Data science is a fast-growing branch of the technology landscape, and with IT and analytic skills highly sought after by employers, data scientists pull in hefty salaries for their expertise. There is an urgent need for quick, cost-effective, and accurate data analysis in the field of biomedical research as well. This need is driven by the continually expanding wealth of patient data, coupled with the need to synthesize and understand the data’s importance in medicine.  One of the most exciting areas where new discoveries are being made all the time is in the molecular data field. The datasets there are huge, but every new project brings us closer to understanding complex diseases, evolution and human longevity.

Today, biomedical data is typically analyzed by a trained bioinformatician – a tech-era mash-up of “biologist” and “statistician” who breaks down and interprets large sets of medical data in a clinical or research setting. But for the most part, bioinformaticians have to go through rigorous training in linux, python, R, databases, visualization and an endless list of new algorithms that are complex and require deep understanding to use. Just to get to a meaningful project might take years depending on how well you are versed in all of these technologies. All of these skills essentially turn you into a computer scientist first and then place you under a biologist’s or clinicians’ authority to guide “real research”.

But more and more biologists, clinicians and even patients want to play a role in these discoveries. At some point, you had to know Fortran to use a computer – that is until a mac came out or Windows with their visual interfaces making it a household item. And then kids could play with the infinite scripts (games) that leveraged this complex technology. Well, that’s exactly what we would like to do – let’s make bioinformatics easy, visual and intuitive. We think we can do it together.

To start, we’ve put together a number of projects that one can analyze using our visual bioinformatics platform. Here are a few examples:

“PDX Models: Tumor-stroma interaction” inspired by the publication, Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumors and stromal specific biomarkers by Dr. James Bradford. The project’s approach focused on comparing several different breast cancer types using RNA-seq and machine learning methods including 79 PDX mouse models with human primary tumors. PDX models maintain more similarities to the parental tumors than a traditional cell line does. Subsequently, alternative analysis of experimental data provided deeper insight into the problem and identified new biologically meaningful group-wise associations between tumor and stroma genes.

“Cell Lines: multi-omics network of associations” rests on the publication of Expression Profiling of Macrophages Reveals Multiple Populations with Distinct Biological Roles in an Immunocompetent Orthotopic Model of Lung Cancer. The approach focused on the molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. This project included both RNA-seq, mutations, and IC50 drug values. During analysis, biassociation was utilized with the expression and mutation variant results as well as drugs (GI50) to find relationships between the datasets.

Explore these datasets online, and learn more about our bioinformatics platform T-BioInfo, at http://edu.t-bio.info/projects/