One of the challenges with supervised interpretation of -omics data analyses is the limitation of existing knowledge databases. An association is typically established by an “enrichment” score of a gene to a member of a known pathway. While this technique is one of the most prevalent among researchers in understanding findings from gene expression studies, it also prevents any significant findings of new associations.
To highlight the uses and benefits of an unsupervised approach, we present a case study where a public data set was analyzed using clustering of differentially expressed genes in Alzheimer patients.
Case Study: Unsupervised method for pathway analysis in Alzheimer patients from Jaclyn Williams
Authors of publication “Genetic Control of Human Brian Transcript Expression in Alzheimer Disease” surveyed over 300 patients to determine differentially expressed genes in 176 Alzheimer patients vs. 188 controls. They hypothesized that genetic variation driving the gene expression was integral to the pathogenic aging and subsequent development of disease (1).
A novel unsupervised method developed at the Tauber Bioinformatics Research Center (2) was used to explore gene co-expression and select the network modules (clusters of co-expression) that differentiated between AD patients and control patients. All the expression values were clustered and “good” clusters were selected (using high T-Test values). One of the clusters that had significant expression profile differences between the two groups was cluster 40. Further analysis of gene functionality was done based on existing knowledge bases.
Utilizing DAVID and KEGG, 93 of the 138 genes identified from cluster 40 were mapped and annotated. DAVID Function annotation identified a gene group: zinc finger proteins and Krueppel-associated box with a significant enrichment score. Interestingly, KRAB-Zinc finger proteins have been associated with cognitive impairment and the onset of Alzheimer’s (3,4).
This demonstrates that it’s possible to link heterogeneous sets of observations without a prior hypothesis. This mathematical model can be applied in any situation using gene expression and additional attributes. The use of unsupervised analysis in this case resulted in the identification of genes that have been indicated in playing a role in the onset of Alzheimer’s disease. Utilizing both supervised and unsupervised methods can allow new benefits to understanding the complex networks of regulation associated with various conditions.