Case Study: Unsupervised methods for pathway analysis in Alzheimer patients

One of the challenges with supervised interpretation of -omics data analyses is the limitation of existing knowledge databases. An association is typically established by an “enrichment” score of a gene to a member of a known pathway. While this technique is one of the most prevalent among researchers in understanding findings from gene expression studies, it also prevents any significant findings of new associations.

To highlight the uses and benefits of an unsupervised approach, we present a case study where a public data set was analyzed using clustering of differentially expressed genes in Alzheimer patients.


Authors of publication “Genetic Control of Human Brian Transcript Expression in Alzheimer Disease” surveyed over 300 patients to determine differentially expressed genes in 176 Alzheimer patients vs. 188 controls. They hypothesized that genetic variation driving the gene expression was integral to the pathogenic aging and subsequent development of disease (1).

A novel unsupervised method developed at the Tauber Bioinformatics Research Center (2) was used to explore gene co-expression and select the network modules (clusters of co-expression) that differentiated between AD patients and control patients. All the expression values were clustered and “good” clusters were selected (using high T-Test values). One of the clusters that had significant expression profile differences between the two groups was cluster 40. Further analysis of gene functionality was done based on existing knowledge bases.

Utilizing DAVID and KEGG, 93 of the 138 genes identified from cluster 40 were mapped and annotated. DAVID Function annotation identified a gene group: zinc finger proteins and Krueppel-associated box with a significant enrichment score.  Interestingly, KRAB-Zinc finger proteins have been associated with cognitive impairment and the onset of Alzheimer’s (3,4).

This demonstrates that it’s possible to link heterogeneous sets of observations without a prior hypothesis. This mathematical model can be applied in any situation using gene expression and additional attributes. The use of unsupervised analysis in this case resulted in the identification of genes that have been indicated in playing a role in the onset of Alzheimer’s disease.  Utilizing both supervised and unsupervised methods can allow new benefits to understanding the complex networks of regulation associated with various conditions.

For more information or assistance with your research, please do not hesitate to reach out to us at:

  1. Webster, J. A., Gibbs, J. R., Clarke, J., Ray, M., Zhang, W., Holmans, P., … Myers, A. J. (2009). Genetic Control of Human Brain Transcript Expression in Alzheimer Disease. American Journal of Human Genetics, 84(4), 445–458.
  2. A Novel Unsupervised Method to Identify Genes Important in the Anti-viral Response: Application to Interferon/Ribavirin in Hepatitis C Patients Brodsky LI, Wahed AS, Li J, Tavis JE, Tsukahara T, et al. (2007) A Novel Unsupervised Method to Identify Genes Important in the Anti-viral Response: Application to Interferon/Ribavirin in Hepatitis C Patients. PLoS ONE 2(7): e584.
  3. Shulman JM, Chibnik LB, Aubin C, Schneider JA, Bennett DA, De Jager PL. Intermediate Phenotypes Identify Divergent Pathways to Alzheimer’s Disease. Domschke K, ed. PLoS ONE. 2010;5(6):e11244.
  4. Gower-Winter SD, Levenson CW. Zinc in the central nervous system: From molecules to behavior. BioFactors (Oxford, England). 2012;38(3):186-193.