Many researchers are trying to elucidate mechanisms of disease progression using information that can be derived from a combination of data derived from genotypes (genetic information), phenotypes (e.g. clinical manifestations of diseases), and demographic information. In the Phenotype Discoverer (PhenDisco) project, we seek to organize the phenotype data deposited in the database for Genotypes and Phenotypes (dbGaP), a resource supported by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine, National Institutes of Health (NIH). We are particularly interested in structuring and computing with phenotype variables related to studies that are relevant to the National Heart Lung and Blood Institute (NHLBI).
Overview of the PhenDisco system. We structure phenotypes in dbGaP using custom built software that retrofits existing data sets into an information model and allows users to use free text to query dbGaP for studies of interest. The semantic-driven Genotypes and Phenotypes (sdGaP) resource will soon be made publicly available to allow further developments in this area.
Biomedical research can be greatly enhanced by analysis of genetic and clinical data. If researchers are able to easily access these data, the investments made to generate them to answer a handful of questions can quickly multiply to allow researchers to answer thousands of new questions by re-analysis and re-use of the data.