Abstract

Recent developments in high throughput biology have enabled the systematic exploration of the relation between genomic variants and phenotypes. The immense amount of data generated from the high throughput experiments, however, poses challenges to researchers. New statistical and computational approaches are desired to use the data efficiently to draw biological meaningful conclusions. In this thesis research, we developed new methods to take advantage of high-throughput biological data to tackle important problems including analyzing genome-wide association studies between human genomic variants and human phenotypes, finding co-complexed proteins from protein interaction networks and estimating the false-positive and false negative rates of two-hybrid protein-protein interaction screens. We also present a database designed to compile and perform preliminary analyses of yeast histone systematic mutations.

The new gene-based association test that we have developed has improved power compared to previous methods because it merges multiple weak associations within a gene into a stronger combined signal. Application of the new approach to ECG traits recovered two more genome-wide significant loci, in addition to the four genome-wide significant loci identified by traditional methods. The two new findings were validated in a meta-analysis using a larger population. Protein complexes are basic functional units in biological processes. Finding proteins that reside in the same complex can provide important information for understanding disease mechanisms. We reviewed current methods and proposed new methods to find co-complex proteins from 'seed' proteins using confidence-weighted protein physical interaction networks. We systematically evaluated all approaches and explored the effects of different confidence metrics on their performances.

To provide information to improve the protein physical interaction network, we extended capture-recapture theory to estimate protein-specific false-positive and false-negative rates in yeast two-hybrid screens. Analysis of yeast, worm and fly protein-protein interaction data indicated that 25% to 45% of the reported interactions are likely false positives. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical under-sampling.

Histones are the basic protein components of nucleosomes. They are among the most conserved proteins and are subject to a plethora of post-translational modifications. We designed a database for histone systematic mutations. This database combines histone phenotypes with information about sequences, structures, post-translational modifications and evolutionary conservation. Preliminary analyses confirm that mutations at highly conserved residues and modifiable residues are more likely to generate phenotypes.

Details

Title
Computational approaches to study the relation between genomic variations and phenotypes
Author
Huang, Hailiang
Year
2012
Publisher
ProQuest Dissertations & Theses
ISBN
978-1-267-61402-5
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
1116402064
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.