Abstract

The segmentation of time series and genomic data is a common problem in computational biology. With increasingly complex measurement procedures individual data points are often not just numbers or simple vectors in which all components are of the same kind. Analysis methods that capitalize on slopes in a single real-valued data track or that make explicit use of the vectorial nature of the data are not applicable in such scenaria. We develop here a framework for segmentation in arbitrary data domains that only requires a minimal notion of similarity. Using unsupervised clustering of (a sample of) the input yields an approximate segmentation algorithm that is efficient enough for genome-wide applications. As a showcase application we segment a time-series of transcriptome sequencing data from budding yeast, in high temporal resolution over ca. 2.5 cycles of the short-period respiratory oscillation. The algorithm is used with a similarity measure focussing on periodic expression profiles across the metabolic cycle rather than coverage per time point.

Details

Title
Similarity-Based Segmentation of Multi-Dimensional Signals
Author
Machné, Rainer 1 ; Murray, Douglas B 2 ; Stadler, Peter F 3 

 Institute for Synthetic Microbiology, Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany; Department of Theoretical Chemistry of the University of Vienna, Vienna, Austria 
 Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, Japan 
 Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, Germany; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Department of Theoretical Chemistry of the University of Vienna, Vienna, Austria; Center for RNA in Technology and Health, Univ. Copenhagen, Frederiksberg C, Denmark; Santa Fe Institute, Santa Fe, NM, USA 
Pages
1-11
Publication year
2017
Publication date
Sep 2017
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1957776639
Copyright
© 2017. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.