Content area
Full text
About the Authors:
Hung-Chia Chen
Affiliations Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America, Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
Wen Zou
Affiliation: Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America
Tzu-Pin Lu
Affiliations Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America, Department of Public Health, Graduate Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
James J. Chen
* E-mail: [email protected]
Affiliations Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas, United States of America, Graduate Institute of Biostatistics and Biostatistics Center, China Medical University, Taichung, Taiwan
Introduction
Recent advances in biotechnology have generated great interest in the development of statistical methods and data mining techniques to analyze massive amounts of biological and medical data for understanding biological processes, discovering new species, or identifying new biomarkers for safety assessment, disease diagnostics and prognostics, and prediction of treatment response, etc. For example, metagenomics utilizes DNA sequence data to detect and identify representative species in environmental and clinically relevant samples and to discover genes or organisms with novel or useful functional properties [1]–[4].
In clinical treatment, patients are heterogeneous due to differences in genetic pre-dispositions, lifestyle, and disease characteristics. Personalized medicine utilizes genomic predictors of target patient population for assignment of more effective therapies to ensure safety and avoid adverse events or unnecessary treatment [5], [6]. A main goal is to develop a procedure that can classify patients into subgroups representing different disease characteristics or different responses to a specific treatment. For example, acute lymphoblastic leukemia (ALL) is a heterogeneous disease, including several subtypes (T-ALL, E2A-PBX1, BCR-ABL, TEL-AML1, MLL) differing in their response to chemotherapy [7]–[9]. Identifying important leukemia subtypes to accurately assign patients to specific risk/treatment groups is a difficult and expensive process, requiring the combined expertise of hematologist/oncologist, pathologist, and cytogeneticist [9].
In food safety surveillance, serotyping of pathogen strains is usually the first important step for identification and characterization of Salmonella isolates in outbreak investigations. However, standard...