Content area
Full Text
C O R R E S P O N D E N C E
A cross-platform toolkit for mass spectrometry and proteomics
npg 2012 Nature America, Inc. All rights reserved.
To the Editor:Mass spectrometrybased proteomicshas become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biologicaland clinical samples1, identify pathways affected by endogenous and exogenous perturbations2 and characterize protein complexes3. Despite successes, the interpretation of vast proteomics datasets remains a challenge. There havebeen several calls for improvements and standardization of proteomics data analysis frameworks, as well as for an application-programming interface for proteomics data access4,5. In response, we present here the ProteoWizard Toolkit, a robust set of open-source, software libraries and applications designed to facilitate proteomics research. The libraries implement the first-ever, noncommercial, unified data access interface for proteomics, bridging field-standard open formats and all common vendor formats. In addition, diverse software classes enable rapid development of vendor-agnostic proteomics software. Additionally, ProteoWizard projectsand applications, building upon the core libraries, are becoming standard tools for enabling significant proteomics inquiries.
Historically, the development of proteomics software tools has been hindered by three factors: first, developers must develop readers and writers for the numerous file formats used for holding mass spectrometry data and analysis results, which range from vendor-specific mass spectrometry data formats to software applicationspecific formats; second, developers must implement numerous common, but critical algorithms (e.g., protein digestion, mass computation,peak integration, charge-state detection and isotope deconvolution), which is both time-consuming and error-prone; and third, comparison and validation of analysis algorithms is complicated by the vast diversity of possible workflows. Together, these three impediments create a significant bottleneck in the development of new proteomics software applications. Beyond slowing the pace of proteomics software development, these impediments havealso hampered the field of proteomics by interfering in the meaningful comparison,
sharing and exchange of data analyses obtained on different platforms or by different laboratories.
Efforts to mitigate these issues led initially to the development of several open interchange formats6,7 and a seriesof software tools that extracted data from vendor formats into open formats. The majority of mass spectrometry vendors also now provide approaches to export their data to open formats. Although this is an important step forward, both the academic...