Introduction
The quantity of biological information continues to increase each year, vastly exceeding the ability of scientists to assimilate it into their knowledge stores. Consequently, there is a disconnection between the information available and the information that is actually utilized. To remedy this situation, databases have been developed to curate biological information, for example, STRING: functional protein association networks (
Specifically, we describe a new database containing a comprehensive list of hypertonicity‐induced effects on proteins or mRNA, the information for which has been culled from the biological literature. The data are presented in the form of grammatical triplets consisting of <subject><verb phrase><object>. Because the database is about hypertonicity effects, all the subject terms are “hypertonicity.” The verb phrases are actions such as “increases phosphorylation of” or “decreases abundance of.” The objects are given as official gene symbols corresponding to the affected proteins. This triplet structure is used to allow the database to be read either directly by humans or by computational parsing programs. The latter is necessary to facilitate computational network building. An example of a similar database on a different topic, namely vasopressin actions in the kidney, is offered by Sanghi et al. ().
Methods
For our database, we used multiple sources. One source was the result of direct literature searching accomplished as part of the normal day‐to‐day research of the final two authors on this paper over the past two decades. The specific search terms were: “taurine transport” OR “sorbitol” OR “betaine” OR “glycerophosphocholine” OR “inositol” OR “hypertonicity” OR “hyperosmolality” OR “osmoregulation” OR “NTE” OR “TonEBP” OR “OREBP” OR “NFAT5” OR “gde2” OR “gdpd5” OR “osmostress” OR “osmotic stress” OR “smit.” In addition, we performed extensive additional PubMed searches prior to finalizing the database to insure that all relevant experiments in which “hypertonicity” was the independent variable were included. Search terms were: “osmotic,” “hyper,” “hypertonicity,” “osmostress,” “osmotic stress,” and “osmoregulation.” In each of our database entries the <subject> is “hypertonicity.” We included hypertonic effects due to elevated extracellular inorganic ions such as NaCl as well as organic solutes such as sorbitol and raffinose. Effects of urea are not included because most of its effects are due to its denaturing ability rather than effects on tonicity (Yancey et al. ). The list of database entries was supplemented by data from two proteomic data sets. The first was from a study of changes in protein abundance in the nucleus of HEK293 cells in response to hypertonicity (Li et al. ). The other was from a study of changes in phosphorylation of proteins in HEK293 cells in response to hypertonicity (Wang et al. ). To curate the data, we initially organized the details about each study in an electronic spreadsheet (Excel, Microsoft). We constructed each database entry to conform to English grammar syntax and to read from left to right as <subject><verb phrase><object><prepositional phrase><parenthetical information>. Next we used the spreadsheet to generate a Hyper Text Markup Language (HTML) file, as previously described in the Appendix of Sanghi et al. (). This was placed on a publicly accessible web server at
The database was analyzed in part by mapping the gene symbol list (<objects>) to either Gene Ontology descriptors or Protein Domains using the program Automated Bioinformatics Extractor (ABE,
Results
The Database of Osmoregulated Proteins in Mammalian Cells can be accessed at (
Figure describes some characteristics of the Database of Osmoregulated Proteins in Mammalian Cells. Figure A shows the frequency of each verb phrase. The most frequent entry is “increases phosphorylation of” with 426 entries. In Figure B, we show the frequency distribution of the experimental systems used. Figure C presents the frequency distribution of the gene symbols. NFAT5 (TonEBP/OREBP) was the most frequently referenced gene symbol by a wide margin. NFAT5 is an osmotically regulated transcription factor (Burg et al. ). The next most frequent were AKR1B1 (aldose reductase), MAPK14 (p38 kinase), and AQP2 (aquaporin‐2). (Note: The frequency of individual terms is given as a description of the database and does not necessarily equate with “importance”).
Characteristics of the Database of Osmoregulated Proteins in Mammalian Cells. (A) A pie chart showing the frequency of verb phrases or effects on target proteins such as changes in phosphorylation or abundance as found in the database. (B) A pie chart showing frequency of experimental system, often cell type, used in the studies cited in the database. (C) A bar graph of the most frequent target proteins, shown as gene symbols, found in the database.
Figure shows the most frequent protein domains found in proteins on the database. These data were extracted using the program Automated Bioinformatics Extractor (ABE,
A bar graph of the most frequent protein domains found in target proteins on the database using Automated Bioinformatics Extractor (ABE, http://helixweb.nih.gov/ESBL/ABE/).
Figure shows the most frequent Gene Ontology Molecular Function terms found in proteins on the database. These data also were extracted using the program ABE. The most common terms were “ATP binding,” “DNA binding,” “protein kinase binding,” and “protein kinase activity.”
A bar graph of the most common molecular functions found for target proteins following Gene Ontology analysis using Automated Bioinformatics Extractor (ABE, http://helixweb.nih.gov/ESBL/ABE/).
Each data entry was classified by the type of study: transcriptomics, proteomics, and reductionist. Figure is a Venn diagram showing the number of proteins in each category. Only three proteins were found to be present in all three categories. These proteins were AKR1B1 (aldose reductase), HSPA1A (heat shock 70 kDa protein 1A), and ATP1A1 (ATPase, Na+/K+ transporting, alpha 1 polypeptide). Aldose reductase is an enzyme that converts cell glucose to the organic osmolyte sorbitol (Flynn ). Heat shock 70 kDa is an abundant cytosolic chaperone, buffering the cell from protein folding abnormalities in response to tonicity changes (Borkan and Gullans ). Na+/K+ ATPase is an ATP‐dependent transporter that moves sodium out of the cell and potassium into the cell, energizing cell volume regulation (Orlov et al. ).
A Venn diagram showing the overlap in proteins among the three types of studies (transcriptomics, proteomics, and reductionist) that were incorporated into the database.
Discussion
Extraordinarily high extracellular osmolality is essential for the normal functioning of the renal medulla. Without elevated NaCl and urea concentrations, mammals could not concentrate urine and water balance would be compromised (Pannabecker ). Important questions previously addressed the effects of tonicity on the cells that comprise the renal medulla as well as the cells that pass through the medulla itself. Notably, high NaCl has major perturbing effects on cell function including cell death or apoptosis. However, cells also exhibit multiple mechanisms to adapt to hypertonicity. Overall, the effect of tonicity on cell function has been an active area of study for many years and a very large volume of literature has accumulated (Burg et al. ). Even in tissues not commonly considered to endure variation in tonicity, a modest hypertonic milieu may naturally occur. Such is the case for nucleus pulposus cells of the intervertebral disc (Tsai et al. ) and developing thymocytes (Trama et al. ).
Perhaps, in earlier days, an individual could keep track of and store all of the published data in their field. Then, it may have been possible to integrate all of the knowledge and form synthetic approaches to further investigations. More recently, with the advent of “omics‐” based approaches, the available data have proliferated hugely and we now have extraordinarily enhanced access to these data. At this point, the same task of gathering, assimilating, and synthesizing is beyond the capacity of the human brain. As an aid to investigators, we have developed the “Database of Osmoregulated Proteins in Mammalian Cells.” The database will be maintained online where it can be easily accessed, sorted, and viewed.
The “Database of Osmoregulated Proteins in Mammalian Cells” is a compilation of reported effects of hypertonicity on proteins and mRNAs. The format of the database is readable by humans directly or by computers (Evans and Rzhetsky ; Rebholz‐Schuhmann et al. ) using standard sentence‐parsing algorithms, for example, the Stanford Parser (
In each of our database entries the <subject> is “hypertonicity.” The database includes approximately 1600 entries. Since tonicity can be bidirectional, the database also might be used to query what might occur if tonicity were decreased, in essence, changing the subject to hypotonicity. For example, finding an entry that states that “Hypertonicity increases phosphorylation of MAP3K3 at S526 in HEK 293” could lead to the hypothesis that hypotonicity decreases phosphorylation at that serine. A change in phosphorylation can be associated with a change in activity. For example, in a study examining effect of an siRNA library against all known phosphatases on NFAT5 activity, there were 57 siRNAs that changed NFAT5 transcriptional activity (Zhou et al. ). At high NaCl, 31 increased activity indicating that the phosphatase was inhibitory, implying that phosphorylation, presumably by a kinase, was stimulatory. In comparison, 16 decreased activity. The phosphorylation could be directly on NFAT5 or on a signaling element in its activation pathway. The database can be queried for proteins that have a desired attribute with respect to change in tonicity, as per the following scenario. An investigator identifies a consensus phosphorylation site in a protein of interest, mutates it so that it cannot be phosphorylated, and finds the mutation increases activity of the protein with exposure to elevated NaCl. The consensus database provides for multiple kinases that could phosphorylate at that site, but which one to test? Perusing our database could narrow the search by identifying kinases whose activity or phosphorylation changes with hypertonicity.
The <verb phrases> or actions that appear in the database are effects on change in protein abundance or translocation, change in cleavage or shedding, change in protein activity or binding, change in acetylation, phosphorylation or ubiquitination, change in mRNA abundance, and change in mRNA transcription rate.
The <objects> represented 1017 genes out of more than 21,000 genes in the mammalian genome. Importantly, we used the official gene symbol of the protein to represent <object>. Proteins commonly have redundant or ambiguous names and usage is not consistent. The use of official gene symbols largely avoids the ambiguity, inaccuracy, and redundancy inherent in protein nomenclature.
In addition to providing a venue for human users to readily access and assimilate the published information, the database is intended to facilitate automated data extraction. The database structure of <subject>, <verb phrase>, <object> is a triplet that a parser algorithm, using syntactical context, can extract from text to identify relationships among individual words (Evans and Rzhetsky ; Rebholz‐Schuhmann et al. ).
In summary, we have provided an expert‐curated database of osmoregulatory responses at a molecular level that can be mined for experimental design, systems‐biology type modeling, or as a shortcut to literature review. The format for data presentation is such that human readers or computers can interpret each data entry as a simple sentence, thereby facilitating data acquisition. It is hoped the experts throughout the field of physiology will record their own knowledge bases in a similar way to preserve the information and to facilitate large‐scale data integration with other data sets.
Acknowledgments
C.R.G., M.A.K., M.B.B., and J.D.F. collected data, wrote, and reviewed the MS. The authors thank Dr. Chin‐Rang Yang for technical assistance.
Conflict of Interest
The authors have no conflict of interest to declare.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2014. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Biological information, even in highly specialized fields, is increasing at a volume that no single investigator can assimilate. The existence of this vast knowledge base creates the need for specialized computer databases to store and selectively sort the information. We have developed a manually curated database of the effects of hypertonicity on target proteins. Effects include changes in
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 Systems Biology Center, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA