from the Center for Clinical Genetics at the Hadassah University Medical Center (Jerusalem, Israel), whereby gene annotation summaries were added to the GeneALaCart repertoire upon her specific request. These, along with other GeneALaCart fields, such as genomic location and disease relationships, are used for genetic counselling.
Omics integration
GeneCards strives to consolidate a complete human gene compendium and to create an annotation network for connecting genes. One could traverse this web to integrate various omics data via its gene-centric framework in order to understand underlying complex patterns. This is exemplified by work in the context of the EU consortium, SysKid (http://www.syskid.eu/), which has 25 participating groups from 16 countries. The strategic aim of the consortium is the use of systems biology to enable novel chronic kidney disease (CKD) diagnosis and treatment. GeneCards is being used as a consortium tool in ways that far transcend its local utilisation by the Lancet group. Different types of CKD-related omics data have been collected, such as transcriptome (including microRNA expression), proteome, metabolome and SNP associations with genes. GeneCards assists in finding genes and pathways related to such data, so as to implicate them in the disease and help to develop new methods of diagnosis and treatments. A crucial component in this process is Set Distiller, part of the GeneDecks suite member of GeneCards. Set Distiller is an analysis tool that ranks descriptors by their degree of sharing within a given gene set [14]. In a pilot study, six metabolites suggested by consortium members as strong candidate CKD biomarkers were analysed. This resulted in the finding of shared descriptors between the genes for each metabolite, thus ranking the relevance of the metabolites for the kidney disease [29]. This capacity is now being augmented by a weighting algorithm to prioritise the metabolite-related gene sets.
The consortium has established a GeneKid database, moulded after the GeneCards design, to hold the omics information as it arrives from consortium members. The GeneKid database consists of 18 tables that hold omics data as the main entities, together with the study and samples from which they originated. An essential aspect of creating an integrated omics network is linking each of the GeneKid's omics data entries to a human gene, thereby 'symbolising' (ie finding the correct official HUGO nomenclature committee symbols) for all annotations through one shared entity. This is often a non-trivial task due to the heterogeneity and non-uniqueness of the gene identifiers provided by the experimental laboratories. An especially challenging relevant task is associating genes with cellular metabolites, an important aspect of the SysKid effort. There is scant gene-association information for many metabolites, therefore, a requirement arose to enhance GeneCards' capacities in this respect. This is an example of the two-way interaction often occurring between users and GeneCards developers. As a result, two new compound-gene association sources have just been added to GeneCards (Version 3.06) in the drugs and compounds section (Figure 1). These are The Human Metabolome Database (HMDB)[30] and DrugBank,[31] bioinformatics and cheminformatics resources that combine information about drugs and their targets. [ Table Omitted - see PDF ]
A literature mining of papers of 17 omics studies related to CKD[29] assisted in benchmarking the GeneKid pipeline (Figure 2). Additional benchmarking was performed based on a list of 26 initial biomarkers (22 proteins, two peptides, one autoantibody and one nucleotide), prioritised by the extent of relevant gene annotations, using the GeneCards database for obtaining diseases, compounds and pathway relationships. When the experimental identifier could not easily be associated with a gene, an exhaustive effort was made using any available identifier, such as probeset, protein or SNP identifier, again highlighting the power of GeneCards' integration. A consortium user interface was constructed, enabling basic services such as browsing the GeneKid database by study, sample and experiment information to allow the 25 collaborating groups to obtain access to interim results. This capacity is strongly dependent on GeneCards' concepts and architecture. One of the key features assisting the consortium is the information within GeneCards about products such as antibodies and silencing RNA kits affiliated with specific genes of interest. These help to expedite the execution of relevant SysKid experiments, and in the development of proprietary diagnostic tools. This applies particularly to a shortlist of seven candidate CKD genes which are now being tested. Such use exemplifies the power of the products feature within GeneCards. Notably, ~15 per cent of all users who browse GeneCards use one or more of these links. [ Table Omitted - see PDF ]
Ongoing GeneCards expansions
Animal models
The afore-mentioned SYNLETexample of transferring experimental knowledge from one organism, namely yeast, to another (human) has emphasised the need for additional annotations derived from various model organisms to our human-centric database. This importantly includes enrichment with orthologues from species not yet covered, by adding to the current sources (eg HomoloGene[32] and others) additional orthologues from Ensembl,[33] thus increasing gene orthologue mapping. One model organism for which integration work has begun is zebrafish (Danio rerio), because of its importance as a model for human disease and drug discovery [34]. A major aim is to obtain additional information about phenotypes that can be incorporated in GeneCards' function section. This will be followed by other animal models, such as Caenorhabditis elegans and Drosophila melanogaster. Some product links to rat animal models have recently been added, with more species and products planned.
Tissue proteomics profiling
Several studies have found a moderate-to-weak correlation between the expression levels of protein and mRNA for a given tissue [35-37]. These may be attributed to experimental imprecision or biological origin, such as post-transcriptional regulation [36]. For years, GeneCards has displayed mRNA expression levels for different normal and cancerous human tissues, obtained from both inhouse and external microarray experiments [9]. Due to the above considerations, we have now decided to complement such data with a pilot quantitative tissue proteomics display in GeneCards' protein section. This was done via a collaboration with E. Kolker and colleagues at Seattle Children's Hospital (Seattle, WA, USA), who have created a database for protein expression for a total of nine normal tissues, as well as cancerous cell lines and body fluids, based on published mass spectrometry experiments. The total number of genes covered by this dataset is about 8,000, but most of them have coverage for a relatively small fraction of the nine tissue-related sample types (Figure 3). We intend collaboratively to broaden these data by seeking additional sample types for which similar information is available, as well as to integrate more than one source of certain tissues. This addition will allow users to compare transcriptome and proteome expression patterns for numerous genes. [ Table Omitted - see PDF ]
RNA genes
A major challenge of the post-genome era is to obtain a truly comprehensive list of all human genes. This is hard to achieve for obvious reasons, including ambiguities in gene identification within genomic sequences. One of the most important expansion targets is ncRNA genes. GeneCards currently mines a total of 14,315 such genes (Version 3.06) and their annotations from Ensembl (including the ncRNA subsection), HGNC,[38] the National Center for Biotechnology Information (NCBI)'s Entrez Gene and miRBase [39]. An immediate goal is to begin mining and integration of several of the numerous RNA gene databases, each providing partial information about the RNA gene universe. One target is to include new RNA gene types such as lncRNA, piRNA and snoRNA [38]. Another is to introduce some of the following new sources: fRNADB,[40] NONCODE,[41] RNAdb[42] and/or RFAM [43].
Gene and protein identifier mapping
Many interesting biological and bioinformatics applications require the integration of data from various sources, and have taken advantage of the rich annotation within GeneCards to facilitate the translation of identifiers (including symbols, aliases and database-specific identifications) and annotations (eg location on the chromosome via the GeneLoc algorithm[4]), from one system to another. Examples include combining microarray data with pathway (as done in the SYNLET project), and/or disease databases, matching names and descriptions used in the literature with official gene symbols; developing GeneAnnot-based custom CDFs;[20] and associating gene symbols with vendor products. We intend strongly to enhance this central GeneCards' capacity, with clear examples of a need for symbol management and integration for RNA genes, and for gene-to-protein identifier mapping in an upcoming effort to add proteome expression summaries for human tissues, in collaboration with E. Kolker.
Online analytical processing (OLAP)
OLAP is a designated tool for sifting through data and quickly locating trends that are worthy of further scrutiny [44]. This functionality is currently used most widely for decision support in financial management, but also can be of great benefit for biological and pharmaceutical researchers. The classical OLAP model of multi-dimensional data separates facts (records) into dimensions and measures, where the measure is the value obtained in the coordinates determined by the dimensions, and queries are made only on the latter. Applying the OLAP model to biological annotation data is not trivial, since the queries are made on both the measure (eg how many genes participate in the cell cycle pathway) and the dimensions (eg how many pathways are related to genes on chromosome 11), but this hurdle may be overcome, as reported in OLAP models for geographical data [45, 46]. Another aspect involved in OLAP development is devising biological visualisation methods that will make querying and analysing results an intuitive process. We intend to employ one such OLAP technology,[47] namely the Mondrian system (http://mondrian.pentaho.com/), to enable traversals over annotations and navigations through the vast amounts of data from omics experiments.
Conclusion
The human genome project is currently at a stage where huge amounts of inter-individual comparative data are becoming available. An example is the new capacity, afforded by next-generation DNA sequencing, for performing whole-exome or whole-genome analyses of hundreds of human individuals. This data avalanche is at present partly addressed by the GeneCards variation section. The synergy between GeneCards integrative architecture and multi-source mining, and user base feedback mechanisms, enhances the probability of GeneCards' continuously being an informative genome annotation and research tool.
HGNC. [http://www.genenames.org/]
Entrez gene. [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]
Ensembl. [http://www.ensembl.org/index.html]
Rosen N, Chalifa-Caspi V, Shmueli O, Adato A, et al: GeneLoc: Exon-based integration of human genome maps. Bioinformatics. 2003, 19 (Suppl 1): i222-i224. 10.1093/bioinformatics/btg1030.Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, et al: Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.Google Scholar
Bult CJ, Eppig JT, Kadin JA, Richardson JE, et al: The Mouse Genome Database (MGD): Mouse biology and model systems. Nucleic Acids Res. 2008, 36: D724-D728.Google Scholar
Chalifa-Caspi V, Yanai I, Ophir R, Rosen N, et al: GeneAnnot: Comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes. Bioinformatics. 2004, 20: 1457-1458. 10.1093/bioinformatics/bth081.Google Scholar
Consortium TU: The Universal Protein Resource (UniProt). Nucleic Acids Res. 2008, 36: D190-D195. 10.1093/nar/gkn141.Google Scholar
Su AI, Wiltshire T, Batalov S, Lapp H, et al: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067. 10.1073/pnas.0400782101.Google Scholar
Safran M, Dalah I, Alexander J, Rosen N, et al: GeneCards Version 3: The human gene integrator. Database (Oxford). 2010, 2010: baq020-Google Scholar
Solr. [http://lucene.apache.org/solr/]
Lucene. [http://lucene.apache.org/]
Shmueli O, Horn-Saban S, Chalifa-Caspi V, Schmoish M, et al: GeneNote: Whole genome expression profiles in normal human tissues. C R Biol. 2003, 326: 1067-1072. 10.1016/j.crvi.2003.09.012.Google Scholar
Stelzer G, Inger A, Olender T, Iny-Stein T, et al: GeneDecks: Paralog hunting and gene-set distillation with GeneCards annotation. OMICS. 2009, 13: 477-487. 10.1089/omi.2009.0069.Google Scholar
Safran M, Chalifa-Caspi V, Shmueli O, Olender T, et al: Human gene-centric databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res. 2003, 31: 142-146. 10.1093/nar/gkg050.Google Scholar
Harel A, Inger A, Stelzer G, Strichman-Almashanu L, et al: GIFtS: Annotation landscape analysis with GeneCards. BMC Bioinformatics. 2009, 10: 348-10.1186/1471-2105-10-348.Google Scholar
Mayer B, Harel A, Dalah S, Pretrokovski S, et al: Omics data management and annotation. Bioinformatics for Omics Data. Edited by: Meyer B. 2011, Humana Press, Totowa, NJ, 71-96.Google Scholar
Muller A, Holzmann K, Kestler HA: Visualization of genomic aberrations using Affymetrix SNP arrays. Bioinformatics. 2007, 23: 496-497. 10.1093/bioinformatics/btl608.Google Scholar
Barton G, Abbott J, Chiba N, Huang DW, et al: EMAAS: An extensible grid-based rich internet application for microarray data analysis and management. BMC Bioinformatics. 2008, 9: 493-10.1186/1471-2105-9-493.Google Scholar
Ferrari F, Bortoluzzi S, Coppe A, Sirota A, et al: Novel definition files for human GeneChips based on GeneAnnot. BMC Bioinformatics. 2007, 8: 446-10.1186/1471-2105-8-446.Google Scholar
Kaelin WG: The concept of synthetic lethality in the context of anticancer therapy. Nat Rev Cancer. 2005, 5: 689-698. 10.1038/nrc1691.Google Scholar
Fechete R, Barth S, Olender T, Munteanu A, et al: Synthetic lethal hubs associated with vincristine resistant neuroblastoma. Mol Biosyst. 2010, 7: 200-214.Google Scholar
Baryshnikova A, Costanzo M, Dixon S, Vizeacoumar FJ, et al: Synthetic genetic array (SGA) analysis in Saccharomyces cerevisiae and Schizosaccharomyces pombe. Methods Enzymol. 2010, 470: 145-179.Google Scholar
O'Roak BJ, Deriziotis P, Lee C, Vives L, et al: Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet. 2011, 43: 585-589. 10.1038/ng.835.Google Scholar
Ropers HH, Hamel BC: X-linked mental retardation. Nat Rev Genet. 2005, 6: 46-57. 10.1038/nrg1501.Google Scholar
Tarpey PS, Smith R, Pleasance E, Whibley A, et al: A systematic, large-scale resequencing screen of X-chromosome coding exons in mental retardation. Nat Genet. 2009, 41: 535-543. 10.1038/ng.367.Google Scholar
Zahr NM, Bell RL, Ringham HN, Sullivan EV, et al: Ethanol-induced changes in the expression of proteins related to neurotransmission and metabolism in different regions of the rat brain. Pharmacol Biochem Behav. 2011, 99: 428-436. 10.1016/j.pbb.2011.03.002.Google Scholar
Le-Niculescu H, Case NJ, Hulvershorn L, Patel SD, et al: Convergent functional genomic studies of omega-3 fatty acids in stress reactivity, bipolar disorder and alcoholism. Transl Psychiatry. 2011, 1: e4-10.1038/tp.2011.1.Google Scholar
Fechete R, Heinzel A, Perco P, Monks K, et al: Mapping of molecular pathways, biomarkers and drug targets for diabetic nephropathy. Proteomics Clin Appl. 2011, 5: 354-366. 10.1002/prca.201000136.Google Scholar
Wishart DS, Knox C, Guo AC, Eisner R, et al: HMDB: A knowledgebase for the human metabolome. 2009, 37: D603-D610.Google Scholar
Knox C, Law V, Jewison T, Liu P, et al: DrugBank 3.0: A comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011, 39: D1035-D1041. 10.1093/nar/gkq1126.Google Scholar
Sayers EW, Barrett T, Benson DA, Bolton E, et al: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011, 39 (Suppl 1): D38-D51.Google Scholar
Ensembl Pan Taxonomic Compara. [http://fungi.ensembl.org/info/docs/compara/index.html]
Kari G, Rodeck U, Dicker AP: Zebrafish: An emerging model system for human disease and drug discovery. Clin Pharmacol Ther. 2007, 82: 70-80. 10.1038/sj.clpt.6100223.Google Scholar
Fu N, Drinnenberg I, Kelso J, Wu J-R, et al: Comparison of protein and mRNA expression evolution in humans and chimpanzees. PLoS ONE. 2007, 2: e216-10.1371/journal.pone.0000216.Google Scholar
Cox B, Kislinger T, Emili A: Integrating gene and protein expression data: Pattern analysis and profile mining. Methods. 2005, 35: 303-314. 10.1016/j.ymeth.2004.08.021.Google Scholar
Tian Q, Stepaniants SM, Mao M, Weng L, et al: Integrated genomic and proteomic analyses of gene expression in mammalian cells. Mol Cell Proteomics. 2004, 3: 960-969. 10.1074/mcp.M400055-MCP200.Google Scholar
Wright MW, Bruford EA: Naming 'junk': Human non-protein coding RNA (ncRNA) gene nomenclature. Hum Genomics. 2011, 5: 90-98.Google Scholar
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ: miRBase: Tools for microRNA genomics. Nucleic Acids Res. 2008, 36: D154-D158. 10.1093/nar/gkn221.Google Scholar
Kin T, Yamada K, Terai G, Oxida H, et al: fRNAdb: A platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res. 2007, 35: D145-D148. 10.1093/nar/gkl837.Google Scholar
Liu C, Bai B, Skogerbo G, Cai L, et al: NONCODE: An integrated knowledge database of non-coding RNAs. Nucleic Acids Res. 2005, 33: D112-D115. 10.1093/nar/gni113.Google Scholar
Pang KC, Stephen S, Engstrom PG, Tajal-Arifin K, et al: RNAdb -- A comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 2005, 33: D125-D130. 10.1093/nar/gni117.Google Scholar
Gardner PP, Daub J, Tate JG, Nawrocki EP, et al: Rfam: Updates to the RNA families database. Nucleic Acids Res. 2009, 37: D136-D140. 10.1093/nar/gkn766.Google Scholar
Codd EF, Codd SB, Salley CT: Providing OLAP (On-Line Analytical Processing) to User-Analysis: An IT Mandate. Technical report, E.F. Codd and Associates. 1993Google Scholar
Bédard Y, Merrett T, Han J: Fundamentals of spatial data warehousing for geographic knowledge discovery. Geographic Data Mining and Knowledge Discovery. Edited by: Miller HJ, Han J. 2001, Taylor and Francis, London, 53-73.Google Scholar
Rivest S, Bédard Y, Marchand P: Toward better support for spatial decision making: Defining the characteristics of spatial on-line analytical processing (SOLAP). Geomatica. 2001, 55: 539-555.Google Scholar
Alkharouf NW, Jamison DC, Matthews BF: Online analytical processing (OLAP): A fast and effective data mining tool for gene expression databases. J Biomed Biotechnol. 2005, 2005: 181-188. 10.1155/JBB.2005.181.Google Scholar
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright BioMed Central 2011
Abstract
Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer