Content area
Full text
AbstractBimodal gene is one of the common phenomena frequently observed in gene expression data for certain types of studies including cancer studies and drug/therapy effect studies. There have been several algorithms proposed to predict bimodal genes with success. However, occasionally their performance is not very satisfied. We propose a new algorithm to detect bimodal genes. The new algorithm is based on the assumption that the bimodality is related with the gap between two consecutive expressions. We show that this new algorithm demonstrates better performance compared with several benchmark algorithms using both real and simulated data sets.
Keywords: bimodal distribution, non-parametric analysis, differential genes, heterogeneity.
(ProQuest: ... denotes formulae omitted.)
1 Introduction
Microarray experiments have benefitted the discovery of genetic differentiation pattern for interpreting the observed phenotypic differentiation for a decade [1]. The success is due to high-throughput and genome-wide examination. The discovery of differential genes in relation to phenotypic differentiation can be implemented using standard student t test if data satisfy the assumption. However biological diversity makes this difficult because a large number of genes appear to have bimodal or multi-modal distribution [2]. Fig 1 shows such a typical bimodal distribution of samples in the same category (such as cancer samples) of a gene.
Khalil et al have explained that cancer is a complex disease [4] because it has many subtypes . The existence of bimodal genes may be related to important subtypes of a disease. In medical science, bimodal genes can be the product of somatic mutations as the amplification of the receptor tyrosine kinase proto-oncogene "erbB2" during the development of cancer [5]. Another cause for the bimodality in cancers is germ cell mutations such as SNPs [6]. It has been noticed that the majority of cancer data demonstrate this kind of heterogeneous pattern [7-9]. Genetic translocations are commonly occurred in cancer cell which is a result of the rearrangement of parts between non-homologous chromosomes [10]. However, these mutations play main role in cancer cell progression or, more generally, diseases development. Furthermore, the genomic lesions may affect some samples but not all leading to the occurrence of bimodality. An example of recurrent fusion was observed by Tomlins and others in prostate cancer datasets where they found ERG and ETV1 genes over expressed...




