Content area
Full Text
PRIMERHow does gene expression clustering work?.com/naturebiotechnologyPatrik DhaeseleerClustering is often one of the first steps in gene expression analysis. How do clustering algorithms work, which ones
should we use and what can we expect from them?Our ability to gather genome-wide expression
data has far outstripped the ability of our puny
human brains to process the raw data. We can
distill the data down to a more comprehensible
level by subdividing the genes into a smaller
number of categories and then analyzing those.
This is where clustering comes in.The goal of clustering is to subdivide a set
of items (in our case, genes) in such a way that
similar items fall into the same cluster, whereas
dissimilar items fall in different clusters. This
brings up two questions: first, how do we
decide what is similar; and second, how do we
use this to cluster the items? The fact that these
two questions can often be answered independently contributes to the bewildering variety
of clustering algorithms.Gene expression clustering allows an openended exploration of the data, without getting lost among the thousands of individual
genes. Beyond simple visualization, there are
also some important computational applications for gene clusters. For example, Tavazoie
et al.1 used clustering to identify cis-regulatory
sequences in the promoters of tightly coexpressed genes. Gene expression clusters also
tend to be significantly enriched for specific
functional categorieswhich may be used to
infer a functional role for unknown genes in
the same cluster.In this primer, I focus specifically on clustering genes that show similar expression patterns across a number of samples, rather than
clustering the samples themselves (or both). I
hope to leave you with some understanding
of clustering in general and three of the more
popular algorithms in particular. Where possible, I also attempt to provide some practical
guidelines for applying cluster analysis to your
own gene expression data sets.A few important caveatsBefore we dig into some of the methods in
use for gene expression data, a few words of
caution to the reader, practitioner or aspiring
algorithm developer: It is easyand temptingto invent yet
another clustering algorithm. There are hundreds of published clustering algorithms,
dozens of which have been applied to gene http://www.nature2005 Nature Publishing Group aExperiment 2bExperiment 1Experiment 2Experiment 1cdBob CrimiPatrik Dhaeseleer is in the...