ISOpure: a general algorithm for gene expression deconvolution and purification

Quon, G., Haider, S., Deshwar, A.G., Cui, A., Boutros, P.C., Morris, Q.D. (2013) Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Medicine, 5:29.

The ISOpure algorithm was originally designed in the context of studying gene expression profiles from lung and prostate tumor samples. ISOpure reduces inter-tumor variance in gene expression due to contaminating non-cancerous cells in the tumor samples, thus “purifying” them. ISOpure is generally applicable to situations in which expression profiles are collected from case and control individuals and the samples are mixtures of cell types, where the goal is to remove the effect of cell types not relevant to the disease from the case gene expression profiles.

The input to ISOpure is:

  • N gene expression profiles from case (disease) tumor samples
  • R gene expression profiles from control (healthy) samples, from the same source (tissue) as the tumors

ISOpure outputs:

  • N purified gene expression profiles, one per input tumor sample
  • N estimates of the % cancer content of each tumor sample

Compared to existing deconvolution methods, ISOpure has a number of advantages:

  • ISOpure produces a purified cancer expression profile for each input tumor expression profile; many other algorithms only estimate mixing proportions of healthy cells, and furthermore, often require expression profiles of individual cell types as input.
  • ISOpure does not need expression profiles of individual healthy cell types. Instead, it uses expression profiles of healthy individuals from the same source (tissue) as the tumor samples – the underlying assumption being that with enough samples from control individuals, most contaminating healthy cells in the tumor samples will be represented in the samples from the control individuals. These control samples do not need to be matched.