PERT: expression deconvolution under varied environmental conditions

Qiao, W.*, Quon, G.*, Csaszar, E., Yu, M., Morris, Q.D., Zandstra, P.W. (2012) PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLoS Computational Biology, 8(12): e1002838

The PERT algorithm is designed for expression deconvolution of samples of mixed cell types, where the expression profiles of most of the individual cell types are available and we simply want to estimate the proportion of each mixed sample that is attributable to each individual cell type. PERT is designed for the situation in which the expression profiles of the individual cell types were collected under different experimental conditions than the expression profiles of the mixed samples. For example, in this work, the mixed blood samples were collected in culture conditions, whereas the cell type-specific expression profiles were collected in uncultured conditions. Cell cultures induce systematic changes to the expression profiles that, when not corrected for, yield bad estimates of mixture proportions of the mixed samples. These systematic changes in expression were not removed by batch correction software in our experiments, so we developed PERT to address this issue.

The underlying assumption of PERT is that the effects of the cell culture (or other) conditions under which the mixed cell types are collected, “perturb” the gene expression levels of each gene in a similar manner across all cell types.

The input to PERT is:

  • D gene expression profiles, each of which is a sample containing multiple cell types at unknown proportions
  • K gene expression profiles, each of which is from a single homogeneous cell type

The output of PERT is:

  • a matrix T with D rows and K columns, where T(d,k) describes the fraction of RNA from the mixed sample d that is attributable to the input cell type k
  • a vector rho of length G (the number of genes in the expression profiles), that describes how the gene expression level of each gene is perturbed in the mixed samples relative to the individual cell types