Research Publications Software People Teaching Contact

Research

The ultimate goal of our research is to advance biomedical sciences and clinical practices through the development of rigorous and efficient tools. We aim to tackle the challenges presented by modern biomedical "big data", including high-dimensionality, heterogeneity, technical artifacts, and reproducibility, and make sense of the data.

3
Core Directions
10+
Bioconductor Packages
30K+
Annual Downloads
Single-cell & Spatial Omics

Single-cell & Spatial Omics Analysis

Single-cell RNA sequencing (scRNA-seq) enables the identification of rare cell types and cell type-specific gene expression patterns by measuring individual cells rather than population averages. We develop statistical methods for:

  • Cell type annotation and clustering: Cellcano for scATAC-seq cell typing, FEAST for feature selection, and WIND/CEMUSA for clustering evaluation.
  • Differential expression analysis: SC2P uses mixture models (zero-inflated Poisson and lognormal-Poisson) for flexible differential expression inference in sparse scRNA-seq data.
  • Spatial transcriptomics: SIGLE for spot-level representation learning, STAND for abnormality detection, and MicroMap for predicting spatial expression from H&E images.
Signal Deconvolution

Signal Deconvolution for Bulk Omics Data

Bulk omics data remain the primary choice for population-level studies due to lower costs. We develop methods to extract cell type-specific information from bulk data—effectively "unmixing the smoothie":

  • Signal deconvolution: Estimating cell type proportions and reconstructing pure cell type profiles without single-cell reference data.
  • Cell type-specific inference: Accounting for cell type mixture in differential expression/methylation and sample clustering.
Software: TOAST (reference-free deconvolution) and InfiniumPurity (tumor purity estimation).
Differential Analysis

Differential Analysis for Bulk Omics Data

We develop rigorous statistical methods for detecting differential signals in bulk sequencing data:

  • BS-seq: DSS uses hierarchical beta-binomial models for differential methylation analysis (20,000+ annual downloads).
  • RNA-seq: Methods accounting for overdispersion and small sample sizes.
  • ChIP-seq: Differential peak detection for protein-DNA binding sites.