Introduction to Large-Scale Biomedical Data Analysis

Class Information


This is an overview course for the Bioinformatics, Imaging and Genetics (BIG) concentration in the PhD program of the Department of Biostatistics and Bioinformatics. It aims to introduce students to modern high-dimensional biomedical data, including data in bioinformatics and computational biology, biomedical imaging, and statistical genetics.

This course will be co-taught by several BIG core faculty members, with each faculty member giving one or two lectures. The focus of the course will be on the data characteristics, opportunities and challenges for statisticians, as well as current developments and active areas of the research fields of bioinformatics, biomedical imaging and statistical genetics.

Prerequisites: BIOS 501 or equivalent, or permission from the instructor.


Class schedule and notes

Date Lecture Title Description Homework
8/31 Lecture 1: Introduction to high-throughput data analysis (Wu) [Notes] Course information. Introduction to high-throughput data. Feature selection from high-throughput data.
9/7 Lecture 2: What genomics bigdata are available and what are their utility? (Qin) [Notes] Review existing genomics big data resources including ENCODE, 1000 genomes, GWAS catalog, GTEx and TCGA. Discuss how to use them to extract biological insights.
9/14 Lecture 3: Introduction to single cell genomics (Wu) [Notes] Biological motivation, technologies, and data analysis methods for single cell RNA-seq, including cell clustering, differential expression, and cell type identification. Reading assignment 1: (a) single cell RNA-seq review, (b) Select a paper of your interest from the 2021 NAR database issue.
9/21 Lecture 4: Introduction to biomedical imaging (Guo) [Notes] Introduction of imaging techniques, various imaging modalities and data acquisition and structure.
9/28 Lecture 5: Statistical Analysis of Neuroimaging Data (Risk) [Notes] An overview of the analysis of neuroimaging data with a focus on functional magnetic resonance imaging (fMRI), including experimental design, data processing, and statistical analysis. We will also discuss accelerated acquisition techniques and the statistical implications. Reading assignment 2: paper 1, paper 2, paper 3
10/5 Lecture 6: Introduction to Genome-wide Association Studies (Yang) [Notes] An overview of linkage disequilibrium, single variant GWAS methods, population stratification, and meta-analysis.
10/12 Fall break, no class
10/19 Lecture 7: Microbiome (Hu) [Notes] Statistical method and data analysis for microbiome data.