BIOS731: Advanced Statistical Computing

Class Information


This course is designed for Ph.D. level biostatistics students. It covers the methods and applications of some common statistical computing methods. Topics include random number generation, permutation and bootstrap, optimization methods, Expectation-Maximization (EM), Minorization-Maximization (MM), linear/quadratic programming, hidden Markov model (HMM), and Markov chain Monte Carlo (MCMC).

The class has two main goals for students: (1) learn the general theory and algorithmic procedures of several widely used statistical models; (2) develop fluency in statistical programming skills. The class puts more emphasis on implementation instead of statistical theories, although a non-trivial amount of theories will be taught in the class. Students will gain computational skills and practical experiences on simulations and statistical modeling.

This course requires significant amount of programming. Each set of homework involves the analysis of real dataset or implementation of certain algorithms using high-level programming language (Matlab/R/perl/python).

Upon successfully completing this course, students will be able to:

  1. Understand the general theories and computational procedures of the algorithms covered in the class.
  2. Implement the algorithms using high-level programming languages.
  3. Apply the methods to model real world data.

The target audiences of the class include Biostatistics and Bioinformatics Ph.D. students, or other graduate students who works on highly quantitative fields.

Prerequisites: BIOS 510 or equivalent. BIOS 545 “Introduction to R programming” or prior programming experiences in one or more high-level programming languages (Matlab/R/perl/python).

Grading: Three sets of homework, each worth 30%. Class attendence and participation, 10%.

Recommended books:

  1. Lange, K., Numerical Analysis for Statisticians.
  2. Gilks W. R. et al., Markov chain Monte Carlo in Practice.
  3. Press WH, et al., Numerical Recipes in C.
  4. Durbin, R. et al., Biological sequence analysis.

Class schedule and notes

Date Lecture Title Description Homework
8/30 (Thur) Lecture 1: Random number generation, permutation and bootstrap (Wu) [Notes] Course information. Random number generation. permutation and bootstrap.
9/4 (Tues) Lecture 2: Optimization methods (Wu) [Notes] Optimization Algorithms: Newton-Raphson, Quasi-Newton-Raphson, Iteratively Re-weighted Least Squares for Generalized Linear Regression
9/6 (Thur) Lecture 3: EM algorithm I (Wu) [Notes] EM algorithm and applications.
9/11 (Tues) Lecture 4: EM algorithm II (Wu) [Notes] EM algorithm extensions, SEM algorithm, EM gradient algorithm, ECM algorithm.
9/13 (Thur) Lecture 5: MM algorithm (Wu) [Notes] MM algorithm and applications homework1
9/18 (Tues) Lecture 6: HMM I (Wu) [Notes] Introduction to HMM. Forward-backward algorithm.
9/20 (Thur) Lecture 7: HMM II (Wu) [Notes] Viterbi algorithm.
9/25 (Tues) Lecture 8: Linear programming I (Wu) [Notes] Introduction. Simplex algorithm. Duality.
9/27 (Thur) Lecture 9: Linear programming II (Wu) [Notes] Interior point method. Quadratic programming. Applications of LP: quantile regression.
10/2 (Tues) Lecture 10: Linear programming III (Wu) [Notes] Applications of LP: LASSO, Support vector machine. homework2, Data for questions 1
10/4 (Thur) Lecture 11: MCMC I (Qin) [Notes] Introduction to MCMC.
10/9 (Tues) Fall break, no class
10/11 (Thur) Lecture 12: MCMC II (Qin) [Notes] Introduction to sequential MC.
10/16 (Tues) Lecture 13: MCMC III (Qin) [Notes, handout] Other techniques, applications of MCMC. homework3
10/18 (Thur) Lecture 14: Dimension reduction (Wu) [Notes] Introduce two dimension reduction methods: non-negative matrix factorization (NMF), and local Dirichlet allocation (LDA).
10/23 (Tues) Lecture 15: Introduction to deep learning [Notes] Perceptron, neural network, deep architect, convolutional neural network, transfer learning.

Additinal readings and resource: