This is a detailed learning plan for a high schooler (sophomore and above) during an 8-week period. Students are expected to have basic knowledge about programming and statistics. The workload is 2-3 hours per day during the weekday.


Learning objective

To develop essential skills for data analysis, including visualization, summarization, basic statistical inference, and generating report.


Content

Basic statistics knowledge

  • Understand data format: vector and matrix
  • Understand concept of data distribution.
  • Summary statistics: mean, meian, variance, standard deviation, proportions.
  • Understand concept of data visualization and their relationship with data distribution and summary statistics: boxplot, histogram, scatter plot, bar plot.
  • Concept of correlation and contingency table.
  • Basic statistical inference: two group t- and z-test, simple linear regression.

Programming language

  • R: for data analyses and visualization
    • Basic usage of Rstudio
    • Using R console for basic numeric computation.
    • Understand R data types including vectors, matrix, data frame
    • File input and output: how to read in files and write results out.
    • Basic R graphics, and how to output R figures to file.
    • Basic data analyses: compute summary statistics, simple statistical test, linear regression.
  • markdown and Rmarkdown: for writing reports.
    • Create markdown files, convert to pdf and/or html.
    • Basic commands for different type of format
    • How to include figures in the document
    • How to include Latex style equations
  • A littlt bit of Latex.
    • I don’t expect students to write standalone latex document, but they need to know how to use latex to write equations, and insert those in Rmarkdown documents.

Data analysis

  • Analyze a dataset for COVID-19.
  • Write a final report in the standard format of a scientific paper.

Materials

Below are some selected materials, websites, and videos. I’ll also use other materials, detailed in the weekly schedules.


Weekly schedules

Click the link for each week for detailed learning schedule and homework. All homework need to be written in R, markdown, or Rmarkdown.

  • Week 1:
    • Install R and R studio.
    • Understand R and R studio basics.
    • Understand basic R data type and operator: scalar, vector, matrix. Understand how to combine and subset vector and matrix.
    • Install a Markdown editor. Write a simple markdown document.
  • Week 2
    • Understand the concept of R packages. Learn to install packages.
    • R data frame and list.
    • R file I/O (input/output): Understand tab-delimited and csv files. How to read those files into R, and write the results out to a text file.
    • How to save and load R objects.
    • Statistics:
      • understand mean, median, variance, standard deviation, etc. Learn to use R to compute these values.
      • understand the concept of continuous and categorical data, and their data distribution. understand the meaning of histogram, boxplot, barplot.
    • Basic R graphics. Understand the meaning of different type of plots. Use R to generate the plots.
    • Learn to save the figures from R session to pdf/jpg/png/etc.
    • Markdown: play with different type of fonts, ordered and unordered list, including figures. Write a complete markdown document.
  • Week 3
    • R programming control statement: loops, if-else.
    • R graphics: plot with the color, line types, point types, legend, etc.
    • Basic Latex.
    • R markdown.
  • Week 4
    • Random number generator in R.
    • More advanced R graphics:
      • overlay lines on scatterplot.
      • Multiple panels in one figure.
      • Colors in R.
      • Figure margins.
    • Basic statistics:
      • Concept of random variable and probability distribution.
    • Learn Latex math symbols. Insert latex equations in Rmarkdown.
  • Week 5
    • Concept of corrleation, contingency table. Relationships among more than one variables (continuous and/or categorical).
    • Use scatterplot and boxplot to explore relationships among more than one variables.
    • Understand the concept of simple linear regression.
    • Use R to do linear regression.
    • Write R function.
  • Week 6
    • Linux system and commands.
    • R graphics, including base and ggplot2.
  • Week 7
    • R animation
    • Review some earlier contents.
  • Week 8
    • Write a review for the whole course.
    • COVID-19 data anaysis. Write a report.