Data science for high schooler
This is a detailed learning plan for a high schooler (sophomore and above) during an 8-week period. Students are expected to have basic knowledge about programming and statistics. The workload is 2-3 hours per day during the weekday.
Learning objective
To develop essential skills for data analysis, including visualization, summarization, basic statistical inference, and generating report.
Content
Basic statistics knowledge
- Understand data format: vector and matrix
- Understand concept of data distribution.
- Summary statistics: mean, meian, variance, standard deviation, proportions.
- Understand concept of data visualization and their relationship with data distribution and summary statistics: boxplot, histogram, scatter plot, bar plot.
- Concept of correlation and contingency table.
- Basic statistical inference: two group t- and z-test, simple linear regression.
Programming language
- R: for data analyses and visualization
- Basic usage of Rstudio
- Using R console for basic numeric computation.
- Understand R data types including vectors, matrix, data frame
- File input and output: how to read in files and write results out.
- Basic R graphics, and how to output R figures to file.
- Basic data analyses: compute summary statistics, simple statistical test, linear regression.
- markdown and Rmarkdown: for writing reports.
- Create markdown files, convert to pdf and/or html.
- Basic commands for different type of format
- How to include figures in the document
- How to include Latex style equations
- A littlt bit of Latex.
- I don’t expect students to write standalone latex document, but they need to know how to use latex to write equations, and insert those in Rmarkdown documents.
Data analysis
- Analyze a dataset for COVID-19.
- Write a final report in the standard format of a scientific paper.
Materials
Below are some selected materials, websites, and videos. I’ll also use other materials, detailed in the weekly schedules.
- R
- markdown
- Latex
- Rmarkdown
Weekly schedules
Click the link for each week for detailed learning schedule and homework. All homework need to be written in R, markdown, or Rmarkdown.
- Week 1:
- Install R and R studio.
- Understand R and R studio basics.
- Understand basic R data type and operator: scalar, vector, matrix. Understand how to combine and subset vector and matrix.
- Install a Markdown editor. Write a simple markdown document.
- Week 2
- Understand the concept of R packages. Learn to install packages.
- R data frame and list.
- R file I/O (input/output): Understand tab-delimited and csv files. How to read those files into R, and write the results out to a text file.
- How to save and load R objects.
- Statistics:
- understand mean, median, variance, standard deviation, etc. Learn to use R to compute these values.
- understand the concept of continuous and categorical data, and their data distribution. understand the meaning of histogram, boxplot, barplot.
- Basic R graphics. Understand the meaning of different type of plots. Use R to generate the plots.
- Learn to save the figures from R session to pdf/jpg/png/etc.
- Markdown: play with different type of fonts, ordered and unordered list, including figures. Write a complete markdown document.
- Week 3
- R programming control statement: loops, if-else.
- R graphics: plot with the color, line types, point types, legend, etc.
- Basic Latex.
- R markdown.
- Week 4
- Random number generator in R.
- More advanced R graphics:
- overlay lines on scatterplot.
- Multiple panels in one figure.
- Colors in R.
- Figure margins.
- Basic statistics:
- Concept of random variable and probability distribution.
- Learn Latex math symbols. Insert latex equations in Rmarkdown.
- Week 5
- Concept of corrleation, contingency table. Relationships among more than one variables (continuous and/or categorical).
- Use scatterplot and boxplot to explore relationships among more than one variables.
- Understand the concept of simple linear regression.
- Use R to do linear regression.
- Write R function.
- Week 6
- Linux system and commands.
- R graphics, including base and ggplot2.
- Week 7
- R animation
- Review some earlier contents.
- Week 8
- Write a review for the whole course.
- COVID-19 data anaysis. Write a report.