Tutorial: Introduction to R and Data Visualization

Instructor: Abel Rodriguez, Professor, Applied Mathematics and Statistics, University of California, Santa Cruz, abel@ams.ucsc.edu

This short course covers two distinct but interrelated topics: it provides both an introduction to R (one of the premier languages for statistical analysis) and to data visualization techniques (and how to implement them using R). The course is divided into 6 sessions of 105 minutes each (for a notional total of 10.5 hours of instruction). Participants are expected to bring their own computers to work alongside the instructor on examples.

There are numerous books and online courses on R. There is also a multiplicity of books on Data Visualization. Below are my favorites, and I have based important portions of the course on material extracted from them.

  • Venables, William N., Smith, D.M. and the R Core Team. An introduction to R. Available at https://cran.r-project.org/doc/manuals/R-intro.pdf
  • Maindonald, John, and John Braun. Data analysis and graphics using R: an example-based approach. Vol. 10. Cambridge University Press, 2006.
  • Venables, William N., and Brian D. Ripley. Modern applied statistics with S-PLUS. Springer Science & Business Media, 2013.
  • Cairo, Alberto. The Functional Art: An introduction to information graphics and visualization. New Riders, 2012.
  • Yau, Nathan. Visualize this!. John Wiley & Sons, 2012.

Content

Session 1. Introduction to R
  • What is R? Basic syntax and operations.
  • Objects: vectors, arrays, data frames, lists.
  • Flow control.
  • Functions and vectorization.
  • Loading datasets.
  • Descriptive statistics.
  • Packages.
Session 2. Useful tools in R
  • Linear algebra.
  • Scripting.
  • Optimization.
Session 3. Introduction to Data Visualization
  • General principles
  • Pre-attentive processing
  • Choosing visual cues
  • Choosing coordinate systems
  • Choose scales
  • Choosing context information
Session 4. Data Visualization in R
  • Case study: Combining Time Series and Part-to-Whole Relationships
  • Case study: Hierarchically classified data.
  • Case study: Periodic Data
  • Case study: Two time series with different scales
  • Case study: Uncertainty bands
Session 5. Basic data analysis in R
  • Fitting linear models.
  • Fitting generalized linear models
  • Multivariate statistics
Session 6. Advanced Topics
  • Greek letters and math notation in R
  • Mapping
  • Network data
  • Trees and dendrograms
  • Reproducible research