📊

Essential Guide to Data Analysis in R

Aug 3, 2024

Introduction to Data Analysis with R

Installation

  • Install R and RStudio (front-end for R)
  • Google 'RStudio' or 'RStudio Desktop' and follow the installation instructions
  • Open RStudio only; no need to interact with the R icon

Basic Operations

  • RStudio can be used as a graphing calculator
    • Example: 5 + 7, abs(-17)
  • Variable assignment
    • Example: x <- -12, y <- c(-12, 6, 0, -1)
    • Use <- for assignment (preferred over =)
  • Operations on variables
    • Example: x + 7, abs(x), y * 2, abs(y)*

Data Import

  • Import data sets via the file browser in the lower right of RStudio
    • Example: Import an Excel or CSV file
    • Use read_excel function
    • Store the data frame in a variable
    • Use View() to see the imported data

Scooby-Doo Data Set Example

  • Example data set from Tidy Tuesday
    • Use library(readxl) to import Excel data
    • Install the readxl package using install.packages('readxl')
    • Import and view the data set: Scooby <- read_excel('file_path')
  • Summary statistics
    • Calculate mean: mean(Scooby$runtime)
    • Handle missing values: mean(Scooby$imdb, na.rm = TRUE)

Scripting

  • Use R scripts to document and encode your analysis
    • Example: Create a new script with File -> New File -> R Script
    • Save and execute script lines with cmd + Enter (Mac) or ctrl + Enter (PC)
    • Load packages with library(tidyverse)

Data Manipulation

  • Built-in data sets: data()
  • Viewing data sets: View(mpg), ?mpg, glimpse(mpg)
  • Filter rows with filter function
    • Example: filter(mpg, cty >= 20)
    • Save filtered data: mpg_efficient <- filter(mpg, cty >= 20)
  • Add or change columns with mutate
    • Example: mpg_metric <- mutate(mpg, cty_metric = cty * conversion_factor)
  • Use pipes %>% to chain commands
    • Example: mpg %>% filter(cty >= 20) %>% mutate(cty_metric = cty * conversion_factor)

Grouped Summaries

  • Group data with group_by and summarize with summarize
    • Example: mpg %>% group_by(class) %>% summarize(mean_cty = mean(cty, na.rm = TRUE), median_cty = median(cty, na.rm = TRUE))

Data Visualization

  • Use ggplot2 for plotting
    • Example: ggplot(mpg, aes(x = cty)) + geom_histogram()
    • Add layers: + labs(x = 'City Mileage')
    • Scatter plot: ggplot(mpg, aes(x = cty, y = hwy)) + geom_point()
    • Add regression line: + geom_smooth(method = 'lm')
    • Color by category: aes(color = class)
    • Use color palettes: + scale_color_brewer(palette = 'Dark2')

Reporting with R Markdown

  • Create R Markdown documents for sharing results
    • Example: File -> New File -> R Markdown
    • Add code chunks with triple backticks
    • Render documents with the knit button
    • Modify template: library(tidyverse), glimpse(mpg)

Tips

  • Comments in code: Use # to add comments
  • Customize code chunks: Use gear icon in RStudio to adjust chunk options
  • Explore R Markdown cheat sheets for formatting options