🖥️

Introduction to R and RStudio

Jan 3, 2026

Overview

  • Lecture: Introduction to Programming with R using RStudio.
  • Goals: Write basic R programs, use RStudio, handle user input, manage data types, read/write CSVs, explore data frames, vectors, and factors.
  • Examples: "hello.R" program, greeting users, vote counting programs, reading FiveThirtyEight CSV.

First Programs And RStudio

  • R is a language and interpreter for working with data.
  • RStudio is an IDE specialized for R with a Console and Editor.
  • Working directory: folder RStudio uses by default for files.
  • Create files with file.create("filename.R") and view files in the Files pane.

Key steps to write and run:

  • Save script (Cmd/Ctrl-S) and run single lines via Run or Command/Ctrl-Enter.
  • Use Source to run entire script file.

Functions, Arguments, Side Effects, Return Values

  • Function syntax: name(arguments) — parentheses denote function invocation.
  • Arguments: inputs that affect function behavior.
  • Side effect: visible effect of function (e.g., print outputs to console).
  • Return value: function output that can be stored and reused.
  • Example: print("hello, world") prints; readline(prompt) returns user's input.

Variables And Assignment

  • Assignment operator: <- (or = in some contexts). Example: name <- readline("What's your name?")
  • Environment pane shows stored objects and their values.
  • Use comments (#) to document code above relevant lines.

String Concatenation

  • Strings: text wrapped in double quotes.
  • Concatenate with paste(..., sep = " ") or paste0(...) for no separator.
  • Named parameter example: sep sets separator; default sep = " " in paste.
  • Function composition: nested function calls compose functions; innermost runs first.
  • Readability trade-off: prefer separate assignments and comments for clarity.

Basic Debugging

  • Common error: mistyped function names produce "could not find function" error.
  • Use error messages to locate and fix bugs.
  • Use ?function to access documentation (help pages).

User Input Example: Dynamic Greeting

  • Steps:
    • name <- readline("What's your name?")
    • greeting <- paste("Hello, ", name, sep = "")
    • print(greeting)
  • Or compose print(paste0("Hello, ", name)) to avoid storing greeting.

Arithmetic And Data Types

  • Arithmetic operators: +, -, *, /.
  • Readline returns character strings; convert to numeric with as.integer() or as.double().
  • Data types (storage modes): character, integer, double, etc.
  • Coercion: converting storage mode via as.integer(), as.double(), as.character().*

Counting Votes Example

  • Read three inputs for Mario, Peach, Bowser and compute total:
    • mario <- as.integer(readline("Enter votes for Mario:"))
    • peach <- as.integer(readline("Enter votes for Peach:"))
    • bowser <- as.integer(readline("Enter votes for Bowser:"))
    • total <- sum(mario, peach, bowser)
    • print(paste("Total votes", total))
  • Prefer sum(...) to add any number of candidates.
  • Use function composition to convert and assign in one line.

Data Representation: Tables, CSV, Data Frames

  • Tabular data: rows (observations) and columns (variables).
  • CSV (comma-separated values) is a common file format for tables.
  • Read CSV into R:
    • read.table("file.csv", sep = ",", header = TRUE) or read.csv("file.csv")
  • Data frame: R's tabular data structure (columns can have names and types).

Table: CSV Read Example

FunctionPurpose
read.table(file, sep, header)Generic file reader; specify separator and header.
read.csv(file)Convenience wrapper for CSV files; assumes sep="," and header=TRUE.

Inspecting Data Frames And Vectors

  • Use ls() to list environment objects; rm(list = ls()) to clear environment.
  • View(data_frame) opens a spreadsheet-like tab.
  • nrow(df) and ncol(df) return row and column counts.
  • Accessing elements:
    • Bracket notation df[row, col] (indices start at 1).
    • Omit row or column to select whole column or row: df[, 2] gives all rows in column 2.
    • Column by name: df$column_name returns a vector of that column.
  • df[ , 1] returns vector of first column; df[1] returns a data frame with first column.
  • Use colnames(df) and rownames(df) to inspect names.

Table: Data Access Examples

SyntaxReturns
df[1, 2]Single value at row 1, column 2
df[, 2]Vector of all values in column 2
df$pollVector of "poll" column by name
df[1]Data frame of first column (not a vector)

Vectors And Vectorized Operations

  • Vector: one-dimensional list of elements all with same storage mode.
  • Access vector elements with v[i]; indexing starts at 1.
  • Many R functions are vectorized: accept and operate on entire vectors efficiently.
  • Vector arithmetic is element-wise:
    • Example: votes$poll + votes$mail returns a vector of row-wise sums (per candidate).
  • Use sum(vector) to sum all elements of a numeric vector.

Adding Columns To Data Frames And Writing CSVs

  • Add new column: df$new_col <- some_vector (length must equal number of rows).
  • Write data frame to CSV: write.csv(df, "filename.csv", row.names = FALSE) to avoid default row names column.
  • row.names argument controls whether row names are written.

Factors: Categorical Data

  • Factor: representation for categorical variables (one-dimensional).
  • Create factor: factor(vector, labels = c(...), exclude = c(...)) or as.factor().
  • Factor stores levels (unique categories) and can map numeric codes to human-readable labels.
  • NA represents missing/unavailable data; NULL and NaN are distinct special values.
  • Exclude unwanted categories when creating factors to convert them to NA.

Table: Special Values And Conversions

Value/FunctionMeaning/Use
NAMissing / not available
NaNNot a number (e.g., invalid numeric result)
NULLNothing / empty object
as.integer(x), as.double(x)Coerce x to integer or double
factor(x, labels, exclude)Convert x to categorical factor with labels and exclusions

Working With External Data (Example: FiveThirtyEight)

  • Read online CSV by providing URL to read.csv().
  • Use codebook to interpret column names and codes.
  • Use unique(df$column) to list unique values in a column.
  • Use factor() and labels to convert coded numeric responses to meaningful categories.
  • Use nrow() and ncol() to explore dataset size.

Common Patterns And Tips

  • Use ?function to read documentation (help pages) and learn parameters.
  • Prefer meaningful variable names and comments for readability.
  • Convert inputs to appropriate types immediately to avoid later errors.
  • Favor vectorized operations (sum on a column vector) over manual element-by-element loops for clarity and performance.
  • Use functions like unique(), sum(), nrow(), ncol(), View(), colnames(), rownames() to inspect and understand data.

Next Steps / Action Items

  • Practice:
    • Write simple R scripts using readline, print, paste/paste0, and assignment.
    • Read a CSV with read.csv and inspect nrow/ncol, View, and column access patterns.
    • Convert coded columns to factors with labels and exclude problematic codes.
  • Upcoming topics: data filtering/subsetting, transforming data, handling missing values, and more advanced data manipulation.