Overview
- Lecture: Introduction to Programming with R using RStudio.
- Goals: Write basic R programs, use RStudio, handle user input, manage data types, read/write CSVs, explore data frames, vectors, and factors.
- Examples: "hello.R" program, greeting users, vote counting programs, reading FiveThirtyEight CSV.
First Programs And RStudio
- R is a language and interpreter for working with data.
- RStudio is an IDE specialized for R with a Console and Editor.
- Working directory: folder RStudio uses by default for files.
- Create files with file.create("filename.R") and view files in the Files pane.
Key steps to write and run:
- Save script (Cmd/Ctrl-S) and run single lines via Run or Command/Ctrl-Enter.
- Use Source to run entire script file.
Functions, Arguments, Side Effects, Return Values
- Function syntax: name(arguments) — parentheses denote function invocation.
- Arguments: inputs that affect function behavior.
- Side effect: visible effect of function (e.g., print outputs to console).
- Return value: function output that can be stored and reused.
- Example: print("hello, world") prints; readline(prompt) returns user's input.
Variables And Assignment
- Assignment operator: <- (or = in some contexts). Example: name <- readline("What's your name?")
- Environment pane shows stored objects and their values.
- Use comments (#) to document code above relevant lines.
String Concatenation
- Strings: text wrapped in double quotes.
- Concatenate with paste(..., sep = " ") or paste0(...) for no separator.
- Named parameter example: sep sets separator; default sep = " " in paste.
- Function composition: nested function calls compose functions; innermost runs first.
- Readability trade-off: prefer separate assignments and comments for clarity.
Basic Debugging
- Common error: mistyped function names produce "could not find function" error.
- Use error messages to locate and fix bugs.
- Use ?function to access documentation (help pages).
User Input Example: Dynamic Greeting
- Steps:
- name <- readline("What's your name?")
- greeting <- paste("Hello, ", name, sep = "")
- print(greeting)
- Or compose print(paste0("Hello, ", name)) to avoid storing greeting.
Arithmetic And Data Types
- Arithmetic operators: +, -, *, /.
- Readline returns character strings; convert to numeric with as.integer() or as.double().
- Data types (storage modes): character, integer, double, etc.
- Coercion: converting storage mode via as.integer(), as.double(), as.character().*
Counting Votes Example
- Read three inputs for Mario, Peach, Bowser and compute total:
- mario <- as.integer(readline("Enter votes for Mario:"))
- peach <- as.integer(readline("Enter votes for Peach:"))
- bowser <- as.integer(readline("Enter votes for Bowser:"))
- total <- sum(mario, peach, bowser)
- print(paste("Total votes", total))
- Prefer sum(...) to add any number of candidates.
- Use function composition to convert and assign in one line.
Data Representation: Tables, CSV, Data Frames
- Tabular data: rows (observations) and columns (variables).
- CSV (comma-separated values) is a common file format for tables.
- Read CSV into R:
- read.table("file.csv", sep = ",", header = TRUE) or read.csv("file.csv")
- Data frame: R's tabular data structure (columns can have names and types).
Table: CSV Read Example
| Function | Purpose |
|---|
| read.table(file, sep, header) | Generic file reader; specify separator and header. |
| read.csv(file) | Convenience wrapper for CSV files; assumes sep="," and header=TRUE. |
Inspecting Data Frames And Vectors
- Use ls() to list environment objects; rm(list = ls()) to clear environment.
- View(data_frame) opens a spreadsheet-like tab.
- nrow(df) and ncol(df) return row and column counts.
- Accessing elements:
- Bracket notation df[row, col] (indices start at 1).
- Omit row or column to select whole column or row: df[, 2] gives all rows in column 2.
- Column by name: df$column_name returns a vector of that column.
- df[ , 1] returns vector of first column; df[1] returns a data frame with first column.
- Use colnames(df) and rownames(df) to inspect names.
Table: Data Access Examples
| Syntax | Returns |
|---|
| df[1, 2] | Single value at row 1, column 2 |
| df[, 2] | Vector of all values in column 2 |
| df$poll | Vector of "poll" column by name |
| df[1] | Data frame of first column (not a vector) |
Vectors And Vectorized Operations
- Vector: one-dimensional list of elements all with same storage mode.
- Access vector elements with v[i]; indexing starts at 1.
- Many R functions are vectorized: accept and operate on entire vectors efficiently.
- Vector arithmetic is element-wise:
- Example: votes$poll + votes$mail returns a vector of row-wise sums (per candidate).
- Use sum(vector) to sum all elements of a numeric vector.
Adding Columns To Data Frames And Writing CSVs
- Add new column: df$new_col <- some_vector (length must equal number of rows).
- Write data frame to CSV: write.csv(df, "filename.csv", row.names = FALSE) to avoid default row names column.
- row.names argument controls whether row names are written.
Factors: Categorical Data
- Factor: representation for categorical variables (one-dimensional).
- Create factor: factor(vector, labels = c(...), exclude = c(...)) or as.factor().
- Factor stores levels (unique categories) and can map numeric codes to human-readable labels.
- NA represents missing/unavailable data; NULL and NaN are distinct special values.
- Exclude unwanted categories when creating factors to convert them to NA.
Table: Special Values And Conversions
| Value/Function | Meaning/Use |
|---|
| NA | Missing / not available |
| NaN | Not a number (e.g., invalid numeric result) |
| NULL | Nothing / empty object |
| as.integer(x), as.double(x) | Coerce x to integer or double |
| factor(x, labels, exclude) | Convert x to categorical factor with labels and exclusions |
Working With External Data (Example: FiveThirtyEight)
- Read online CSV by providing URL to read.csv().
- Use codebook to interpret column names and codes.
- Use unique(df$column) to list unique values in a column.
- Use factor() and labels to convert coded numeric responses to meaningful categories.
- Use nrow() and ncol() to explore dataset size.
Common Patterns And Tips
- Use ?function to read documentation (help pages) and learn parameters.
- Prefer meaningful variable names and comments for readability.
- Convert inputs to appropriate types immediately to avoid later errors.
- Favor vectorized operations (sum on a column vector) over manual element-by-element loops for clarity and performance.
- Use functions like unique(), sum(), nrow(), ncol(), View(), colnames(), rownames() to inspect and understand data.
Next Steps / Action Items
- Practice:
- Write simple R scripts using readline, print, paste/paste0, and assignment.
- Read a CSV with read.csv and inspect nrow/ncol, View, and column access patterns.
- Convert coded columns to factors with labels and exclude problematic codes.
- Upcoming topics: data filtering/subsetting, transforming data, handling missing values, and more advanced data manipulation.