Understanding R Programming and Statistics

Oct 2, 2024

Lecture Notes

Announcements

  • Discord Server: An undergraduate learning assistant has set up a Discord server for student use. The lecturer will not use this resource, but students are encouraged to engage.
  • Course Project:
    • Project Name: Prison Plot
    • Tasks: Recreate a plot with R code
    • Assumptions: Completion assumes reading of three specific book chapters
    • Due Date: September 27th

R Programming: Vectors and Indexing

Vectors

  • Types: Numeric, character strings, logical (true/false)
  • Operations: Creation and manipulation using functions

Indexing

  • Access Specific Elements: Use square brackets []
    • Example: randnums[5] gives the fifth element
  • Access Range of Elements: Use sequence notation
    • Example: randnums[1:3]
  • Functions:
    • mean(): Calculate average, e.g., mean(randnums[1:3])
    • length(): Find the number of elements

Accessing Elements

  • Last Element: Use length function
    • Example: randnums[length(randnums)]
  • Last N Elements: Use sequence and length function
    • Example: randnums[(length(randnums)-4):length(randnums)]

Good vs Bad Practice

  • Hardcoding indices is bad practice; using functions like length() is better.

Help Documentation in R

  • Access using ?function_name
  • Provides detailed information on function arguments
  • Example: Understanding mean() function and na.rm to handle missing values

R Basics: Data Frames

Data Frame Creation

  • Definition: Equivalent to a spreadsheet in R
  • Syntax: data.frame(variable1, variable2, ...)
  • Requirements: Equal number of entries in each column

Manipulation

  • Single Column Access: dataframe$column_name
  • Mean Calculation with NA Handling: mean(dataframe$column, na.rm=TRUE)
  • Add New Columns: dataframe$new_column = operation

Subsetting Data

  • By Rows: dataframe[row_indices, ]
  • By Columns: dataframe[, column_indices]
  • By Specific Rows and Columns: dataframe[row_indices, column_indices]

Basic Statistical Concepts

Population and Sample

  • Population: Entire collection of observations
  • Sample: Subset of the population
  • Analogy: Pot of soup (population) and a sip (sample)

Parameters and Statistics

  • Parameter: Measurement of a population, denoted by Greek letters (e.g., μ)
  • Statistic: Measurement of a sample, denoted by Latin letters (e.g., x̄)

Descriptive vs Inferential Statistics

  • Descriptive Statistics: Summarize data (e.g., mean, median)
  • Inferential Statistics: Make inferences about populations based on samples

Random Sampling

  • Method ensuring every population element has an equal chance of selection
  • Important for reducing bias; ideal in survey research

Realistic Research Constraints

  • True random sampling is often impractical due to ethical and practical issues
  • Non-random samples, like convenient samples, can introduce biases

Conclusion

  • Basic R concepts covered are foundational for future topics
  • Importance of reading course material for deeper understanding
  • Encouragement for questions and engagement in upcoming sessions