📊

Data Handling and T-Tests Overview

Nov 1, 2024

Lecture: Data Handling and T-Tests

Introduction

  • Recap of previous session: converting data from wide to long format.
  • Discussion on data organization: importance of having data in tidy (long) format for most analyses, except paired t-tests.

Data Preparation

Converting Data

  • Converted wide format data (candle data) into long format.
  • Dependent Variable: Words Per Minute (in single column).
  • Independent Variable: Room Brightness (separate column).
  • Tidy data facilitates easier statistical analysis.

Creating Data Frame for Plotting

  • Created plot_data to contain statistics necessary for plotting.
  • Used summarize function to calculate statistics like mean.
  • Grouped data by room column for separate statistics for bright, dim, and difference groups.

Calculating Statistics

  • Calculated mean, sample size (n), standard deviation (SD) for each group.
  • Organized code for clarity and error prevention.

Creating a Bar Plot

  • Used ggplot to create a bar plot with plot_data.
  • Specified x-axis (room) and y-axis (mean) using geom_bar.
  • Adjusted plot aesthetics (colors, alpha).
  • Ordered bars correctly after factoring the room column.

Error Bars and Confidence Intervals

  • Calculated confidence intervals using one sample t-test method.
  • Added CI as error bars to the plot using geom_errorbar.
  • Highlighted the significance of CIs not crossing zero.

Transition to Independent T-Tests

  • Explanation of moving from dependent to independent group comparisons.

Independent T-Test: New Data

Dataset: Swimming

  • Data included swimmer classification (optimist/pessimist) and performance ratio.
  • Removed unnecessary columns (e.g., sex, event type).
  • Discussed nature of the study (observational, not an experiment).

Statistical Approach

  • Differentiation between predictor and response variables.

Analyzing Data

  • Calculated summary statistics for optimists and pessimists.
  • Planned next steps for independent t-test analysis.

Key Concepts

  • Importance of tidy data format for analysis.
  • Usage of summary statistics for plotting and analysis.
  • Understanding and visualizing statistical significance with confidence intervals.
  • Distinction between observational studies and experiments.
  • Role of statistics in determining sampling error, not causation.

Next Steps

  • Further exploration of independent t-tests and analysis of new data.

Additional Resources and Support

  • Extra office hours for project assistance.

This lecture focused on data handling techniques in R, particularly transforming data for analysis, and basic concepts around paired and independent t-tests using the tidyverse package. The practical application involved creating plots and understanding statistical significance through confidence intervals.