Lecture: Data Handling and T-Tests
Introduction
- Recap of previous session: converting data from wide to long format.
- Discussion on data organization: importance of having data in tidy (long) format for most analyses, except paired t-tests.
Data Preparation
Converting Data
- Converted wide format data (candle data) into long format.
- Dependent Variable: Words Per Minute (in single column).
- Independent Variable: Room Brightness (separate column).
- Tidy data facilitates easier statistical analysis.
Creating Data Frame for Plotting
- Created
plot_data to contain statistics necessary for plotting.
- Used
summarize function to calculate statistics like mean.
- Grouped data by
room column for separate statistics for bright, dim, and difference groups.
Calculating Statistics
- Calculated mean, sample size (n), standard deviation (SD) for each group.
- Organized code for clarity and error prevention.
Creating a Bar Plot
- Used
ggplot to create a bar plot with plot_data.
- Specified x-axis (
room) and y-axis (mean) using geom_bar.
- Adjusted plot aesthetics (colors, alpha).
- Ordered bars correctly after factoring the room column.
Error Bars and Confidence Intervals
- Calculated confidence intervals using one sample t-test method.
- Added CI as error bars to the plot using
geom_errorbar.
- Highlighted the significance of CIs not crossing zero.
Transition to Independent T-Tests
- Explanation of moving from dependent to independent group comparisons.
Independent T-Test: New Data
Dataset: Swimming
- Data included swimmer classification (optimist/pessimist) and performance ratio.
- Removed unnecessary columns (e.g., sex, event type).
- Discussed nature of the study (observational, not an experiment).
Statistical Approach
- Differentiation between predictor and response variables.
Analyzing Data
- Calculated summary statistics for optimists and pessimists.
- Planned next steps for independent t-test analysis.
Key Concepts
- Importance of tidy data format for analysis.
- Usage of summary statistics for plotting and analysis.
- Understanding and visualizing statistical significance with confidence intervals.
- Distinction between observational studies and experiments.
- Role of statistics in determining sampling error, not causation.
Next Steps
- Further exploration of independent t-tests and analysis of new data.
Additional Resources and Support
- Extra office hours for project assistance.
This lecture focused on data handling techniques in R, particularly transforming data for analysis, and basic concepts around paired and independent t-tests using the tidyverse package. The practical application involved creating plots and understanding statistical significance through confidence intervals.