Overview
This lecture covers analyzing the effects of different poisons on mice survival times, using nominal independent variables in statistical models, interpreting results, and visualizing data with box plots.
Analyzing the Poison Mice Dataset
- The independent variable "poison" is nominal/ordinal, not quantitative, but can be used in linear models.
- Dependent variable is "time" (survival time of mice).
- In R, using the lm() function compares each poison level (second, third) to the baseline (first), which is chosen alphabetically.
- Only first vs. third shows a significant difference; first vs. second is not statistically significant.
Statistical Testing & Interpretation
- Pairwise t-tests can be run to compare each pair of poison levels.
- The Bonferroni correction is a conservative adjustment to reduce false positives in multiple comparisons.
- P-values under 0.05 indicate significant differences between groups.
- Results: first vs. second not significant (p โ 0.98), first vs. third and second vs. third significant (p โ 0.001).
Data Visualization with Box Plots
- Box plots visually compare survival times across poison levels (first, second, third).
- Outliers appear as dots outside the box area.
- The difference in medians and spread illustrates which groups differ significantly.
- Color differentiation (e.g., purple, green, pink) can be used for clarity.
Research and Real-World Application
- Understanding which doses/levels are significantly more lethal is critical for fields like national security.
- Statistically significant differences inform assessments of chemical or biological threats.
Key Terms & Definitions
- Nominal Variable โ Categorical variable without numerical order (e.g., type of poison).
- Ordinal Variable โ Categorical variable with a meaningful order (e.g., first, second, third dose).
- Linear Model (lm) โ Statistical model used to predict a dependent variable based on one or more independent variables.
- Baseline โ Reference group in analysis (alphabetically first by default in R).
- Pairwise t-test โ Statistical test comparing means of two groups at a time.
- Bonferroni Correction โ Method to adjust p-values for multiple comparisons.
- P-value โ Probability that observed data occurred by chance; <0.05 is considered significant.
- Box Plot โ A graphical representation of data distribution showing median, quartiles, and outliers.
Action Items / Next Steps
- Review the box plot code and experiment with color and fill options in ggplot.
- Practice running pairwise t-tests with and without Bonferroni correction on sample data.
- Prepare to interpret statistical outputs (p-values, significance) in future assignments.