Lecture Notes: Intro to Statistics
Lecturer: Monica Wahi
Section 1.1: What is Statistics?
Learning Objectives:
- State at least one definition of statistics
- Give one example of a population parameter and one example of a sample statistic
- Classify a variable into quantitative or qualitative and as nominal ordinal, interval, or ratio
Topics Covered:
- Definitions of statistics
- Population parameter and sample statistic
- Classifying levels of measurement
Definitions of Statistics
- Statistics: Study of how to collect, organize, analyze, and interpret numerical information/data
- Statistics vs Math: Statistics involves analyzing and interpreting data, not just numbers.
- Science of Uncertainty and Technology of Extracting Information: Helps in decision-making, especially in fields like healthcare and public health.
Examples:
- CDC Example: Evaluating flu viruses to decide next year's vaccine composition.
- Uncertainty: Even with statistics, there can be incorrect predictions (e.g., flu vaccines).
Concepts in Statistics
- Individuals vs Variables:
- Individuals: People or objects included in a study (e.g., patients, hospitals, states)
- Variables: Characteristics of individuals to be measured/observed (e.g., nosocomial infection rates in hospitals)
Examples in Healthcare:
- Individuals: Hospitals, states, programs
- Variables: Infection rates, mortality rates
Importance: Statistics is crucial in healthcare for decision-making and understanding processes.
Population Parameter vs Sample Statistic
- Population Parameter: Entire group with a common theme (e.g., nurses at a hospital)
- Sample: A small portion of the population (representative or biased)
- Example: Surveying nurses in ICU only vs across departments
- Census: Data from every individual in a population
- Sample Data: Data from some individuals
Examples in Healthcare:
- Medicare: Public insurance for elders, almost everyone aged 65+ is covered.
- Population Data: Medicare claims data
- Sample Data: Surveys like Medicare beneficiary survey, American Community Survey
Statistical Notation
- Population: Capital N
- Sample: Lowercase n
- Parameter vs Statistic:
- Parameter: Measures describing the entire population (e.g., mean age of all Americans on Medicare)
- Statistic: Measures describing a sample (e.g., mean age from a Medicare beneficiary survey)
- Descriptive vs Inferential Statistics:
- Descriptive: Organizing, picturing, and summarizing information
- Inferential: Using sample information to draw conclusions about the population
Classifying Variables
- Quantitative (numerical) vs Qualitative (categorical)
- Quantitative (Continuous): Numerical measurements (e.g., blood pressure, platelet count)
- Qualitative (Categorical): Characteristics (e.g., health insurance type, country of origin)
- Levels of Measurement:
- Quantitative Variables: Further classified as interval (no true zero) or ratio (true zero)
- Qualitative Variables: Further classified as nominal (no order) or ordinal (natural order)
Section 1.2: Sampling
Learning Objectives:
- Define sampling frame and sampling error
- Give examples of simple random sampling and systematic sampling
- Explain stratified sampling, compare cluster vs convenience sampling
- Example of multistage sampling
Topics Covered:
- Definitions: Sample, sampling frames, errors
- Types of Sampling: Simple random, systematic, stratified, cluster, convenience, multistage
Importance of Sampling
- Purpose: To infer from the sample to the population
- Resources: Saves time, effort, and resources compared to collecting data from the entire population
Sampling Frame:
- Definition: List of individuals from which a sample is selected
- Examples: List of nursing students, HR list of employees
- Undercoverage: Missing members from the sampling frame
Types of Errors
- Sampling Error: Natural difference between population mean and sample mean
- Non-Sampling Error: Mistakes such as a bad list, sloppy data collection
Simulations
- Definition: Numerical representation of a real-world phenomenon, used to observe possible outcomes
Types of Sampling Methods
Simple Random Sampling:
- Definition: Subset of the population where every sample of size n has an equal chance of being selected
- Methods:
- Using a hat: IDs on slips of paper
- Random number generation: Assign numbers
- Limits: Requires a good list, not feasible for unknown real-time individuals (e.g., ER patients)
- Examples: List of hospitals, students, employees
Stratified Sampling
- Definition: Dividing the population into subgroups (strata) and sampling each subgroup
- Examples: Grade-based stratification for students, department-based sampling in hospitals
- Limitations: Oversampling, requires a good list, effort to divide into strata
Systematic Sampling
- Steps: Arrange in order, pick a random start, take every kth individual
- Flexibility: Can be done with or without a list
- Limitations: Potential periodicity issues
Cluster Sampling
- Use Case: When geography or fixed clusters matter
- Steps: Divide map into clusters, randomly select clusters, measure everyone in those clusters
Convenience Sampling
- Definition: Using readily available data
- Use Cases: Low-risk questions, low-resource scenarios
- Limitations: Bias due to non-representative samples
Multistage Sampling
- Definition: Combining multiple sampling methods in stages
- Examples: National surveys like NHANES (sample counties, then segments, then households, then individuals)
Section 1.3: Introduction to Experimental Design
Learning Objectives:
- State steps for conducting a statistical study
- Avoid bias in surveys
- Understand randomization and blinding
Steps in Conducting a Study:
- State hypothesis
- Identify individuals of interest
- Specify variables to measure
- Determine population or sample, and sampling method
- Consider ethical concerns
- Collect data
- Use statistics to answer hypothesis
- Report findings, note concerns, recommend future studies
Types of Studies:
- Experiment: Treatment/intervention is assigned
- Observational: No treatment, just observation
- Replication: Studies must be rigorous to allow replication for scientific progress
Avoiding Bias:
- Survey Design: Minimize bias, non-response, voluntary response
- Question Wording: Avoid ambiguity, leading questions
- Order of Questions: May influence responses
- Interviewer Influence: Nonverbal cues can affect answers
Randomization:
- Purpose: Prevent bias in selecting treatment groups
- Steps: Recruit sample, measure confounders/outcomes, randomly assign to groups
- Blinding: Deliberately not telling participants/study staff the treatment assignment to prevent bias
Key Concepts in Statistics
- Inferential Statistics: Using sample data to infer characteristics about a population
- Measures of Central Tendency: Mode, median, mean
- Measures of Variation: Range, variance, standard deviation
- Probability: Likelihood of an event occurring, expressed as a percentage or decimal
- Sampling Methods: Various techniques to obtain a representative sample from a population
Final Notes
- Importance of Statistics: Critical for making informed decisions in various fields, especially healthcare
- Continuous Learning: Always stay updated with new methods and findings in the field of statistics