Intro to Statistics

Jul 11, 2024

Lecture Notes: Intro to Statistics

Lecturer: Monica Wahi

Section 1.1: What is Statistics?

Learning Objectives:

  1. State at least one definition of statistics
  2. Give one example of a population parameter and one example of a sample statistic
  3. Classify a variable into quantitative or qualitative and as nominal ordinal, interval, or ratio

Topics Covered:

  • Definitions of statistics
  • Population parameter and sample statistic
  • Classifying levels of measurement

Definitions of Statistics

  • Statistics: Study of how to collect, organize, analyze, and interpret numerical information/data
  • Statistics vs Math: Statistics involves analyzing and interpreting data, not just numbers.
  • Science of Uncertainty and Technology of Extracting Information: Helps in decision-making, especially in fields like healthcare and public health.

Examples:

  • CDC Example: Evaluating flu viruses to decide next year's vaccine composition.
  • Uncertainty: Even with statistics, there can be incorrect predictions (e.g., flu vaccines).

Concepts in Statistics

  • Individuals vs Variables:
    • Individuals: People or objects included in a study (e.g., patients, hospitals, states)
    • Variables: Characteristics of individuals to be measured/observed (e.g., nosocomial infection rates in hospitals)

Examples in Healthcare:

  • Individuals: Hospitals, states, programs
  • Variables: Infection rates, mortality rates

Importance: Statistics is crucial in healthcare for decision-making and understanding processes.

Population Parameter vs Sample Statistic

  • Population Parameter: Entire group with a common theme (e.g., nurses at a hospital)
  • Sample: A small portion of the population (representative or biased)
    • Example: Surveying nurses in ICU only vs across departments
  • Census: Data from every individual in a population
  • Sample Data: Data from some individuals

Examples in Healthcare:

  • Medicare: Public insurance for elders, almost everyone aged 65+ is covered.
  • Population Data: Medicare claims data
  • Sample Data: Surveys like Medicare beneficiary survey, American Community Survey

Statistical Notation

  • Population: Capital N
  • Sample: Lowercase n
  • Parameter vs Statistic:
    • Parameter: Measures describing the entire population (e.g., mean age of all Americans on Medicare)
    • Statistic: Measures describing a sample (e.g., mean age from a Medicare beneficiary survey)
  • Descriptive vs Inferential Statistics:
    • Descriptive: Organizing, picturing, and summarizing information
    • Inferential: Using sample information to draw conclusions about the population

Classifying Variables

  • Quantitative (numerical) vs Qualitative (categorical)
    • Quantitative (Continuous): Numerical measurements (e.g., blood pressure, platelet count)
    • Qualitative (Categorical): Characteristics (e.g., health insurance type, country of origin)
  • Levels of Measurement:
    • Quantitative Variables: Further classified as interval (no true zero) or ratio (true zero)
    • Qualitative Variables: Further classified as nominal (no order) or ordinal (natural order)

Section 1.2: Sampling

Learning Objectives:

  1. Define sampling frame and sampling error
  2. Give examples of simple random sampling and systematic sampling
  3. Explain stratified sampling, compare cluster vs convenience sampling
  4. Example of multistage sampling

Topics Covered:

  • Definitions: Sample, sampling frames, errors
  • Types of Sampling: Simple random, systematic, stratified, cluster, convenience, multistage

Importance of Sampling

  • Purpose: To infer from the sample to the population
  • Resources: Saves time, effort, and resources compared to collecting data from the entire population

Sampling Frame:

  • Definition: List of individuals from which a sample is selected
  • Examples: List of nursing students, HR list of employees
  • Undercoverage: Missing members from the sampling frame

Types of Errors

  • Sampling Error: Natural difference between population mean and sample mean
  • Non-Sampling Error: Mistakes such as a bad list, sloppy data collection

Simulations

  • Definition: Numerical representation of a real-world phenomenon, used to observe possible outcomes

Types of Sampling Methods

Simple Random Sampling:

  • Definition: Subset of the population where every sample of size n has an equal chance of being selected
  • Methods:
    • Using a hat: IDs on slips of paper
    • Random number generation: Assign numbers
  • Limits: Requires a good list, not feasible for unknown real-time individuals (e.g., ER patients)
  • Examples: List of hospitals, students, employees

Stratified Sampling

  • Definition: Dividing the population into subgroups (strata) and sampling each subgroup
  • Examples: Grade-based stratification for students, department-based sampling in hospitals
  • Limitations: Oversampling, requires a good list, effort to divide into strata

Systematic Sampling

  • Steps: Arrange in order, pick a random start, take every kth individual
  • Flexibility: Can be done with or without a list
  • Limitations: Potential periodicity issues

Cluster Sampling

  • Use Case: When geography or fixed clusters matter
  • Steps: Divide map into clusters, randomly select clusters, measure everyone in those clusters

Convenience Sampling

  • Definition: Using readily available data
  • Use Cases: Low-risk questions, low-resource scenarios
  • Limitations: Bias due to non-representative samples

Multistage Sampling

  • Definition: Combining multiple sampling methods in stages
  • Examples: National surveys like NHANES (sample counties, then segments, then households, then individuals)

Section 1.3: Introduction to Experimental Design

Learning Objectives:

  1. State steps for conducting a statistical study
  2. Avoid bias in surveys
  3. Understand randomization and blinding

Steps in Conducting a Study:

  1. State hypothesis
  2. Identify individuals of interest
  3. Specify variables to measure
  4. Determine population or sample, and sampling method
  5. Consider ethical concerns
  6. Collect data
  7. Use statistics to answer hypothesis
  8. Report findings, note concerns, recommend future studies

Types of Studies:

  • Experiment: Treatment/intervention is assigned
  • Observational: No treatment, just observation
  • Replication: Studies must be rigorous to allow replication for scientific progress

Avoiding Bias:

  • Survey Design: Minimize bias, non-response, voluntary response
  • Question Wording: Avoid ambiguity, leading questions
  • Order of Questions: May influence responses
  • Interviewer Influence: Nonverbal cues can affect answers

Randomization:

  • Purpose: Prevent bias in selecting treatment groups
  • Steps: Recruit sample, measure confounders/outcomes, randomly assign to groups
  • Blinding: Deliberately not telling participants/study staff the treatment assignment to prevent bias

Key Concepts in Statistics

  • Inferential Statistics: Using sample data to infer characteristics about a population
  • Measures of Central Tendency: Mode, median, mean
  • Measures of Variation: Range, variance, standard deviation
  • Probability: Likelihood of an event occurring, expressed as a percentage or decimal
  • Sampling Methods: Various techniques to obtain a representative sample from a population

Final Notes

  • Importance of Statistics: Critical for making informed decisions in various fields, especially healthcare
  • Continuous Learning: Always stay updated with new methods and findings in the field of statistics