Introduction to Statistics

Jul 9, 2024

Introduction to Statistics

image

Definition of Statistics

  • Mathematical science related to:
    • Collection, presentation, analysis, and interpretation of data.
    • Utilized to solve real-world complex problems and make decisions.
    • Involves statistical principles, functions, and algorithms.

Types of Analysis

Statistical Analysis (Quantitative Analysis)

  • Involves collecting, exploring, and presenting large data sets to identify patterns/trends.

Non-Statistical Analysis (Qualitative Analysis)

  • Provides generic information (text, sound, images).
  • Both analyses offer results, but statistical analysis offers more insights, hence vital for businesses.

Categories of Statistics

Descriptive Statistics

  • Organizes data and summarizes main characteristics numerically/graphically.
  • Example: Heights of students in a classroom (max, min, avg).

Inferential Statistics

  • Generalizes larger data sets using probability theory to draw conclusions.
  • Example: Heights of students categorized (tall, medium, small) and small samples studied.

Key Statistical Terms

  • Population: Entire group data is collected from.
  • Sample: Subset of a population.
  • Variable: Characteristic of population members which varies.
    • Quantitative Variable: Differ in quantity (e.g., weight).
    • Qualitative Variable: Differ in quality (e.g., color).
    • Discrete Variable: No value between two values (e.g., number of children).
    • Continuous Variable: Any value between two values (e.g., race time).

Statistical Measures

Frequency

  • Number of times a data value occurs.

Central Tendency

  • Tendency of data to accumulate in the middle.
  • Measures: Mean, Median, Mode.

Spread (Dispersion)

  • How varied the data values are.
  • Measures: Standard Deviation, Variance, Quartiles.

Position

  • Exact location of a data value.
  • Measures: Percentiles, Quartiles, Standard Scores.

Statistical Analysis System (SAS)

Procedures for Descriptive Statistics

  • PROC PRINT: Prints all variables.
  • PROC CONTENTS: Describes dataset structure.
  • PROC MEANS: Computes descriptive statistics.
  • PROC FREQ: Produces frequency tables.
  • PROC UNIVARIATE: Conducts basic statistical analyses.
  • PROC GCHART: Produces various charts.
  • PROC BOXPLOT: Creates box-and-whisker plots.
  • PROC GPLOT: Creates 2D graphs.

Inferential Statistics

Hypothesis Testing

  • Technique to determine evidence sufficiency to infer population conditions.
  • Example: Wages comparison between men and women.
  • Hypotheses:
    • Null Hypothesis (H0): Assumed true unless proven otherwise.
    • Alternative Hypothesis (H1): Assumed true if H0 is false.

Variables

  • Categorical (Nominal): Multiple categories, unordered (e.g., gender).
  • Ordinal: Ordered but unclear relative distances (e.g., size).
  • Interval: Ordered, meaningful differences, no true zero (e.g., temperature).
  • Ratio: Ordered, meaningful differences, true zero (e.g., height).

Hypothesis Testing with SAS

  • Example demonstrates hypothesis testing with T-test.
  • Hypothesis: Mean delivery days = 6.
  • Alpha value: Probability of error = 0.05.

Hypothesis Testing Procedures

Parametric Tests

  • Depend on specified probability distributions.
    • Tests include: T-test, Anova, Chi-square, Linear Regression.

Non-Parametric Tests

  • Do not depend on parameter knowledge.
    • Tests include: Wilcoxon rank sum test, Kruskal-Wallis test.

Advantages and Disadvantages

Parametric Tests

Advantages

  • Models population parameters.
  • Easier modeling and data analysis.

Disadvantages

  • Assumes normal data distribution.

Non-Parametric Tests

Advantages

  • Fewer assumptions.
  • Works with any data.

Disadvantages

  • Less efficient, harder to apply to large samples.

End of Lecture