🧪

Handling CSV Files and Hypothesis Testing

May 21, 2024

Lecture Notes on Handling CSV Files and Hypothesis Testing

Introduction

  • The session plan was adjusted based on feedback from students facing issues with the practical assignment.
  • Problems encountered by students involved opening CSV files, where data appears in one column.
  • Aim: Address handling of CSV files and then resume hypothesis templates.

Handling CSV Files in Excel

Issue: Data in One Column

  • Data in CSV files may appear in one column when opened in Excel.
  • The problem typically arises due to system settings using decimal commas instead of decimal points.
  • Changing system settings to use decimal points solves the issue (instructions on changing this are available).
  • Alternative solution: Use Excel’s Text to Columns function.
    • Highlight the column with data.
    • Go to Data -> Text to Columns.
    • Choose Delimited -> Next -> Choose Comma delimited -> Finish.
    • Data will split into columns correctly.

Data Import Option

  • Alternative: Import data from CSV within Excel.
    • Data -> Get External Data -> From Text.
    • Choose the CSV file -> Import.
    • Follow a similar process to Text to Columns.
    • Data will be imported into Excel columns correctly.

Practical Assignment Overview

  • Four questions; Questions 2, 3, and 4 involve CSV datasets specific to each student.
  • The FIFA dataset is used for reference.
  • Automated calculations in templates are emphasized for accuracy and efficiency.

Hypothesis Testing Templates

Setting up the Template

  • Requirement: Input hypothesized value and population standard deviation.
  • Calculations must be automatic when new data is input.

Calculating P-Values

  • Different formulas for lower tail, upper tail, and two-tailed tests.
    • Lower Tail: NORM.S.DIST(t-statistic, TRUE)
    • Upper Tail: 1 - NORM.S.DIST(t-statistic, TRUE)
    • Two-Tailed: 2 * MIN(NORM.S.DIST(t-statistic, TRUE), 1 - NORM.S.DIST(t-statistic, TRUE))*

Critical Values

  • Calculated using significance levels.
    • Lower Tail: NORM.S.INV(alpha)
    • Upper Tail: NORM.S.INV(1 - alpha)
    • Two-Tailed: NORM.S.INV(1 - alpha/2) and -NORM.S.INV(1 - alpha/2)

Hypothesis Testing with Sigma Unknown

Example Template Use

  • Include sample size (COUNT), sample mean (AVERAGE), and sample standard deviation (STDEV.S).
  • Input hypothesized value and significance level.
  • Calculate the T-statistic:
    • Formula: (Sample Mean - Hypothesized Value) / (Sample Standard Deviation / SQRT(Sample Size))
  • P-Value: Use T.DIST for lower tail, 1 - T.DIST for upper tail, 2 * MIN(...) or T.DIST.2T for two-tailed.
  • Critical Values: Use T.INV and T.INV.2T functions.*

Proportion Hypothesis Testing

Setting up the Template

  • Calculate sample size (use COUNT.A and subtract one for the heading).
  • Count responses (use COUNTIF).
  • Calculate sample proportion (Count of Response / Sample Size).
  • Calculate Standard Error: SQRT((Hypothesized Value * (1 - Hypothesized Value)) / Sample Size).
  • P-Values and Critical Values: Similar to above discussions but using proportions.*

Conclusion

  • Emphasis on understanding the setup of templates for hypothesis testing.
  • Reminder to consult instructions and recordings if issues arise.
  • Next session to continue with templates and address any Q&A.