Coconote
AI notes
AI voice & video notes
Export note
Try for free
Understanding Big Data and Statistical Inference
Oct 2, 2024
Lecture 4: Big Data, Statistical Inference, and Practical Significance
Sampling Error
Definition
: Deviation of the sample from the population due to random sampling.
Independent random samples are generally representative of the population.
Unavoidable
: Every random sample will have some sampling error.
Non-Sampling Error
Deviations not due to random sampling.
Coverage Error
: Data collected don’t align with research objectives.
Non-Response Error
: Systematic under-representation or over-representation in samples.
Minimizing Non-Sampling Error
:
Define target population carefully.
Design data collection process meticulously.
Pre-test data collection methods.
Sampling Techniques
Stratified Random Sampling
: Use when qualitative population-level information is available.
Cluster Sampling
: Use for heterogeneous subgroups.
Systematic Sampling
: Use when quantitative population-level information is available.
Big Data
Definition
: Large or complex data sets beyond current processing capacity.
Sources
: Sensors, mobile devices, internet activities, digital processes, social media.
Size Terminology
Units
: Kilobyte, Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte, Yottabyte.
Attributes of Big Data
Volume
: Amount of data.
Variety
: Diversity in types and structures.
Veracity
: Reliability.
Velocity
: Speed of data generation.
Types of Big Data
Tall Data
: Many observations.
Wide Data
: Many variables.
Standard Error and Confidence Intervals
Standard Error
: Decreases as sample size increases.
Confidence Intervals
:
Narrower intervals with larger samples.
Less meaningful if intervals shrink too much.
Margin of Error
: Part of confidence interval, diminishes with large samples.
Implications for Confidence Intervals
Sample means may differ due to sampling error, non-sampling error, or changes in population mean.
Business Implications
: Small differences can have significant effects.
Hypothesis Testing
Very Large Samples
: Almost any difference may lead to rejection of the null hypothesis.
P-Value
: Decreases with larger sample sizes.
Non-Sampling Errors
: Increase risk of Type 1 or Type 2 errors.
Practical vs. Statistical Significance
Business decisions should consider both.
Next Steps
Future lecture on using R for calculations and computations.
📄
Full transcript