Back to notes
What are the advantages of Concise Sampling?
Press to flip
It retains characteristics of the entire stream and allows adjustments based on main memory size.
Provide an example of Biased Reservoir Sampling in practice.
Selecting product ratings from users who tend to give more accurate ratings, based on user history.
What is the main goal of sampling in big data streams?
The main goal is to ensure that the sampled data retains significant characteristics of the entire stream while reducing computational resources and time.
What is the key difference between Fixed Proportion Sampling and Fixed Size Sampling?
Fixed Proportion Sampling samples data based on a fixed percentage of the data stream, while Fixed Size Sampling samples a fixed number of records from the entire data stream.
What is a potential drawback of Biased Reservoir Sampling?
It may introduce significant biases and requires careful adjustment of analysis parameters.
What are the main advantages of Fixed Proportion Sampling?
Fixed Proportion Sampling usually ensures a representative sample and is good when the data size is very large and computational resources are high.
What defines Concise Sampling?
Concise Sampling maintains a small, fixed-size reservoir while achieving a representative sample using unique attributes.
Identify a challenge faced with Concise Sampling.
It is limited by memory size and needs parameter adjustment for the best results.
What is a major disadvantage of Fixed Size Sampling?
It does not guarantee a representative sample and can be biased if the data distribution is not random.
What are the advantages of Biased Reservoir Sampling?
It is suitable when resources are constrained, such as limited memory or computational power.
Give an example of where Fixed Proportion Sampling might be used.
Analyzing user sentiments on social media by sampling 1% of tweets.
What technique selects a subset of data streams based on a non-uniform, predetermined probability distribution?
Biased Reservoir Sampling.
In what scenario might Fixed Size Sampling be particularly useful?
It is useful for reducing data volume and is simpler to implement.
Give a practical example of where Concise Sampling can be applied.
A bank analyzing customer spending habits by selecting distinct customer IDs from transaction streams.
What challenges might arise with Fixed Proportion Sampling?
It can lead to under-representation or over-representation and requires high computational power for large data volumes.
Previous
Next