Alright, good day everyone. So this will be our lecture 1 or first topic with our introduction to statistics and data analysis which includes the topic of obtaining data. So for the outline of discussion with this topic, we have the introduction to statistics and data analysis. We will be talking about how statistics and data analysis are related. We're also going to talk about summary statistics.
Okay, so, it's a sort of review of your statistics back when you were in high school. And then, graphical summaries. So, let's start with the introduction to statistics and data analysis. So, as future civil engineers, okay, or as civil engineers or engineers in particular, we have this what we call challenges that we face in relation to is statistics and data analysis. What are those?
First, with the advancement in sciences and engineering occurring in large part through the collection analysis of data, proper analysis of data can be challenging because scientific data are subject to random variation. So when you say random variation, that means the data that we are collecting are somehow different from the others. That is random variation.
So random na, That means you cannot control it. There's no bias. And there's variety pa. Pwede magkakaiba-iba. So that's one of the challenges that we face in dealing with scientific data.
Two, how can one draw conclusion from the results of an experiment when those results could have come out differently? So most of the time, we cannot control the outcome. Hindi naman natin talaga kinokontrol yun. So if the outcome comes... comes out differently, how are we going to deal with it?
So, dun papasok yung ating statistical treatment. And third, the method of statistics allows scientists and engineers to design valid experiments. Take note of the word valid. That means, tinatanggap siya or correct to its, when it comes to also to other studies. And to draw reliable.
That means this conclusion could be used to other studies from the data that they produce. So from these three challenges, statistics and data analysis, doon napagpasok yung ating paggamit ng what we call engineering method and statistical analysis or thinking. Dahil, From these three challenges, na medyo hindi ating kaya, first, what is the first challenge? Meron tayo tayo ng random variation.
Nagkakaroon ng pagkakaiba ba? Random, hindi natin control. Variation, pagkakaiba iba.
So, it will result or yield to another challenge. What was that? The result that will come out differently. So, from there, saka papasok nga yung ating mga method of statistics. So, now we're going to talk about engineering method.
and statistical thinking. So, here's a diagram that I could show you when it comes to statistical method or sorry, statistical thinking and engineering method. So, in your statistics back when you were in high school, you were asked to do chapter 1 to 5 of your research or study. Chapter 1 will be the introduction. Chapter 2 is the review of related literature.
or literature studies. So, chapter 3 is the methodology. Chapter 4 is the analysis and presentation of data.
And then 5, conclusions and recommendations. Actually, those chapters came from these what we call methods na pag-uusapan natin. So, here's an example of a flow chart wherein yung mga chapters na yun na inyong ginawa back when you were in high school ay makikita natin dito.
First, we have what we call develop a clear description of the problem. So, this will fall into chapter 1 of your study before that is the statement of the problem, objectives, with this hypothesis. So, from there, makikita natin, what are those problems?
So, last time, we've talked about social or societal issues na kinakaharap natin ngayon. And then, From there, we could have what we call the statement of the problem. So, yun talaga yung una. And then, from this point, we're going to talk about the important. So, flowchart ito.
So, next step is identify the important factors. What are those important factors? Maybe if the problem is about environmental, let's say, the issue between the trees, a cut down so that there's a building that could be used as a... For example, residential, commercial.
So, what are the important factors that you need to consider? So, for example, is there a need talaga to have the building? Okay.
Or if there's really a need, what are the other factors to consider? For example, the trees that you're going to cut down are very important with the environment because of the lesson of the effect of the greenhouse. Yung pagkulong ng init sa atmosphere ng Earth o sa loob ng Earth.
So, sabihin, hindi masyadong iniinip. So, those are the balance may mga factors ngayon. Then, from those factors, you will propose a refined model. So, when you say model, it would be quantitative, qualitative. So, ito na yun.
Maka-propose ka pa lang. You will state, what are the possible model that could be used so that you could at least have what we call the impact of cutting down trees, for example, in building a structural element, vertical structure, for example. And then from there, you will collect data. So, dito na mapasok yung obtaining data natin. So, in collecting data, there are three studies or ways to collect data.
First, we have what we call the retrospective study. When we say retrospective study, these are the data. na na-collect natin from past studies or historical data. So, for example, meron naki mga related studies on the chapter 2 of your study, halimbawa.
So, that means, meron naki makikita na variation, just like we talked about a while ago, that one of the challenges is what we call the random variation of the data. Random kasi hindi mo control, variation, nag-iiba-iba pa. So, from there, You will have a heads up na okay, in this line of study, this will be the possible variation of data na pwede kong makuha. Second, we have what we call the observational study.
Observational study, it's an observation. Ibig sabihin, ginagamitan natin ng mga senses. Pwede yung touch, smell, hear, see, yan. So tinitingnan natin at the present kung ano yung mga data na pwede makuha. So pwede.
ocular, nakikita natin, qualitative at some point. Pwede rin quantitative, going to count. And then, we have what we call the design experiment. Ito na yung mga test na ginagawa natin.
For example, let's go back to our possible topic. Let's say, the impact of cutting down trees. So, what are the design experiments? So, we're going to test the soil. If the soil is somehow good for stability, syempre kung yan ay may tanan naman ng puno, may moist, baka hindi stable yung soil.
Or we could also test the content or the mineral content below the soil, yung surface, below the surface, baka may hindi na pala kayo, matay doon ng building, matumba lang. Something like that. Yun yung mga design experiments.
We're going to test the concrete, other materials, and so on. And then from there, bumabansin nyo, bumabalik siya, no? Babalik siya dito ulit sa, I don't define important factors because if nagkakuha ka na ng test, no?
For example, nakakuha ka na ng test, nakita mo na, then makikita mo if possible ba yung ginawa mong refined model. Now, if not, or if the model is not suitable, then magpapalit ka. Okay? And lalo na, if there are additional factors to consider. Okay?
And then, from there, saka napapasok yung ating manipulation ng data. Okay? When you say manipulation ng data or manipulate the model, ito na yung mga statistical treatment natin.
Okay? And then, from there, i-confirm natin yung solution from the standards, if there are standards or other studies that have been used, and then, we're going to draw conclusions from those confirmation of the solution and data. So, it states here that many of the engineering sciences are employed in the engineering problem-solving method.
What are those? For example, mechanical sciences, statics and dynamics. We have the fluid science or the fluid mechanics, thermal sciences, thermodynamics, heat transfer, electrical sciences, material sciences which is evident here in civil engineering, and chemicals. Science. So, those are the problems na pwede ating ma-incorporate, no, in applying this engineering method and statistical thinking.
Okay? So, next we have what we call the basic idea. Okay? What's the basic idea?
Basic idea is behind all statistical methods of data analysis is to what make inference. Now, when you say inference, okay, to predict, no, prediction. Parang you're going to make a prediction of something with the data.
Inferences about a population By studying a relatively small Sample from it So, yun yung pinag-uusapan natin Sa inference Population Is a Sum or cumulative Nung buo kabuan And then sample is a subset Of your population So, for example Consider a machine that makes steel balls for ball bearing used in clutch systems. The specification for the diameter of the balls is 0.65 plus minus 0.03 centimeters. During the last hour, the machine has made 2,000 balls. The QE wants to know how many of these balls meet the specification.
He does not have the time to measure all the 2,000 balls so he draws random sample of 80 balls. 70% of which, 90% meet the specification. So, the question is, how can he be sure that the 90% of the whole population meet the specification? So, meron tayong binagit sa atin.
So, we have this population. We have a population na 2,000 balls. So, the specification that we need to satisfy is 0.65 plus minus 0.03.
centimeters. Or, in this case, this is 0.06, sorry, 0.62 to 0.68 centimeters. Or what we call the range.
Yung range ng ating value or ng diameter ng ating mga balls. But, the QE or the quality engineer, ito, doesn't have time to check all the 2,000 balls. So, ang ginawa niya, kumuha siya ng 80 balls sa 2,000. 80 out of 2,000. At lumabas nga daw doon na 72 ay nakita na meet yung specification na 0.62 to 0.68 cm diameter which is pasok.
So, paano niya na sure? So, ang ginawa niya ay random linya kinuha. Nakalagay dito is he draws a random sample. So, pinatapos niya muna yung production ng 2,000 balls, and then sa kanya kinuha, no?
Randomly, without bias. Okay? So, he could say na, 90% doon sa 2,000, that is ilan?
1,800 out of 2,000 balls ay pumasok doon sa specification. This is 90% of the production ng ating mga balls. Okay?
So, yun. No, yun yung, ang ginawa niya doon is what we call inference. Okay?
Nag-predict siya, no, by using what? The random sample na 80 volts, okay? So, now, let's move with the fields of statistics, no?
There are two fields of statistics that we're going to talk about the whole semester, no, in general. First, we have what we call the inferential statistics or ito na mag-git natin kanina, yung may term na inference, no? It is the process of using data analysis to make predictions or inference, no? Yun yung binanggit ko. inference or prediction from that data.
And then, we have also what we call the descriptive statistics used to describe basic features in the study in the form of charts, graph, and etc. So, ito yung mga ginagawa natin ng sabay. From your high school, maybe, you just used descriptive statistics. Pwede rin kumamit kayo ng inferential dahil you took samples.
Pero, syempre, may konting bias dahil... Controlled ninyo yung gusto ninyong data. Pero, in general, dapat hindi controlled yung ating mga data. Kaya nga tayo mag-remake ng predictions. Controlled in a sense na may bias or you eliminate the possible variations.
Pero pag inferential, hindi ating ginagawa yun. Pag descriptive, gumagamit tayo ng mga charts, graphs to present what is the effect ng ating mga data. Yung ating... independent and dependent variables.
Okay? So, those are the two fields of statistics. Now, let's go na dun sa ating main topic or what we call the obtaining data or collecting engineering data.
Okay? In collecting engineering data, we have what we call the population and sample. Population is the entire collection of objects or outcomes about which information is sought. So, yung kabuuan.
So, in statistics, population here, no? is capital letter N. Yan.
This is the symbol for population. Yan. And then, sample is a subset of population. So, in statistics, yun na igamit natin yung symbol na letter N kasi nga subset siya.
Okay? Containing the objects or outcomes that are actually observed. So, kinukuha mo lang because in statistics, we cannot really get the data of the entire population.
Masyadong mahirap. sa kain ng madaming oras. That's why, we are using sample a subset of population to have what we call yung observation ng ating actual outcome.
So, in terms of sample, let's talk about how we're going to sample from our entire population. So, may tinatawag tayo ng mga sampling techniques. First, we have what we call the simple random sample.
Simple random sample Of a size N is a sample chosen by a method in which each collection of N population items is equally likely to comprise the sample, just as in a lottery. So, for example, may 10,000 tickets na may 5 winners. So, ano daw ang fairest way to choose the winners?
Of course, pwede gumamit siya ng drawl hats or what we call the fishbowl technique. Parang may tombiolo. And then, you're going to draw a 5. 5 tickets from those 10,000 for the winners. Another example here. Utility company wants to conduct a survey to measure the satisfaction level of its customers in a certain town.
There are 10,000 customers in the town. And utility employees wants to draw sample size of 200 to interview personally. They obtain a list of 10,000 customers and number them from 1 to 10,000.
They use a computer random number generator to generate 200 random integers between 1 to 10,000 and then contact the customers who correspond to those numbers. The question is, is this a simple random sample? So, from here, may binanggit daw na meron 10,000 na gusto nilang customers dun sa isang utility company. And then, they only want 200 out of 2,000 na personally daw may interview.
So, they... The number, yung ating mga customer from 1 to 10,000 and then gumamit sila ng AI or what you call computer generator, na random generator na magja-generate ng 200 random integers. So this is, from the term itself random, so this is what we call simple random sampling.
So the answer is yes. Simple random sampling siya. So next, how about this one? A quality engineer wants... to inspect electronic microcircuits in order to obtain information on the proportion that are defective.
Now, she decides to draw a sample of 100 circuits from a day's production, each hour for 5 hours. She takes 20 most recently produced circuits and tests them. Is this a simple random sample?
So, let's take a look, or a closer look at this problem. So, it states here na 100 circuits from... production, yung gusto niyang kunin. Or pagkunan. So, the population, or n, is 100. Or, what we call, sorry, the sample is n, na 100. Kukunin niya doon sa population.
So, n circuits, 100. That will be his sample or her sample. Ngayon, the catch is every 5 hours siya kukuha daw ng ano, 20 most recently produced, nakalagay dyan. 20 most recently produced. So, in that case, no, kung 20 most recently produced siya, mukhang hindi ito simple random sample, no.
The reason why is, meron siyang bias. No, pipiliin niya yung recent. Okay?
Balikan nga natin yung definition ng simple random sample. Nakalagay dito, A sample chosen by a metamist, each collection of n population items is equally likely to comprise the sample. Equally likely to comprise.
E kung pipiliin niya yun ng every 5 hours na 20 pa na most recently mukhang hindi na yun equally likely to comprise. Okay? The sample. The possible sample. Okay?
So the answer in this sample or example is no. Hindi siya simple random sample. Okay? Or how about this one? A construction engineer has just received a shipment of 1,000 concrete blocks, each weighing...
Approximately 25 kilograms. The blocks have been delivered in a large pile. The engineer wishes to investigate the compressive strength of the blocks by measuring the strength in a sample of blocks. What is the more appropriate method of selecting random samples?
So meron tong 1,000 concrete blocks. For example, ito yan. Naka sa container daw eh.
So maybe nakapatong-patong yung mga blocks. Parang ganyan. Okay. O, nakapatong daw. Tapos, bawat blocks, 25 kilos isa.
So, the question is, ano ang more appropriate? No? Na random sample. Pwede pa rin ba natin gamitin yung simple random sample here? No?
So, I think, may hihirapan yung ating construction engineer kung gagamit siya ng simple random sample. Why? Let's say, minark niya from 1 to 1,000 yung mga blocks.
Parang naka-integer. Ito. Block number 1, block number 2, and so on. So, kung gagawin siya ng simple random sampling, meron siyang, let's say, parang draw lot or computer generated na integer, number, na magse-select ng sample niya. Let's say, 6 ang kukunin niya.
So, alimbawa, napunta siya sa number 500 and nandito siya sa ilalim. Ayan. Makukuha niya pa ba agad yung 500?
Siyempre, bubuhati niya muna. So, very impractical. Nagamitan pa natin ito ng simple random sampling.
Lalo na sa mga ganitong situation. What we're going to use is another sampling technique which is called sample of convenience. Sample of convenience is a sample that is not drawn by a well-defined random method.
So, ano siya? Kinukuha natin ng koni yung convenience sa atin. Other books term it as man in the... straight sample. Kasi nga, yung kagustuhan or convenient dun sa tao na mag take ng sampling, yung kinakonsider.
Of course, hindi pwede naman itong applicable sa lahat ng situation. So, piling-pili lang kung saan siya pwedeng ma-apply. Okay? Itong ganitong sampling technique na ito.
Hindi pwede ito sa lahat kasi kung gagamitin natin ito sa lahat ay mayroong bias. Hindi pwede. Magkakaroon ng bias. sa ating sampling technique. Okay?
So, how about this example? A quality inspector draws a random sample of 40 bolts from a large shipment, measures the length of each, and finds that 32 out of them, which is 80%, meet the length specification. Okay? By chance, a second inspector got a few more good bolts, about 90%. In her sample, okay, the proportion of good volts in the population is likely to be close to 80% or 90%, but it is not likely that it's exactly equal to either value.
Ano daw nangyari? Let's say, yung isa ng quality inspector or QI, okay, nagtest ng 40 volts total, pero 32 lang doon ang pumasok sa specification, okay? Or Inspector 1 yan. And then, merong sumunod, no?
Na Inspector QI or Kuwet Inspector 2. Na sa kanya, binanggit sa problem na 90% lang daw ang pumasa. Okay? Doon sa ating inspection. O, let's assume na 40 rin to.
Kung 40 yan, kung 90%, 36, no? Yung ating pumasa na votes. Okay, sa kabila, ano daw to? 80%. Ang sabi sa atin, hindi daw close or exact na 80 or exactly 90. Okay, o paano natin ma-prepresent to?
Lumalabas na 80% to 90% ang possible na pumasa sa specification ng mga volts. Or what we call the range ng ating pagpasa. So, ang tawag doon sa sampling method na yun na ginamit dito ay what we call the sampling of variation.
Bakit sampling of variation? Because there's a variation in what we call the data produced. Balikan natin. Yung isa, 80% lang yung kanyang namit yung specification.
Sa pangalawang spector, 90%. That means, hindi magkaparehas. May variation.
So, pwede nga natin masabi nga na to 80 to 90% or makonclude from this na let's say 1,000 yung volts. 80 to 90% doon ang pwedeng pumasa specifications. Pero hindi nila pwedeng mabanggit yung exact value. Okay?
Ng percent. Dahil nga, magkaiba yung test. Unless gumawa pa sila ng maraming test method and find the possible mean percent.
Pero hindi pa rin exacto. Okay? So ang tawag nga ulit doon ay sampling variation.
Sampling variation happens when two or more different samples from the same population will differ from each other as well. Ayun, no? Dahil nga, magkaibang magkaiba yung naging result nung kanilang sampling test, no, or data na nakuha.
Okay? So, that is sampling variation. So, now let's have what we call, yung mga tatotens, sampling with and without replacement. Okay? Sampling with replacement, space that one-one gets in one sample, does not affect what one gets in different samples.
In this case, we say that the samples are independent. Independent na yung mga samples. So, it's the other way around kapag binanggit natin na sampling without variation.
Sampling without, sorry, sampling without replacement. Sampling without replacement states that when one gets in one sample, does affect what one gets in a different sample. In this case, we say that the samples are dependent.
So, para mas madali natin itong maintindihan, let's have this example yan. Lalagay dyan, an urn of contained 5 balls numbered 1 through 5. I pick 2 balls and write down their numbers and place them back in the urn or lalagyan. Then I pick another 2 balls and write down their numbers. Are 2 samples dependent or independent? Okay, so let's analyze.
This situation. So, for example, meron daw lalagyan or earn. Then, meron 5 balls. Yan.
5 balls. Ninumberan daw ng 1 to 5. Okay? Numberan natin ng 1 to 5 yan. Oh, numbered 1 to 5. Okay?
So, alam mo, bumunot muna ako ng dalawa at ang nakuha ko ay ball number 2 at ball number 3. Okay? And then, sabi dyan, ibinalik ko ulit, bumalik ulit, and then kumuha na naman ako ng dalawa, and then ang nakuha ko ay number 3 and then number 4. Question is, two samples ba ay dependent or independent? Let's look at it. Balikan natin yung definition ng sampling without replacement and sampling with replacement.
Pag sinabi natin sampling with replacement, ano daw siya? samples are independent. And sampling without replacement, samples are dependent. Dito mukhang binalik natin, no?
Or nagkaroon ng replacement. So, we could say na, ano yun? Na nagkaroon ng independent yung ating mga samples or data.
Okay? Nagkaroon ng independence. Paano nangyaring independent? O, let's say ito. So bumunot ako, so ang chance na nakuha ko yung dalawang bola lalabas ay dalawa siya sa limang bola.
Tapos binalik ko ulit. Bumunot na naman ako, ito yun. At ang nabunot ko ay nagkaroon ulit ng chance na dalawa ang nabunot ko sa limang bola.
So independent sila. Ngayon kailan natin magsasabi na dependent na siya? If the situation goes like this. Kung kumuha ko ng dalawang bola, let's say.
Let's say, 1 and 2, nakuha ko ng una. At hindi ko siya binalik. Let's say, nakuha ko naman ng pangalawa ay 4 and 5. So, ito ay dependent.
Bakit? Sa una, meron akong nakuha ng dalawang bola sa limang bola na nakalagi sa lagyan. Pero, nang sumunod, naging dalawang bola sa tatlong bola.
So, magkaiba ito. Ito ay dependent. Ito ang nasa kaliwa.
Mga dependent ito. Dahil nga, ito ay without replacement. Okay? Yan.
So, yun yung pinakaiba niya. Unlike this one na nasa kanan, na ginawa natin ay independent. No?
Dahil, parehas lang yung chances nila na makukuha nila yung bola na dalawa kasi binalik. Or what we call with replacement dito. Kasi binabalik nga.
Okay? So, ayan yung pinakaiba ng sampling with replacement and sampling without replacement. Okay?
So, now, let's move to other sampling methods. Aside from the sampling convenience, simple random sampling, sampling variation, and then sampling with and without replacement, there are other sampling methods na pwede din magamit. First, we have what we call the weighted sampling.
It's when some items are given a greater chance of being selected than others. So, parang kapag daw, sa mga lottery, the more tickets, the more chances of winning. Okay, ganun. Which is true, no?
Dahil kapag mas marami yung nilagay mo, dun sa fishbowl yun, nagagamit ka ng fishbowl technique. Kapag mas marami yung... magkakamukha na item or magkakamukha ang pwedeng sample na makuha ay mas marami yung chance na makukuha yun. Or mas mataas yung chance. Okay?
Ang tawag doon, weighted sampling. Okay? Next, ito nito called the stratified random sampling.
When you say stratified random sampling, it's then the population is divided into subpopulation known as strata. Okay? Yung mga strata, yung mga groups.
And a simple random sample is drawn from each stratum. Okay? So, kukuha ka. ng mga sample. Kung nakagroup siya, maybe, male, female, let's say, age, okay?
Pwede rin naman mga ethnic groups, for example, Arby's Science, Mindanaoans, mga Tagalogs, something like that. And then, we have what we call cluster sampling, na similar din sa stratified, somehow. In this case, nakagroups din siya. So, pwede rin masabi na yung cluster sampling is a stratified random sampling also.
So, nakagroups, Ito nga lang, pag sinabi cluster, defined siya na may groups talaga. Sa stratified random sampling, pwedeng wala. Let's say, randomly male-female siya and then merong ages yan.
Yung age, yun yung stratified. Pero kung gender, male-female, naka-cluster. So somehow, medyo may similarity pag binabanggit yung stratified random sampling and cluster sampling. Pero pag cluster sampling, focus siya sa groups or what we call clusters. Okay?
So, to understand, no, yung ating isang sa mga nakalagi dito, no, which is yung ating stratified random sampling, ayan, magsasagot tayo ng isang problem yan. Nakalagi dyan, determine the impact of the post-pandemic to the commuters in four barangays in Quezon City. The total population of commuters in four barangays is 2 million. So, again, the total population daw is 2 million na commuters.
Okay? So, ayan, given na barangay. 1, 2, 3, and 4. Ang tanong, ang gusto makuha ng number na respondents ay 1,500 sa lahat ng barangay.
Kasi, let's say, ang pinocomprise nyo na Kansas City, 4 barangay. Himbawa, walang naman dyan. So, paano natin makukuha yung ating mga sample per barangay? So, ito ay stratified random sampling.
Ito yung kanilang mga strata. Yan ito, strata yan, tawag dito. Okay?
O paano? So, tignan muna natin. yung percentage-ing ng ating population.
So, sa 250, ano ang percent niya sa 2 million? Nakalagay dyan, no? Parang 250 over 2 million. So, alamin natin, 1,300,000, ilang percent siya dun sa ating population. So, ito ay, kung P250,000, this is 12.5% ng 2 million.
Kung 1.3 million yan, that is, malaki-laki, no? That is 65%. Yan, 65%. Kapag P350,000, so, medyo bababa ng konti, kasi, or mataas ng konti sa barangay 1, mababa na sa barangay 2, kasi P250,000 lang. That is, Ilan yun?
That is 17.5%. Okay? And then, dito ay, sa 100,000 is 5%.
Kasi 5% ng 200,000, ay, ng 2,000,000 ay 100,000. O yan na, meron na tayong percentages. Okay? Mayroong tanong is, paano natin makukuha yung sample bawat barangay? Kasi ang kailangan lang natin total ay 1,000.
Okay? Bakit gano'n? Dahil, gusto natin malaman yung total. Kasi 100% to.
Diba? Ang total neto ay 100%. So, i-sabihin, In consideration with the sample, 100% yung 1,500. Tama.
So, sa 1,500, yung 12.5% niya is 187.5. Kung 65% yan, that is equal to 975. Kung 17.5%, so medyo mataas ng konti sa barangay 1, That is 262.5. At kung 5% lang yan, ay mukhang madali. That is 75 na commuters. So, ito na, nakuha na natin.
At ang total nito ay 1,500. Ngayon, meron tayong problema. Dahil, wala naman tayong mga 0.5 na tao. Hindi natin pwede makonsider yung ganyan exacto.
Pero kung mathematically, Yung ating pag-i-explain, tama, na 187.5. Kaso hindi nga natin pwedeng sundan na may 0.5 dahil kailangan mo ng respondents. Walang respondents na 0.5.
Ang meron dapat 1, isang buo. O paano ito? 1 dapat ang respondent o isang buo. Paano natin gagawin? Kailangan makita natin o makonvert natin ito into a whole number.
So, sa pag-round off or round off natin yung whole number, kung natatandaan natin sa ating mathematics, no? Kapag mga greater than, let's say, 1.2, 1.3, ganyan. So, alam natin na, let's say for example, ito yung once digit natin na, let's say, x.
X.1 to x.4, ano yan? Magiging x lang, no? Ganyan. Let's say, let's say ano to, 5. Yan.
5.1 to 5. 5.45 ang sagot. Pero kung magiging 5.6, 5.8, yan, magiging 6. Okay? So, ganyan. Ganyan ang kanyang variation or ganyan ang kanyang strategy para ma-round off yung ating numbers. E paano kapag 5.5?
Ayan. So, mangyari ay, kapag ang sinusundan ng ating 0.5 ay add, no? If ang preceding natin ay add, let's say 7.5, 9.5, ano mangyayari?
Kapag preceding natin ay 0.5, ang ating magiging sagot ay mareretain siya sa add number. Retain siya sa add number. Retain add. Pero kapag even, let's say 4.5, 6.5, magiging add yung ating number.
To add. Sabihin, magiging add number siya. So, in this case, mukhang anong mangyayari dito?
So, kung ito ay 187.5, magiging ano ito? 187. 187. Ito, mag-retain na na yung 75. Ito ay magiging 263 at ito ay 75. Ayan na. So ngayon, sir, applicable ba yan sa lahat ng situation? Hindi. Dahil, ito ay may kinalaman sa tao.
Tandaan natin na commuters ang pinag-uusapan natin. Bakit mas binaba natin ito? F.5 naman ito.
Ba't nating ginawa siyang 188? Bakit ganun? Yun nga, kasi pag add, gagawin natin, remain as add. Pero pag even, gagawin mong add number.
Yun ang rule na sinusundan natin. Lalo na, or specifically, kapag ganito ang situation. Okay?
So, lalabas na sa barangay 1, 187 ang kailangan mo. Sa barangay 2, 975. Sa barangay 3, pag 262.5, magiging 263. At sa barangay 4 ay 75 commuters na makakabuo ng ating 1,500 samples. Okay? So, that's for our stratified random sampling.
So, now, Let's move with the types of data. So, ayan na pinag-uusapan natin yung mga sampling. So, now let's move with the... types of data.
So, meron tayong dalawang klase ng data in general. We have what we call numerical or quantitative and categorical or qualitative. Pag numerical, numerical quantity designating how much or how many, kasi nga quantitative, quantify. So, kung gaano karami yung sample, assigned to each sample, resulting the set of values.
Kapag qualitative or categorical, from the term quality, place into what we call yung mga categories or qualities na gusto natin makuha doon sa ating samples. Kaya qualitative. So, yung mga previous studies ninyo ng statistics, nagamit kayo ng both. You use quantitative in measuring, let's say, the impact, halimbawa, and you have this qualitative in terms of, let's say, the effect.
Kapag mga effect yung mga... pinag-aaralan natin, qualitative yan. Let's say, hambawa in concrete, what is the possible effect? Let's say, binuhusan mo ng mataas na acid content ang concrete.
So, makikita natin, pwede ma-degrade yung concrete, for example, or pwede mabawasan ng kanyang strength. Qualitative yan. Pero, pag-define mo na yung strength, let's say, in values, or how we're going to back up kategorical or qualitative data, ba-back upan mo siya, ng inyong quantitative data, yung mga numerical values natin.
Okay? So, yan yung mga two types of data na pinag-aaralan natin sa statistics in general. Okay?
So, ambawa, dito sa ating example, yan, may loading test daw ng column to beam welded connection. Yan. Okay? So, ang tanong dyan, so, parang nakalagay dyan, no, yung ating quantitative variable, yung torque, or what we call yung mga twisting moment.
Yan. And then, qualitative variable, yung location ng weld. Ano ba yan?
Let's say, Bukang ito ay steel, no? Na column to beam. Lasa itong inyong column, no?
Ay beam pala muna. Beam, no? Tapos ito ang inyong column. Yan.
Okay. Yan, no? So, okay. Yan. So, ang tinitignan dyan daw ay welded connection.
Ito, ito, ito. Yan. Lasa'y ganyan, no?
Nung welding siya. So, tiniting natin, quantitative yung magiging moment or torque. So, pwede yung value dito na torque.
May unit yan. Of course, quantitative yan. And then, yung location ng weld, yung qualitative.
O, tama ba na dito nilagay ito? Yan. Tama ba yung pagkaka-weld?
And so on. So, kung lalagyan ba natin ng welding core mga stiffener? Yan. Diba?
O, paano yan? Magiging matibay ba siya? Yan. So, babakapan mo ulit siya ng paribawang quantitative.
Yung sinasabi ko kanina na quantitative and qualitative variable and data are kasama lagi. You cannot separate one from the other dahil you have to back up your qualitative data with the quantity or the numbers. You have to quantify.
Kasi hindi mo naman pwede mabanggit agad-agad. Kahit descriptive study ang ginagawa natin, you have to back it up also. You have to quantify pa rin.
Hindi pwede ang bawa. Let's say, sasabihin mo na mas sikip or mas matinde ang traffic condition in street 1 to street number 2. How come? Of course, you have the quantity ng mga sasakyan and so on.
Okay? So, those are the types of data. Okay?
So, we are done with the sampling and the types of data. Okay? Now, let's move with what we call, dito natin na summary statistics. Nasa pangalawang topic na tayo. Okay?
Summary statistics. We have, first, we have what we call the sample mean or the mean or the average. Ito pa rin siyang arithmetic mean or the average. Some of the numbers in a sample divided how many there are in the sample.
Or in the, tama, in the sample. So, this is yung ating summation ng mga scores or values over the total number of values or scores. Yun yun, yung ating mean.
Kaya ang symbol niya is, mean or x-bar is summation, ayan, sigma, capital letter, x sub i ranging from 1 to n, over n. Tama ba? Ayan yun, yung ating 1 over n. Okay? So, ayan yung ating sample mean.
Okay? So, sample mean, pinag-uusapan din natin yung tinatawag natin na sample standard deviation and sample variance. Okay?
Sample standard deviation is the quantity that measures the degree of spread, no? Or scattered. Tama ba? Yung ating degree of spread and scattered. Okay?
Yung square naman, yung variance and division, is what we call the sample variance. Or measured degree ng variation. Kasi variance eh, no? Kung ito spread or scattered, pagkakawala, ito'y pagkakaiba-iba nila.
Okay? Variance. So, medyo papahapiwan lang natin ito dahil ito ay inyo nang napag-aralan noong inyong high school.
Okay? Ang variance ay may simbol na s squared pero sa ibang book, ginagamit natin yung simbol na small Greek letter sigma. Ayan, no? s squared is 1 over n minus 1, no?
The summation ng x of i minus min squared ranging from 1 to n. Okay? And then, yung square root niyan, It's what we call the standard deviation.
Ito, ayan. So, pag naiskwerot natin itong S, no? Yung S squared, ito yung ating sagot, standard deviation.
Anong nga ito yung standard deviation? Minimeasure niya yung degree ng spread or scattered. Kaya pag sa standard deviation, meron tayong ano, tinatawag na bell-shaped curve, no? Na meron nandito inyong mean, and then meron standard deviation 1, 2, 3, negative 1, negative 2, and so on, no? Ayan, minimeasure natin yung spread ng inyong mga score.
Kung nasa gitna at equally distributed, mukhang bell-shaped ang itsura talaga ng inyong curve. So yan yung ating sample mean, sample variance, standard deviation. Kaya kasama siya sa mean dahil ginagamit natin ng mean for yung gitna, yung average. So ayan pa yung pinag-usapan natin.
Now let's talk about, under pa rin naman sa sample mean, because most of the data na meron tayo, we are expecting. expecting na magkakalapit-lapit yung data. For example, with our studies, of course, you have this what we call preconceived result.
Sabihin, na-expectation ka based on the related studies. Pero hindi ating maiwasan, just like yung ating isa sa mga challenge na meron tayo ng random variation, and most of the time, ay meron talagang data that will come out differently from the others. Yung dalawa sa mga challenge natin. Ang tawag sa mga data na yun, what we call outliers.
Okay. Outliers are, eto, mga points that are much larger or smaller than the rest of the sample. This may result from data errors and needs to be scrutinized and should be corrected or deleted.
Okay. Dito sa illustration, yun na sa kanan, mukhang eto, eto yung ating mga outliers. Yan.
Parang meron tayong sinusundan na curve, let's say progression to ng mga strength and so on ng inyong material. Tapos, Biglang naging ganito siya, may nahiwalay. Or dito sa kaliwa, ay sa kanan pala ninyo, ito, parang ito yung curve na sinusundan. Pag ganyan dapat, marapit.
Let's say ito yung mean. Pero ito napunta siya dito. So ang tawag sa mga ito ay outliers. Hindi siya kasali. Pwede i-delete na natin, o tanggalin, o pwede rin naman tignan kung bakit nangyari yun.
So pwede, from there, makikita mo na May mga possible effect. Let's say, human error or meron bang naging outside effect kaya nagkaroon ka ng ganyang outlier na data. Okay? So, ayan, yung mga outliers. Kasama yan, hindi talaga natin mayalis yan in our engineering data dahil hindi natin kinokontrol ang data.
Remember, random variation nga yung ating mga data. Random na tapos may variation pa. Okay?
So, We are done with sample mean. Anong alit yun? Average or arithmetic mean.
And then we have what we call the variance standard deviation. Kung saan natin ginagamit din yun, yung mga mean and these outliers. Now, let's talk about yung median.
Median is the measure of center ng ating mga score. So, ibig sabihin, pag ginagawa natin, kinukuha natin median, we arrange the score from highest to lowest. Parang ganun yun, no? Lalo na kung sample lang siya, no? And number...
are ordered from smallest to largest. Or pwede rin namang largest to lowest, depende rin naman sa kung paano nyo ina-analyze. No?
That's not to examine the smallest to largest. Pwede rin largest to smallest, depende sa kung paano nyo ina-analyze yung mga data. Okay? N is add. The sample, if N is add daw, let's say 1, 3, 5, hindi pala 1, kasi 3, 5, 7. The sample median is in the number, is the number in the position of N plus 1 over 2. Let's say, Yung, kung add yan, let's say, n is equal to 13. So, ang median mo ay, ano yun?
Ano ba simbol ng median pag ganito? Okay? Ang median mo daw ay n plus 1 over 2. So, lalabas siya ay 13 plus 1 over 2. 14 over 2 is 7. So, yung pang 7, yung inyong median, no?
O, let's say, meron kang 1 to 14. E di yung number 7, parang ganun. Okay, sorry pala, 1 to 13. So, number 7 yung median mo. Okay? So, kung n daw is even, the sample median is the average of the numbers in the position n over 2 and n over 2 plus 1. Okay?
O, yung bawa, meron tayong n na... Say, 18. Yan. Ang sabi ang median, average, no?
Average, that means i-add mo, i-divide mo sa 2. No, kasi dalawang numbers, eh. So, yung una ay nasa n over 2 daw. So, that is, let's say, 18 over 2, sabihin, pang 9. No? At yung isa ay nasa n over 2 plus 1. So, lalabas siya, 18 over 2 plus 1, or pang 10. So, sabihin, ang median mo ay, ang median mo ay nasa average nung pang 9. at saka ng pang-tenth. Average kasi i-divide mo sa 2. Okay?
So, kung lalabas yan, parang 9.5. Nasa position siya ng 9.5. Okay?
So, yun yung ating median. Okay? So, ganyan. Actually, ito sa review of your statistics back when you were in high school. Senior and junior high ninyo.
Okay? So, now let's move with quartiles. Yan. Actually, pinag-uusapin natin yung quartiles, yung we talk about median.
Kasi ang quartiles ay kapag dinivide natin yung ating set of scores or set of possible samples into 3 quartiles. Okay? So, first quartile is 0.25 times n plus 1. Second quartile is 0.5 multiplied by n plus 1. And third quartile is 0.75 multiplied by n plus 1. Ano yan? So, Let's say itong line na ito represents the total number of samples na parang n. Okay, yan.
N na yan. N. Kabuuan. So, sabi daw, ano daw? Parang lalabas.
Ayan, no? So, kung ito ay 0, malamang ito ay parang 100%, no? So, sabi, didivide ito sa 3 quartiles.
So, 1, yan. 2, 3. Yan. So, ito ay yung ating 25%, 50%.
and 75%. Ayan. So, magiging apat. So, ito, first quartile, ayan, nakalagay dyan.
Okay? Ito, second. Ito yung ating third. Okay? So, yung ating first, ito yan, 0.24N plus 1. Di ba parang katunad lang din yung median?
No? Okay? Tapos, 50% or second quartile is also known as the median.
Kaya pinag-uusapan natin yung quartile sa median kasi pumapasok naman siya doon sa topic na yan. So, ito rin pala yung median natin. Third quartile is parang 75% ng inyong mga score.
So now, kapag quartile, pinag-uusapan din natin itong interquartile range or IQR. Ano yun? That is the difference between quartile 3 and quartile 1. So, para maitindihan natin yung quartile, sagutan natin itong problem na ito. In the article, Evaluation of Low Temperature Properties of Hot Mixed...
asphalt mixtures, HMA. Yan. Ito yung nag-study.
May Journal of Transportation 2002 5785-83. The following values of fracture stress in megapascal were measured for a sample of 22 mixtures of hot mix asphalt. Yan. HMA.
So, ito. Mukhang naka-arrange na yung mga scores at mukhang madaling natin makikita. Ngayon, ang una nating gagawin ay ano ba pwede pa sa atin?
First and third quartile. No? First and third. Paano nga ulit?
Formula, quartile 1 is 0.25 n plus 1. Ano nga ulit yung n natin? Mukhang nakalagay dyan, no? 22 ba?
22? Ito, 22 mixtures. So, quartile 1 is equal to 0.25 multiplied by 22 plus 1. So, quartile 1 ay 25% or 0.25 ng ating ating 23. That is equal to 5.75.
Ito pala yung ating quartile 1. Sa ganitong daming sample, take note ha, may kinalaman ito sa sample natin kasi hindi natin alam yung population nito eh. Parang ano? Parang hindi siya naka, ano, hindi naka ayos.
Hindi natin alam yung population niya. Okay? Sige. So quartile 1 is 5.75.
Ayan. So, paano to? So, that is, tanapin natin, naka-arrange naman siya, 1, 2, 3, 4, from lowest to highest, no? 1, 2, 3, 4, 5, lagpas siya, ito yung 6. So, nandito pala yung ating quartile 1, no?
Nandyan. So, para makuha natin yung quartile 1 na sagot, that is, yung quartile 1 ay, parang katulad sa median, ina-average natin, tama? That is, 80 20, Plus 105 over 2. So, our quartile 1 is equal to 92.5. Okay. 92.5.
Okay. Ito yung ating quarter 1. Okay. So, ang location niya ay nasa 5.7.
Kung 5.7 na sa gitna, o nasa pagitan niya, not nasa naming gitna, pero nasa pagitan, kaya kinukatin yung average. Bakit ganun, sir? Bakit hindi siya 3 fourths? Kasi 0.75.
Tandaan natin na yung quarter na pinag-uusapan natin dito ay, ano, in terms of the total, yung sample, no? Sample lang. Okay?
Sample yung pinag-uusapan natin. So, ibig sabihin, posible na sa population natin ay merong score na mukha paalo ng 90. Tama ba? Kasi may lumabas tayong quartile na 92 eh.
So, let's say may naka 90, 91. Pwedeng gano'n. Kasi hindi nga natin kaya ma-analyze yung population. Pero yung sample kaya.
Okay? O, paano yung quartile 3? Quartile 3 is equal to 0.75 n plus 1. So, quarter 3 is 0.75. Ano yun? That is 22 plus 1 or 75% ng inyong 23. That is 17.25.
O, saan yun dito? 17.25. O, tuloy natin.
This is 7, 8, 9, 10, 11. Yan. So, 12, 13, 14, 15, 16. So, ito yung 17, ito yung 18, yung 247. So, nandito pala yung ating quarter 3. So, ating quarter 3 ay nasa pagitan daw ng 245 plus 247. So, ating quarter 3 ay 246. So, yan yung sagot natin for our quarter 3. So, pwede talaga na nangyari ay napunta doon yung inyong quarter 3. Napunta. dun sa 246. So, pwedeng may score na 246 talaga. May sakpuhan, no? Ayan yun.
Okay? So, yan yung ating quartiles. O, what if pinahanap yung interquartile range? Pag interquartile range, ibasasubtract lang natin yung quartile 3 na value na 246 sa 92.5. Okay?
So, that's the answer para kung hanapin naman yung ating interquartile range. Okay? Sige.
So, now, Let's move na dun sa ating percentile. Yan. Ang percentile is similar sa ating quartile. Okay? Ang percentile naman, so natin ito sa percent, that means 100%, or in division, to 1 to 100, or 0 to 100. Okay?
So, makalagyan. Device the sample so that as nearly as possible, P percent of the samples na between, ano daw, 0 to 100, no? Yung P. are less than the 5th percentile or 100 minus 5% are greater. Okay? So, let N represent the sample size.
So, parang lalabas, P over 100 multiplied by N plus 1. Parang lang di siyang quartile, no? Diba? So, ibig sabihin pala, yung 25th percentile natin is the first quartile kasi 25 over 100 is 0.25, no? Okay, median is the 50th percentile and second quartile, 75th percentile is the third quartile.
Okay? If the quantity is an integer, that is the percentile. Otherwise, get the average of the two samples values on either side. So, parang katulad lang din yung quartiles kanina.
Okay? O, sige, para maintindihan natin, sagutan natin itong problem. O, same. Same na problem, same na given. Ang pinapahanap sa atin is the 65th percentile.
Yan. 65th. So, lalabas, P65. Yan. Parang 65 over 100. Kasi percentile, eh, no?
n plus 1, o. Parang 0.65 multiplied by 22 plus 1. So, our 65th percentile daw ay 14.95. O, nasa pagitan pala ito ng 14 tsaka 15. So, kung ito ay 11 na ito nandito, 12, 13, 14, 15. So, nasa pagitan neto.
Ito pala yung ating P65, no? So, ang ating 65th percentile is 236 plus 240 over 2 or equal to 238. Ito yung sagot natin for the 65th percentile. So, madali lang.
Actually, ang idea niya, yung quarter and percentile, ay galing sa median. Di ba? Kailangan ako pag ano daw, n is odd.
Ganon yung nangyayari. Or n is, sorry, n is tama. Is odd or even.
Depende. Pero kasi ang ginagamit natin yung n plus 1. Anong nga ulit yung ating n plus 1? Pag n plus 1, ayan, over 2, ayun yun, no? Yung parang ating... Kapag add, pag even, parang ina-average mo pa, no?
Ayan, yung idea ng median, yun yung ina-apply natin sa quartiles and percentiles. Okay? So, ayan yung ating median together with quartiles and percentiles.
Okay? So, now we are done with the data or summary statistics, no? Pag-ano lang naman yung ating pinag-uusapan.
Actually, more on the mean median, no? Ayan. Dahil yan yung mga ginagamit natin to analyze data, eh, no?
We are also dealing with modes. Kung nalala nyo, tatlo yun, di ba? Mean, median, mode. Mode yung dami. Kung gano'ng karami, mas marami ba yung number 1, number 2, di ba?
So, ayun. Hindi natin isasama na yun kasi parang yun ay may kinalaman sa pag-observe ng ating mga quality. So, ano na yun?
More on the recommendation conclusion na part, yung mga modes. Depende sa dami. Yung scores, variation.
Actually, nag-fall din naman sila sa mean kaka median yung mga modes. Kasi kapag median, kung pare-parehas yan, sorry, mode, min pala, kung pare-parehas, mas madami yung ganitong data. So, malapit doon yung min. Tama? Tama?
Mas marami. Sa inyo, kung nag-exam, marami yung mga nasa 50 to 60. So, malamang, nasa range ng 50 to 60 yung possible min. Okay?
So, ganun lang. Okay? So, now, let's move with the last outline for our discussion topic outline, which is the graphical summaries. Okay? In graphical summary, ito yung the usual na ginagamit yun to present yung ating mga data.
After natin mag-compute ng mga mean, median mode, after we presented the sampling method, ito na yun. So, first of them is what we call the stem and leaf plot. Sa stem and leaf plot, para maintindihan natin siya, okay, meron tayo dito nga kung nilagay na problem na agard sample problem.
The table shows a study of the bioactivity of the certain antifungal drug. The drug was supplied. to the skin of 48 subjects after 3 hours.
The amount of drug remaining in the skin were measured in units of nanogram per square centimeter. The list has been sorted in numerical order. So, from highest to lowest na siya, parang gano'n. Or in numerical order, pwede rin naman lowest to highest.
Pero, mas maganda siya i-present ng highest to lowest kasi mukhang may 1 stem na. So, ang stem and leaf ay merong stem tsaka leaf. Yun talaga siya. So, in this case, makikita nyo na yung stem dito ay yung ating mga tenths na digit, no?
And then, dito yung ating mga ones. Ayan. So, tulad ito, meron tayong 3, 4, 4, 7, 7, 8, 9, 9. E di, ang counterpart niya na tenths ay 0, kaya nakalagi siya dito.
3, 4, 4, 7, 7, 8, 9, 9. Okay? O, kapag yung stem naman natin ay 1, so, ano-ano yan? Mga tenths, no? 12, 12. 15, 16, 16, 17, 17, 18 Yan So, yung susulat mo ay Sa leaf, yung ones nya 2, 2, 5, 6, 6, 7, 7, 8 And so on Kaya may 20 Diba?
20, 20 Dalawang may 0 Ito yun Ayan Okay? So, ginagawa tong stem and leaf Para mas madaling magkaroon ng calculation Nang hindi masyadong gumagamit na ganyan ng calculator Yung mabilisan Okay? Usually, ganit-ganit ito pag mga rough estimation dahil dapat nakita agad yung value. Nanggaling ang idea ng stem and leaf plot sa Abacus, yung Chinese na technology. Abacus.
Ano yung Abacus? Yung may mga beads. So, sa mga Chinese, mabilis silang mag-solve dahil sanay silang gumamit ng Abacus. Parang ganito siya sa stem and leaf.
Sige, halimbawa. Halimbawa, ang tanong sa atin, i-add natin Kunin natin yung average ng mga scores. Adi, ang gagawin mo, i-add mo.
Tama? Paano mo i-add na mabilis? So, ginawa niya stem and leaf. So, nilagay niya lahat yung ones dito, yung ditong tens.
So, bibilangin niya. Let's say ito. Pag-add niya itong 3, 4, 4, 7, 7, 8, 9, 9. Meron siyang total dyan.
Halimbawa, dito. O, yung mga 12, 15, 16. Ang gagawin niya, bibilangin niya ito muna. Let's say lang ito.
1, 2, 3, 4, 5, 6, 7, 8. 8 ba? O. 8 yung ano.
So, ibig sabihin, 8 times 10. Kasi ito yun, diba? 10 to. So, meron ka itong 80. Tapos ipag-a-add niya ulit to.
Ayan dito. So, anong nangyari? Nagkakaroon na siya ng mga values.
And so on. Hanggang sa kung ilan yan. Okay.
Ang bawah dito, ilan? 1, 2, 3, 4. So, ito, matic alam niya may 40. Ito, bibilangin lang niya, no? Parang 50 times 4. Tama ba?
40 times 4 pala. Sorry, 160 pala to. Ito, 5. Ano ba? Andi, tama, 40 nga.
Sorry. So, ito ay 40. Yan. O, 40 ito kasi, ilan? Apat, no? 40, ito 50, ito 0, ito merong isa.
Eto, 70. Ayan. Tama ba? 70. 40. Parang 40 times 4 pala ito. 40 times 4. Eto, 50. Ilan?
Times 4. Ayan. So, lalabas na yan. 200, 160, and so on. So, mas mabilis sa kanya mag-copy. Hindi makukuha niya ng average.
So, ganun yung stem and leaf. ginagamit siya somehow kapag pagkukuan ng mga mean ng mabilisan. Parang hindi natin masyadong parang, ditingnan yung mga total kasi nga, mabilis na yung pag-compute. Parang hindi natin makakompute agad.
Ito, stem and leaf yung isang example. So, yung stem and leaf ay nakaka-counterpart o nakadikit yan sa ating tiyatawag na dot plot. Kapag dot plot is a graph that can be used, To give rough impression of the shape of the sample. Useful when the sample size is not too large. Tulad na ito, hindi naman masyadong marami.
Ito, ilan lang ba ito? 48? 48, let's say 1,000 yan.
So, 48 lang kinuha niya. So, kapag dito masyadong malaki. At saka, may mga ulit-ulit na values. So, katulad na ito, di ba itong mga scores na ito ay galing din dito sa stem and leaf. Ano yun?
Yung mga 3, 4, 4, 7, 8, 9, 9. Ito ba yun? 3, 4, 4, 7, 7. 899. Ayan yun, diba? O para makikita na niya kung saan magpo-fall o saan kategory mas madami yung kanyang data.
Halimbawa, lalabas pala na from 20, ano ba ito? 20 nanogram per cubic centimeter to 30 nanogram square pala, square centimeter na value ng gamot ang nagkaroon ng effect, no? Diba ito ba?
Ito ba yun? Masaya natin ulit natin. Ayan.
No? Nakita yung units na na-remain doon sa balat ng 20 to 30 nanograms. So ganun pala kadami. So magkakaroon ng rep estimation yung nag-aaral or nag-study na okay, kung ito karami pala yung average, so pwede mag-fall dito yung inyong mean. Pwede dito.
Pwede rin naman from 20 to 40 kasi mukhang medyo mas nagkukumpula ng konti. Mukhang may significant yung dami ng scores dito. So, ayan.
Makikita mo, rough lang. Rough impression. Hindi ito exact. Rough impression lang siya. So, ayan yung ating dot plot.
And yung last sa ating mga graphical summaries, yung ito na tawag na histogram. Sa histogram, ay ito na yung mga ginagamit natin madalas. When we present data, lalo na kung yan ay maramihan na na samples. So, nakalagay dito nga, histogram is a graphic that gives an idea of the shape of the sample. Kanina, rough impression yung dot plot.
Ito pala yun, makikita mo na yung shape. So, pwede ka na makakita dito ng mga bell shape, kung distributed na maayos yun yung mga scores. Ayan, ayan na nakalagay. So, where the sample points are concentrated in regions where they are sparse or dikit-dikit, or iwaiwalay, depende. So, ito may nakalagay na emission ng inyong mga vehicles.
In high altitude, nakalagay na siya. Particles ba ito? So, ayan, nakalagay na yung mga particles natin.
particles or PM emissions. So, paano natin gagawa ng histogram? Sa histogram, kailangan gumawa muna tayo ito tawag na ito, cumulative frequency table or frequency table.
Ayan, yung nakikita nyo sa right side. Yung may cross interval na grams per gallon. Then, you have the frequency and then relative frequency. So, ayan yung ating ginagamit. Actually, yung frequency, ito yung ating x-axis at yung relative frequency natin, yung y-axis.
axis natin para sa histogram. Yung class interval yan, pwedeng ito ay dictate na lang, depende. Pwede rin naman natin makuha using yung ating tatawag na range, tama ba? Ano yung range natin lalabas siya? Highest score minus lowest score.
Then, titignan natin kung ano yung possible na class interval natin. So, in this case, mukhang class interval na ginamit niya ay 2, tama ba? Kasi from 1 to 3, ayan, 1 less than equal to sa x but less than 3. Tama ba?
So, ayan yun. So, ayan yung ating class interval. Okay?
So, pag nagawa na natin itong frequency table, sa frequency, nalagay natin yung dummy ng scores na nag-fold dun sa range na yun, or class interval na yan. Okay? And then, yung percent nila. Kung ilan man to, let's say, ito yung over 62, parang 12 over 62. Ganyan siya. Or then, 2 over 62, 1 over 62, and so on.
Okay? So, yun. Then, pag nakuha na natin yan, naguha natin siya ng itong tape, ng histogram. Ayan.
Okay? Yun niya. Yung emission natin, ito, no? Yung ating nasa relative frequency, sorry, frequency pala, yan.
Ito yung ating x, ito yan. X-axis. And then, yung y-axis nga natin, yung ating relative frequency.
Yung percentages, pwede naka-inpercent, pwede naka-decimal yan. Okay? So, from here, makikita natin na mukhang ano, pag ganito ang itsura niya, parang dito, mataas na konti, bababa, tataas na naman. So, iksabihan, nasa class interval na pagkita ng 5 to 7. Kasi doon doon yung madami, ito 18, no?
Nang kita naman, diba? So, nandyan yung ating kumpula ng mga scores. So, nakalagay nga dito, to construct a histogram, 1, determine the number of classes and use and construct the intervals of equal width. Yan, ginamit natin 2 in this case. Number 2, compute the frequency and relative frequency of each class.
Frequency, bibilangin lang natin. Relative frequency, yung number na nabilang over the total. And then, draw a rectangle.
Yan, ito na yung histogram natin. So, ngayon, parang, sir, hindi ba bar graph din yan? So, iba ang bar graph sa histogram.
Ang bar graph ay hindi magiging histogram. Kasi ang bar graph pwede kahit ano yung kanyang x and y axis. Pero ang histogram pwede maging bar graph. Dahil meron naman siyang x and y axis, nang nga lang, pag histogram, kailangan yung y axis mo ay yung ating relative frequency. Sabihin, percentages or in decimal.
And then yung x axis natin yung frequency. Kaya kung ang tanong ay, ang histogram ba ay bar graph? Hindi.
Hindi ba? Pwede. Pwede siya maging bar graph.
Pero not all bar graphs are histogram. So, yun yung ating definition ng histogram. Actually, review lang din to.
Alam ko yung iba sa inyo, ginawa to ng inyong senior high and junior high. Kaya, medyo madali-dali. So, eto, pag-usapan natin yung skewness. Yung skewness, actually, galing din siya sa histogram. Skewness, is the asymmetry or refers to the asymmetry of histogram.
Asymmetric histogram has its right half a mirror image of its left half. No, parang kalahati. Ito, symmetric daw.
Now, let's say ito yan. So, parang ito siya lalabas. Yan.
Ay, sorry. Yan. Ito, no, parang nakaganyan siya. Okay?
So, symmetric ito. Yan. Symmetric. Ito, ito, ito.
Single peak or unimodal distribution. Okay? Ngayon, kapag, yan tawag natin, yun. skewed to the left or negatively skewed, ito yun. Bakit skewed to the left?
Dahil mababa yung left side. Kunyari, ito 10, so lumalabas parang kung dito siya from 0. Ano ba ito? 6 na, no?
Parang 0.6 to 2.0. Parang lalabas ano yun? Dito sa 1.2. No?
Tama ba? 1.2 na or let's say dito sa 1.3. Tama ba? Ayan, dito. Let's say ito yung gitna niya.
O diba, nandito sa left, konti, kaya negatively skewed. Skewed to the left siya. Ibig sabihin, yung kanyang symmetry hindi pantay.
Nasa kanan halos lahat pala yung scores. Tama. Pero pag positively skewed or skewed to the right, sabihin, let's say ito, from 0 to 1.4 so banda dito sa 0.7 yung gitna niya. Sabihin, dito sa kanan, hindi walang halos score. Nasa kaliwa yung mga scores niya.
Baka malito kayo. Ganun talaga. Positively skewed ito, ito ay negatively skewed or skewed to the left.
So, yun yung ating skewness. Pagpantay na pantay, ang tawag daw symmetrical or symmetric yung kanyang skewed. So, yung skewness din also, ay may kinalaman sa tiyatawag natin ng histogram modes. Another graphical summary. Ang histogram mode is what we call the peak.
of the local maximum or local maximum in a histogram. So, katulad kanina, ito. So, ito, yung ating symmetrical. So, yung ating histogram mode dyan, ay nakatagay na single peak or unimodal.
Okay, ayan, ito rin yan. So, unimodal yan, yung ating peak. Sabihin, isa lang, or nasa isang location lang, yung ating peak score.
Pero, pag titataw natin na bimodal histogram, dalawa yung peak na lumabas, maybe, nagkaroon ng tinatawag natin na variation. Okay? Or tinatawag natin na sub-samples. Halimbawa, nag-quiz kayo.
You have your quiz number 1. And then, merong mga scores na nag-fall sa 50 to 60. Halimbawa lang ito, ha? Ito, in this case, parang ano ba ito? Parang 10. Gawin lang natin 50 to 60. Then, yung isa, nag-fall, marami-rami yung nagkumpulan din sa 70 to 80. Tapos, nag-nataon, naging pantay.
Nung, halimbawa, ilan ba kayo? 50. Let's say, 50 kayo sa klase. So, Lumalabas, let's say, tag-17 parehas.
Napunta dito sa 50 to 60 at 70 to 80. So, lalabas, ano? Magkakaroon tayo ng dalawang peak. Ang tawag doon ay bimodal distribution. So, pwedeng ano yung mga factors doon? Let's say, itong napunta sa 50 to 60, medyo average na yung pag-aaral nila.
At sa 70 to 80, siguro magkakagroop sila. O pwede natin masabi na kahit di sila magkakagroop, Pero nagkaroon ng, let's say, nakukuha nila yung tamang reference or nakapag-aral sila ng mas advanced in comparison doon sa 5260 kasi parehas naman pasado. Okay?
So, parang gano'n, no? Or pwede rin naka-groups or naka-cluster yung inyong mga naging sample. Okay?
Parang lumalabas ng grouping. Let's say, nakakuha ng 5260, let's say, ang mga male na may age na 10 to 12. Parang mga gano'n, no? Tapos, yung 7260 male na 13 to 14, parang ganyan. At 70 to 80, kasi medyo mas maalam sila ng konti. So, magkakaroon ng double peak.
Okay? So, ganun, no? Kaya, yan yung ating histogram mode.
Okay? So, that ends our discussion with our lecture 1, our introduction to statistical data analysis that contains obtaining data.