Transcript for:
Understanding the Basics of Data

the word data means many different things to many different people and for that reason the purpose of this lecture is to provide a primer on data so what does the word data actually mean data refers to the facts of transactions or events importantly the word data is actually plural so data refers to multiple data points and in fact if we want to refer to a single data point we can say data point or we can say datum now in the real world many people use the word data as if it were singular and that's fine you'll actually see this commonly on news sites but for our purposes we'll know that data is actually a plural word now we also need to think about the quality and the integrity of our data because we have garbage data going into our data analysis we're going to have garbage findings and interpretations so it's important to understand what data actually are and the different types of data and what we should do about them to make sure that they have high integrity and higher quality so that they can be analyzed and part of this comes with good measurement and good data acquisition but some of it we can do on the back end too in terms of applying structure to the data so that we can actually analyze it in a meaningful way and have findings and interpret it in a meaningful way as well so we think about data integrity and quality we need to think that data in and of themselves are not inherently of high quality or unnecessarily error-free in fact sometimes we think oh this is a numeric form it must be its data and therefore it must have some kind of value it must be of high quality but it doesn't necessarily mean it's accurate there's all sorts of ways for errors to creep in and so these could include transcription and transposition errors a lack of validation joining errors and a number of different various human errors that can lead to low quality or quote garbage data so we really need to focus on how data can be cleaned and structured so they can actually be of use to us in order that we can actually have high data integrity and high data quality as well again garbage in garbage out so we can distinguish between two types of data in general qualitative data versus quantitative data so what's the difference between them well qualitative data involves rich description this can be text in nature this could be a photo this could be a recording of some oral communication or a sound or something like that it's something that doesn't necessarily have inherent numeric quantities or numeric quantities that or at least they haven't been applied to the data yet quantitative data on the other hand these have inherent numeric properties and therefore can be counted and can be analyzed using quantitative data analyses qualitative data on the other hand need to be analyzed using a separate set of techniques that are referred to as qualitative data analysis and this includes thematic analysis content analysis and so forth we're going to focus mostly though on quantitative data not because they're more important but just because this really gets at what the heart of hr analytics is currently with that said we have a number of cool new tools and effective new tools that can be used for natural language processing can be used for latent semantic analysis and text analysis in general that can be useful for fast tracking those qualitative data analyses that historically have taken quite a bit of time because it involves multiple raters and judges going through data qualitative data whether it be open-ended survey ended survey question responses or something like that so let's look at an example of how we can distinguish between qualitative data and quantitative data and this time in the context of performance evaluations so let's imagine that we have a performance evaluation tool and part of the tool involves raiders rating the employees along a number of different performance dimensions or standards and then the other part is an open-ended concept at the end where people fill in the blank and provide information and narrative form about the employees and describe the employees using words what they did or didn't do or how well or how they didn't do something so well during the review period and so if we look at the first row here let's assume that this row contains data for the same person we can see an example of qualitative data would be john completes reports in a timely manner and with few errors quantitative data on the other hand would be their actual numeric score here okay so we can see in the right hand column there that we have the quantitative data going from 3.9 4.4 4.2 this is something that we can apply descriptive statistics math to as well as inferential statistics and other types of quantitative techniques to analyze if you notice over on the other side qualitative data it's not going to be readily not to say it can't be quantitatively analyzed it would just take an extra step of somehow counting something in the qualitative data maybe the number of times that someone uses positive words from a select list or something like that but in its original form it's not going to be quite ready to be analyzed using quantitative approach as i mentioned there's a whole other area of qualitative data analysis that we can pursue so the in practice though the distinction between qualitative and quantitative data is not always so clear and in fact sometimes we do make that transition very quickly from something that is actually inherently qualitative in nature that then we apply numeric properties to so let's take the example of an employee survey and let's assume we are surveying employees using a number of different survey items or questions in which they respond to using a liquor type scale that ranges from very satisfied to very dissatisfied and we're trying to tap into the concept or construct that is job satisfaction so let's imagine the item that we're interested in is a job satisfaction item that's very common that is in general i am satisfied with my job are you very satisfied somewhat satisfied neither satisfied nor dissatisfied somewhat dissatisfied or very dissatisfied as you can see here we've arbitrarily coded very satisfied to be one all the way down to very dissatisfied to be five so we've taken these qualitative narrative descriptions of how satisfied someone is and translated them into quantitative scores for this person or to a group of people so again the distinction is not always clear or sometimes the distinction can shift very quickly depending on our choices in terms of what to actually quantify and what not to quantify now another thing to consider when we're thinking about data broadly is that data don't speak they're interpreted and this is a quote from john matthew from the university of connecticut now this is an important thing to remember data analysis and data interpretation involves a human being and so we need to consider how do we interpret this in some context the data aren't saying anything to us we have to look at them very carefully and then make inferences about them and interpret them and then report them outwardly so next we can distinguish between these concepts of data information and knowledge and we tend to use these interchangeably but actually there are some technical distinctions between what we consider data versus information and knowledge you can think of data being kind of the most basic version then information being built upon data and the knowledge being built upon information so again when we talk about data as we discussed before it represents the facts of transactions or events now information on their hand refers to the interpretation of the data with a particular or a given goal in mind and then finally knowledge refers to information that has been given meaning so this consists of procedures that are necessary to follow to use the data and information so let's bring in an example to make this a little bit more lifelike so let's use the example of employee age if we talk about facts about employee age this could simply just be a list or an array or a vector or column filled with employees ages let's say you have 100 different employees we could have their actual biological ages listed out there in years well that's the simplest form those are the data next we could look at the information so we can do something with those data for with a given goal or a purpose and let's say we calculate the arithmetic average or mean based on those data and we find that let's say this group of employees has an average or a mean age of 45.3 years well we can take that one step further by translating this into knowledge and so we can take that average employee age let's assume this is considered a high average employee age in this context and then we can use it to plan for future recruitment given the high average employee age and use this to anticipate the likelihood of retirement assuming that maybe most people are going to retire around 65 to 70 or so okay so this concludes the primer on data again this is meant to be not exhaustive but this is meant to be kind of a crash course on what data mean and what that word actually means thank you very much