thank you for that lovely introduction um i thought i would get started by sharing maybe one or two small facts that maybe not anyone on this call knows about me but actually my background is also in teaching high school science so i spent three years as a high school physics teacher before sort of making my way into the realm of educational research and the field of learning analytics uh so what i'm going to try to do in the next 10 to 15 minutes is give the 10 000 foot uh view of what learning analytics is all about and in doing so i'm going to try and specifically pull in hopefully some examples from stem ed and uh in response to a question that sean asked me with each of the different pieces of the puzzle i'm going to hope to keep the focus on why does this matter what does this tell us about learning and how does this help us educate better uh so with that hopeful uh caveat that i can't do justice to build in 10 minutes let's get going so what is learning analytics uh the basic definition back from 2011 that we still see is it's the collection analysis of data traces related to learning in order to inform and improve the process and or its outcomes um i think there's one important thing i would add to this that is implicit in this definition but doesn't always pop out for people is this idea of closing the loop and i think that's what really differentiates learning analytics from a lot of educational data sciences we're not just trying to use um more sophisticated forms of data and complex computational approaches to understand what's going on but we're trying to find something that's going to be actionable in the moment actionable for teachers actionable for students uh maybe actionable for advisors but the notion is this isn't sort of long cycle education research where we learn something that might be useful and change the world in uh ten years or five years or even one year the notion is we're developing traces that can let us know how learning is going while it's happening so that we can do something about it right now for the same learners from which the data came so um it's a bit controversial to start with data because i would argue it's not the most important part but it is what most people think of when we think of learning analytics so i'm going to start uh with data and build up to some of the larger pieces surrounding what goes on here um so what's new about data right data isn't new we've had data in education for a very long time but uh what we start to see as being new these days is the number of different people we can collect data for and particularly with everything having moved remote in the last year the the number of people who we have some sort of data on has increased dramatically there's also the number of different variables that we're able to collect um lots of different things that you can see about what students are doing when and how in addition and this is maybe one of the more important ones we're able to collect this data much more frequently so we suddenly have much finer grained granularity of data immediacy of access to this data and the analysis is done on it and the notion of linkages as everything uh sort of starts to get stored together in the cloud the opportunity to connect different sources of data to learn things we might not have been able to before is is very exciting it's also one of the main challenges just connecting our registrar and our lms data turns out to be not a trivial uh thing so what kinds of data are we talking about um i'm going to use a classification system that i elaborated from some work done by uric hop and first we've got sort of old school data that education has had we had sort of who the students are and how they've done but uh when we talk about analytics we're usually thinking of one of three kinds of data that uh we call three a's the first is activity things students are doing and this might be captured through log files physical traces eye tracking self reports these kinds of things we also have artifacts things students create these might be answers they write numbers they give and they're solving a problem it can be steps along the way um or it could be something at the end and finally we have association which are connections students make who and what they interact with which things do they look at in temporal succession what kinds of linkages are they drawing so i find this very helpful if we're starting to think about what new kinds of data there are and you can think about this if if you're a stem educator or stem researcher what are the things that are happening in every moment what are students doing what are they creating and what are they connecting and how can we collect data on that that might be useful and tell us something about the learning going on a couple caveats about what big data isn't because i think sometimes we all would like it to be more than it can be it's not neutral we often sort of imagine the data is just sitting there but the process of how it was created um always influenced it in some ways and those can be very very important when we understand what sources of data exist which ones don't what categories were used um we've been having a lot of conversations with some of the work we've been doing at nyu around what are the categories around race ethnicity and gender and as opposed to just sort of taking for granted these are the ones that happen to be sitting around whether they are the right ones whether those are even the right kinds of variables to be looking at or are those proxies for things that we could actually look at in much better ways big data is also not natural and what better big data isn't is it's not easy to get good data it's not something that's just going to happen it has to be purposeful and it's going to involve processing and it's going to involve planning so one thing i'd like to communicate is that if you're going to do good learning analytics it doesn't start you know after students have engaged in activities whatever days sitting around it has to do with thinking about how the activity is being set up how the tools are being set up and how that's going to generate the right kind of data to be able to to do something interesting and useful um the the phrase garbage in garbage out is important to heed which is no matter how good your algorithms if the data wasn't tracking interesting things you can't generate that after the fact uh so as i said we think of data as the starting point but in fact data comes second to design and as i mentioned the data is not neutral so for example um you know they did a study of students walking around a college campus and you can imagine well they just walked where they walked but in fact there was a design factor shaping very strongly where students walked here and that was the pathways um we can imagine the same thing exists in online environments and i pulled a couple of examples of ones relevant to stem ed that i thought might be interesting so this is from a really nice study done by vanessa sviha and colleagues back in 2015 and this was done in the wise inquiry learning environment which i'm guessing a few of you are familiar with um so if we look at this environment what are the tracks similar to the pathways i showed in the prior one look like here well there's the reset and run button there's a couple of toggles and numbers they can fill out and there's the different steps they can go through um and so this this was a while ago and so with this study what they were able to find is that looking at the steps on the left when students went back and revisited certain kinds of steps it was a predictor for learning but it wasn't all steps it was these kind of steps here the ones that had dynamic interactive visualizations and again that only showed a benefit when students went back and revisited them in a distributed manner in other words they had to go back separated over time and so this is something that was able to be mined from the log file data this might be considered more of an educational data mining study but there's a really exciting more recent one i wanted to show you that takes this a step further to learning analytics so this is work done by janice gilbert and her team in their inquiry environment and you can see a similar kind of thing where students are filling thing uh little boxes out and you know they can alter an experiment in this case we're going to look at the one on the left where they are filling out some drop down menus to put in their hypothesis about how the amount of heat affects the boiling point of water so using the data from here they've created a dashboard that teachers can access real time while students are using the system and you can see here the the first thing is there's a hypothesizing alert and their system sort of has three phases where we've got hypothesizing collecting data and analyzing data and since hypothesizing is the first step if you've got 62 percent of students struggling here it's a place you want to to investigate so they can dig in they can see the progress and these kind of things have been around for a while how much is someone doing but where it starts to get really interesting is right here so digging in for this test student taylor read they've created a hypothesis that targets a dependent variable to be manipulated um which is obviously problematic right it should be an independent variable and so how are they able to do that well they were able to do that because they have these drop down menus with different variables and in the back end of the system they've tagged which are independent and which are dependent if a student picks a dependent variable it's been manipulated and maybe if we see that a few times you start to understand the kinds of problems students might be having and you can imagine you know if it's just one student then the instructor might do some sort of targeted scaffolding where they reach out but if it's a lot of students having the same problem then it might be time to do a just-in-time kind of lecture or maybe see if there's something else going on that the students are having trouble with so this is trying to show you sort of the whole cycle all at once where you start off with clever design that allows you to collect the right data that gives alerts that are going to let the teachers take action so the other thing to remember though is that data is always a proxy right you see what a student selected but does that really mean that they didn't understand the difference between dependent or independent variable or did two variables sound the same were they not paying attention what were the different reasons and so it's always important to remember that data is a proxy either for how much effort a student's putting in how engaged they are how interactive they're being what kind of knowledge they have their skills emotion etc and so one thing that is important to recognize is that it's very easy to jump to a conclusion from data but it's also important to interrogate what are the reasons that this data might have resulted looking at computation we kind of get to to what people think of as the heart of learning analytics though i think it's just one of several pieces um and obviously we won't go into depth but i was just going to highlight five major classes of analytic methods we see so one we see lot of work with prediction so uh you might have classification or regression and there's a focus on identifying individuals or features that matter and so um you know here on the left you see a decision tree uh that bart rentis and his team had built and it was um trying to identify whether students were more likely to pass fail or get distinction in an assignment in a class based on a bunch of factors and these are often used for building early alert systems that can tell you sort of who's likely to struggle in a course so intervention can happen early and if they're built well you have variables going in that don't just tell you who but might tell you a little bit about why other kinds of methods um don't necessarily have a clear angle they're trying to predict but are about structure discovery so we see work in clustering and network analysis and topic modeling and these are unsupervised models where the notions we're looking for patterns out there and here we're looking for different groups of people or relationships maybe you can find you know a lot of students have a similar pattern in the data then you can say that there's a type going on here um in this particular study this is one that we did of instructors in a discussion forum and we were looking at the different patterns in their conversations so this is a social network analysis and users 1 and 417 are the instructor and the ta but we also see something going up up on the top with user 2 and this turns out to be a super thread while the students were getting to know each other so um you know too many students to read through the discussion by hand how can this kind of analysis let us see where there's interesting areas to dig in further um another kind of analysis we see quite a bit of is temporality so here we might see lag sequential analysis or hidden markov analysis and those kinds of things and the one i have here on the left is a nice example from back from 2013 some work by matt berlin and colleagues and what they were able to do is look at students learning to program and the programming so they i think every certain amount of time every time it was updated they were able to classify the state that uh program was in and it was based on the different kinds of commands and the students here were learning to program robots to play soccer and so you can kind of see that the the program could be in a minimal state when they hadn't done much it could be an active state where all they were doing was telling the robots to do things um it could be in a balanced state where there was a balance between action commands and logical commands um and there's a couple others as well you can see logical when it's heavy on the logic commands is actually sort of the most um mature program and so what they're able to do is match these kinds of states and see how students move between them so you can see that students didn't just start and end up with a you know a logical program they went through a variety of stages and yes they went back and so this is a transition diagram that says you know if a program is in an active state where is it going to go and you can see it might might go back to being minimal it might move on to being balanced and then have a chance to eventually become logical but it's not going to jump over and become a test bed program and so you could imagine you know different states that students might go through in learning science where they have certain kinds of misunderstandings where are they likely to transition and this helps us think of sort of stepping stones not just the beginning and the end but the process in between and how can we see where students are and knowing how to help them intervene to to move to the next step that we're hoping to get them to you can also see visual analytics these are um all sorts of different ways that we can visualize information to help understand what's going on uh what you have over here is some work by ravi batrapu and colleagues with a heat map based on eye tracking data you can also see sand key diagrams which have a little bit of temporality and transition in them and here the idea is to figure out a way to present information visually so that we can see patterns so rather than having the algorithm identify the patterns we do it um and finally we see a lot more work now with nlp natural language processing many different types of this but basically it's how can we instruct information from the language being used generally by students and here there's a nice image of a tool called acura writer developed at uts sydney and what it's able to do is look at the text students generate and identify rhetorical moods you can imagine this could work really well also with argumentation fragrance that we see in stem education so where are students making claims where are they showing evidence warrants those kinds of things um just a brief interlude before we kind of get to where this all comes to fruition which is to mention that the computation ideally should be driven by theory so we're not talking about data mining or even data geology but this notion of data archaeology in other words what is our theory of how students are learning and how do these traces we find you know their remnants sort of about lost learning civilization you can think of them that way right we're not just going in and sifting through the dirt looking for shards of pottery but trying to understand you know was it part of a vase how was it used in ceremonies what was the lived experience that generated this data and theory is very important in giving guidance about what data to collect how to construct variables what might be different groups we need to look at one big challenge in learning analytics and a good guideline is that one size does not fit all all students aren't the same and if you don't know what to split students out on you can develop a model that looks okay but really is hiding some very important things um and i won't go into it in depth but this is a really nice study by tommy sinha at all and it just shows how you can get some really really basic data like whether students are playing a video seeking forward making the video go faster slower and you can aggregate it up to different activities like skipping which might indicate students being disengaged or activities like they're checking back and searching for info re-watching and reviewing content or even clarifying an idea and so it's important to recognize even if our data is low level it doesn't mean that the things we're thinking about have to be uh so finally we get to the part where we start to close the loop and we think about insight and action what can we learn from all of this information and how it's processed that are going to help us do something to support students and i think it's important to remember that learning analytics is very much a socio-technical system where human decision making and action are as important maybe more important even than just getting the technical pieces right so this is a model from some work we've been doing over the last few years and while i can't go into all the pieces in depth i just want to point out there's a lot more there it doesn't say look at data figure out the answer and fix everything right uh it has to do with what kinds of questions instructors are coming into the data with um it has to do with where they're finding answers a big thing is relative and absolute reference points right we know that zero is bad and 100 is good but otherwise if you see students doing certain things in a certain order or you know a certain amount of time how do you know what it means we see a lot of triangulation contextualization making of attributions and then assuming we can understand an answer to a question we thought was worth answering then we might take some of the actions i mentioned earlier where we could do something to scaffold the whole class individual students maybe revise the design of our course one big thing we saw was quite a bit of wait and see which is we've got data that tells us something but i'm not quite sure and so i think it's important to think about in designing analytics how to also um make sure we're transparent enough for what we're doing so that the users of the analytics feel confident making decisions um i always say we need to make this into a need to know not a nice to know situation and then finally i wanted to point out that reflecting on pedagogy is something that's um we've seen that come quite a bit and if you think about the things we collect data on may bring things to the surface that um instructors or students haven't been focusing on and we can think about the data not just as giving answers to narrow questions but also changing the whole way we think about what it means to to learn and and what it means to teach um so just in thinking about insight in action these questions are you know we had the three a's for data i have the five p's for thinking about here thinking about what are the goals of the educational activity what is the point if this is sort of a way kind of a a process you can think of if you're going to develop some analytics what's the goal in the first place what's the point and then what is the process what are productive and unproductive ways that students might engage look like how can the available data or data that we start collecting serve as indicators of these so what's the proxy how are those input outputs going to inform teaching and learning what's the plan and why does it matter how can results improve understanding of learning what's the payoff um i do see lots of analytics get developed that are in the nicene category so i think it's really important to be able to answer all these questions before you're starting to to develop an analytic system um just to wrap up a couple of things obviously this is sort of the the heart of what we're doing but there's a lot of other things surrounding this that are very important to keep in mind in terms of ethics student and instructor agency questions of privacy and more and more questions of equity and questions of equity go beyond simple notions of algorithmic bias but to really think about what are we deciding is the outcome in the first place and what's the right way to get there and not not necessarily assuming it's going to be the same for everybody we've got lots of different populations of students that we we need to care for and thinking about you know there's ways in which once we create these tools we can sort of um set a position where we we have to submit to them and so we want to think very carefully about what we're creating up and how do we leave space for speaking back right for for letting humans bring some of the nuance that the machines may not capture so to close um just to think about uh there's a nice quote from many many years ago about how the clock is a powerful machine that created the products of seconds and minutes and changed our relationship to time right before we had clocks there was no being late there was just showing up and learning analytics is a powerful machinery that creates new data products that can really change our relationship to teaching and learning and i think it can be positive but we we also need to think about the ways in which it might not be um so thinking about how data can inform but not dictate pedagogical decisions how can our pedagogical decisions generate better data and how we can think about this whole list of really really important things that we want to be careful of right there's lots of literature and education on the danger of sort of labeling students and labels that stick and you know learning analytics has the potential to do that as well so um with that i think i will close and say thank you but there is one last thing which is where we've just launched a new research research a resource uh at the learning analytics research network called la 101 and i know you're going to have in be in amazing hands for the next several weeks learning about learning analytics in a much more hands-on way but if you're looking for a curated set of readings videos and tutorials for the different areas of learning analytics to get up to speed uh or to sort of further your knowledge and there's some tutorials in there this might be something you'll want to check out to keep your learning going after uh the next couple of weeks have ended thank you