Transcript for:
Analyzing Large N Data in Conflicts

hello everyone thanks so much for coming out I'm going to do this I'm going to use this video in a broad spectrum of different classes because the goal here is just to understand two quick things first what is large n data this is data set particularly for US foreign policy National Security just things that have to do with international relations so this is very very important so first this is what we call Laren data set look at how many data points you have 1,000 848 so this obviously is very different from a case study where you might do one or two let's say why diplomacy worked in one area and not another why was there a revolution in one country and not another this is a large end data set now we're going to ignore that I don't know why they have one two but it's a pretty interesting data set on where you have a conflict type region in death estimate the goal here is to understand this is conflict type there's three types and this is the death estimate so you can ask an interesting research question let's say you're a foreign policy advisor not just for the United States but any country and you say what kind of conflict creates More Death so I wrote here there are three types interstate is a conventional conflict between states like the Ukraine and Russia an internal conflict government against Rebel groups like in Colombia right to Rebel groups fighting a rebel group fighting against the government but because because of the peace process that ended but there's still issues there internationalized government against a rebel group that has an outside force helping it like Sierra Leon where uh Sierra Leon was having um problems with Rebel groups but it was being supported by Liberia Nicaragua the Contra War Etc now I don't know who coded for this but you know we're just going to use it as it is there's always problem with coding but you know so you have different conflict types we have those three so you can see internal a lot of inter internationalized um Etc so you're going to see that's oh that's a lot of internal and internationalized but there is Interstate as well I know in in the data set so what the goal here is is going to do is we're going to see which one creates more conflict so that is the question so the conflict type is the independent variable and the death estimate is the dependent variable that's how you would say so if you are doing large end data science with foreign policy National Security Etc that would be the question so what we're going to do here it depends on the class you're in but you don't have to necessarily do this but this is just what we do we'll call it something like I don't know what do you you people like I don't know we'll call it um a lot of students Vape I'm sure so we'll call it vape we can call it anything you want this is using r r Studio R is the statistical environment our studio is the user friendly interface that makes it a little better looking and then LM that's a function called linear modeling so we're going to see so the independent variable comes first which is the death estimate all you got to do is type it up here and you see it see death estimate and then tiatia is regressing on the conflict type so we want to know what the conflict type is here it is and what's the name of our data data equals what's the name conflict National Security so right there you figure oops where is it up here because that's the data set so this is a pretty big data set there we go I ran it it's called Vape LM is the linear model death estimate conflict right type is uh oh yep death estimate is here the dependent variable the result and the conflict type here is the type of conflict so and the data is National Security very interesting so we want a summary of the data because this is a large end data set is is not case studies you have all these data points and you come up and you say Vape see here it is having a party with the Vape and there are your results and I can hit this button we won't be able to see this and then there you go it's very interesting so this is Interstate internationalized against internal which is the base here since these are three nominal variables you won't have to know all this and they're basically comparing to internalized the reason why internalized is the first one the base is because that word is first in the alphabet that's what R does so when you look at it our results are pretty significant meaning the type of conflict matters the T value is very high five five um the standard of error which is essentially our um Precision of the of of the data set in the sense that we can't pick every single solitary conflict so it's kind of a sample of out there and that's representative our sample but this is very important this is our practical value which shows uh with every unit increase of essentially uh the internationalized or Interstate you're getting an increase so if you look at internationalize is very high and this is very high and they're positive so that means that it's going up and then you've got these here so what you have here is and you can see right here with that huge number that interestate seems to be the highest right look at that standard uh I'm sorry the T value Etc so we really don't um need to know everything here because this just kind of introducing you to it but this is exactly what you would do in large in data a lot of people say they don't like it right they don't like statistics but it's very interesting just to see how oh wow Interstate seems to be very very significant higher look at that that's the estimate so that's saying basically in compared to internal you're getting what 4,346 on average deaths more than the internal and then you've got internationalized you're getting 1,430 to uh deaths uh average with every increase um so you've got this this is pretty intense showing that these two are significantly higher death rates than uh on average than internal so but how can we see all of them compared well that's where we use this amans function and you've got a means and what do we call Vape Vape we're going to do vape you don't have to necessarily know how to run this depending on your class but that's peer wise so we're going to see all of them compared the the um independent variables conflict type it's let's take a look and this is all of them compared so you see they're all statistically significant meaning because the P value is less than 05 they're all statistically significant meaning that the conflict type matters it matters so this is very important to understand in international relations National Security let's say us for policy do we get involved in this crisis well you have to understand that this crisis leads to a whole lot more deaths so you take for example this internal verse internationalized that's significantly lower internal because it goes by the first one here and that's a negative so that means there's less and it's statistically significant which means internal in internationalized is more internal here right again being compared now to interstate is also negative which means that there's more Interstate deaths relative to internal and then you've got intern internationalized and Interstate which means interstate is more because this is negative and that goes with internationalized so what do what does that mean all this statistically significant P value under 05 very high ratios they're all negative here uh which means the first one is lower deaths is basically saying that in Conflict you have three types here internal which is just the government versus non-state actor internationalized which is uh government versus non-state actor Rebel group Etc that's being supported significantly from an outside power and Interstate which is like two State powers like the Ukraine and Russia so it's basically saying internal vers internationalized this has more deaths because internal is negative and it's statistically significant interstate and internal is statistically significant because Interstate obviously has more than internal because that's negative and then internationalized versus Interstate because that's negative that's negative reflecting this that means Interstate to States like the Ukraine and Russia and other states have significantly more um deaths so you have to understand that so if you want to get into the Ukraine Russian war correct you need to understand that that's going to produce a whole lot more deaths on average than these other types of conflicts I know it's kind of difficult but the goal here is to kind of understand for what a big data is this is what big data is conflict type you've got three you know you got one independent variable conflict type then three attributes of it internal internationalized and inter State and then you have the death estimate which is the deepend of variable the result and after running it statistically speaking this is not a statistics class necessarily uh depending on which one you watch it is that you see a statistical significance between the death estimate and the conflict type so conflict type matters and I was able to change my mouse to show how smart I am to Blue look at that pretty nice huh well thanks every one for coming out and listening to me