Transcript for:
Hotspot Analysis in Spatial Statistics

hotspot analysis part 1 choosing a conceptualization of spatial relationships the purpose of this series of videos is to walk through a hotspot analysis with a specific focus on choosing the right parameters based on the questions that you're asking and the data that you're using the first thing that we have to do is understand the question that we're going to ask in our analysis according to the CDC the rates of childhood obesity have more than doubled over the past 20 years with an estimated 16 percent of adolescents considered overweight based on their body mass index this analysis will focus on Los Angeles County where I was able to obtain the percentage of overweight fifth graders in over 1,000 elementary schools we're interested in figuring out if there are hotspots of overweight fifth graders so that we might focus our research efforts on those areas and eventually use additional tools to understand what might be causing those spacial trends so with the stage of the problem set let's start exploring the childhood obesity data that I have and see if we can start to figure out what's going on in Los Angeles County each of these polygons represents a school zone which is an aggregation of several block groups that are closest to each of the schools in my dataset the darker green areas have the highest percentage of overweight fifth graders and the lightest green have the lowest percentage from looking at this thematic map I can see a couple of areas that appear to have clusters of overweight fifth graders and some areas that seem to be generally darker than other areas perhaps here in the center of the map but these trends that I'm seeing are really subjective what if I were to change the classification scheme for my thematic map from equal interval to instead of natural breaks and maybe ten classes instead of five now those darker green areas have virtually disappeared and the patterns that I saw aren't nearly as apparent in order to really understand if there are areas in the county with a serious childhood obesity problem I'm going to use the hot spot analysis tool which uses the Geddes or GI star statistical method using hot spot analysis we're going to test to see if these clusters that I see are statistically significant and therefore worth investigating further so I'm going to jump right into the tool and walk through all of the decisions that have to be made before we can hit okay we can find the tool in the spatial statistics toolbox in the mapping clusters tool set the first thing that we have to do is point the tool to the input feature class which in this case is my feature class of school zones that are holding that information that I collected about the rates of childhood obesity in Los Angeles County the next thing that it asks for is the input field unlike a tool like the nearest neighbor cluster analysis tool which determines if a set of points in a dataset are clustered or disbursed based on their location the hot spot analysis tool is testing to see if there are clusters of high values and clusters of low values in your data set this means that there's always going to be an input field in this case the input field is the obesity rates that I have this means that this tool will be testing to see if there are hotspots of school zones with high obesity rates that are surrounded by high obesity rates or low rates that are surrounded by low rates this is what hotspot analysis using the geddes or GI star statistical method is all about so we've already decided what our input feature class is and also our input field now we have to navigate to a geodatabase or folder where we want to save the new output feature class that's easy enough the next decision that we have to make is an important one and that is choosing the right conceptualization of spatial relationships this is where your expertise in familiarity with your dataset and your study area become invaluable tools for decision making the idea is that there is an interaction or an influence among a feature and its neighbors some shared commonality and once we accept the fact that this interaction amongst neighbors is important we have to decide what it means to be neighbors this is what it means to choose the conceptualization of spatial relationships let's walk through the options and see which one makes sense today for hotspot analysis the D fault is the fixed distance band and that's actually a great option fixed distance band uses a critical distance to decide what neighbors to include because that distance is fixed it means that the scale of analysis will not change or it will be consistent across the study area and that's often very important polygon contiguity is another option instead of using a distance to determine what neighbors to include it uses the neighbors that are contiguous or that share a boundary this is another great option for a lot of analyses that involve polygons but let's think about it in terms of our study area in Los Angeles County there are some polygons that are very small like those here in the middle and other polygons that are very large like out here on them on the outskirts in those areas with large polygons using polygons contiguity the scale of the analysis will be very large in the areas with small polygons the scale will be much smaller so if you have polygons that are similar in size over your entire study area polygons contiguity would be a good option that we keep the scale of your analysis pretty constant but for this study area it wouldn't work quite as well another option is the inverse distance option and in the case of hotspot analysis it's really not recommended the reason is that in hotspot analysis inverse distance tends to give you very small isolated salt-and-pepper hotspots meaning that the neighborhoods being examined tend to be too small the last option that I want to mention is the zone of indifference the zone of indifference is similar to the fixed distance band in that it uses a critical distance to decide what neighbors to include the difference is that with fixed distance band the features that fall outside of that critical distance are not included with zone of indifference after the critical distance is exceeded it uses inverse distance to wait the features so it's kind of like a fixed distance band with the fuzzy boundary and in the case of my analysis either fixed distance band or zone of indifference would be great options I'm going to choose the zone of indifference please stay tuned for part two to learn how to choose an appropriate distance band