Transcript for:
GIS Attribute Data Overview

[Music] [Music] attribute data but uh some of the contents on this chapter of course as you already know are solely for the arcgis pro right so i'll talk about that after the fact but uh some of the contents are going to be similar in any gis tool that you actually use right so the reason we talk about that is that uh you may not see some of these contents in qgis but doesn't mean that it's not going to be available in other areas so what is attribute data we talk about the spatial data type so we have points lines and polygons right so those are the three main vector data type and we have an an additional data source which is the attribute table so attribute attribute table is related to that spatial feature class and tries to explain it a little bit more in depth which means that then you have a polygon of the state of california now you wanted to know what is the population within the state of california or what would be the unemployment rate or what will be the the median age or all those different types of attributes those are related to the spatial data that you have you may have another data uh standalone table that is not connected to this spatial data but can be connected right so most of the spatial data they already have that attribute table but doesn't mean that you cannot connect additional pieces of information that can be coming from external sources and just have that attribute and you want to make that connection so let's talk about the tables we discussed that last time as well but in the tables we have these columns usually known as attributes we have an id field so in this case we have object id so the id fill even though it's numerical but it's not a number that we do the statistics it's just a number to identify every single polygon every single line or every single point so this is uh this field is the unique id it's the identifier of that specific shape and we have several attributes for example we have the state code or the the population 2014 all those could be really important when you want to do other types of analysis so we have records which are the roles and we have fields so i think some of you have answered uh on chapter review questions then we asked about how many fields do you see and you just counted the rules no records are the rows but fields are the columns excuse me and then you can in gis you did that in qgis you can do that in arcgis pro the same concept you can edit these fields you can add a new field you can remove an existing field you can do some calculation for example if you want to if you have population 2014 and you have population 2020 you can subtract them and now you can show the the the change in terms of population so you can create a new column and that column you have to identify what would be the type of that column so in the previous lab you had to select the specific format for the type of the column and if it's numerical value usually we have to identify it's an integer which is the whole number or it's the double or decimal so depending on the case you may select any of those but for example if you want to just show the rate of change of course having that decimal value would be useful but if you want to just show the actual value of population you can't have a decimal value you don't have a half percent right so you just even you count that individually you don't count it so these are integer values for population so make sure you understand the importance of data type for different fields there are some other uh attribute formats in specifically in database management system or dbms that you can have and those are known as flat file database and similar to the table you just saw they store data as rules of information but the reason they are doing that is that it makes the uh the process of searching and querying much more efficient so you can easily search for every single row within the attribute table let's let me go back here for example if you are interested in king uh county in texas you can easily just search for that within the flat file you don't have to open up the gis environment and load the spatial data and then search for the specific uh county and get its population what you can do just easily search for within the flat file just search for that king county and just return its population so many of the analysis we do is just searching and querying but not all of the analysis we do in gis are going to be like that so in so in gis we have a concept known as database management system so basically when you have all these databases you try to manage those and make the query process available for your user and also create some reports for the query that whole mindset known as database management system but the way you can store the different pieces of information and adopt the analysis portion would be different so you can have several tape tables on top your more important tables or what we call parents and then you have the the other tables that are related to that but they are not as important the first one so you have the parent customer here and you have all the different types of information about the customer so the parent table is customer but their electric usage or service calls or even billing history all these tables or attributes are child of the customers so there is a concept of parent and child and that type of database management system known as hierarchical so the reason that's not used uh in most of the recent gis packages is that this makes the process of querying less efficient so for example if you want to know what is this specific value for what is the billing history very specific customer you have to start from top you go through this process to get the billing history so you see right away just based on looking at this chart is going to be less efficient however there is another types of database management system which is known as relational database management system so in relational database management system there is going to be a specific field or specific id that relates some of these together so let me go back see here we have object id which is useful but you also have some other codes here you have fips it's a code federal code for every single county you have state code you have county code so any of these could be used because it is going to be unique for that specific county for that specific polygon for loving county in texas the the the county code is 301 and the state code is 48. so that is unique so you can just use that if you have another table for the states i'm going to show you uh in a couple of minutes if you have another table for the states now you can just relate to these two so make that relationship and that's why it is known as relational database management system because you can create those relationships based on any of these any of these ids it doesn't have to be the object it can be any other of these so it makes it much more flexible compared to the previous format the hierarchical but uh at the same time because of its uh adoption in most of the gis packages you just need to understand how the relationships work how you can define different types of relationships so let's talk about that so when you have it is also similar in qgi so don't worry if you the visual is about arcgis it doesn't mean that you can't do that in qgis uh so the the process of joining tables is basically the concept we use in relational database management system so when you have a specific field that is common similar for two tables so you have one table here the u.s estate it is your target table it's your original table now you have an additional piece of information for example you have a recent survey or you have a recent calculation and then you have for every single state you have an additional piece of information additional piece of information now you don't want to maybe separate them you want to be able to do the analysis all at once so what you can do you can just find that common field the common field here is going to be that state fips or state id right so the federal code that we use and that is going to be unique for every single state you have that unique id for example for uh washington we are going to use the or actually for hawaii we're going to use the state fill 15 right so you can just link these together based on that state field now you just build a little bit longer or more complete table which is based on that joint process so the the joint process now raise a specific question here so what would you do if there are more than one state fips here for example we know that the state of hawaii has different islands so what if your joint table has separate rows or separate records for the state of hawaii for different islands that it has how do you join them together do you first count the just aggregation of those and then maybe find a way to put that information inside the table or maybe there is a better way to do that does anyone know how to join uh two tables when you have more than one record for that uh one single record that you have in the target table would you use like a secondary character like uh like a letter no no no you don't you don't use the secondary character the the good thing about gis that it has already had has thought about these situations so you can identify a way that you can combine these together so let's get to that point in a second uh i'll come back to the multiple joists but let's talk about the relationship so that's relationship known as cardinality which means that how we identify the target and join fields the target and join tables and how to link them if it's one to one for every single state in the target table we would have one uh record in the joint table that's great that's the best possible scenario right so we don't get confused the the software can easily just link these together there is no confusion whatsoever but that's not always the case we may have like to say the hawaii i just talked about we may have several islands and that makes a little bit more difficult to do the calculation you may also have more detailed information kind of broken down to the counties for example for the state of washington maybe your team has not done the aggregation so you have every single county in different uh in the state of washington with different names now you just need to find a way to count those or maybe you want to keep that information you don't want to aggregate them at this stage so now here we have the mini to one relationship so in the target field in the target table we have one record for the state of washington but then in the joint field we have more than one so that relationship known as many to one or the other way around when you have more than one kind of different ways you can convert them there's a no difference between many to one and one too many whatsoever when you join them but uh the point is that you're going to get more than one records for the state of washington when you join these two tables right so when you do that in gis gis does ask you how do you want to combine them do you want to keep all of the records which is what we start from we keep all of the records for example for state of washington when when we join the tables now instead of having only one polygons for the or one record for the state of washington we are going to come up with four four polygons or four records but then if you have population if you want to combine them if you want to only keep one record you will get a way to combine or aggregate so gis asks you do you want to calculate the combination based on summation or do you want to calculate based on based on the maximum value based on the minimum value or just keep the value of the original table so there are different ways you can keep the values inside the join table but uh just understand difference between one to one and many to one or one to many relationships so one to one for every single record in the target table you would have only one record in the joining table so when you join these two tables you come up with the same exact numbers of records but in the many to one or one to many relationship you would get more or less uh records depending on the way you do it you are going to get in this example many to one you're going to get less uh records into your joint tables because you're basically going to aggregate these and just put all of them into one specific state but in one too many you are going to keep all of the records you get more records same as here so for example if you think about a very understandable case each student like yourselves you are taking several classes possibly at this stage and each class also has many students so there are different ways you can combine the student list and getting the classes and knowing okay so each class would have some of these students inside it and each of these students would get they don't get all of these classes they get some of these classes so these relationships many too many relationships could happen in real life and that could be the case for customer management system or some of the customer related data so you have different stores and you have different customers you have different products so every single of these would have one table now you want to just build a table that kind of explains these relationships so that will be known as the many too many cardinality question all right and uh just to summarize it the uh cardinality concept is the way you define the relationship between several tables inside gis are basically inside any database management system remember we only focus on the relational database management systems we're not really interested in hierarchical or different types of the database that are available and used to be famous maybe 20 years ago right now we're only using the relational database management system most of the gis packages they have basically i would say all of them they have the relational database management inside them and you can define that cardinality four different ways you can have one to one relationship right so every single record has another record inside the table that you want to join you may have one too many or many to one relationships so depending on the direction you may for every single record you may get more than one inside the join table or maybe for more than one records you get only one record inside the joint table and finally you may have a little bit more complicated version many too many relationship or many too many cardinality so you can have several tables some of them are related different ways and i think that graphic kind of explains that many too many relationship very clearly and just knowing that concept help you when you want to join several tables how you do that so don't worry about the examples per se but make sure you understand the the value of having that because that helps you to create much more uh complete data sets for example census bureau right now i think most of you have already filled out your surveys and put out your information for the census 2020 now what they're doing they're trying to join those information to the previous or historical data so how do they do that so they don't really look at the individual person they look at the aggregation across the census boundaries or the blood groups or the zip codes for example for zip codes they just aggregate all of the values for the ages for the gender for all of those to aggregate it now what they do is just look at the previous level of previous period and know that okay this is the zip code 91786 what was the value for the previous period and just just joined that one-to-one relationship doesn't make sense hopefully it does if it doesn't let me know mark oh no that was good yeah okay great so as you can see i stopped here at 7 14. it's the same for your uh uh chapter slide i put on canvas you don't have you you have the full lecture right but some of these parts of the lecture they are really related to the arcgis pro and arcgis environment so i briefly talked about them but you're not going to see the same exact options in qgis you're going to see different options in qgis so we don't have to really worry about those but just want to make sure that you understand what happens when you create those joins inside the any gis environment so in the previous example we talked about cardinality so what if there are no in okay so excuse me what if there are no thank you what if there are no values for that joint table you know what okay that the state is there you can create that joint but there is no value for the count or for the damage so what happens when you create that join you may come up with some null values which is okay because we still want to keep that location intact we want to still have that boundary of the state or boundary of that specific place or the point that we have you don't want to get rid of it but there is no value whatsoever when you create that join right so make sure you understand when you see these null values after the fact that you do the join that's because there is no value that satisfies your relationship and uh but there is another point here i've seen it too many times when i create a join i forget about the formatting and try to join for example locations and when i create the join the join might actually occur but all of the joined values are null which means that it tells me there's something up with the joint process something doesn't add up something is not working because i know that there's some of them they should have a value and when all of them become null that's a sign that that joint step was not done properly so you just need to go back remove that join make sure you can uh and you understand how the joint or the cardinality works now you can probably redo that step maybe some variable you selected is stored in one table it is stored let me go back here maybe yes here so some of the values here for example state fips it's uh even though you see the numbers here it is stored as text these are the these are not the numbers you identify as values numerical values these are just codes or ids but maybe one individual store those values inside the joint table as a number i mean the format is a number for example is the string and here is the text so when you create that join gis might not uh recognize it and gives you an error or sometimes it does it but all of the join values are going to be null so make sure you understand when you have left uh when you have id values the id values should be shrink even though you see numbers but doesn't mean that those are the the numbers you can do the statistics these are just numbers to identify different uh records inside your database of course you can create joins across multiple tables known as multiple joints but i just going to continue the step here and there is a tool in gis arcgis pro you can easily just create some of these graphs we're going to do that in qgis but using different tools just basically for any numerical value we can create its statistic and then of course you can show its graph kind of show the scatter plot of the value you're looking at and it gives you some examples of uh maybe these are outliers there are something up about these several points that are kind of uh distant from the whole population oh uh we did create a field in our previous lab and uh when you create your field you just need to identify what is going to be the name of it and what will be the type of it there is another information in that put is going to be the accuracy or if it's a decimal uh would be the length of the the decimal or how many decimal points you want to keep so basically identify the number of decimals you want to have because many times it does not make sense to have more than two or three decimals you may do the calculation and get up to 16 decimal values but what's the meaning of that there's no real value inside there's no value in keeping those numbers other than that you are going to increase the storage capacity for that table which means that this is a field that you calculated and then every single value has 16 decimal points and it is going to increase the amount of storage you need to keep that record or keep that value and then think about the gis projects at the enterprise level or at state level or even federal systems those make huge difference you don't have you don't have just one single polygon you may have millions of layers and knowing how many decimals you want to keep is going to be very critical to kind of managing the size of the database and at the same time when you have more decimals when you want to do the analysis when you want to query the layer it's going to take more time because that's the value you need to look at not a simple integer a very important type in gis or any other database known as binary data basically binary data has only two options you either have zero or one and based on that you can identify what will be the case for example if you have measured uh an existence of a specific species in a in a region and you do that for the whole state for every single polygon you have you may have a value of zero and one and if it's one it tells you okay that's pc was in that region if it's zero the that species did not exist so kind of give you the message you don't have to have a text to explain it it's easier to just have a binary data and code it that way to be able to explain the situation so there are some discussion about the storage amount and all those things about the storage which is important but not our discussion just i wanted to mention that after slide 17 i believe or maybe 15 these are not going to be important in terms of the exams or other things but just want to mention this is available on the presentation i posted so it talks about the different types of data sets the the data formats in gis i probably have seen floats and what is the meaning of flow what is the precision for the float numbers how the full number precision changes and we usually use the scientific formation or x or the power 10 value for numbers kind of instead of storing that large number we store that small and then we multiply that by the 10 power of 12. it can give us the same value but it's much easier to store that and the storage amount would be much less and uh slide 730 i would keep that slide just in case in any environment you go you may see some of these formats the same format for example you have short format what is the example of the short format it is a number that you have two bytes binary number what is the range of that is from negative 30 to 500 and negative 32 000 to positive 32 000 so if you have a large value you can't use short so make sure you understand how the ranges also work because if you have a long value you need to use log if you have a large value you need to loan you use lot so i usually use long and never use short because of course storage is not going to be our main concern our main concern is having a specific attribute with appropriate accuracy so if you lose your data because it's more than 32 000 that's not really acceptable i also use double a lot which means that it gives me dynamic formatting i can easily reformat my number even it's a very large number i can easily show it very quickly you have text or sometimes is known as string just need to identify how many numbers of the string or letters do you want to keep for example if it's the name of a city what would be your expectations for the name of the city is it going to be 500 wars or is it going to be possibly less than 10 letters so kind of identifying that helps you to keep your storage uh limited and finally date another format gonna show the the dates for specific events or specific uh values and that could be saved as just a simple month day and year or it could also have the actual time of that event so slide 7 30 talks about even though it says here 731 i think on your pdf is 7 30. talk about the geometry fields so in gis when you have spatial data when you have a polygon the way you can calculate its area or its perimeter is simply by going through the process of using the attribute table and do the calculation you don't want to use the measurement tool because it's going to be more complicated more time comes consuming and of course less accurate for example i can just measure that area by just tracing that polygon but i have to zoom in then just match every single point on the vertex of that polygon which is doable but it's more complicated instead of doing that i can open its attribute attribute table gis already knows the coordinates of every single point on that polygon and when i calculate the the shape area i can easily just get the area of every single of these polygons probably just going to take 10 seconds to do that and but of course if you have too many polygons it might take a little bit more but uh it's much more easy to do that instead of calculating the the the areas or the length based on the measurement items and domains we talked we don't talk about it because it's just a very specific concept in geodatabase we don't use domains in key gis and we talked about the table views before so i'm just gonna skip through that make sure we have everything you want one last thing is that when you want to do the actual edits for the attribute tables you need to enable the editing we're going to do that uh actually you did that last time when you wanted to add a specific column or add a specific field you enable the editing inside the qgis and you are open the attribute table of the the the the attribute table of the special data you want to edit and then you start to do the editing and then finally you saved it so these layers are editable most of them when you have the data on your machine and you have the accessibility you can easily edit them but when you edit the values and you save it there's no undo button that works for that you can't go back and restore the previous value if it's of course local machine if it's the server the database is uh able to capture the different versions of the table but in reality most of the value most of the layers we have are stored locally on our machine so when you do the editing make sure you know what you're doing and you have a backup for the layer and hopefully you also have a good understanding of the process after you do the editing you can't go back and undo those batteries so that's about the lecture for chapter seven about the attribute table