Transcript for:
IICS Administration and Metadata Management

I'm logged into iics um there are that's the application right so my focus uh today is going to be on uh on the administration aspect as well as on the metadata Administration aspect primarily so the difference between metadata Command Center and administrator is if you were to layer it in in a bottom-up fashion at the bottom is the infrastructure and for infrastructure management use you would use uh Informatica administrator so this is the equivalent if you're coming from on-premise EDC idq background then the it's the equivalent of the uh the administrator application uh admin console uh in in on-premise in the domain so where all the infrastructure is managed the user management of the connection management all of those things are done and on top of that layer on is the is the metadata Command Center which is basically helping manage the all of the the metadata that you want to harvest for your Enterprise right and metadata includes not just the the catalog sources for the governance assets if you want to do any sort of custom Fields like for out of the box assets this is the equivalent of the axon custom Fields this is the place to make the configurations if you want to extend workflows or configure workflows this is the place right so for anything that is somewhat Administration specific and monitoring specifics so you would use the metadata Command Center and it's usually for a smaller group of control higher privileged users and data governance and catalog application is the consumption layer the equivalent of the EDC application right this metadata Command Center the equivalent for the on-prem for the folks who are familiar with on-prem it's the equivalent of the ldm admin all right so let me start with the bottommost layer administrator right so if you're familiar with ISS of course you're already familiar with the administrator but if you're not familiar with Informatica Cloud then this is the place where you set up um your saml configurations you can use the saml configuration here just for authentication or um extension of that is if you want to bring in the users synchronize and create iics user objects this will help with authorization right you can put them in roles and give them different stakeholdership um then you can have all of those users seated in the system before you open it up for for the end users right uh or before you pick them as the individual users or stakeholders within the application within cloud data governance and catalog application so that's the saml part that's the user management part but there is also native user support that is available user groups users both are available user roles are available a bunch of out of the box roles are available you can create your own custom roles additional roles uh depending on how you want to configure your uh your your organizations or your clients organization um uh in terms of uh user management this is the place to uh start with so outside of that uh uh runtime environment is is the is is another critical piece uh all of the compute for metadata extraction data profiling happens on the runtime environment but the same runtime environment gets used or can be used for your cloud data integration jobs cloud data quality jobs or cloud data if you're using MDM Mass injections so the all the applications that run on the secure agent um are a collection of secure agents runtime environment is the place to configure okay uh so connections so this is one difference between EDC uh and uh and cdgc the cloud data governance and catalog uh wherein a common set of connections can be used uh and this gets you the connections that you define here can be used for our CDI or for CMI or for cloud data governance and catalogs so one place to Define all the connections for your organization and it gets consumed in various different applications so similarly there is metering there are a bunch of other yes of course this is all outside of Individual Services but this is a critical piece uh that needs to be set up by the infrastructure administrator uh for the various services to consume so once you kind of go up the stack now you are at the the metadata administrator right all of these are available if they're just a click away right they're all available in the apps which are um so in in metadata uh Command Center uh of course yeah you can create your own catalog Source that's probably the the biggest use that you will find here right so there are multiple sources supported there is jdbc supported as well if you if you do not find the the native scanner that is listed in this in this particular page uh there are also custom catalog sources if you if you do not find a find a source here let me pick SCI base we do not have a native connector a native scanner for uh SCI base if you want uh to extend one of the the models that we have maybe the SQL Server model you can take that and make the required changes and bring that as a custom catalog source so if you're familiar with TDC it's very similar concept you can create the you can start with the meta metadata model definition and then once that is defined you can put a layer around it which is the the custom catalog source and those cutting custom catalog sources appear here and then you can configure them next all right so let's uh um let's move on to the next topic here which is data classification and lookup lookup tables lookup tables uh I said as the name suggests uh a table with CSV values uh like a column um column row values right so you can put the values which can be used uh for evaluating classifications for example if you want to look up uh to see if it's a product code um the lookup table can have all the product codes the classification expression can look up in that table and identify if the value is there then you can mark it as Mark that particular column or a field the data element as as a product code right or anything that that is required for uh for the business to classify which is per perspect right so let me start with one of the catalog sources that is already configured right so let me pick um right here if it is it's a SQL Server one yeah let's start with that so uh this is the place as I said the connections are used so the the connections that you defined in the administrator those show up here for uh the specific Source type that you that you are interested in in this case it's Azure SQL Server uh you choose the connection and you do the configuration so this is where you'll spend majority of the time you can perform various different activities on this particular catalog Source you can extract the metadata you can profile uh you can run column profiles you can run classifications you can enable disable the very various options here you can run relationship Discovery and a glossary Association right so each of these have its own set of parameters to deal with very much similar to how it is done in ADC right so in the concept of runtime environment is something new because here you're just configuring and you do have the option of using one or more of runtime environments in EDC it was all there whatever you have it installed that was only runtime environment available here you have two options one is a serverless execution which is you do not install anything on premise right as the you use the Informatica managed infrastructure and this helps with um with Cloud sources primarily and it can do the metadata extraction let's say from an S3 Source you're not installing any on-premise on-premise runtime environments you are using Informatica managed infrastructure you provide the credentials to connect to S3 and if we out to that particular S3 Source at the defined schedule apply the right set of filters and we we Harvest all the metadata right so that's one option the other option is you can have more than one runtime environments one uh set of one runtime environment can be running uh within the data center uh by on premise probably the other could be running in AWS or Azure VPC depending on how the the data sprawl is it's recommended to install the runtime environment closer to the data source that way the latency issues are resolved and also if it's a cloud Source uh probably the taking the data my out of the system out of that cloud ecosystem May incur cost that can be avoided right data or metadata right so for if it is Data profiling yeah then probably there is some data involved if it is not then it is uh um just the the metadata so another aspect here is I touched upon uh earlier uh is the the programmable objects so this is the advanced scanner configuration if you want Advanced scanning to happen for uh for the SQL database then yeah you can you it's just a flag that you'd need to set so data profiling also has a similar kind of options uh data classification is you create multiple classifications or you use one of the 85 or 86 classifications that we provide out of the box we refresh these and add new classifications every month so after a couple of months you may be seeing 100 plus classifications here in this list ah so you can include all the classifications that you want to run against uh the catalog source that you are going after uh another uh important AIML based uh classification feature that we have is intelligent glossary Association uh based on uh the certain attributes like the the column name under any data element related name comments uh parent name and depending on that we do an automatic Association based on various algorithms with glossary name description aliases that you create right and the association can happen um automatically and you can Auto accept it there are if you uh if you do not want Auto Acceptance to happen yeah of course you can disable it but these associations at that point in time will appear as recommendations then you can manually accept or reject those recommendations but if you enable Auto Acceptance based on a conformance score then it they are automatically Associated and you can go back and of course you can always reject it if you if you think that is an incorrect Association so last feature that I want to highlight here is the relationship Discovery this is another AIML capability this helps with schema matching recommendations or ability to identify any in currently this these schema matching recommendations are made available in the cloud data integration if your developer is building a mapping and in that process if relationship Discovery for this particular catalog source is enabled then recommendations start showing up right so within cloud data governance and catalog we still haven't built the similarity Discovery right so this is suggesting other similar type of columns so our data elements or data sets those have not been built into the product yet but the probably in the in the coming releases monthly releases you will see that that popping up that will help with identifying if you have curated a particular data element a column or a field with a specific glossary Association classification and maybe privacy policy Association a bunch of things now if you want to propagate to all other similar type of columns or data elements then you can push a button and it gets propagated right so that it helps with efficiently managing uh the um the catalog assets uh with the right set of governance constructs that you that you have all right so those are the various capabilities um you can always apply filters and we do recommend you apply filters you start small uh start with a small schema a small database or a small set of tables or views within within a schema so that you bring in only a smaller metadata in the initial phases of the project go through the the Motions of using the product classifying the creating so glossary is associating them automatically once you kind of complete the initial iteration then you expand then you come and remove the filters or apply the filters for a broader set of data assets to be included or excluded right that this the primary consideration here is cost um the metering is based on compute which is the all the metadata extraction that you're doing uh the classification that needs to be done the data profiling that needs to be done glossary Association needs to be done there is cost associated with it and there is cost Associated uh for harvesting the the volume of metadata that you are bringing in and the that can be um if you're bringing in a lot of metadata and then you would need to go back and restart the experiment because you missed something or it's done incorrectly you want to redo it then you're incurring the cost unnecessarily while you're in the initial phases you may want to start small and then expand more so the filters will help with that and these filters are applicable for both metadata and data last but not the least is you can run it on a schedule or you can run ad hoc and let me start with the Run ad hoc here since I've made some changes let me discard that let me pick the click the database the catalog Source here let me push the Run button so all the various capabilities that we had configured all of them show up there is no need this is one difference uh with from ADC there is no need to run all of them in a sequence if you're adding classifications and if you want to just run additional classifications you can just do that of course the very first time the metadata extraction and data profiling has to be completed but after that if you want to iterate uh multiple times to include more classifications or include just the glossary Association you can do that without reaching out to the catalog Source again yeah so in EDC you have to reach out to the catalog Source again that is one enhancement that we have done in cdgc uh and uh also the computation the metadata extraction and data profiling the computation happens on the secure agent or the runtime environment running closer to the data source data classification any of the discovery jobs data classification relationship Discovery glossary Association all of these run within Informatica managed infrastructure right so it gives the flexibility to run them iteratively based on the the metadata and the data profiling statistics you have already harvested you don't need access to the data source again right so that's one part so now this uh second is here you can set the permissions here uh if you want a specific user you may have 10 administrators you probably want to identify only one or maybe two users to specifically work on uh to have the access for that particular catalog Source configuration not the catalog Source I'm referring to I'm referring to the catalog Source configuration maybe I just want Chris Phillips and Eric um to be the only administrators on this catalog Source configuration you can specify that here or if you want everyone in the in that role in that group uh or in the primarily belonging to the role this at governance administrator role then then they can be all of the users belonging to that role can participate uh with the Privileges that are granted at the administrator level yeah so you have that flexibility so uh there is also Parts related to all other capabilities exist here so that's on the catalog source side so data classifications as I said earlier there are a bunch of uh classifications that are available every month if you come and click the import button after we have done our deployments so you'll see some additional classifications are getting imported um similarly lookup tables can also be imported so creation of uh classifications it's it's a fairly simple exercise probably for the folks who are familiar with EDC this may sound like a uh an enhancement that we have done let's say it is a U.S state if if you want to classify a column or a data element as a U.S state then there is a simple um simple expression editor here right so if the name contains let's say state and if you want additional um additional uh rule expansion right so if the value frequently appears in the lookup table um and if the con if the which is whereas you need to specify the lookup table so let me search for one of the lookup tables with the U.S states lookup table and state is the column name and I am looking up uh this particular column the frequent values within that column and matching against the a lookup table if that conformance is 95 percent I am good to call classify this that particular column as a USD yeah so it's way for a non-technical user a simple editor is available they can iterate on this multiple times and they can run one at a time and see if if it is satisfying the the required uh or curating uh this helps with the with the fine-tuning the curation uh so now if you want to complicate this right and the complication may be required for a specific expansions that you want to do right so this is spark SQL expression um that's how it has been designed so we do provide a whole bunch of metadata um I don't think EDC had all of these capabilities and so some of these uh additional statistics that are available those can be used you can also use some of the operators that are available uh built in functions that are available uh of course we looked at lookup tables you can create constants so that uh the rule that you're creating is easily readable if there are lots of places where you are doing let's say this frequent value lookup and if you want to give that a constant name and then use that constant multiple times you will be able to do that right you can validate and then you can put it to test so there's a powerful expression editor that has been provided a simpler one as well as an advanced one so outside of that uh the metadata management part you can also monitor all the jobs that are currently running uh or that you own and you can also within the specific job that gets executed right so you can figure out the the job statistics like in this case I did the metadata extraction how many objects got extracted you can notice that you can also notice that at every job type that you have done glossary associations there were 186 clusters that were associated out of the 276 right if there are any failures you should be able to track that in the in the logs soon we will have the ability to click from here and so that you can go see an object level view of what happened to a particular table or an object that that is going through the the processing right you can download um and send it to Informatica you can do an offline analysis as well right so that's on the monitoring side and if you are familiar with EDC it's a very similar connection assignment concept exists here um for any uh any third party system which has not been cataloged yet those appear as unresolved connection endpoints which can be resolved here in uh in the monitor page so taking a step back so this is not as I said earlier this is not just for metadata for all other configurations uh and the application Behavior cdgc application Behavior can be defined here so first thing is the workflows if you have if you want to configure workflows we are providing uh three out of the box workflows which can be configured for various different types of activities you can visualize the workflow that we are providing out of the box you can go through the configurations here who meaning which type of role needs to be performing the various approval process basically donning taking on the different different responsibilities in the workflow can be configured and that can be configured for various types of assets as well right so in this case um metric and business term both can be both are configured you can add additional types as well right so so you there are out of the box workflows that are that can be configured for various different asset types so maybe majority of the the simpler use cases get covered uh we do not have a custom workflow capability built into the product yet that's on our roadmap to be built once we have that then any sort of complex workflows that you want to build you'll be able to do that and perform the configuration right here uh customizations we briefly touched upon this you can bring in different metadata models uh you can uh like for example if you want to pick up let's say in this case b i right so there is a bi model that we have provided if you want to make modifications to that and upload it as a custom model you can do that and you can build that into a custom catalog Source type foreign that's uh an extension and the third extension is for the existing models you can add additional custom fields so those are the various configurations uh that can be performed and um currently see the cloud data governance and catalog happens to be the first application that is taking advantage of this uh there are some configurations that will come for the cloud data Marketplace as well uh in the coming releases for there is we're also working on cloud data prep um so very much similar to on-premise Enterprise data preparation product EDP product you will be able to preview the data in the cloud data governance and catalog you will be able to get into the preparation mode in uh in the cloud data governance and catalog and the configuration of that will will start appearing in metadata Command Center uh it's not limited to just the applications that Informatica is providing you can build your own custom applications when we expose the public apis and the coming releases you can use metadata Command Center to set up and configure uh and have a custom application built you may be using Informatica provided applications or you may be building you may be just using the apis and building your own custom applications that's uh the the role of the the metadata Command Center to a kind of set up your Cloud metadata platform from then on you can use it with the applications that Informatica has provided or you build your own right so that's this is the middle tier administrator application is the is the bottom most tier which provides all the infrastructure and the top tier is the the applications that that sit on top of metadata Command Center one of them is the data governance and catalog application that we looked at earlier early on