Transcript for:
Overview of DBT Semantic Layer

[Music] look at me [Music] hello hello everyone and welcome to the talk that we've all been waiting for hands on with the DBT semantic lair presented by RPM for the Samantha Claire Cameron afsall and our developer experience advocate for the semantic layer and metrics Cal McCann today we're going to be walking you through the semantic layer and now if you're here you're probably one of two types of person you are either someone who remembers the exact moment when they saw Drew open up that PR saying that DBT should know about Matrix or maybe you've heard about the semantic layer and you're like that sounds cool what what is a semantic layer and this is a talk for all of those types of people and everyone in between we're going to be walking through why the semantic layer how you're going to use it and a little bit about where we'll be going from here now remember to keep the conversation in slack so if you check the slap Channel Hands-On semantic layer post your reactions questions anything for that and then after the talk Cam and Callum will be going through and responding to everyone thank you so much and now on to cam and Calum [Applause] hello everyone thank you for attending today oops my name is Cameron afsel I'm the product manager for the DBT semantic layer hey everyone I'm Callum McCann I'm a senior developer experience Advocate focusing on metrics so I want to talk a little bit about our path to DBT Labs we came at this through two different paths me through the lens of the analytics engineer and cam from the stakeholder side of the house through the lens of product management and before we were at DVT Labs we actually worked together at another organization trying to help our organizations do analysis on their metrics and when we first got there it was the classic startup story uh you know all of the logic for our metrics was self-contained in the sequel behind dashboards it was really hard to understand I'd implemented DBT at a number of other organizations and decided let's implement it here so as we went through and we were able to better understand what our customers were doing we actually discovered that all of our most successful customers were using DBT so for those customers DBT helped them curate Dimensions manage grain and do a lot of the data preparation that was required for them to do Advanced analytics we recognized that if customers use DVT they were probably successful with our product so we started building our product around DBT we even built an integration with DBT Cloud so customers could import their metric metadata and that was when we decided to join the DVT Labs team so there are two key opportunities with the DBT semantic layer that I see first is to move up the stack by defining your metrics in a central place just like defining your models in a central place you don't have to go around and redefine custom queries redefine those metrics you're able to move up the stack which means you can spend your valuable time doing the things that matter the most that you are the best at rather than redefining a query or redefining a metric the second main opportunity is to contribute to the knowledge Loop by adding more information to your DBT project you don't have to spread that information out across a bunch of different tools you're able to centralize it in one place we know that a lot of you all like doing that with your models we would like to enable you to do that with metrics as well so now we're going to talk about how the DBT semantic layer extends the value of DBT so this might look like a familiar question you can place Revenue with any metric that you care about but I know we've all heard questions like what was Revenue in June and it sounds really simple right it's just it's just Revenue but a lot of times people might be calculating that in different ways with different filters at different time grains and the most pernicious thing is when people get different answers for this one metric what was Revenue in June we've seen this problem proliferate as people have new tools new ways of looking at data and more teams are also interacting with with that data and so and my my world in the product world we might have a question like how many active users did we have last week Callum and I encountered this at our last company and people would come up with different answers and how can we run a business like that we've heard similar things from our semantic layer beta customers so we've heard that it gets confusing when people are using different metric definitions that if you have to go update them in different tools you're bound to miss one that discoverability is a challenge for those stakeholders not just looking for metrics but also for Relevant data sets and that folks want a controlled interface so they can answer more questions so that's why we built the semantic layer when you ask questions like what was Revenue in June everyone is pulling from the same Revenue metric everyone is using that same time Grain on a monthly basis everyone's using the same filters the same dimensions it's all coming from a common source of Truth and as you can see at the top different tools and your data stack are going to be integrated with the semantic layer so now you can hear from the same beta customers what happened after they tried the semantic layer so they they said that the semantic layer can help them a lot that they want everyone in their organization to start leveraging DBT metrics that they can even see themselves moving everything to the semantic layer from look ml and from that same user who wanted a controlled interface they see the semantic layer reducing the speed to valuable data products and improving long-term governance so note in that last quote we're not just talking about Native Integrations with the semantic layer but you can use the semantic layer as a developer platform using our proxies and our apis so now we'll talk about how we're going to extend the DBT workflow we've talked at a high level about the semantic layer we're going to steadily zoom in and in so the DBT semantic layer is the interface between your data and your analyzes it's a platform for compiling and accessing DBT assets and downstream tools and note that I didn't just say metrics that's a hint for later so the product architecture of the semantic layer is composed of three different components the first is DBT metrics which we've talked about defining metrics in your DBT projects that's already out there I'd imagine some of you all played around with it we're going to give you ways to leverage your metrics now the second is the proxy server that allows you to compile DBT SQL queries execute those against your warehouse and then receive that data to your integrated tools and finally we have the metadata API which allows you to pull in metric definitions and model metadata as well to integrated tools on the right you can see how they all fit together so we see the semantic layer extending workflows in two different ways the first is it with ubiquity we want this Mantic layer to work with all of the different tools that you use and meet you where you are as well as your stakeholders extending that DBT dag to Downstream tools the second is with utilities when our partners are developing an Integrations or when we're equipping you with apis and connectors we want them to be meaningful Integrations that improve your workflows and here's the array of partner Integrations that we have so far so mind you this is just in the past few months this slide is going to be exploding with logos when we ga next year so the top Partners at the top import metric definitions and query their data so we have thoughtspot hex mode deep note atlan houseware and light Dash the partners at the bottom import metric definitions or help you load them and so a lot of them are also currently developing additional Integrations if you don't see a logo of a tool you use on this slide please reach out to your friendly vendor talk to them about developing an integration we want to build an interconnected ecosystem and be that common layer we want you to be able to use your metrics and models across all the tools you like using and we're open to working with those folks all right let's walk through an example a user flow for the semantic layer first a data engineer might use packages to load metrics from their CDP data sources an analytics engineer could then jump into the cloud IDE Define additional metrics in their DBT project a business user is able to pick up from there look at all those metrics that have been defined within their data catalog and decide which metric is right for them then a business user could select that metric and dimensions in a reporting tool they know what it means they preview the value they trust that it's verified and they can run their own analysis so no more asking for you know as many at least ad hoc dashboards right let's say the business user has additional questions a data analyst is able to pick it up from there do a deeper dive maybe use some additional algorithms to understand the data more and finally a data scientist could monitor you know say the revenue metric for anomalies using an observability tool the semantic layer is powering all of these different use cases so I want to talk a little bit about why this matters to everyone in the room the analytics Engineers the people online all of us as a community so historically DBT has been a tool for analytics Engineers it's given us the speed to serve our organizations that we've never had before but that speed has led to the proliferation of data assets and that's introduced another issue consistency across all of your consuming experiences and so DBT decided to focus on the semantic layer to address this consistency we want your organization to have a unified world view and in so doing we can address one of the fundamental data problems that I think everyone engage has probably seen at some point of their life the why doesn't Revenue over here match Revenue over here that late night email you get from your CFO freaking everyone out and sending them on a spiral to figure it out we saw this in our last organization like cam mentioned where our internal tooling and our analytical tooling handled time in slightly different ways which led to slightly different calculations of our weekly active user metric I was then called in to figure out what was going on it took me about a day to debug what the actual issue was it took me months to rebuild the trust in those metrics and that's the most important part rebuilding that takes so long and we don't want analytics Engineers to have to worry about that we want you all to be able to spend your time on higher leverage work delivering value you to the business so that DBT handles consistency for you so we've talked a lot about high level stuff and now we want to get into some demos of what this actually looks like so we're going to walk through some of those four use cases or four of those use cases that cam mentioned earlier defining in the flow of an analytics engineer in DBT Cloud discovering in the catalog use case with atlan reporting in the bi use case with mode and analyzing with hex so in this you can see what I'm doing is I'm going into my packages.yaml file in cloud and adding the metrics package this allows me to interact with my metrics I then move over to a yaml file where I've actually defined metrics I've given it a name a label is a human readable version the model that it is dependent upon and any descriptive information that I wanted to include the calculation method which is the aggregation that we will then apply on the expression which is the SQL expression the timestamp that will be joining to a date spine and any of the time grains that we want our users to be able to aggregate upon and finally a list of dimensions and this is actually a subset of all the dimensions in the model that we want and believe will deliver deliver value to our stakeholders Additionally you can open up the lineage and see it flow all the way from your sources to your staging models to your intermediate models all the way up to your Marts and all that feeds into the metric definition itself if you're actually interested in interacting with this metric itself inside of DBT Cloud you can go and use one of those macros that I mentioned earlier defining the revenue as or defining the metric as Revenue including the Grain and when you hit preview it will return a data set at that selected grain with that metric value and that keeps everything consistent across all the consuming tools Additionally you can see the compiled code in a friendly DB tonic way not as something that's a little confusing and understand it follows all those best practices so you all can understand it now we're going to talk about some of the prerequisites that are required to get this set up but if you are interested today you can go to environments navigate to your deployment environment go up to the edit button hit that and you'll be able to scroll down to semantic layer hit that switch and get the URL that you would need to set this up in all of your tools there's a lot more documentation on the documentation site I want to give a huge shout out to the docs team so if you have any questions I would navigate to there but I'm going to hand it back over to cam to talk about this through some of our partner Integrations thank you Callum all right so let's go back to our example flow before you know let's say we've got a finance team a data team working together to understand changes in Revenue over time so I'm a business user right I want to understand what my analytics engineer has defined in the DBT project and we've loaded up our metrics into Outland which is a data catalog so here I'm able to go into atlan I can see all of my different assets look at the raw tables metrics models Etc I'm going to select my metrics here and I can see a list of the different metrics from my imported DBT project I can see which ones are verified which are draft which I shouldn't use which I should and I'm going to open up my Revenue metric here this loads the different details the metadata I've defined in the DBT project and there we just queried the semantic layer to pull a preview of the annual revenue pay special attention to this 4.4 million figure for 2019 we're going to see it again in a minute so Atlant also loads up the lineage of the metric pulling from the dag that calendar showed so I can trace back and see what tables and even what other metrics the metric is dependent on I'm tracing that all the way back to the source atlin also takes care of column level lineage so I know how the different columns from the models flow into the dimensions in the metric this allows me to trace dependencies and fully understand what is being measured here atlen also loads up the different models for my DBT project so I can look at their metadata and by extension understand the metrics better all right so I found my Revenue metric I know what I want to analyze I'm going to hop into my other tool here and dive a little deeper all right so now I've got my Revenue metric I'm going to jump into mode and mode allows me to create a report and visualize this metric in a custom way share that with my stakeholders so I'm going to start by creating a report from a DBT metric I'm going to choose the same Revenue metric here I recognize it from the list and create a bar plot to look at it over time so mode is doing here is loading up the metric from the semantic layer and auto generating a chart of the metric broken down on a daily basis I'm going to break it down annually though because I want to look at it a little higher level and let's check and just absolutely make sure it's symmetric we want there you can see the 4.4 million figure again we're avoiding the time grain issue that Calum talked about that we've encountered in the past we're going to trust this value I want to break it down by location and have a little more detail here to share with my stakeholders great so now I can visualize the annual revenue by location location was actually one of the dimensions that the analytics engineer curated for me so I don't get overwhelmed with the list of like 50 different dimensions I'm able to add this chart to my report and then share that out with my stakeholders but there's a little bad news at this point I do have more questions I want to dive a little bit deeper so I'm going to hand it off to my data analyst to do a deeper dive in HEX awesome so now I'm a data analyst supporting the finance team and I'm jumping into hexia which also integrates with the semantic layer I'm able to load up a DBT metric cell under the transform menu and I can see my list of metrics here for the semantic layer connection you know break it down by year and let's like absolutely triple check that it's the same metric and we're getting the right value you can see the same 4.4 million but now I want to break it down at a little bit of a finer grain so I'm actually going to add another metric as well I'm going to add the company the customers metric and look at it on a weekly basis great so I get a little more information here I'm able to choose multiple metrics I can choose the grain I can choose Dimensions as well as secondary calculations run that again and I'm going to visualize this and keep diving deeper from there so hex like mode allows me to visualize the metric changes over time and here I'm going to break it down on a weekly basis pull in the location and there we go we can even see some some seasonality weekly changes but another special feature of a lot of the different Integrations including hexes is that I can write in DBT SQL I can pull from a model as well as a metric so here I'm going to select a preview from my orders model I can also use macros here so I think the sky is the limit for where you all can take this we're really excited to see what you play around with so I'm able to pull from that model and I can use macros there as well so that's just a little bit of a preview of some of our partner Integrations there's a lot more to see they're continuing to develop them we're working really closely with our friends in the ecosystem and we would love to hear from you all as well what you would like to see and keep working as a community all right so now we're going to talk a little bit about the product roadmap what we've shown is just the first step this is public preview this isn't GA at right it's a stepping stone to where we want to go given feedback from you all so our roadmap has three main focus areas the first is to enable efficient and comprehensive modeling of your metrics and your models the second is to provide a reliable and accessible experience in DBT Cloud we want to be a central infrastructure in your data stack and earn that right and the third is to nurture an ecosystem make it easier to develop Integrations with the semantic layer and support proxy in queries to more data platforms and our ultimate goal is to power new and improved workflows for you all so I want to talk a little bit about uh today's metrics leading into tomorrow's semantics right now with the functionality that we've released this week you can Define metrics in your DBT project with things like metric Dimensions filters grains you can query secondary calculations in all of these Integrations but like Cam said this is just the first step in a cohesive semantic layer we've received a lot of really great feedback from our early adopters and our beta customers around them wanting to encode more information into their DBT project things like relationships column hierarchies and we've taken this feedback and it's really helped us understand what the roadmap is up to GA and even beyond that so we have a few ideas that we want to you know present to you all here but we also want to make sure that Community Voices are heard so some of the ideas that we have are you know things like entities which could be a higher level abstraction on top of your models that allow you to Define relationships between different models that allow you to Define data type Types on Dimensions have metrics associated with them and allow metrics to Traverse through that graph and pull in Dimensions from other entities we also are really interested in improving the developer experience with metrics things like versioning metrics Beyond just the versioning that is self-contained within Version Control so you know if you have to switch to version two of a metric will there be a deprecation period testing metrics to ensure that there is consistent numbers as you perhaps change the definition or as you go along to ensure that there aren't underlying data issues and development environments by having you know your semantic layer understand working in a development environment and a production environment and like I said this is just the first step in a long road map of things that we want to work on but we really want to make sure that the Community Voices are heard because you all in this room and online have the ideas that are really going to up level this so if you have ideas of things that you really want to see that you think would provide value to your organization please join some of the slack channels that we'll mention at the end of this talk join and open issues in GitHub or just personally send me DMs I am very interested to hear your ideas and we want to work with the community to make sure that this functionality and this product works for all of you great so Callum covered the road map that we're looking at for DBT core and the package ecosystem which is inextricably linked with the rest of the semantic layer I'm going to talk a little bit about what the semantic layer roadmap looks like for DBT cloud we are providing the semantic layer as highly available infrastructure we're currently working towards 99.9 aggregate availability we're going to keep iterating towards being highly available and be part of your mission critical analytics flows we're also working a lot on query performance so hitting 100 millisecond P95 compilation overhead that's a fancy way to say when you run a query through a semantic layer you're not going to notice a huge change between that and how you usually run a query right we're also going to be continuing to improve Enterprise access management making sure that we're meeting the needs of the largest and most complex organizations as we provide this service we're going to continue to also iterate on an accessible setup experience this is where we'd love to hear from you all as you get started with the semantic layer so that looks like easier environment setup and management could you set up the semantic layer for a particular deployment environment and better workflows and the IDE so I know Nate is presenting about the ID later today we think that developing metrics and ID is a really unique Prospect and we want to make it easier to Define that yaml and test those metrics out in one central place in DBT cloud the semantic layer also couldn't realize its full value without the ecosystem so we're so grateful to working with our integration Partners we want to make it easier for you all to integrate with us so we're going to build new and richer apis as well as other functionalities to make it easier to integrate with the semantic layer for users we're going to make it easier to manage your Integrations in DBT Cloud so seeing a list of those Integrations understand how they're used audit logging things along those lines so you understand how folks downstreams are using those Integrations that could also look like things like Auto exposures we currently have exposures and the semantic layer is a way to really leverage those understand where is this metric being used and which dashboards the semantic layers way to programmatically do that so you understand those dependencies and which metrics are most important to folks in your work and finally one of the biggest areas of investment for us is proxying queries to more data platforms we support proxy inquiries to Snowflake and this public preview release and we are going to be building support for a lot more data platforms such as redshift bigquery databricks and Beyond this is going to be a huge area of focus again if you're interested if you have strong opinions about any of these areas of the roadmap we would love to hear from you we directly take your feedback into account as we're continuing to iterate on the roadmap but we wanted to give you a preview of what's coming up next all right if you're anything like me you're wondering okay I've got the high level you know I understand what this mental layer is can I get my hands on it and the good news is you can so to get started you'll want a DVT cloud account you'll need a team or Enterprise plan to use the metadata API but you can use the the proxy server with the developer account as well currently we support the snowflake data platform we're going to keep iterating on that adding more support in the coming months you're going to need to use a project with core 1.2 plus and the metrics package and currently no environment variables but support for those is coming within the next couple of weeks so check out the user docs and our launch site in slack for more information as calm noted we've got an awesome docs team I would also like to stop and thank our engineering team for doing an incredible job this has been a very complicated product to put together and we're really proud of what we've done and are excited to continue to iterate with you all right so next steps please try the semantic layer join these slack channels we will be there um like all the time engage with tech Partners to invest in Integrations if you didn't see a logo on that list we are open to working across the ecosystem and please provide feedback on your experience and the direction we presented today thank you all so much for attending we now have time for Q a foreign so I think we got a little bit of time if there are any questions that anyone wants to flag on slack or in person we got some time yeah so for anyone who's online who didn't hear the question he was asking a question about OSS or open source and the DBT server component of the semantic layer uh yes that if it's not available my understanding is the DBT server will be a source available within the next couple of weeks the team is preparing the repo to share it's under a BSL license so non-commercial but you'll be able to to try it out and use that source how is it was the question how is it paid for yeah so the question was how does it work commercially how do we make money I'll pass it off to cam because I'm the open source person that's a fun question okay so I'll put it this way for the public preview we want you all to try it out and give us feedback like we noted developer accounts can use a proxy server team and Enterprise accounts can also use the metadata API so a lot of those full-fledged Integrations that we showed earlier but try it out give us feedback and as we study usage patterns we'll figure out pricing from there but during the public preview period don't worry try it out and give us feedback so the question was does the proxy execute the sequel or does the proxy send the SQL to the data warehouse and the warehouse executes it I can answer that one so what the proxy does is it receives the DBT SQL from whatever the client is it compiles it down to the underlying Warehouse SQL sends that to the warehouse retrieves it and transmits the data back to whatever the consuming experience is through correct results that come from the warehouse do go through the proxy server um do are there is there anyone that's on the slack that might have flagged one of the questions that's in the slack Channel I'd love to make sure that we include everyone who's online do you have a question from the online Channel perfect go for it is there support for custom calendars yes there is you can find information about that in the DBT metrics package you set that with a variable in your DBT project is there any build-in method for like caching data sets so that you can like trigger a refresh for instance if you have data that's changing a lot and it's a large data set yeah the question was are there any mechanisms for caching data uh there aren't right now but if you have strong opinions on that and you think that's a feature that you really would need for your organization thank you for already letting us know uh but already continue to let us know as we move forward that's that's definitely on our radar I'd say as we're iterating on query performance um so yeah if you have any specific use cases if you can flag those we'd love to hear you know whether it's like diffing based on change data but definitely hear you there and that fits within our roadmap for sure question from the community uh yes why not just Define these as models at why Define them at metrics at all um there's a great blog post about it the author is standing right here you can go and find that someone will probably share it in the slack channel uh but the answer to that is you want to retain the underlying flexibility of the lower grain data if you have to aggregate in a model to whatever the metric level is you're implicitly making decisions for your stakeholders that they want to consume data in a certain way if you aggregate to a monthly level in your underlying model and it's statically represented as that you're not giving users the ability to then go down to the week to the day or slice it by any other combination of metrics that they may be interested in defining it as a metric gives them the flexibility to drill into the specific questions that they have and consume the metric in the way that is actually relevant to them because everybody thinks differently and so the way someone in your Finance team and maybe your marketing team may consume the same metric will be different based on the context that they bring any other questions go for it yeah so the question was how does this look for companies that are currently using look ml right now there is no integration with lookml we're very open to working with the looker folks but there are elements where they're overlapping functionality that we definitely need to figure out um that would be something that we're open to chatting with them about and hearing from the community if that is a path they want us to go down and I'd say if there's um something that people want to see in DBT metrics or DBT projects that they can't do but that they can do in Locomo we'd love to hear that about that as well so you're introducing reducing across the server connection with your clients point where there are certain location definitely so if we detect that a queries coming in that doesn't contain DBT SQL we're not going to run it through the compiling process so you won't see that increase in overhead although we are continuing to iterate on that overhead so it'll be practically imperceptible anyway the question was by the way at large organizations should we potentially break out connections to the semantic layer that use metrics and a different connection that goes straight to the data warehouse the answer is we're thinking about it and no question awareness so the question was how do we handle the proxy server being hosted in different regions So currently we support customers who are hosted in the us but as we expand DBT Cloud to other regions for instance we spun up that email instance we're going to be extending support for the semantic layer to the Maia instance so that that data can stay in the EU and so if we have a critical mass of customers in Canada for instance where the data needs to stay there it would have to align with the DBT Cloud roadmap overall but we would follow that pattern question uh yeah definitely so there's a great chart in the documentation that goes over what the individual components are and where they're Associated but the way you can break it down and we have one on our slides as well is DBT core contains the metric definition itself so you know when you go in and you define a metric node that architecture and that API spec lives inside of DBT core then you query that with DBT metrics which is an open source DBT package that you install then there is the DBT server and the DBT proxy server which are two different components server will be Source available as we talked about there's more information on the documentation and the proxy server is proprietary and part of dbg cloud and then the metadata API is also part of DBT cloud and that's how you get access to metric definitions and any other metadata about your project that you would you know potentially find in things like the Manifest any other questions go for it ah this is a question of uh variation of models versus metrics what about metrics that aren't Time series so our opinion is that most metrics that are valuable to the business which is the specific subset that we really wanted to focus on are not always but almost always time series metrics you want to track them across the time of the business break them down if there are metrics that aren't that or if you are interested in seeing a Time series aggregated at you know a non-time grain that functionality does exist you just need to define the time grain of all time inside of your uh inside of your metric definition and that allows you to see the metric definition at all time of whatever the slice is that you've looked at of Dimensions that you provided to the macro alrighty I don't see any other hands up I don't know if there's any other open questions on the community but we can get to the slack we'll give everyone some time back and I hope you all have a wonderful day both in here and online thank you all so much for attending [Applause]