Hello, hello everyone and welcome to the talk that we've all been waiting for. Hands-on with the DBT semantic layer. Presented by our PM for the Semantic Layer, Cameron Afzal, and our developer experience advocate for the Semantic Layer and Metrics, Cal McCann.
Today, we're going to be walking you through the Semantic Layer. And now, if you're... here, you're probably one of two types of person. You are either someone who remembers the exact moment when they saw Drew open up that PR saying that DBT should know about metrics, or maybe you've heard about the semantic layer and you're like, that sounds cool. What is a semantic layer?
And this is a talk for all of those types of people and everyone in between. We're going to be walking through why the semantic layer, how you're going to use it, and a little bit about where we'll be going from here. Now, remember to keep the conversation in Slack. So if you check the Slack channel, hands-on semantic layer, post your reactions, questions, any anything for that and then after the talk, Cam and Callum will be going through and responding to everyone. Thank you so much and now on to Cam and Callum.
Hello everyone. Thank you for attending today. Oops. My name is Cameron Afzal. I'm the product manager for the DBT semantic layer.
Hey, everyone. I'm Cal McCann. I'm a senior developer experience advocate focusing on metrics.
So I want to talk a little bit about our path to DBT Labs. We came at this through two different paths, me through the lens of the analytics engineer and Cam from the stakeholder side of the house through the lens of product management. And before we were at DBT Labs, we actually worked together. at another organization trying to help our organizations do analysis on their metrics. And when we first got there, it was the classic startup story.
You know, all of the logic for our metrics was self-contained in the SQL behind dashboards. really hard to understand. I'd implemented dbt at a number of other organizations and decided, let's implement it here.
So as we went through and we were able to better understand what our customers were doing, we actually discovered that all of our most successful customers were using dbt. So for those customers, dbt helped them curate dimensions, manage grain, and do a lot of the data preparation that was required for them to do advanced analytics. We recognized that if a customer used dbt, they were probably successful with our product.
So we started building our product around dbt. We even built an integration with dbt cloud So customers could import their metric metadata and that was when we decided to join the dbt labs team So there are two key opportunities with the dbt semantic layer that I see. First is to move up the stack.
By defining your metrics in a central place, just like defining your models in a central place, you don't have to go around and redefine custom queries, redefine those metrics. You're able to move up the stack, which means you can spend your valuable time doing the things that matter the most, that you are the best at, rather than redefining a query or redefining a metric. The second main opportunity is to contribute to the knowledge loop.
By adding more information to your dbt project, you don't have to spread that information out across a bunch of different tools. You're able to centralize it in one place. We know that a lot of you all like doing that with your models.
We would like to enable you to do that with metrics as well. So now we're going to talk about how the dbt semantic layer extends the value of dbt. So this might look like a familiar question.
You can place revenue with any metric that you care about, but I know we've all heard questions like, what was revenue in June? And it sounds really simple, right? It's just revenue.
But a lot of times people might be calculating that in different ways with different filters. filters, at different time grains. And the most pernicious thing is when people get different answers for this one metric. What was revenue in June?
We've seen this problem proliferate as people have new tools, new ways of looking at data, and more teams are also interacting with that data. And so in my world, in the product world, we might have a question like, how many active users did we have last week? Calum and I encountered this at our last company, and people would come up with different answers. And how can we run a business like that?
We've heard similar things from our semantic layer beta customers. So we've heard that it gets confusing when people are using different metric definitions, that if you have to go update them in different tools, you're bound to miss one, that discoverability is a challenge for the stakeholders, not just looking for metrics, but also for relevant data sets, and that folks want a controlled interface. so they can answer more questions.
So that's why we built the semantic layer. When you ask questions like, what was revenue in June, everyone is pulling from the same revenue metric. Everyone is using that same time grain on a monthly basis.
Everyone's using the same filters, the same dimensions. It's all coming from a common source of truth. And as you can see at the top, different tools in your data stack are going to be integrated with the semantic layer. So now you can hear from the same beta customers what happened after they tried the semantic layer. So they said that the semantic layer can help them a lot, that they want everyone in their organization to start leveraging DBT metrics, that they can even see themselves moving.
everything to the semantic layer from LookML. And from that same user who wanted a controlled interface, they see the semantic layer reducing the speed to valuable data products and improving long-term governance. So note in that last quote, we're not just talking about native integrations with the semantic layer, but you can use the semantic layer as a developer platform using our proxies and our APIs. Now we'll talk about how we're going to extend the dbt workflow. We've talked at a high level about the semantic layer.
We're going to steadily zoom in and in. So the dbt semantic layer is the interface between your data and your analyses. It's a platform for compiling and accessing dbt assets and downstream tools. And note that I didn't just say metrics. That's a hint for later.
So the product architecture of the semantic layer is composed of three different components. The first is dbt metrics, which we've talked about defining metrics in your dbt projects. That's already out there. I'd imagine some of you all played around with it.
We're going to give you ways to leverage your metrics now. The second is the proxy server that allows you to compile dbt SQL queries, execute those against your warehouse, and then receive that data to your integrated tools. And finally, we have the metadata API, which allows you to pull in metric definitions and model metadata as well to integrated tools. On the right, you can see how they all fit together.
So we see the semantic layer extending workflows in two different ways. The first is with Ubiquity. We want the semantic layer to work with all of the different tools that you use and meet you where you are, as well as your stakeholders, extending that dbt DAG to downstream tools. The second is with utilities. When our partners are developing integrations or when we're equipping you with APIs and connectors, we want them to be meaningful integrations that improve your workflows.
And here's the array of partner integrations that we have so far. So mind you, this is just in the past few months. This slide is going to be exploding with logos when we GA next year.
So the top partners at the top import metric definitions and query their data. So we have dot, spot, hex, mode, DeepNote, Atlan, Houseware, and Lightdash. The partners at the bottom import metric definitions or help you load them. And so a lot of them are also currently developing additional integrations. If you don't see a logo, of a tool you use on this slide, please reach out to your friendly vendor.
Talk to them about developing an integration. We want to build an interconnected ecosystem and be that common layer. We want you to be able to use your metrics and models across all the tools you like using, and we're open to working with those folks.
All right, let's walk through an example user flow for the semantic layer. First, a data engineer might use packages to load metrics from their CDP data sources. An analytics engineer could then jump into the Cloud IDE, define additional metrics, and in their dbt project. A business user is able to pick up from there, look at all those metrics that have been defined within their data catalog, and decide which metric is right for them. Then a business user could select that metric and dimensions in a reporting tool.
They know what it means. They preview the value. They trust that it's verified.
And they can run their own analysis. So no more asking for as many, at least, ad hoc dashboards. Right?
Let's say the business user has additional questions. A data analyst is able to pick it up from there, do a deeper dive, maybe use some additional algorithms to understand the data more. And finally, a data scientist could monitor, say, the revenue metric for anomalies using an observability tool.
The semantic layer is powering all of these different use cases. So I want to talk a little bit about why this matters to everyone in the room. The analytics engineers, the people online, all of us as a community. So historically, DBT has been a tool for analytics engineers.
It's given us the speed to serve our organizations that we've never had before. before. But that speed has led to the proliferation of data assets. And that's introduced another issue, consistency across all of your consuming experiences.
And so DBT decided to focus on the semantic layer to address this consistency. Consistency we want your organization to have a unified worldview and in so doing we can address one of the fundamental data Problems that I think everyone engaged has probably seen at some point of their life The why doesn't revenue over here match revenue over here that late-night email you get from your CFO Freaking everyone out and sending them on a spiral to figure it out We saw this in our last organization like cam mentioned where our internal tooling and our analytical tooling handled time in slightly different ways Which led to slightly different calculations of our weekly active user metric. I was then called in to figure out what was going on. It took me about a day to debug what the actual issue was. It took me months to rebuild the trust in those metrics.
And that's the most important part. Rebuilding that takes so long and we don't want analytics engineers to have to worry about that. We want you all to be able to spend your time on higher leverage work delivering value to the business so that dbt handles consistency for you. So we've talked a lot about high-level stuff, and now I want to get into some demos of what this actually looks like.
So we're going to walk through some of those four use cases, or four of those use cases that Cam mentioned earlier. Defining in the flow of an analytics engineer in dbt cloud. Discovering in the catalog use case with Atlan. Reporting in the BI use case with Mode. And analyzing with Hex.
So in this, you can see what I'm doing is I'm going into my packages.yaml file in cloud and adding the metrics package. This allows me to interact with my metrics. I then move over to a yaml file where I've actually defined metrics. I've given it a name.
a label as a human readable version, the model that it is dependent upon, and any descriptive information that I wanted to include, the calculation method which is the aggregation that we will then apply on the expression which is the SQL expression, the timestamp that will be joining to a date spine, and any of the time grains that we want our users to be able to aggregate upon. And finally, a list of dimensions. And this is actually a subset of all the dimensions in the model that we want and believe will deliver value to our stakeholders.
Additionally, you can open up the lineage and see it flow all the way from your sources to your staging models, to your intermediate models, all the way up to your marts, and all that feeds into the metric definition itself. If you're actually interested in... interacting with this metric itself inside of dbt cloud you can go and use one of those macros that i mentioned earlier defining the revenue as or defining the metric as revenue including the grain and when you hit preview it will return a data set at that selected grain with that metric value and that keeps everything consistent across all the consuming tools additionally you can see the compiled code in a friendly dbtonic way not as something that's a little confusing and understand it follows all those best practices so you all can understand it. Now, we're going to talk about some of the prerequisites that are required to get this set up, but if you are interested today, you can go to Environments, navigate to your deployment environment, go up to the Edit button, hit that, and you'll be able to scroll down to Semantic Layer, hit that switch, and get the URL that you would need to set this up in all of your tools.
There is a lot more documentation on the documentation site. I want to give a huge shout-out to the Docs team. So if you have any questions, I would now... navigate to there, but I'm going to hand it back over to Cam to talk about this through some of our partner integrations. Thank you, Callum.
All right, so let's go back to our example flow before. Let's say we've got a finance team, a data team working together to understand changes in revenue over time. So I'm a business user, right?
I want to understand what my analytics engineer has defined in the dbt project And we've loaded up our metrics into atlin, which is a data catalog. So here I'm able to go into atlin I can see all of my different assets Look at the raw tables metrics models, etc I'm gonna select my metrics here and I can see a list of the different metrics from my I imported dbt project. I can see which ones are verified, which are draft, which I shouldn't use, which I should. And I'm going to open up my revenue metric here. This loads the different details, the metadata I've defined in the dbt project.
And there we just queried the semantic layer to pull a preview of the annual revenue. Pay special attention to this 4.4 million figure for 2019. We're going to see it again in a minute. So, Atlan also loads up the lineage of the metric, pulling from the DAG that Callum just showed, so I can trace back and see what tables and even what other metrics the metric is dependent on. I'm tracing that all the way back to the source. Atlan also takes care of column-level lineage, so I know how the different columns from the models flow into the dimensions in the metric.
This allows me to trace dependencies and fully understand what is being measured here. Atlin also loads up the different models for my dbt projects so I can look at their metadata and by extension understand the metrics better. All right so I found my revenue metric, I know what I want to analyze, I'm going to hop into my other tool here and dive a little deeper.
All right so now I've got my revenue metric, I'm going to jump into mode and mode allows me to create a report and visualize this metric. in a custom way, share that with my stakeholders. So I'm going to start by creating a report from a dbt metric.
I'm going to choose the same revenue metric here. I recognize it from the list, and create a bar plot to look at it over time. So what mode is doing here is loading up the metric from the semantic layer and auto-generating a chart of the metric broken down on a daily basis. I'm going to break it down annually, though. because I want to look at it at a little higher level.
And let's check and just absolutely make sure it's the metric we want. There you can see the 4.4 million figure again. We're avoiding the time grain issue that Callum talked about that we've encountered in the past.
We are going to trust this value. I want to break it down by location and have a little more detail here to share with my stakeholders. Great. So now I can visualize the annual revenue by location. Location was actually one of the dimensions that the analytics engineer curated for me, so I don't get overwhelmed with the list of 50 different dimensions.
I'm able to add this chart to my report and then share that out with my stakeholders. But there's a little bad news. At this point, I do have more questions.
I want to dive a little bit deeper, so I'm going to hand it off to my data analyst to do a deeper dive in Hex. Awesome, so now I'm a data analyst supporting the finance team and I'm jumping into hex here Which also integrates with the semantic layer I'm able to load up a dbt metric cell under the transform menu and I can see my list of metrics here for the semantic layer connection Now break it down by year and let's like absolutely triple check That it's the same metric and we're getting the right value you can see the same 4.4 million But now I want to break it down Add a little bit of a finer grain. So I'm actually going to add another metric as well.
I'm going to add the customer's metric and look at it on a weekly basis. Great. So I get a little more information here.
I'm able to choose multiple metrics. I can choose the grain. I can choose dimensions as well as secondary calculations. Run that again. And I'm going to visualize this and keep diving deeper from there.
So hex-like mode allows me to visualize symmetric changes over time. And here I'm going to break it down on a weekly basis, pull in the location, and there we go. We can even see some seasonality, weekly changes.
But another special feature of a lot of the different integrations, including hexes, is that I can... right in dbt sql i can pull from a model as well as a metric so here i'm going to select a preview from my orders model i can also use macros here so i think the sky is the limit for where y'all can take this we're really excited to see what you play around with so i'm able to pull from that model and i can use macros there as well so that's just a little bit of a preview of some of our partner integrations there's a lot more to see they're continuing to develop them we're working really closely with our friends in the ecosystem And we would love to hear from you all as well, what you would like to see and keep working as a community. Alright, so now we're going to talk a little bit about the product roadmap. What we've shown is just the first step.
This is public preview. This isn't GA yet, right? It's a stepping stone to where we want to go, given feedback from you all.
So our roadmap has three main focus areas. The first is to enable efficient and comprehensive modeling of your metrics and your models. The second is to provide a reliable and accessible experience in dbt Cloud. We want to be a central infrastructure in your data stack and earn that right. And the third is to nurture an ecosystem, make it easier to develop integrations with the semantic layer and support Proxying queries to more data platforms.
And our ultimate goal is to power new and improved workflows for you all. So I want to talk a little bit about today's metrics leading into tomorrow's semantics. Right now, with the functionality that we've released this week, you can define metrics in your dbt project with things like metric dimensions, filters, grains.
You can query secondary calculations in all of these. integrations. But like Cam said, this is just the first step in a cohesive semantic layer. We received a lot of really great feedback from our early adopters and our beta customers around them wanting to encode more information into their dbt.
project, things like relationships, column hierarchies. And we've taken this feedback, and it's really helped us understand what the roadmap is up to GA and even beyond that. So we have a few ideas that we want to present to you all here, but we also want to make sure that community voices are heard. So some of the ideas that we have are things like entities, which could be a higher level abstraction on top of your models that allow you to define relationships between different models, that allow you to define data types on demand. have metrics associated with them and allow metrics to traverse through that graph and pull in dimensions from other entities.
We also are really interested in improving the developer experience with metrics. Things like versioning metrics beyond just the versioning that is self-contained within version control. So, you know, if you have to switch to version two of a metric, will there be a deprecation period?
Testing metrics to ensure that there is consistent numbers as you perhaps change the definition. or as you go along to ensure that there aren't underlying data issues. And development environments by having your semantic layer understand working in a development environment and a production environment. And like I said, this is just the first step in a long roadmap of things that we want to work on.
But we really want to make sure that the community voices are heard because you all in this room and online have the ideas that are really going to up-level this. So if you have ideas of things that you really want to see that you think would provide value to the community, to your organization, please join some of the Slack channels that we'll mention at the end of this talk. Join and open issues in GitHub, or just personally send me DMs. I am very interested to hear your ideas, and we want to work with the community to make sure that this functionality and this product works for all of you.
Great. So Callum covered the roadmap that we're looking at for dbt-core and the package ecosystem, which is inextricably linked with the rest of the semantic layer. I'm going to talk a little bit about what the semantic layer roadmap looks like for dbt-cloud.
We are providing the semantic layer as highly available infrastructure. We're currently working towards 99.9% aggregate availability. We're going to keep iterating towards being highly available and be part of your mission-critical analytics flows.
We're also working working a lot on query performance, so hitting 100 millisecond P95 compilation overhead. That's a fancy way to say when you run a query through a semantic layer, you're not going to notice a huge change between that and how you usually run a query, right? We're also going to be continuing to improve enterprise access management, making sure that we're meeting the needs of the largest and most complex organizations as we provide this service. Thank you. We're going to continue to also iterate on an accessible setup experience.
This is where we'd love to hear from you all as you get started with the semantic layer. So that looks like easier environment setup and management, because you set up the semantic layer for a particular deployment environment, and better workflows in the IDE. So I know Nate is presenting about the IDE later today. We think that developing metrics in IDE is a really unique prospect, and we want to make it easier to define that YAML and test those metrics out in one central place in dbt Cloud. The semantic layer also couldn't realize its full value without the ecosystem, so we're so grateful to working with our integration partners.
We want to make it easier for you all to integrate with us. So we're going to build new and richer APIs, as well as other functionalities to make it easier to integrate with the semantic layer. For users, we're going to make it easier to manage your integrations in dbt Cloud. So seeing a list of those integrations, understand how they're used, audit logging, things along those lines. So you understand how folks downstream are using those integrations.
That could also look like things like auto-exposures. We currently have exposures, and the semantic layer is a way to really look at it. leverage those.
Understand where is this metric being used and which dashboards. The semantic layer is a way to programmatically do that so you understand those dependencies and which metrics are most important to folks in your org. And finally, one of the biggest areas of investment for us is. proxy inquiries to more data platforms.
We support proxy inquiries to Snowflake and this public preview release, and we are going to be building support for a lot more data platforms, such as Redshift, BigQuery, Databricks, and beyond. This is going to be a huge area of focus. Again, if you're interested, if you have strong opinions about any of these areas of the roadmap, we would love to hear from you. We directly take your feedback into account as we're continuing to iterate on the roadmap, but we wanted to give you a preview of what's coming up next. All right, if you're anything like me, you're wondering, okay, I've got the high level, you know, I understand what this layer is, can I get my hands on it?
And the good news is, you can. So to get started, you'll want a dbt cloud account, you'll need a team or enterprise plan to use the metadata API, but you can use the proxy server with the developer account as well. Currently, we support the Snowflake data platform. We're going to keep iterating on that, adding more support in the coming months. You're going to need to use a project with core 1.2.
and the metrics package. And currently, no environment variables, but support for those is coming within the next couple of weeks. So check out the user docs and our launch site in Slack for more information. As Calum noted, we've got an awesome docs team. I'd also like to stop and thank our engineering team for doing an incredible job.
This has been a very complicated product to put together, and we're really proud of what we've done, and are excited to continue to iterate with you all. All right, so next steps, please try the semantic layer, join these Slack channels. We will be there, like, all the time.
Engage with tech partners to invest in integrations. If you didn't see a logo on that list, we are open to working across the ecosystem, and please provide feedback on your experience and the direction we presented today. Thank you all so much for attending.
We now have time for Q&A. Thank you. So I think we got a little bit of time.
If there are any questions that anyone wants to flag on Slack or in person, we got some time. Yeah, so for anyone who's online who didn't hear the question, he was asking a question about OSS or open source and the dbt server component of the semantic layer. Yes, that, if it's not available. My understanding is the dbt server will be source available within the next couple of weeks The team is preparing the repo to share it's under a BSL license So non-commercial, but you'll be able to to try it out and use that source How is it was the question how is it paid for Yeah, so the question was, how does it work commercially?
How do we make money? I'll pass it off to Cam, because I'm the open source person. That's a fun question. OK, so I'll put it this way.
For the public preview, we want you all to try it out and give us feedback. Like we noted, developer accounts can use a proxy server. team and enterprise accounts can also use the metadata API, so a lot of those full-fledged integrations that we showed earlier.
But try it out, give us feedback, and as we study usage patterns, we'll figure out pricing from there. But during the public preview period, don't worry, try it out and give us feedback. So the question was, does the proxy execute the SQL, or does the proxy send the SQL to the data warehouse and the warehouse executes it? I can answer that one. So what the proxy does is it receives the dbt SQL from whatever the client is.
It compiles it down to the underlying warehouse SQL, sends that to the warehouse, retrieves it, and transmits the data back to whatever the consuming experience is. So that would mean that results that come from the warehouse... Correct.
The results that come from the warehouse do go through the proxy server. Is there anyone that's on the Slack that might have flashed? I flagged one of the questions that's in the Slack channel. I'd love to make sure that we include everyone who's online. Do you have a question from the online channel?
Perfect. Go for it. Is there support for custom calendars? Yes, there is. You can find information about that in the dbt metrics package.
You set that with a variable in your dbt project. Yeah, the question was, are there any mechanisms for caching data? There aren't right now, but if you have strong opinions on that and you think that's a feature that you really would need for your organization, thank you for already letting us know. Yeah.
But already continue to let us know as we move forward. That's definitely on it. radar, I'd say, as we're iterating on query performance. So yeah, if you have any specific use cases, if you can flag those, we'd love to hear. Whether it's like diffing based on changed data, but definitely hear you there.
And that fits within our roadmap for sure. Question from the community? I got one from the Slack.
Yeah. Ah, yes. Why not just define these as models? Why define them as metrics at all?
There's a great blog post about it. The author is standing right here. You can go and find that.
Someone will probably share it in the Slack channel. But the answer to that is, you want to retain... the underlying flexibility of the lower grain data. If you have to aggregate in a model to whatever the metric level is, you're implicitly making decisions for your stakeholders that they want to consume data in a certain way.
If you aggregate to a monthly level in your underlying model and it's statically represented as that, you're not giving users the ability to then go down to the week to the day or slice it by any other combination of metrics that they may be interested in. Defining it as a metric gives them the flexibility to drill into the specific questions that they have and consume the metric in the way that is actually relevant to them because everybody thinks differently. And so the way someone in your finance team and maybe your marketing team may consume the same metric will be different based on the context that they. bring.
Any other questions? Go for it. Yeah, so the question was, how does this look for companies that are currently using LookML? Right now, there is no integration with LookML.
We're very open to working with the Looker folks, but there are elements where there are overlapping functionality that we definitely need to figure out. That would be something that we're open to chatting with them about and hearing from the community if that is a path they want Us to go down and I'd say if there's Something that people want to see in dbt metrics or dbt projects that they can't do but that they can do and look ml We'd love to hear that about that as well Yeah. Yeah. Definitely.
So if we detect that a query is coming in that doesn't contain dbt-sql, we're not going to run it through the compiling process, so you won't see that increase in overhead, although we are continuing to iterate on that overhead, so it'll be practically imperceptible anyway. The question was, by the way, at large organizations, should we potentially break out connections to the semantic layer that use metrics and a different connection that goes straight to the data warehouse? The answer is, we're thinking about it, and no.
Question? Yeah. So the question was, how do we handle the proxy server being hosted in different regions? So currently, we support customers who are hosted in the U.S., but as we expand dbtCloud to other regions, for instance, we spun up the EMEA instance, we're going to be extending support for the semantic layer to the EMEA instance so that that data can stay in the EU.
And so if we have a critical mass of customers in Canada, for instance, where the data needs to stay there, it would have to align with the dbtCloud roadmap overall. But we would follow that pattern. Question?
Yeah, definitely. So there's a great chart in the documentation that goes over what the individual components are and where they're associated. But the way you can break it down, and we have one on our slides as well, is dbt-core contains the metric definition.
definition itself. So, you know, when you go in and you define a metric node, that architecture and that API spec lives inside of dbt core. Then you query that with dbt metrics, which is an open source dbt package that you install.
Then there is the dbt server and the dbt proxy server, which are two different components. Server will be source available, as we talked about. There's more information on the documentation.
And the proxy server is proprietary and part of dbt cloud. And then the metadata API is also part of dbt cloud, and that's how you get access to metric definitions and any other metadata about your project that you would potentially find in things like the manifest. Any other questions?
Go for it. This is a question of variation of models versus metrics. What about metrics that aren't time series? So our opinion is that most metrics that are valuable to the business, which is the specific subset that we really wanted to focus on, are...
not always, but almost always, time series metrics. You want to track them across the time of the business, break them down. If there are metrics that aren't that, or if you are interested in seeing a time series aggregated at...
you know, a non-time grain, that functionality does exist. You just need to define the time grain of all time inside of your metric definition, and that allows you to see the metric definition at all time of whatever the slice is that you've looked at of dimensions that you've provided to the macro. Alrighty, I don't see any other hands up. I don't know if there's any other open questions on the community, but we can get to the Slack.
We'll give everyone some time back, and hope you all have a wonderful day, both in here and online. Thank you all so much for attending. Thank you.