Overview of Microsoft Fabric Solutions

hey everyone in this video I want to talk about Microsoft fabric a fantastic service that provides really a SAS solution for our organization's core Enterprise data needs now if we think about for most organizations the picture today and why Microsoft fabric is so powerful often we have data in many different places I could think about as a company I I've got many different groups so I've got group one there's a group two different areas of my organization well each of those they use different tools so they have various different engines and analytics engine a data warehouse SQL spark whatever that is and then they each have their own little areas of data and today they're doing different Transformations because hey for this engine I need it in a a different format but then this group will they also want to use a portion of that data so they have to go and do some kind of export inut or replication but now I've got another copy of the data and well that data can get stale so basically you end up with all these different silos of data throughout your organization some of them are are replicas some of them are conversions because different engines need their own proprietary format and there's just a whole bunch of work associated with that picture and then as a company how do I know where all my data is how do I avoid problems when hey certain copies of the data have got stale how do I audit how do I think about data governance the discovery the classification the protection and yes it's possible but it really is very very painful and I end up with a massive mive amount of effort managing all the different data warehouses the data lakes and whatever that may be because there's been this big shift we used to extract the data from somewhere transform it and then load it into some store but that assumed as a company I know exactly what I want to do with the data because I've already transformed it into some final format I've abstracted stuff away I've removed things well now I can only use it for that purpose purpose so there was a shift to extract load transform where I would just store it in a data Lake in its raw Source format then I can transform it in lots of different ways in the future so there was a big shift around that so I end up with these massive numbers of silos and Transformations and copies of the data and as you would imagine that is a huge problem and so this is where Microsoft fabric comes in so we have this new solution Microsoft Fabric and the goal around Microsoft fabric is it really provides those complete set of solutions when I think about my Enterprise data it's made up of lots and lots of different workloads yes it has Lakehouse capabilities remember Lakehouse can have both structured semi-structured and unstructured data yes it has a warehouse that schema based structur data it has the spark engine so I can do my data engineering my data science on top of that there's a data activator so I can have an alerting system based on certain events hey Revenue drops below this threshold perform this action send an email telling someone to panic there's realtime analytics where I can have the streaming of data logs and events as you would expect there's data Factory we need the pipelines that's one of the most procode ways that I can actually get data in perform actions on data maybe move it around and of course there's going to be powerbi integration and say special I'll talk about later on that powerbi gets from being part of Microsoft fabric but let's build up the different components and some of the things that makes fabric different than just okay it's a sweep they've took these different solutions and they've bundled it together as Microsoft fabric but it's really just the same components it's not there's actually a massive amount of innovation and Engineering that makes Microsoft fabric Microsoft Fabric and I want to deal with that and the first bit when you think about what makes fabric fabric is we mentioned the idea of the Lakehouse of the warehouse so we have to have somewhere to actually store the data now normally one of the problems is we have to get our data into somewhere and you end up with all of those silos you end up with all these Transformations because different engines need different formats that they can read and write to and that's the really painful part I end up with lots of different storage accounts based on those different copies of the data so at the base of fabrics I'm going give myself lots and lots of space So at the base of Microsoft fabric is we have the concept of one lake so I'm going give myself a lot of space here so we're going to think about the idea of the one Lake and you can think of one Lake as a software as a service version of a data Lake if you think of one drive that we have on our machines what one drive does for our personal data one lake is going to do for our Enterprise data it becomes that one drive for the organization removing the need for me as a company to end up with some very complex architecture of data lakes and warehouses it makes it far more intuitive and this one lake is very much at the foundation of Microsoft fabric for your tenant so what's going to happen here if you think about you have your entra ID tenant so that used to be called Azure active directory so your entire organization all your accounts you users service principles computers policies they live under an entra ID tenant so what's going to happen here if I think that I as an organization have my entra tenant it is A1 to one mapping I.E there is only one one Lake per tenant there's nothing I have to do to get this it's there for my tenant anything I create within Microsoft fabric if my account is part of the tenant will be part of that single one lake so I can think about the ENT tenant is the boundary for my one Lake any item which we're going to talk about that I create within Microsoft fabric will live within my organization's one L as soon as I start using fabric I have my one L automatically there are no silos anymore there's no button I have to click as the first administrator to create the one leg it just exists the first time I try to do anything now that may fill you with a certain amount of panic initially that's like wait I I need to have some separation between different groups and permissioning it doesn't mean as a different group group I can set access controls to control who can see the data I can absolutely still do that I still have those abilities to restrict but I'm part of a single Nam space so think about as an organization my requirement to maybe perform governance and audit I have that ability now on all of my data no matter who created it it's going to be part of this organization's namespace and it's just all going to happen automatically for me now because it lives within the tenant there are actually some tenant level settings if we jump over really quickly so here I'm just looking at Microsoft fabric but you would see up here I have my little gear icon and one of the things I could do is down the bottom I have my governance and insights I have an admin portal so I've already opened that up so this is the admin portal so these are all the tenant settings that we have available and there's lots of different things things that I can turn on enable for the organization I've enabled some of them regarding some of the mirroring settings but you have a huge amount of control on what I want to enable for my tenant so I have all of these different governance there's Advanced networking settings I can do down here some of the audit and usage but I still have a lot of control within as a company what I want to light up within in my Microsoft fabric environment so think of the one Lake as just that core at the fabric level namespace now I want to actually be able to go and create stuff now what I have to do therefore is have some capacity so we have fabric capacity that then some particular team or project can leverage and these are typically an azure resource so what I would actually do is I would go and create a fabric capacity within my Project's Azure subscription so if I'm going to try I want to control the billing I want maybe some area of compute capacity that I want to be able to leverage within my group I can create my own capacity set so if I jump over back over here and if I actually this time look in the Azure portal we can see we have this concept of Microsoft Fabric and what I'm creating here is a fabric capacity now my fabric capacity you can see scale it's of a certain size so you can see I have all these different sizes available to me all these different capacity units you also create this in a specific region so that's going to control where the compute is but it's also going to control where the data that is created is associated from a regional perspective and I can create as many different fabric capacities as I want there's even trials so I could actually go and sign up for a trial I get these two months I think I get a pretty big skew if I remember correctly I actually think I get the f64 skew for two months so it's a massive amount of computer I can go and play with but I can therefore go and create one of those capacities and here's the pricing page that goes through again the details of all those different capacities so I pick a certain skew and it's amount of capacity I can use per second and obviously there's also aspects when I actually go and store data there's storage prices as well based on the the gigabytes I'm storing per month and I can also do bcdr so by default it's Zone redundant uh the storage but I can do geo redundant as well now the thing that's really important to understand about this is so if I think what I'm doing now is we had our one Lake I then within my organization I create these different capacities so moving Upp a layer from there I would then go and create as many as I want but maybe a different certain business unit I would go and create my capacity one a different business hey they go and create their capacity too and remember that capacity is of a certain skew so that maps to that amount of compute scale that it can perform now one of the things you may have noticed on that screen is I can actually pause and resume this so it's spilling me at a per second level I think with a minimum of a minute but again if we jump back and I go and look at that notice one of the things I can do is I could pause it now mine is currently active because I actually need to perform actions on the items that live within the associated things for this capacity but if I know I've got quiet times I can absolutely go and pause I could write alterations so I could have some logic app or Azure function that at certain times could go and pause and resume that capacity so yes there's a cost associated with it but I could pause it I can change that size very easily up and down as maybe my needs change throughout the day week month whatever that may be and it it's a a virtual amount of capacity this is not going and creating a VM for example or a container with that many VOR that equates in some ways to that capacity it's a virtual bucket of capacity my work close can use actual physical capacity is not assigned until I try and run some job so it's all serverless what I'm buying is a virtual bucket of capacity I can use then all of the different workloads I can run in fabric go and use against that amount of capacity in that bucket so the the jobs when they execute actually go and get assigned physical capacity which equates to a certain amount of that currency that virtual currency that gets built against that bucket so behind the scenes I might get infrastructure spinning up for SQL uh separate infrastructures spun up for spark a separate infrastructure spun up for analysis services and powerbi they all then just tell that Central capacity tracking hey this is the amount we're using it maps that to that virtual amount and as long as I'm staying within that level hey everything is great so I can go and use it for many different types of workload but it's better either than that if I consider over time let's take a 24-hour period just for example and let's say what I've purchased is that amount that's my skew that's the number of those fabric capacity units I have available to me and then there's the actual amount of of work I'm doing so maybe I'm hovering over here but then I have some Peak amount of work so then it it shoots up it wants to use way more than I've actually paid for you can I can burst and then it does something called Smooth so it lets you go above the amount of provisioned virtual capacity you have so I can burn and use a lot more for an amount of time and then you kind of owe it you owe it some capacity so it then will smooth it in your quieter times now the amount I can burst varies for my long running uh background my schedule jobs it actually let me smooth over 24 hours so I could run sank way more than what I've actually provisioned and then pay it back if it's a shorter more interactive it's smoothed over a variable amount amount of time based on what it needs the amount I can burst is actually huge so if I look at the documentation based on the skew for the really small one now I'm running the the cheapest smallest one because I'm cheap I can actually burst to 32 times now realize if I'm doing small experimentation the F2 is probably fine and most of the things I'm running maybe run for 10 seconds but for that 10 seconds I could use 32 times that actual amount of capacity so I could actually use the same as what a 64 is capable of doing and then it will just need to pay it back over the next five six minutes whatever that time is whatever I've bursted Beyond now the bigger you get the scaling Factor starts to reduce but even for the biggest ones it's still 12x so if I have those schedule jobs those long running jobs for maybe a couple of hours hey I could run for maybe 1 hour with 12 times the amount of actual capacity and then I just pay it back over that next 24-hour period while other things are still going on now if I start to go beyond those limits it will start to throttle me and eventually if I keep pushing it it will start rejecting the jobs so there there are finite numbers so I may need to go and change that scale for that various capacity but it's super flexible is really trying to help you optimize the spend but be able to do some pretty fantastic things well above that if you actually need to okay so where we are right now is okay we have this tenant level one Lake that everything to do with fabric within our tenant will be part of that name space it's easy to find apply governance controls then I need to create capacities it's all serverless for the actual workloads but obviously I still have to pay so I provision a virtual bucket of capacity then all the different workloads that get associated with that capacity draw from it and have these great nice burst capabilities but how do I then actually create the workloads now as a team as a project I'm going to have many different types of item Maybe it's a lake housee a warehouse um some notebooks maybe it's pipelines coming in a semantic models for my powerbi and I want to be able to organize those and maybe set permissions on those so our next layer is we go and create workspaces and I associate my workspace with a capacity so again under here maybe I have my workpace one I might I can have another workspace under this capacity I have a workspace under this capacity you get the idea so I associate a workspace with a specific capacity in many ways think of it like in an Azure world I create a resource Group to group like things together and I can apply certain policies on a resource Group and permissions on the resource Group it's really what workspace is doing for me here within my Microsoft Fabric and once again let's go and have a look at that so let's just close those down so this is Microsoft fabric this is kind of the main introductory screen if I just jump to app. fabric. microsoft.com it's showing me those key types of workload that I would have as part of this which we're going to talk about in more detail I have my little menu down here in the bottom corner that lets me jump to different tools but also different experiences to my data engineering data science a data warehouse whatever that may be if I just jump over to my data warehouse for example one of the things I can see here are all my different workspaces now if I select that we can see here I've created one already which is what we're going to be using for most of this little demo but I can also create a new workspace and the key thing to bear in mind here is I give it a name and the important part is I have to assign it to a capacity so here for example now notice there is a trial option I don't have the trial active in my tenant I would see a trial option for example but I'm going to do a fabric capacity and here you pick the capacities that you have available that you have permissions to so hey I want to create a new workspace I'm using fabric capacity what is the semantic model format I want to use and then the specific capacity so multiple workspaces can use the same set of capacities it's not a one: one and it's really just that organizational construct and then if I jump over here this is that sample workspace I've got it open even within the workspace itself I can see I have workspace settings there's different things I can do so for example my spark settings I have various controls we're going to come back and look at this in more detail there's some network security um potentially depending on the skew you're leveraging I can see some of the details I could update little icon for it there's things I can do here I can manage access so this is where within that workspace I could Grant different people uh groups permissions to this but these really are just organization structures if I just go and start browsing for example and I can see that basic workspace over here if I look at my one Lake data Hub I can go and select a certain workspace and then looking down here I can go and see all the resources under that particular workspace it's really just organizing my things and I'll give you a sneak peek at something there's even a desktop application that I can use for one Lake and when I look at this it organizes the folder structure by my workspace now I would see any workspace I have permissions to I only have one and then if I open that up then I see the items I have within my workspace so it really is just an organizational structure and and that's the key goal around this it gives me a lot of different flexibility so I create these workspaces and then based on the workspace I'm going to want to do certain things now again within it I might see lake houses warehouses I'll see all these different things but they're just items that get organized within a particular workspace that that's the goal for this okay great we're nearly ready to actually start creating some stuff which is the goal of what we want to do so we talked already about the fact that this is a ATT tenant level we have this one Lake and before we talked about the idea that we have all these different engines that are used by many different types of workloads and we already saw they all still exist as part of fabric in that main page we saw the different tools we saw the different experiences so under fabric I still have all of those same things so I can think from an engine perspective so from an engine perspective I still have trying to line this up spark and of course my spark could get used by kind of data Factory my notebooks I have my tsql so that's going to be used for data warehouse I have kql I may have a kql query set I have analysis Services obviously the huge one there is going to be powerbi so all these different tools that talk to different engines and we talked before about the challenge we had was they each used their own specific format they used to work with a certain type of data so if I had some data in a data warehouse and then I wanted to be able to maybe run some spark via notebooks analysis on it I'd have to start replicating and messing around with the data which was painful like that's a very negative thing I never really want to have copies of data it's hard to manage to govern there staleness issues so that was always a huge pain point so what's happening here when I look at everything we're going to create under one lake with regards to storage of any kind of data we want to use with those engines if I think of the data layer so let's think about our data what's going to happen here is under the covers it's still using ADLs Gen 2 accounts it's still creating instances of those data Lake storage accounts that can really store anything and that those massive scale but it's completely abstract distracted away from you you don't need to worry about that it's going to create them as it needs to so it's going to take care of the underlying physical storage and then what's actually going to happen is within this everything it writes well it's going to store it as paret files and specifically it's storing it as Delta Park file so think of Park as the actual storage of the data then what you have on top of that is we have the metadata and what it's doing here is it's using that Delta log format so on top of the paret it's using Delta log so we have Delta paret storage is going to be used for everything and what this means and what has happened here is this is a huge amount of the engineering effort that went into Microsoft fabric yes they created the one Lake this new name space they then invested in this open standard the Delta Park format that everything is going to store from these engines so what that means is now they updated all of these the spark the tsql kql analysis they speak Delta Park format so if I have these Delta Park files and that's how everything gets stored now within that one Lake all of my structured data is this Delta par it doesn't matter who wrote the data everyone can now read it so from this Delta Park format tsql can read it kql can read it analysis Services can read it spark can read it so I could have the scenario data warehouse job wrote some data I can read it from spark spark wrote some data I can read it from analysis Services I can read it from tsql I can lay powerbi over everything so they've removed those silos and those proprietary formats that I had before that led to this ugly replication and Transformations and copying that I had everything has been updated the engineering went in to make everything right to Delta paret format and everything can read Delta parket format so there's no more proprietary format that inhibits the ability for different engines and therefore tools on top of those to work with any of the data within our organization that's within our one Lake they made all of that work to speak Delta parquet so it's one format nonproprietary for all of the different tools I don't have to worry about staleness or copies I have that complete flexibility and Par is that uh column base columna uh very good compression for that column based data so that's the data storage and we can see this so if I jump back over to my view here for a second and if I was to look at my Lakehouse for a second and I look at my tables which is the structured area of my Lakehouse and I look at one of my um sets of data my Dimension customer we see it we have the underlying parquet files that have the actual data and then I can see the Delta log which is that metadata so that's how it's storing it under the covers and the fact that we have this deltaab based storage actually does some really interesting things so the park great efficient compression good for actually storing the stuff and then meta data so the metadata are the properties of the data the changes to the data it helps provide that acid compliance that's so important to many of the types of operations we ever perform we need that atomicity the consistency the isolation the durability of the transactions within now that open format and that Delta log has information about the table itself but also the history of all of the transactions and the changes to the table so what this means is with this I can time travel I could actually go back and look at a different period in time of the data without having to store it again I can just use the Delta log with the paret to say hey what did the data look like two days ago I don't have to change any of the actual data I can just go and do that and I'll say this again but if you take one thing away this is massive in terms of what Microsoft fabric brings to the table yes it's this fantastic name space that's organizational wire to make it easy to discover govern audit but the fact that they've made the engineering effort that everything now writes and reads from this open Delta Park format it removes all of those barriers we had before for the different engines and suddenly all my different data I have in my organization I can leverage from all the different types of tools based on the types of activity I actually need to do there is something else as well they did so there's something called the verti engine which can do V ordering so V ordering is a right time optimization for that Park file format and they do this and so what that means is when you do this V ordering it gives you amazing performance when you're performing reads under these different engines so if I'm using powerbi SQL spark they leverage that vertic scan technology and that V ordering that gives you like an inmemory access time so not only is it using this open Delta Park format they're using vertac engine that V ordering for all of the rights which means the performance is going to be phenomenal when I start actually doing things against it from my data warehouse from my analysis Services that's a really really big deal so I should probably call that out so yes it's this but it's also this V ordering give a smiley face so for super fast performance in our files okay so now we're actually starting to get to some of the the really interesting things about what makes this Microsoft Fabric in the one L so special now it does actually go a step further so if you think about the Delta log is the metadata that's providing information on top of the actual storage which is Park what that means is technically I could support other metadata format to make it even easier for other things to work with the data in the one Lake and the one they've started with and introducing is support for Iceberg so additionally they have Iceberg uh metadata support it's just another metadata format and where iceberg is very much used is snowflake snowflake speaks Iceberg so what this now means is snowflake can natively use one Lake for the storage of its data and what that also then means is I can now leverage the data that snow snowflake has written from all of the other workloads and other data I have can be leveraged by snowflake because what's happening is Microsoft have used a technology to do conversions between them so if we think about there's this x table so we have two different metadata formats and it's using xtable to do those conversions so that's an Apache project Apache X table but it does a translation between the metadata formats between the Delta log and that Iceberg which now gives me complete flexibility that hey snowflake is a another workload that can go and interact with the one Lake and it's not a silo and that's the key point they want to introduce silos I don't want hey snowflake could write to it but nothing else could read it sure snowflake speaks Iceberg but then the xtable that's been implemented does translations between the different metadata formats these all support Delta log well it's going to do conversions hey other stuff snowflake wants Iceberg it can take the other data that these have written and make it available to snowflake so now I get this really good um compatibility between all my different workloads I have a massive amount of data now as we're going to see as we go on in my one L in this Delta Park format and we saw under the covers remember we're using ADLs Gen 2 accounts well ADLs Gen 2 has the ADLs uh Gen 2 API that's a very well understood API that we use today to speak to ADLs Gen 2 accounts well what the one Lake does is we get this one Lake name space so now all I have to do is if I speak ADLs Gen 2 API which is a very widely adopted API I just change the target I'm talking to and suddenly now I can communicate with the data I would see the parket files I would see the Delta log I would see the unstructured data that maybe I'm storing in a Lakehouse I'm going to create in there but it makes it super flexible and available so now I could any app can just go and speak the AGS gen to API and can talk to the data in the one Lake your existing tools so if I was using Azure storage Explorer well that talks that that would just work data bricks uh HDI they speak that as well it becomes this data platform that is not just for Microsoft fabric AZ RI studio all of these things can talk to the ADLs Gen 2 rest API and the SD K it's not some special one L API so it makes it very easily for workloads to Now integrate so anything can use those open Integrations to now use the one leg that's the whole point it's not just for the engines that are part of Microsoft fabric nearly everything speaks that API is an open integration that other vendors can go and add new capabilities for just using that and and that's really huge and again we have those additional metadata now for things like snowflake can hook in and talk as well now additionally what this hopefully is making very clear now is I don't have multiple copies of my data I have a single point of Truth every workload can read the data that has been written there is no separate sets of transformation or Imports based on the engine or the workload I want they're all speaking that Delta par format there is no separate hey I got one analysis over here one need a different type of engine over here let's do some ETL or Transformations or spark or dat bricks none of that happens one question that does come up okay so I have this one L Nam space I'm using the ads Gen 2 API so I'm talking Network traffic what are the abilities to restrict that today for example private link is a big one private link creates an IP address in your virtual Network that represents access to a single resource today all I have is at a tenant level so at a tenant level if I look at my tenant settings I think there's Advanced networking somewhere see there's a lot of different settings here so Advanced networking I can turn on private link and then I could block public internet access but that's at the tenant so that will impact every single workload that's pretty Broad and you'd have a lot of considerations before you just went and turned that on they are working for workspace level I think that is on their road map to have that now workspace level if I can find my workspace this tab just go to it so at a workspace level if I look at my workspace settings let me just go back there go sample workspace at the workspace settings you do see network security however this is managed private end points and what this is about today is for my workloads running inside Microsoft fabric to go and talk to other resources I have in Asia so that's more the other way around that would be hey I've got some spark job running in this workspace it wants to go and talk to an ADLs Gen 2 storage G that has a private endpoint so it's letting me connect to something to be able to do processing on it it is not a separate private endpoint to utilize the services of the fabric that is not there today again I think it's on the road map so today at a tenant level I could turn on private link I could block Public Access but you'd have to really be careful about all the ways you're using the Microsoft fabric services to ensure that didn't go and and break something pretty significant let's talk about items i' I've mentioned items and I've said word Lakehouse and warehouse and notebooks a few times let's actually dive in to see what those things are because remember what we've created at this point is we created a workspace the workspace was associated with a capacity which gives us that serverless compute for the things that I actually want to do the workspace lives in a region that will govern where data we create is actually stored but now I want to actually be able to store things well I need an item into which I can store things and very often the first thing we're ever going to create is a Lakehouse now that's not a rule let's not say that has to be the first thing you create but very commonly we're going to go and create a lake house now when I think of a lake house so lake house lak house one the whole point of a lak house is it's a container for some type of artifact and it supports both structured and unstructured so the unstructured what we'll actually see off of this is we'll see a files node and then I can create folders I can see a structure and items under that for the structure I would see a tables node and obviously everything in the table well it's actually not strictly true but most of what we do in tables obviously we're going to store in that Delta Park format so the lake house is that first item that gives us the unstructured and structured now as I start to create things remember I showed you this already that one Lake namespace provides access to all of these things and I'll see everything I as the person looking at us have access to and remember it will be broken down by the workspace and then the different items under it so I'll see a Lakehouse one maybe I'll see a date Warehouse one um maybe there's a warehouse 2 and then maybe I've got access to workspace 2 and under there I see another lake house etc etc that's going to be the name space that we see so all of the things I start to create I will see under that name space so if we go back and look again I'm looking at a particular Warehouse sorry workspace and I can see the different items I have here Lake housee you can see I've got a lake house I've got one I've got a warehouse and remember when I looked at the file system what I actually saw is I only have access to the one workspace so that's the first structure then when I go into that that I then see the things that have storage I'm not seeing my notebooks but hey I can see my Lake housee I can see my data warehouse but look at my late house I see tables which is the structured and then files which is anything so files well I put in some pictures I put in a CSV file which is what I use to import in to populate some structured in my Lakehouse I have some other data I used to create my warehouse and remember again if we actually go and look at the tables which is the structured all I'll see in here are the paret files and then we have those Delta log transactions which are up there so my Lakehouse is very commonly going to be the thir first thing we create because I can have that collection of both unstructured and structured and then on the Lakehouse I can run various different workloads I can run Spark very commonly I'm going to run spark I'm going to run notebooks on it to go and actually do population maybe a data Factory uh to go and populate those things and I'm going to see everything I have access to and if I was to look within here so here I'm in my workspace if I just go and select my Lakehouse I'll see that same structure so I can see here I have my various tables and I have my unstructured data now just pay attention to the icons for a second so if I select my Dimension customer so this is obviously structured data if I look at the icon the Little Triangle tells me it's stored in Delta par technically spark could write non Delta par it would still show in tables but it won't have the little Delta Park icons so I can still run spark and I could force a different format I could force to write the CS3 I could force do something else typically we're not going to do that but I could notice on this one it's also got a little link so that's a shortcut and we're going to talk a lot more about that but here I can go and see all of my Delta parket based data I can do various things about it the other really cool thing and this again speaks to compatibility is and actually just show you the files so I can see all the files there as well but when I look at this at the top I have this drop down it also gives me a SQL analytics endpoint so now I could switch over to this now it's only going to show me tables obviously SE can't go and look at the unstructured but I see exactly the same data I could now go and access that D data without having to understand or use sparkk I could just use tsql now this is a readon mode it's a readon mode Over The Lakehouse Delta tables I could say functions views I could have SQL object level security but I can't write to it this is just the ability that I could run analysis over the data in that structured tables part of my Lakehouse and again I could go and switch back and it's the same data it's just giving me that that flexible capability to leverage those things so great I now have the ability to start storing stuff but realize for many organizations I have data in other places and I said that word shortcut and I showed that little icon now the shortcut is a feature of The Lakehouse and the whole point is I have data somewhere else that I want to bring into visibility of the workloads that talk to One Lake without having to actually move the data so I could think about hey maybe I have data in AWS or Google cloud or other ADLs Gen 2 storage accounts and I want to have the look and feel like it's in one Lake without actually doing that migration it's a symbolic link so the key Point here is I have data somewhere else so again it could be AWS gcp it could be in Azure another location in Azure and what I'm going to do is I'm going to create a shortcut now I can create a shortcut to anything into files or if it's Delta Park format I can create it into tables so this has to be Delta Park so if the thing I'm trying from here is in Delta Park format I can do a shortcut into tables if it's not then I can do a shortcut into files and then once it's in files maybe then I do something else I could maybe do some translation do some conversion to then make it available in tables but I can shortcut from there into it I could also hey maybe I have data in other places is within my one L another workspace that I want to be a to leverage within here well I can shortcut in there as well which is exactly what you saw in my environment so when we looked at my environment if we remember those icons for a second notice this one has a shortcut holidays is actually let's go back to my sample workspace stored in a completely separate item it's stored in my data warehouse so that was brought in using SQL but I want to be able to use that data in a single way and interact with it via my spark for example within my Lakehouse and so what I added let's go back to my regular endpoint in tables I just did new shortcut and I added it in from within there now notice what I can do it's telling me hey look I can do stuff from internal sources which is what I did right there but it also lets me do from Amazon S3 Google Cloud other storage accounts data verse so I can bring in short cuts from all of those places and what this is letting me do is it's not moving anything I don't want to do an upfront migration of the data but I want to play with it I want to experiment with it in fabric so I can shortcut it into fabric use it with any of the engines it's all being accessed via this symbolic link and again remember I can put anything into the file section but it has to be Delta paret to shortcut into the table section there is actually a caching option I can do CU obviously it's still every time I'm interacting it's reading from these so it may have some cost implications to there so within the work workplace settings I can turn on a cache which could help with those ESS um capabilities what about if the data is not available in Delta paret what about if the data is in some proprietary format what if I'm working with Cosmos DB or Azure SQL database or different from that snowflake what if I want to be a to work with those proprietary storage formats within my one Lake well what I can do here is I can introduce the concept of a mirror now this is a separate item this is created as a new item a Lakehouse is an item a warehouse is an item a notebook is an item a mirror is an item so it's something created within my work space and what this is going to do is it's using change data capture so that as things are changed it writes it into the mirror and it does a transformation so now those will just appear as part of my one leg and what that mirror is doing is it's obviously it's doing that transformation and it goes and writes it into that Delta log format so we can then be red and processed by all of those other engines and we'll see those three types so if I jump back over now I had to enable this in the tenant at the time of recording this is not there by default and again this um is actually a mirror it's taking that copy of the data so I can then consume it from within the workloads in my engine so if I go and look at my workspace and let's do a new I'm just going to say more options so this is everything I can do so if we look at all the different types of things I have if I look at my data warehousing capabilities I can see mirrored a a SQL database mirrored Snowflake and mirrored Azure Cosmos DB so easily replicate data from existing Source into an analytics friendly format the really cool thing here is chances are I'm not even going to have to pay for this so if I look at the pricing for a second again I don't have to pay for the work to transform and I get free mirroring storage for the replicas up to a certain amount based on my capacity skew so if I was running an f64 I get 64 terab of storage for free so the transform is free the storage is free up to these amounts then I'd obviously have to start paying for it but it's a really nice way to go and get access to consume from all of the engines within my one Lake if it is stored in some proprietary format and it's only if I go above those numbers then I would start paying for that storage obviously I pay for when I start reading and consuming it I'm going to use up my cap capacity for the consumption for the reading of the data but the actual maintaining of the mirror the work to do the transform the writing of it and the storage depending on my skew of the underlying workspace and that capacity that workspace uses may not cost me anything so it's just another option um to what I can do but again this is a a oneway replication this is making it a readon set of data that I can then consume for analysis within my one link uh there's a another thing I can do I guess within here imagine there I talked about within my tenant imagine there's someone else so imagine for a second there's some external person over here and they have their one Leake cuz they're in their Tenon and what I want to be a to do is I want them to be able to have access to tables or files residing in my Lakehouse or a kql database I could do the same thing I don't want them to copy the data I think of like a shortcut but going the other way and something external to my lake house well I can do that one of the things I could do is from my lake house I can do an external data share and what this lets me do is from within my particular Lake housee I say hey I want to go and share this piece of data it prob doesn't show up on the board very well it's a poor color to pick um I give the Target of the person that I want to share the data with then I select the items I want to share they are not getting a copy of it it is not a duplicate it's a reference to the data in my one leg I have to enable this at the tenant level but now I can go and send them an invitation so if I jump back over again and I'm in my warehouse I'm looking at my lake house notice I have this option external data shck so I've turned this on within my tenant all I would now do is I would select the particular things that I want to share specify the person that I wish to share it to they would then get an invitation and then once they accept that invitation they would now be able to pick where they want to shortcut that in their oneel hierarchy and be able to consume and work with that data and again I could resend that at any time if I wanted to stop them being access the data okay I said the word Warehouse I said I did a shortcut from a warehouse so obviously we also have the concept of a warehouse running out colors here so I can create warehouses this is an item and the big thing here with the warehouse is yes we very commonly start with a lak housee structured and unstructured semi-structured data and then as I start to get focused and I start to refine my data I get down to that structured data I then maybe want to put it into a warehouse I want to do more um maybe interactions with SQL to go and write into that data now if a p offering remember I would be putting all the compute the storage uh the resource groups and networking I put all of that in together this is a SAS it's very easy to use it's I just create this as an item and it's ready to go and it's decoupling the computer and the storage so it's the server lless compute when I run the things it's going to go and consume from that capacity that sits under the workspace under which I create my warehouse item and then the storage and once again the storage remember it's in that Delta Park format as we talked about I could shortcut things from here into the tables I can do all of these wonderful things and what's crazy about this is I think it's the first Warehouse that is using this open format they have papers with one petabyte scale so they've done massive tests and remember again because of that V ordering they get phenomenal performance from that so I've created a warehouse item we saw that already within my environment so in my workspace and we can see I've got a warehouse and we can get day in in many different ways there's Pro code low code no code ways I can get the data in but it's all structured the whole point here is it's the schema based I've got the structure I've got my data if I run a query remember what it's going to do is just like everything else with with SQL there's a query estimator that estimates how I can run that query very efficiently so there's a unified query Optimizer as part of the warehouse solution and every query I run is actually broken down into tasks those tasks are executed on Virtual core some of them in parallel which enables a much faster execution compared to the older step-based execution we had with our data warehouse and those assigned vorss obviously map to a certain amount of fabric capacity and once again it uses that burst and smooth so we don't have to suffer if we're running something fairly big and we have a certain base amount of capacity available to us and it does compute isolation and what that means is imagine I'm doing an ETL job well that's not going to interfere with running queries so query and ingestion of two very separate non-competing compute PS it's going to help avoid that Noisy Neighbor problem so it is the First transactional Data Warehouse using that open standard format with the acid compliance it's Auto integrated it's Auto optimized you don't worry about really anything you're not worrying about hash distributions or workload groups that we would normally have to worry about for Optimal Performance it's doing all of that for you think of this as that SAS offering because of that V ordering it gives a huge performance boost for the warehouse and powerbi when I was leveraging that as well um I mean we're used to the idea of a warehouse from a features perspective they're saying for everybody in terms of the extract transform load and it's really designed that based on your skill set you can get data in in different ways for pro code yes data Factory and we see that when we go and look for example so I'm in my data warehouse kind of view over here if I was to do create and we look at our data warehousing yep I can go and create a new Warehouse over here when Also let's go back over for a second I can get data I can do a data flow so a data flow is this very much low code it's like a visual Excel like experience I can easily do Transformations for we remember we saw already the idea of shortcuts and mirroring with a no code Pro code sure I can use data Factory so I could go and use full data pipelines I could do store procedures but I need the skill to do that I could just run SQL commands I could use cads create table as select so I have a vast number of experience on how to get data actually into my environment I can do cross databased queries native shortcut mirroring it doesn't matter and this Ain you're going to see time and time again the whole point of all of this is whether the data is natively stored whether it's been shortcut whether I'm doing a mirror it doesn't matter it all appears the same to all of these engines that want to go and consum it I can build a virtual data warehouse over all of those different sources um I can even Save The View so I can create these new views and then query based on that there's a visual query capability so if I jump back over again I can do a new visual query and I can just start draging dring the tables over and then from here it will show me the data that's currently there based on what I have but you notice there's functions to reduce rows I can manage columns I can do sorting I can do transformation I can combine things I could save it as a view I could view the SQL that's powering it it's giving me that preview down here so it really gives me a nice experience to start interacting with the data uh within my data warehouse and something I'm not going to mention in any detail right now but powerbi gets some really nice benefit from the one Lake in how it can talk to the data here we used to kind of the idea that well we can import we can do a translation but the performance suffers I get lag um there's something very powerful now that because of this Delta Park format and the work done in analysis Services we get direct mode we get direct Lake mode that doesn't have to import the data but doesn't have to do translations in how it can talk you can get this phenomenal powerbi performance as well it's doing automated table maintenance it's optimizing the compaction the checkpointing cuz normally Delta and Par each operation perform would be a separate parquet file so if I was doing Singleton operations I'll end up with a stack of um parquet files which is why we normally do batches but any update well it's a delete and then insert this is going to go and tidy that up automatically for me so instead of having to worry about fragmentation right in some job and I have to go and optimize and compress myself the storage optimization will look at the par files I have and what can it do across them to get me to a smaller number of larger paret FS that will give me increased efficiency so there's a a whole set of work being done to just increase that warehouse experience now I know one of the questions I had when I started to look at this was we have a lake housee which has a structured bit and we have a warehouse which is structured only and the lak house remember has a SQL endpoint so I could consume do I use a lake housee or do I use a warehouse the whole point because I can put that SQL endpoint on The Lakehouse the front end experience it can be the same creating views can be the same reading the data the difference is what engine do I want to use to write the data I base the decision on how I want to do the authoring the consumption doesn't matter I cannot use the Spark engine to write into a warehouse I cannot use the T sequen to write into a Lakehouse so do I want to write stored procedures or do I want to write python how are you shaping preparing the data in the first place so it's really that tall what tall is the developer am I using if I'm using relational databases if I'm using SQL if I'm using a warehouse architecture if I want multi-table transactions well I only get that in the warehouse that's a very pure warehousing capability locking multiple tables I'm going to use the warehouse if I more code first if I'm writing python code if I work in notebooks if I'm using spark I'm used to working with Messy data and getting in shape I use a Lakehouse but often it's not an either or most of our Solutions I'll have instances of both you might have heard of the medal in architecture and what's really cool is workspaces will help you with this if we jump back for a second so I I skip this a little bit but if I go back to my workspace we could see all of the resources within my workspace sure but if I drag this little bar here it opens up this task flow and what I can do here is I can select a task flow so these are common task flows I might use and one of them is medallion so if I select this and it's telling me what the types of workloads it's going to require and the required item types now it's going to create a warehouse and a lake housee and it will then create a set of steps now I could change this I could say I I don't want this anymore I could get rid of this completely I could say I want a new type completely I could add my own types of items into this I could just delete this entirely but the point is it's going through and guiding me for what I might want to do bronze data silver data gold data I'm just going to delete this cuz I don't want it but realize they're there to help you so super powerful but what that means is The Medallion architecture is all about so bronze bronze would be that raw unprocessed un compressed data from The Source I'm just extracting it and I want to load it in somewhere well that would be the Lakehouse silver might be okay I start to match and merge into a little bit of cleansing that might be my silver Medallion data and that again would still be in my lakeh house and then the highly refined the aggregated data that makes up my gold that I would put in my warehouse so don't think of it an either or think a bit about what is the Right Storage based on the engine I want to use for the stages of life my data go through and again those task flows can help you get that right now so far all I've covered is the storage side again there's kql capabilities to interact there's other things I can store but I've really just talked about Lakehouse and warehouse and of course the point then is once I I have all of this I want to do different types of activity on top of these based on my personas data Engineers data scientists so I want to Pivot just a little bit to start okay so a data engineer so remember a data engineer and let's just change our view for a second so remember we have views in here based on our Persona so we talk to data warehouse but I'm going to change it now to data engineering now straight away it talks about for data engineering okay there's lak houses notebooks environments spark jobs data pipelines in p a notebook it's guided me through some of the key things as recommendations show me what I'm doing here now remember prefabric I'd probably using Azure synapsis spark I would have used ads Gen 2 accounts I'd have gone through various types of hops to read the data and my whole goal as a data engineer commonly is I'm working on a massive amount of data a massive amount of processing compared to a scientist it might be considered more basic types of action I'm preparing the data that everyone else in the organization can then consume do more advanced analytics on I'm filtering I'm cleaning I'm transformation um I'm doing this at massive scale for different types of workloads now obviously data dat Factory is one of the tools that a data engineer might be using uh to get the data in there but then I also obviously have notebooks and leveraging Spark to do many different types of things now this is a their own engine and when I think of data engine there's really two key artifacts that it showed there one of them is obviously the lake house the other is The Notebook now what's crazy about this we've got notebooks up here already is I'm not provisioning any type of compute when I want to use this at the workspace level there's a starter spark pool just gets created for us so if I was to go and look at my workspace look at my workspace settings remember we had that data engineering science and we had our spark settings now what it is doing by default is it's created this starter pull it's a Memory optimized medium size one to two nodes so what's nice here is it has single node support that's one of the features we have available to us now I could create my own pull I could create additional pulls and customize it I could edit the settings of my particular pull if I wanted to as well but this is just here from us and what's happening is there is just a warm pull that Microsoft maintains of those size nodes which means they're available crazy fast now you also saw there's the idea of environments environments would let me have the ability for different notebooks could run on different PS with different configurations so there's a whole set of things I can do but the philosophy is keep it as simple as possible until you want to go deep and if you want to go deep you can go really deep so you can customize a lot of the different things but from right here let's just say I'm looking at my lake house for example and if I open a notebook I'm going to create a new notebook so it's creating that well let's just do something one + one super complicated and we'll run it one little second two little second three little second four little second is counting for me as well but what's going to happen is really quickly I'm going to get my response so two that was not minutes waiting for it to spin up some cluster of spark it just went and grabbed from the WM poool A unit of compute that it needed to execute that query remember I can still do all of the other cool things I could still grab some data and drag it over and now it's created the spark query for me I'm going to change that limit to 10 and I can run that so it's creating the py spark python I can do other things in here I can write other things but it's so simple to go and leverage this and I could obviously mix in I could do other types of actions whatever I want whatever I'm used to doing it's just available super super fast to me now one of the other settings uh you may have noticed in that workspace let me go back to it again was this High concurrency that I have got turned on so what is high con currency all about so we know okay serverless compute it's spinning it up it's one of those engine types it's one of those types of consumption that it would go against that pool we have in the capacity that's assigned to the workspace that's running this particular notebook and then we have that high concurrency thing think again about what's happening so if I think of the load and when I'm running that particular notebook and it's performing some action it's provisioning at the time I execute it from that warp it's grabbing a unit of actual compute to run that thing for me so I'm running a particular notebook so I'm running notebook one if I run a second notebook it would go and grab another unit another VM to run it and then I run a third notebook a fourth notebook a fifth notebook etc etc and each of those have a certain amount of memory and um CPU and it's probably a lot wasted I'm just running OnePlus One while I was running that very basic little compute and remember my fabric capacity is at a certain level so if I'm running five six seven things concurrently and they're each using this unit that's consuming a certain portion of the capacity I'm wasting a lot and these are really idle they're not doing very much I'm not using much memory from it it's it's actual utilization is like here ton of waste so if I turn on high concurrency what high concurrency says is look this is silly instead of that what we'll actually do is 2 3 4 five running that same unit so now my consumption would be much higher within that particular unit of compute so I'm being way more efficient with the actual resources that it's assigning to run that now it still has to worry about running out of memory so the maximum number of notebooks in a single session is five and I cannot share a session between users so it has to be for a single user because there's security boundary considerations of that but this is what the high con currency does it lets me run up to five notebooks in a session so I can turn that on it will be more efficient with the actual usage and avoid trying to waste uh the capacity I have there and so I if I think about data engineering yes the data Factory the notebooks the lake house they're just huge capabilities uh that I'm going to be leveraging across all of that then think data scientists and I always think of data scientists as it's taking what the data Engineers do so if we pivot our View to a data science it's really adding the idea of machine learning models and experiments they're still using lakehouses they're using notebooks they're using environments but they're adding these models and experiments they're going to take that clean prepared data from the data engineers and run their own potentially more advanced workloads on it that's more about predictive analysis they work in the same place so the the data scientist the engineers they're using the same tools I'm using a notebook to train a model um I just have these additional capabilities I'm exploring the data I'm training the models I'm applying the models I have this whole exp exploratory data analysis Eda there's low code UI experience is to create my models there's a data Wrangler which not only cleans my data it gives me the code on how it cleaned it so yes it will go and help clean the data but then I can go and rerun that through the code it generates so it's really this endtoend experience they talk about 5x five uh 5 Seconds to get started five minutes to get meaningful results and we'll see that data Wrangler in the notebook so if I jump back over uh do I have my notebook still around or did I get rid of it let's see is there a Notebook oh there it is my notebook so one of the things you see here is the data Wrangler now I would have to go and actually set some things up to start using the data Wrangler which I have not done so noce it's finding the data frames I can do a custom sample but the whole point of this is it provides this immersive interface for all of that exploratory data analysis it gives me a nice gridlike data display but also this Dynamic summary of the statistics there's built-in visualizations it does that data cleaning that data let's just pick so we get this sample [Music] view so I can start to see the operations it's giving me this great insight into everything that's going on uh drop duplicate roles apply so it's showing me what it would do it showing me the code that it would be doing around there so it really is a super powerful set of capabilities it's doing that code generation as well so it's really simplifying all of the data preparation making my data work flows more efficient now one thing to consider about this um this is using this fairly generalized capacity and what I mean by that is there's no special GPU accelerations at time of recording available to me so if I was doing video or image processing as part of that uh engineering sorry science it may struggle now if it's standard text if it's tabular data and the end game is business insights that's going to be absolutely fine but just bear in mind that could potentially be a challenge if I am working with images and video because it's using this more generalized set of server lless compute um for those experiences the final thing I want to talk about um was powerbi now remember powerbi is all about the idea that I want to create these super nice semantic models and with those semantic models I get the data into a format that's easily consumable by the business users I can abstract away complex data structures I can put it into an understandable format that business uniters can then go and create and perform analysis on I can rename tables rename columns create relationships apply logic I can do a whole bunch of cool things and can create layouts um and even automatically create mobile app layouts it's a really powerful capability and now I have all these workloads that are talking to my one Lake one of the things I have to do is the semantic models are using data from some source and I have this analysis Services now in the past we have different options on how I could actually get data I could do an import mode so I could think about okay import mode that's bringing in the data into memory but I have to store it locally I have to go and then refresh the data when I want to publish it so there's there's a challenge with that if I was going from SQL Server well in SQL Server we had a direct query mode well it add to a translation so a performance hit because what it was having to do here is the visuals I'm doing in here talk Dax and then it has to convert that to SQL so there's some performance hit and lag introduced it's constantly going and talking to it with Microsoft fabric with the one Lake and powerbi we now have this new direct Lake mode we'll do this in a different fancy color because now it's just talking to it directly there's no translation it's just working directly with the data in that Delta Park format remember that V ordering we talked about it gets a massive massive performance boost you can compare it because obviously I could still create a semantic model based on Direct query mode and use the SQL endpoint to my Lakehouse for example there's a crazy difference in the performance between those things now that direct Lake works on the structured Delta parade content in the lake house only obviously not unstructured content now I can't just toggle that on it's at the foundation of the semantic model so I would have to set that when I create the semantic model um now there's still an upper limits number of rows it brings in but there's no translation with any of that direct query and this is going to be your preferred approach there's really not a reason you wouldn't use this I mean I guess technically if I already had a semantic model using the SQL endpoint and I don't want to rebuild it it's very complicated then sure I'd probably stick with that direct query mode but if I'm creating a new one you are going to want to just leverage the direct L mode so if I was looking at my Lakehouse I can actually just go and create a new semantic model and let's just select those two model one can't type very well and then once you're in the semantic model remember you would go and create those those relationships refined things that you then want to make available to those business users remember it's leveraging that shortcut data as well that I put as part of the holidays and note when I look at the tables and look at the advanced I'm in direct Lake mode so that's my Dimension customer and this is my shortcut to my warehouse direct lake so I'm going to get that amazing performance then I would go and do all of the regular things I would do in powerbi um that I that I typically have now another consideration would be remember one of the things I could do in SQL is roow level security if I'm doing roow level security I can't do direct Lake because I need the SQL to be enforcing those roow level permissions so i' have to use a direct query mode I'd have to go by the SQL endpoint now I could do powerbi row level that's absolutely fine I can do that on the semantic model with imp powerbi and it will actually even if I'm using direct like check into the SQL end point periodically to make sure you've not turned on row level security because then it would have to go VI the SQL endpoint to have that enforced um but that would I guess be one reason why I wouldn't use direct Lake mode so be if you can try and avoid it don't do the roow level in SQL and bring that up a level another thing I can do once I create my semantic models is you can start I'm storing it in my workspace so with one L integration once I've got my semantic model remember I could be importing data from uh CSV I could have calculated columns in memory I can publish it to the fabric workspace and then the next time I refresh it it gets populated into the one Lake which can then be consumed by the other engine so I could now have a notebook using spark to consume what I've created in my semantic model now it's not going to include runtime aggregation so if I had a measure that wouldn't be included in what's published because that that's a runtime it's not going to show me relationships between the tables but any of the calculated columns or tables that would be available to me if I wanted that data I'd have to implement that somewhere else but I can actually take those semantic models and as you can sort create it as an item within my one lake so I could then consume it as it refreshed from spark uh and other types of engine so but that direct Lake if you take away nothing else the direct lake is just massive in terms of the performance for powerbi now one thing you obviously have to talk about in this day and age is AI and yes there's multiple Dimensions when you think about artificial intelligence absolutely there were co-pilots for everything there's co-pilots that help me uh create SQL to create my spark to organize my models you just that that's almost like a table spakes now to create my notebook content it's there there's something they've also talked about I can't demo it I don't have access to it but there also going to be something called AI skills so this will be a type of item that you can create that will let the consumer of what you have here in the day to use natural language to interact and get help so I think that's going to be a really nice capability there one final thing that's not a feature of fabric but it's something fabric is going to make way more easy thanks to the one leg is think what this has become now all of the data these engines are writing are stored and linked from this single name space I can also include data that's in AWS gcp other things I can mirror data from Cosmos DB Azure SQL Snowflake and they're all treated the same once they're in here I can consume them so now there's this single name space that's all of the data in my entire organization well what would be really nice to do here is purview so perview obviously works with one lake so when I and it works with the shortcuts it works with the mirroring and now all of those different sources come together because perview just has to work with one Lake one lake is doing a lot of the work with regards to the data with the shortcuts with the mirroring so when I think about my data governance requirements on discovering my data the classifying the protection perview can now work on the one Lake and get that single View and make it far easier to discover and apply those protections for the organization if you're curious about where things are going there is a road map so if we go and look at the road map there's a what's new and planned and it breaks it down into the different components so if I looked at one Lake for example it talks about what it's working on so shortcuts to on premises data that's obiously there data access roles one Lake security model and then things that it's actually shipped already so I can go and look at the different sets of capabilities to see well where is this thing I'm interested in where is this actually happening so obviously private link at tenant level they've done that manage v-net support with yeah we saw that already but you could go and look so there it talks about private link at a workspace level which I think will be super interesting okay Q4 2024 so I can go and look at what's planned what's coming and even see what's been shipped recently so I think this very transparent view of what's coming and what's new I think that's going to be really powerful um for people so that was it I mean that was the stuff I really wanted to cover and talk about as part of talking I know I talked about a lot I always do I apologize but I hope it gives you some idea of what fabric is cuz I know for me personally I was like oh is it just a sweet and it's really not it again take away one thing from this and it's this one Lake becomes a point for your entire tenant that all of the different engines can now talk to and consume so there's no more silos there's no more painful Transformations between things all the engines have been updated to work with this open Delta parket format and then if you bundle in and couple that with the shortcuts bringing data in from other places and we saw that's expanding couple that with mirroring for structure sources couple that with the iceberg metadata support and the xtable conversion to another way actually you want to read and write snowflake I would use the iceberg metadata as opposed to hey I just want to be able to consume it for analysis the fact that that one lake is based on the standard ADLs Gen 2 API it means it's just this open interop interoperability complicated work on a Sunday morning for anything it's not just for fabric that one Lake becomes a place that any application and workload can work with because it's just that easily interoperable API that things are used to already it's got this really powerful serverless computer that I get a virtual bucket based on the amount I've provisioned but it lets me do this massive bursting and I pay it back over the next 24 hours so I don't have to heavily over buy because I have some peaks of the things I need to do we still have all of the core experiences we used to for data engineering data science the warehousing the analytics side sprinkling our co-pilots the AI skills that perview that now works with the one Lake it it really gives us as an organization this amazingly powerful and I think a some of the capabilities I think the workspace private link will be huge for a lot of companies so I think that's a very welcome thing that's coming as well but it's going to kill this pain off the silos of data these Transformations because this engine has to talk this goes away when we think of this open standard that all the engines have been updated talk to so that was it um I really hope that was useful as always till next video take care

Transcript for:Overview of Microsoft Fabric Solutions

Transcript for:
Overview of Microsoft Fabric Solutions