Transcript for:
Empowering Non-Tech Users with Databricks

[Music] all of this data is locked in the hands of your data scientists and data engineers in this session we will show you a great new way of unlocking that insight and getting more of that data in the hands of your Reg non-technical business users my name is t I'm the director of product this is Andre he is a staff software engineer at datab bricks we're going to talk to you about datab bricks apps well let me start with a story uh we all have Sarah the Hot Shot VP of sales in the organization um she speaks Excel she speaks email she speaks anything in a browser but doesn't necessarily know how to interact with a ton of data outside of excel in your organization you have all your data stored somewhere in the cloud or a combination of things in the cloud and things in other places and Sarah still needs to be able to get insight that valuable Insight we were talking about into her sales by region by area get forecasts into what those sales are going to be in the future and she can do some of that in Excel but not everything on the other side in the organization we have Dan the data scientist and Dan knows how to speak data is familiar with a number of programming languages works with data bricks and other tools and can get that valuable Insight from the data and a lot of our jobs is connecting Dan with Sarah and others in the organization that need that kind of insight so how do we make it happen how many have a similar story happening in their company okay you're in the right place good how many identify more as a Dan how many are aera are you afraid one one person did okay gonna be talking about two for the rest of the talk uh so how does Dan do it today Dan uses a combination of notebooks queries dashboards and a whole set of other tools to be able to do that in a notebook Dan is able to go and um get you know use his favorite programming language he can use Python he can use SQL uh he can you know do lots of massive processing on the data access the AI models that they have and otherwise be able to get to the piece of insight that he wants to share with Sarah notebooks are not a great way of sharing data with Sarah I think if Dan shared the notebook with Sarah some problems may happen uh we wouldn't want that so Dan typically would I don't know take a screenshot send something to Sarah maybe even create a dashboard you know get a dashboard maybe write some SQL queries and have a nice uh dashboard that he would create for Sarah and other people that are trying to do that and dashboards are great because dashboards are kind of mostly accessible to people you know in that in the outside World they can you know click on a link in a browser and be able to see that it can give them nice visualizations some people can make you know dashboard sing and dance they can they have all these like filters in the dashboards and like really nice and interactive dashboards which is great if you know exactly the questions are being asked and the the interactions are somewhat you know limited to what you would expect a user would want that's okay these are good so this kind of can work in a situation where say for example you want to see the uh sales of the last few quarters but what if the assignment was a little bit harder for Dan and uh Sarah wanted to do some complex what if analysis that involves a very detailed you know interactive thing that that needs to happen and you know clicking on you know filters and you know subsets of the data and you know imaginary results that show up from the right and overlays and interesting things like that that's not is easy to do in a dashboard if uh the colleagues on the production team wanted to actually visualize the physical product that they have and overlay the data on the physical product would luck doing that in the dashboard that's going to be pretty complicated uh if you want to actually interact with the data not just look at it but also consolidate data write to data you need kind of a different thing that allows you to do that kind of stuff with the data so notebooks and dashboards are not always the answer sometimes you just need something more and that's something more that Dan needs is actually write an app they want to write an app for the user that can speak to them in the language they understand that shows the right content that they want to see and the interaction model that that user expects further they want to be able to share that application with lots of different people that may or may not have access to the underlying platform like data brakes and they want them to be able to just use those apps on their computers on their PCS so how does Dan do that well Dan can write apps I mean luckily Dan knows how to write code they can use Python they can use other tools uh there are lots of Frameworks that they can use for writing apps but what is also hard is Dan now needs to worry about how to deploy that application where to deploy that application dealing with app service dealing with containers that's you know Dan doesn't want to play it infrastructure guy in you know in the afternoon Dan just wants to write the code for the application also sharing the application is not as easy as it seems you don't just send a link to somebody what if that somebody sends that link to somebody else and that somebody else is not supposed to access the applications you need to control how these applications are shared and he needs to manage that control somewhere and it would be nice to be able to do that from the place that you normally are accessing you know permissions for data Maybe you get to set permissions for these applications as well but then that becomes another job of Dan having to deal with single sign on and O and different authentication methods and all of that stuff that's not that's complexity your it team probably has a say about who gets to access the you know the the data in the applications maybe they set a rule where you know certain IPS are the only ones that can access data in your uh in your company well now Dan has to take that into consider consideration as well or they risk that it is unhappy with their application if they are unhappy there's probably a good thing to be said about taking that app down so then now has to deal with hosting sharing securing the application and those are hard problems that are even harder for Dan than just writing the code for the application so what do we do well that's where custom apps come in and we would like to introduce data bricks custom applications and it is the simplest way for data scientists data Engineers to create applications that are secure and governed and easy to deploy on data bricks with custom apps you just write code for your application and you ask datab bricks to host it by running the application datab bricks takes care of all the underlying it infrastructure more importantly that INF infrastructure lives by the same it security rules that you already set for your workspace that you've already been managing for your data that your it team has already approved and agreed on you can also specify who exactly gets to see that data further not only who gets to see that that data on those applications but also respect the same rules that you already have for accessing the data and you can choose to limit who gets to see the application based and the contents of the application based on what their credentials actually allow them or or you can choose to do more than that you get a lot of choices but it's easy to do instead of me showing you more and talking to you more about it I think it's best if you actually see it in action so Andre why don't you to show us a demo of how this apps thing is supposed to work yep I will okay morning everyone uh let's code up some apps um I'm going to go live this time let's see what's going to happen but uh as you most of you are an engineer or data scientists uh so you know the one of the main skills that we have is copy and paste so I I went to this gradio thing like the what is happening right now is there's a lot of very cool Frameworks for building web applications in Python one of them is gradio uh I'm not advertising to use gr anything works extremely works flask works you can write HTML CSS we don't care uh but I found this one I was talking to someone here the audience yesterday about maps and they had there's this little code here that I wanted to copy and paste so let's if I get this uh all that I need to do is create a python file B the code save um in here my setup is very very simple I have my virtual environment I have my database config and I have a vs code extension for data breakes that is automatically syncing my code to the workspace so if I go to my workspace I should very theoretically see something appear in here so my source code is in the workspace right now that's great uh so back to but we don't write codes in like apps are little bit more complicated to write that and I think I at least prefer using a proper ID so let's let's use this so the first thing is look at this code and say oh there's from data sets probably need to install that so the other thing I need to do is put a requirements txt and say data sets perfect um and then go to my terminal datab breaks apps create has suit demo po so when I'm creating an app what do I do uh we're provision URL uh we are setting up a service principle we are setting up o uh an SSO for you so at the end of this like we have everything that the app needs to run uh except from from the actual compute but you have all the identity of the app and the URL that's all there uh the next thing I'm going to do is just datab brakes apps deploy what I call this data and I Summit demo and what I need is I need to know what's the folder here right um I just need to copy this path put it in there and go for it um so what we're doing right now is that we are actually starting a compute specifically for that app so everything is um is sandboxed and one app doesn't have access to the other uh is a small VM that we is specific for this uh which means to be cheap and for the app that should take a couple minutes um a note you should take a couple minutes and there is no way to be slower than this because we're actually going to provisioning DVM for you so it's only going to get faster but in the meantime let's change this this is not very fun because I'm reading from a CSV file from data set that's not how app should be in data bricks um I just happen to have that same data set in data braks look at that uh it's imported this that's great so now let's make this app actually query this data you can see some sample data idid name uh neighborhood latitude longitude room type all that so our app now needs to query that okay so to query this um what do I need uh simple let's do that from data breaks import SDK SQL uh probably don't need any of that don't need of that I probably need a function that says uh and that function that says query probably returns a data frame but let's now let's code up some stuff how do you connect to data braks so the first thing is that um we automatically inject all the information that you need for the service principle of this app in your container which means that you can literally just say config is equal sdk. config doc config and that is the all the authentication that you need for your app the same thing if you have the datab breaks config and everything set up in your local development it should work a very important part of apps is that if it runs in your local machine it should run in data breaks very important for me to make that work because you don't want to be debugging things inside data Brakes in s agage and all that stuff you want to debug in your local machine although I will show you how to debug and see the logs to so that's that's it I got a config so now with uh SQL do connect what the connect needs uh not that not second skill that we are learning how to see of the the co-pilot so that's not the way it works server host name is equal config host because also the host is injected in the container you don't need to know and you should have that in your database config file the other thing is an HTTP path uh that we will keep it empty right now and I'll grab soon and there's a thing called Service uh credential provider uh we everything that the app does in with data bricks is using ooth which means that it's only shortlived tokens and you decide how long those tokens live for um which also means that you need to code in a way that you don't assume that that token is going to be alive during everything so you this credential provider that's what it does it refreshes the token automatically for you so it's very simple I just say Lambda uh config do authenticate authenticate returns head with a bar token it's pretty much it uh okay as connect with oh this one was right uh connector cursor cursor yep I think that's it um I just want this a little bit easier I want this fetch Arrow because that one has a two pandas API and that's exactly what I need uh and then I just return data cool uh might want to do that here that is there okay so now a couple things we need to fix HTP path let's grab an HTV path go to to secr Warehouse there's one here put it there cool uh and then I need to grab my table what's my table name uh this guy right so under BL cool uh do not do this in production select star from a random table not that good okay it's a demo um okay so in the meantime oh yeah I forgot to look down here look at that app deployed process is running cool uh I'm just going to click in this link and okay I didn't save my file yet but if I go and open this oh that app that we copy and paste from the Internet is running in data bricks this own thing has his own URL uh but this is not very uh very hard to do what's hard to do and what I spend the most time on is that is honoring IP access lists private links end to end encryption all the stuff that you argue with it about everything's there the same infrastructure the same security boundaries that your workspace has now your app has as well now the cool thing is that this you have all the pixels in the screen so you have this URL is all yours like you can do whatever now you can share your app with your URL and you can I can show you you get SS SS SSO by default so you have to actually log in into datab bricks and do whatever you uh whatever IDP you have uh so that's that's like very impressive um to get it done and as you saw now in I don't know 10 minutes it didn't even take 10 minutes we got an app and deployed and everything done cool but again this is not very fun uh this works you have that but the data is CSV like let's query this and stuff uh okay so back to my ID so I have a query let's go back here to the code I have a query I config identification uh connect to SQL I have my warehouse uh I select my table return data and I think the only thing missing here is somehow I need ident you have this oh look at that it's actually right so okay so I saved this nice uh again this should have been updated in my workspace yeah my code is updating in workspace automatically and let's uh let's see what happens here uh I'm going to redeploy this app with my changes redeploy prepare source code Codes deployed didn't seem like much but actually updated and I think the way to prove that we have a special little path here called log with a zed and I'm from Canada so I say z um and this is piping Live Your Standard out and Standard air for you so you can see that the first time we got this code installed his requirements started start your app with oh is this too small let me increase this it started your app with python app these are all the environment variables that we load for you automatically so things work from scratch and the reason why it works is that gradio streamlet UV corn all of those things are already set up for you uh it's running there now the second deployment get got the code again uh and you can see that the code here is in a special path so in this case I'm using a mode for deployment called snapshot so I'm actually grabbing the code saving where the service principle folder is so if you go and change your code you know exactly what got deployed because it's very sometimes if you have things that live you don't know what's in the server versus what's in the workspace in this case it's not there's no problem about that but I guess I'm going to have to prove that this thing is actually occurring right so if I go refresh got an error why did you get an error error request to the server so my code is there it's failing something right and that's like outside from the the the keynote there was an error too right but in here like writing code the number of times that you're gonna have to go and do this so I really wanted to demo things breaking so I'm not uh and it's very important like now it's breaking that's great uh but you got all the errors that's fine so now let's try to fix it right what did I do wrong uh got that oh I think I should have done this okay so deploy again deployer source code deployed the app is starting again see is updating Live Now app started if I go and refresh there it is and now we just went through the process of deploying an app connected to the bricks with the service principal authentication in the O found an error debugged it fixed it and deployed again all that fast [Applause] cool so now what's next right like what's next is that uh we likely do not want to run an app in production just saying demo. launch and using python most likely you're going to use some sort of process manager like G unicor uvicorn or something like that right uh just running python laop Pi is a it's a demo but it's not real thing thing so let's do the real thing again let's use our first important skill and go copy some code uh gradio for some reason has a way to import uh to uh Mount fast API so let's do that uh so if I copy this and put it in there cool and I copy that put it at the bottom and I think this is called IO no and I want this to be the normal path or is that I think that's called IO cool okay so I think what's happening now is that uh gradio is just rendering things and fast API is the one that is actually managing the paths uh that's great but how do I tell like apps to there b apps to actually run this thing the right way uh oh before I forget there's another thing that's very important uh for app development is that your source code cannot depend on a specific workspace directly because you're probably going to develop in one workspace and then deploy in a different workspace so then so this thing about like just hard coding a HTTP path from a warehouse is actually not good so let's change that uh so you can we have this special thing called apply yo I don't know if you heard about yo files but they're very everywhere nowadays uh there are two things that you can put inside this app. y one is called a command the other one's called an environment oo look at that that was crazy uh don't want that though okay so I want to run this with uicorn app.py yeah I don't need that I actually want to set workers yeah so now what I'm saying is I'm run my app using this command and I'm going to see Finding environment variable for this app uh and that's going to be my SQL my SQL Warehouse here my HTP path so I just going to call this HTP path and uh where's my HB path here Jesus Christ that cool uvicorn is that the right uh anyone shout if you see something wrong um should work and then here I going to do. http PA go and I think I need to import the S somewhere here ah cool so what I did here I using a proper process manager so if your app crashes on a request your whole app is not going to crash now uh and I'm using the environment variable to connect to the to not make my source code relying on the resource in the workspace what do I do I deploy again to prepare source code downloading source code deploy done uh let's go see the logs like seeing the logs okay so requirements have not changed skipping now I'm running uvicorn app blah blah blah I'm starting with this so now I can see that I should have I have one new uh environment variable being loaded in my app and new vord it's running and started and I should be able to just go to the app refresh this page and the app is still there running okay uh that's cool but the app is still using the service principle and sometimes that is very what you want because the end users don't really have authorization on the data but other times you really want the end user authentication to be on the app right so let's do that um I think in flask there's a thing called in flask in uh gradio has a thing called requests request and as a type radio. request and if I pass this over to my query I can um I can get headers so let's get a few things uh I'm going to get an email is equal request dot headers is it headers get headers let me check my notes press. headers uh we forward a few things for the app all the time uh based on with on the oidc server and the ID provider we get the preferred username the email uh and the user ID for it for you and and then uh also re IP and a few things if you really need that U so in this case I'm going to do X forwarded user email um let me just check my docks yeah those are the things that we forward forwarded user forwarded email preferred username and access token so let's change that forwarded email uh the other thing is the user actual token uh this is a oo token that was done at the very beginning so by default the app doesn't get doesn't have anything injected everything is off so if you want to get uh let's get the X fored access token well that was that was smarted um so if I want to authenticate with the end user all I need to do now is just say access token is equal token uh and just so we know that this is working let's print some stuff print email um cool plug [Music] in done let's see what happened here things are important as how good now if I go and make a request Heathers is not a callable see cool that's change this request. what heads. getet I asked to shout if I did something wrong look it's deploying there it is cool so like I said that's my email printed there uh and now the app is actually used in the end user authentication uh what's Co about this is that normally in data breaks or in other authentication models like a notebook notebook is like here's my laptop go run it that's the authentication model in the app is a bit more complicated because you have for example an app you're uploading files in an app the file uploaded where you're going to put in a volume is not of any end users's business that is owned by the app so you should use the service principle to do that upload to data breaks but if you curing some very sensitive data you are actually using the end user information authentication so you actually use that is up to you to decide what's the best authentication model to use and going from Europe L to user authentication in the data store is not that simple I don't know like try to do that in Azure and that is um that is pretty much it now of course now I just did uh let's show off a little bit more uh we are just starting to put uh you know uh um we're starting to add this UI here for apps at the end uh I have a bunch of apps where my apps uh I want to create an app if I create an app and say you know chatbot put a nice description um another thing that we're working on is that uh most likely you're going to end up in a case where there's a lot of the same use cases in different app so you kind of start from a skeleton very rarely you're going to write code and get your everyone to write code from scratch so you can say some types of apps you already have a skeleton and even more than that you might have some certain templates that you want to use the app needs to look like that to look like this so we're working on ways for you to add templates uh in this case I have a single one uh but uh you define create a template Define where you want to save that's great uh get rid of this this is staging environment so create this is going to important template template is in the GitHub repository so you can sa save everything in the GitHub repository get your templates in there appear in data braks you configure that for you uh and got this uh I got to deploy and here have couple of options but I deploy like in the first time it takes a couple minutes to deploy uh starting compute uh if I go to my other app there on my apps there is demo you show that in here all all the all the information about that app uh deployed whereas the source code blah blah blah uh and let me just show a couple of the other apps that we have uh so you can know that it's actually real uh so this is uh chatbot with dbrx uh if I go to the URL little chatbot add something return this is like 20 lines of code of asking model serveing endpoints in data breaks how to do things it's very simple uh the other one is uh let me yeah everyone loves streamlit let's show some streamlit app out here uh so in this case I have stream L running saring some stuff it's putting my user information in there uh this stream L app I also like have you know quering for both um you see in this case I do not I did not Grant permission to the service principle the app for my for the data so the data is not available have an app my own user has access to so that's streem late for you uh there's Dash is it done am I time there's Dash you know I think you believe me now it works um and that's that's it for the demo that's pretty cool thank you Andre it's awesome to be able to see how uh you can easily develop one of those applications you know it's uh it's been about 20 minutes or so and uh you went from copying en code from the internet to running it in the context of data breaks seeing it running actually making it go to production the nice thing about being able to develop these applications in datab bricks is that you can basically use Python you can use your choice of framework you know you like gradio fine you like streamlet okay you want to use Dash fine like whatever you know framework you want to use you're pretty open into the choice that you have uh there's really no magic that is happening behind the scenes like Andre said you know what you're developing on your local machine is what's going to end up running in the container in the cloud you can debug it in your local machine that really simplifies things uh you can use the editor of your choice you want to use notebooks you can you want to use Visual Studio code you can you want to use any IDE that that you want you can uh all of these are going to going to work basically use your own environment and your choice you have a CLI that's working for you uh most importantly is the built-in security and I think that really is where it all shines is that not having to deal with that heavy lifting of the infrastructure act as an IT person have it respect all the rules that you've already put in your workspace and the ability to just automatically authenticate into those applications now apply the rules based on the user that actually authenticated and be compliant with all the it policy is fantastic having endtoend decryption of anything that is being sent outside the application just really simplifies things and when you see where the apps are running basically those are running all inside data brakes inside the data Brak serverless compute managed by by data bricks so you don't have to manage containers or deal with containers or even know what is happening or running under the scenes you just get a URL you send it to people and as long as they have the right security they're going to get in governance and access is very important we showed you how to share that data you can literally pick the names of the people that you can go and share it with you know pick emails you want to share it with people non-technical users people outside data breaks that's good all of that working again with the single sign on and the application itself now being able to go and act either as its own application fetch the data as the application service principle or it can go fetch the data or interact with other assets and data bricks as the actual user your choice you can mix and match or do one or the other you really get all of that control and finally as somebody that is developing applications not just by yourself but with a team all team practices that you have you can use with grading applications as well you can use git you can use cicd you can use data braks acid bundles that we've launched recently that help you organize all of that and manage deployment between different uh environments all of that kind of comes with when can you get your hands on this right now apps is in private preview uh we have a good number of customers working with us in private preview we have a few open spots so if somebody's interested in joining the private preview uh talk to us uh after the talk we're happy to uh to add you kind of based on on what your needs are later this summer is when everybody else gets it so as a public preview later in the summer you should be able to try apps by yourself if you're not part of the private preview and our goal is to get this into General availability for Everyone by the end of the year if you would like to learn more reach out to Andre or Nick on our team uh they are really the custodian of lak house applications in in data braks uh if you have any feedback you want to send us or if you want to join the private preview feel free to do that this QR code here can get you directly into the you know sign up for the private preview if you want to do that and that was it um we are right on time so thank you so much really appreciate you showing up and we hope you enjoy the rest of your conference [Music]