I'm Alex Bowski as also ways to pronounce my name which is a fun fact but can be Bowski toosi too if you want to be the Polish version and I usually take my coffee any way I can get it I love a flat light but it's very hard to find in the US it's like proper flatl so I got it's cooked on them in Edinburgh welcome back to the mobs Community podcast this is the definite guide on workflows AKA dags AKA pipelines we get into the nitty-gritty of why they're valuable how you can use them we're talking workflows within workflows workflows that are dependent on other workflows it's just a constant Deep dive on what they are why they're useful and how you can best take advantage of them ma man Alex did a whole survey of different tools that are out there he covered 79 of them and we'll leave his blog post to all of the insights that he gained from this survey of all the open-source workflow tools on GitHub in the description so you can check it out let's get into the conversation with [Music] Alex workflow systems and what those mean what it is what got you interested in them where are we coming from with that because you've got lots of thoughts on it and there's this really cool like summary or survey that you showed me so I want to dive into that but let's just set the scene with workflow systems sure the idea of a workflow I had to look us up and it goes back to really the 1920s which is cool and the the term itself comes out of people looking at process engineering manufacturing and other kind of business context so that makes sense right and it is how we use it today but the real sort of Genesis of the current generation I think comes out of more like rule-based expert systems the generation maybe some of the '90s where people were trying to build instead of BU wrri and code the Implement business processes they have rules and the rules would it would use a rules engine and the rules engine would tell the application or whatever what's the next thing to do and so there's this sort of implicit idea of a chain of things that somebody has specified and what's the first step and if we that's successful and maybe criteria then what's the next thing the average user might think of those as or sort of person might think of those as like a filter but it's more computer sciencey talk that's it's a big graph it's a and inside the graph is a bunch of steps and the steps are basically tasks that do things and manipulate data and produce results or have side effects and then when one successfully completes some Downstream task is Inon and that is the workflow itself and there have been processes systems developed post rules engines world that think of that more like a graph and they think it's more human intuitive to draw out that flowchart of what they want to do and and there's a whole sort of other side of this world is not used in machine learning as much but this all business process modeling um and there's actually stands around that there's a thing called BPM and there's a business process modeling notation and those are very cool things themselves that people use to describe processes that they use inside their businesses for all kinds of purposes and the having an exchange format notation for that great but that's sort of you know the I think bation here of those kinds of systems that were built for sort of business processes in general versus how we have use them in data devops and data engineering ml operations and so forth um and then there are kind of a small category of things that are workflow engines that are embedded in applications uh themselves um uh which kind of are allward Place uh and so my interest was to look at the S of more broader context because revisiting things I've done in the past and all this new stuff and what is out there and I spent some time on uh GitHub looking at I found people had some great lists and I found some ways to search for these things and I basically built a big spreadsheet of workflow systems and features that they have and are trying to sort out these different systems and what where when did they start and are they active and features do they have versus others so presumably the majority of them are geared towards technical users yes as I said there's a bifurcation the the business side of it that do is more more of the business process modeling they have nice interfaces they're meant for you're sort of average user inside a corporation or some place to to be able to describe a process some of them have a sort of a no code aspect of them in the end of the day it's still a very technical thing to to draw one of these big diagrams even if you have a beautiful tool to do it with you have to know a lot you have to think a lot about like how does we actually do this and sometimes they're replacing or modeling things that are done by humans and I have a whole story about that from the last company I was working with Scientists Biden how I was trying to we were actually use the this this bpn notation just to help them like draw a picture and I needed a standard way to draw a picture of what they were doing and so that business process modeling notation in the scientific context was actually a useful tool right it is funny that you talk about how we can represent it as graph cuz I've heard this before in this last year when we had the AI quality conference one of the speakers who actually is one of the creators of Docker Solomon he talked about how everything is a graph and then my friend that was sitting next to me David he said wow I never thought about that it's yeah everything's a dag you can really represent anything as a dag kind of and so it's almost like thinking about it like that yeah graphs are super useful notation there also can get messy really quick right this is where you need tooling and so the notations and so the the depends on what you're doing with it I think the what I found interesting and we stumbled on this bpn and I had not been following that area for a while but I was we were struggling with how do we draw a picture of these process engineering flows because we want to Auto automate this we have machine learning we have robots we have humans right and they're all part of this process and we have we have different interactions there and when a person does something in the lab and they take a they literally have a plate with wells in it and inoculant and stuff in it and they're putting it inside a lipid handle a robot or they're putting inside an incubator robot we need to know what that task that the human did before the robot is told to do their task before the machine learning is told you with the post data to do its thing and so that modeling of that whole process let us build tools and user interfaces to make the lab more efficient and that's one of my sort of takeaways is from that experience was that there's what we do in anal Ops we're like just the part the technical part that we're concerned about and then there's how it's used in the bigger organization and that's also a workflow and so it's like workflows inside workflows and you know how where you slice it it it is important to how the sort of result that you're trying to trying to get so in this case we're trying to make the the processes that the lab used more efficient to make to get more throughput to the lab as well as to get all the interesting results from the computational side of it and so writing that workflow down as a whole thing with all the technical bits in there to so everybody understood all the parts um was a was a challenge right so having a notation was our first like how do we communicate this how do I get a literally a drawing doesn't have to turn into code but just a drawing that everybody says yes that's what we do right yeah and from all sides right and even if their eyes glaze over for a part of it that's not their thing we all have consensus and we're building around the same thing so we still M across this mostly because there was a really cool web based tool out there from a company called kundas but they have a open source thing it runs in the browser and you can drag and drop things and build the whole work flow and then it spits out an artifact bpnn uh notation file format so theoretically down the road you could do something with that our in our case it was simply just a diagram that we could use as a communication piece as part of our sort of internal technical doation but it was a good starting point for that just drawing a picture of what are we implementing with our road flow systems you talk about the way that you slice it and being able to look at the workflows almost like from infinitely zoomed out to infinitely zoomed in and all of the different layers of workflows involved in each time that you're zooming in right and then where one workflow starts and one stops and who owns Which workflow and then you're getting into technical workflows versus non-technical workflows and do you have any experiences on what is useful in that regard like how to slice and dice these yeah I think the one of the challenge one of the things is to maybe not try to think of it as there's the one workflow so I think the on the outside there's the process of your business whatever it might be so if you're a very ins silico technical kind of organization like you have a digital product right and you're using machine learning there's the aspects of the tech internal piece of how that model does conference or is trained or is evaluated and that's a very technical workflow that's something that a small team Always Somewhere in your organization really understands about and then it's a blackbox and it's a black box in a larger workflow that is how your organization uses that how they make decisions around it so train a new model it the output of that that whole process might be not just the model but how good is it at a particular Tas somebody has to make a decision about about what they do with that decision could be automated if it passes certain criteria we we put it into some kind of production track it might not be it might be there might be a human who has to go in there and make a decision and start the ball rolling that's part of a bigger process and that's s for a business process workflow and so I think there's these there's opportunities here to have a layer model where you can be using and maybe using different Technologies different workflow systems right but still they're interacting because one is one is using the other and in the case of my last company the at arter workflow of what is the scientist labs doing versus the technical bits of each tool that they're using that was just a paper discussion right it's a diagram it's a part of a documentation it's so that we all understand what we're doing but maybe the ultimate goal is that's executed by some system but that's a long-term goal versus the short-term goal which is so now if you dig into the boxes in this thing how do we accomplish that task maybe there's a big procedure it's a wet lab thing maybe it's a robot maybe it's a whole machine learning workflow that runs a bunch of code and manipulates a bunch of data but so there's you can have you know you can draw the workflow out and you can execute it and you can have a system that actually implements it and I think those are useful architectural sort of ways to decompose the problem and then you can choose where you implement your and where you spend your time and money implementing a system right and there are choices for each of those things that's what's cool about this is you can go full on like I got work for everything and I've found a system for everything and there are tools for those things it's the cool place where we are at this point in time which was not true even like a decade ago yeah it does feel like if you are able to understand the different workflows and what happens after your workflow ends in a way you are setting yourself up for greater success because if we take that example of okay there's your little piece which is the model and then exposing the model to the greater organization that's great and you can be optimizing the model for things that you think are cool but if you really know what happens after you expose it to the organization and how other teams are using it or what they're looking for when it comes to that model then you're optimizing for something that's greater than just what you think is useful and you're understanding how it's being used in the greater context have you seen that being a a case that when you notice all the different ways and dependencies that are being built from your specific workflow you're setting yourself up for more success I I think that it depends on the scope there so I think the workflow systems when they are properly handled however whatever kind of Technology you choose you have to support them they have to be vibrant they have to meet the user's needs so they can stitch fix you know when I was there there was higher than 40 data scientists running stuff for the organization some of these things were the sort of back house types of things some of these things were more daily things that happened that populated their systems internally and they were really critical some of them were research work and so forth and so all over the place the ability to describe the steps of things and interact with it was air flow underneath or was but they didn't actually interact with air flow they had a DSL that let them it's a term we should Define the domain specific language but they have a way to describe the workflow in DSL and then they could give it over to the system and they would run it through by underneath with airf flow and then they had a way to execute the tasks on their big sort of batch system all that complexity was hidden from those 140 users so they didn't have to become experts on that technology and it did lots of good things for them like the task executor Auto learned stuff like oh your task needs more memory so we're going to retry it with more memory because it ran out of memory and it learned the right parameters for the user so again they didn't have to be experts on deployment you can do lots of cool things with workflow systems that and and you can so that gives you that they're more productive right they are less frustrated until the system bre and then they're frustrated with it but happy users are silent this person just and you so I think that I've seen that kind of sucess what I haven't seen and maybe don't me tell me I'm wrong I haven't seen a jump silos right so that's a data science organization it's like a big company the you the tools used internal to that part of the organization but you don't see nesting of workflows as much where you see and this is now part of a bigger workflow I think those are more those processes are more bespoke in terms of if the data that's used by the output of the machine workflow the the model that was updated is used by some of our system by some other connection that is specific to how we deploy that thing and the idea of a workflow using a workflow and but they're different parts of the organization using workflow systems I haven't seen that sort of in in real life I'd like to see it I think that's a vision that we describe our organization ations as process like I like to think of as process engineering right and we can take that down all this and fact there's a term I'm borrowing from like manufacturing where yeah you know everything has to follow a procedure and we describe what the procedures are and they fit together like puzzle pieces and I think business digital context can work the same way um but you have to do that process engineering and it's hard so that people kind of skip that hard step and there's a payoff people will use these systems and and so I think we see that with data engineering and mlops and and operations where there is a payoff right because there's all this complexity about how we do our task and we could hide that in the system and make the end user who is a data scientist or machine learning engineer or some other kind of data person data engineer they're they're just more productive because most those details are left to a very small hempful people in a corner making that work flow system work for them why do you think it is hard you mention to do this process engineering is it just because it's timec consuming and cumbersome and there's a lot of friction there to describe the systems and explain the processes and it's not really like you get you can't just do it and it gets autod documented for you really right I I think the my last company experience was is interesting because my main users were internal they were scientists and they are this is not a world view that they're used to thinking about in terms of like we're going to describe our what we're doing in our lab as a graph of things of tasks and they Rel are writing like procedures operating procedures and it's a big document with hierarchies of things and next you do whatever and next you you talk to this machine and set up these protocols and so the idea of looking at it as a graph of tasks and how they interact how you interact with that and annotating it with here's the data that I need here's the preps I have to happen before I can start this task here's the controls I have here's the failures that could happen in the situation that level of detail they're just not used to writing out and you can understand that because that's not usually they it's not useful to them in their day because they know how to handle those situations and then they're familiar with equipment they're familiar with the work that they're doing so what they've been trained to do that's what they've been doing in their career but it doesn't when you try to scale and this is where the problem comes in you try to scale it you try to understand more about the metrics of it the data artifacts the other things you might be able to get on the system then you need that process engineering to understand that if I'm going to automate something you have to be able to draw me a picture of what you do and tell me all the facets of it and so that's just a hard conversation should have um and so I had varied success when I did that with these some people really pushed back hard they're like I don't know why would I have to do this here's the thing I'm doing here's the procedure I have it works fine for us and others were like sounds interesting the diagram looks interesting but I don't quite understand it and then when they engage on it at different points in times they all of a sudden they're like oh okay I see I can see some value in this as a maybe as a way to document what we're doing in a different way and that's a great opener right if you see value in drawing the picture and I can take the picture work one of these tools and do something with it actually build a system that's great and so I still had that sort of cool like range of responses for people I I think that is the challenge that organization has to be able to attach some kind of a value to it what are we getting out them what's the ROI doing all this process engineering and so it's why police are like manufacturing things to all sense that they do this kind of engineering because I that efficiency quality measuring stuff that's all about manufacturing but when you're when you things are much more I wouldn't call them Boutique but they're that's probably not the right term but basically when the they're highly trained people and they're they're building these things they're running a lab they're running the business side of your organization they that sort of technical benefit and you're asking them to speak this weird language what's the benefit of that and so you have to lead them with like where's and so I I always like to build demos right or prototypes or find some Exemplar that's going to give them like the why why would I do this which is where we started with my last company and other like you've got to build something that's got a captivated value somebody can see and then there's an Institutional investment and going beyond that yeah I noticed it with just this podcast for example I had a friend tell me really early on you're spending too much time on that podcast why what is your favorite part about it what do you like doing and then let's figure out how we can automate the parts or get someone else to do the parts that you don't like doing because there is a lot of intense things that you get from creating podcasts and like you said like I was doing one a week and feeling overwhelmed because there's so much that goes into it I wasn't I was dropp the ball on a lot of stuff so I recognized thatly system well I did so I was lucky enough because my friend told me hey sit down with my buddy he's really good at this type of thing and recognizing where you plug in and where you don't need to plug in and so I sat down with a guy and he just said so what do you do first I say I find the guest and then what do you do once you find the guest I ask if they want to come on okay and then what do you and we just went through that and what if the guest says no or what if the guest says this and and 3 weeks later I had a very indepth flowchart and it helped me so much because it was the blueprint and so I'm wondering have you seen any Trends in the ml World taking a bit of a left turn of or what are the most interesting trends that you've seen in workflows for machine learning I think that one of the interesting Trends here is that the the workflow systems in the last I would say decade have really grown up and it is not a 10 years ago it was more of a niche thing like why would we do this thing it's weird I can just write all the code or I'll write a bash script now it's much more common practice that oh you've gotten to this point from building whatever your prototype is to you need to have a workfl system Pier Choices of systems out there that people are using and there are older ones like airf flow and there are newer ones like metaflow everything in between and so and then you pick your Technologies and things and so I think that is that sort of change to this is not a hard decision like you have to convince people and know it's like a yes people use word plusus systems to train to to do inference to do all kinds of tasks for them and you should have one right and what's your deployment of soci if using kubernetes there's all kinds of choices other things there's ones that do it more like a service orchestration and you pick pick your sort of amongst the menu of things and there are SAS services and there are things that you deploying so I think that's the great thing is that we've moved from it's maybe a hard cell in a corner to this is just standard practice and and I think the there are a handful of companies that are doing this as a business which is great there's a lot of Open Source here a lot of even the ones that have a sass system but the core technology is open source which is what I did to that's why I went to get help it was easy to find a long history of these things over the last actually couple decades of people building these system some of them done active and some of them have gone stale and are no longer active projects and and but they do they they're in different categories so there's the whole business process stuff which we've been talking about a lot about the high level and and then there's these other categories and when I did my survey I made these s on there's things for the business processing things for the generic aspect of it and then there's things that specifically for science they have their own challenges there of PC Computing and then there's the sort of challenge they starts with data engineering and then there's the data science and ML on the side of it and then there's a little bit on operations and so you can see the sort of newer generation of tools came out of that sort of D engineering site and then grew into data science and ML and then a little bit of offshoot for operations which is more like sits management how do I add a new node to my kubernetes cluster how do I install software across a bunch of machines etc etc um but the same kind of problem workflows that applied in the operations context and of the the smallest I 79 systems I looked at out there little like 46% is business process Automation and the rest of it is there and the other but the other big chunk is the sort of data science ml side of things which is about 20 2% and and so that's that's a trend right of like these systems are growing and active and that's not like we're not using the business process stuff that is a whole healthy world right and there's they are growing as well and but the focus on mlops which is obvious in lots of contexts right but these are multi-step processes and this is where workflow system of a certain sort can shine and so there's active and I I spent some time looking at I actually went in and said when was this project created and and making sure is is still active and those are two Dimensions so because some things are personal Hobbies some things are products that kind of came in maybe somebody abandoned at the company went a business or something there's lots of reasons why things stopped getting developed I guess we usurped by a new thing and if you look at that the business process stuff is like the oldest repos out there you can find of various products it goes back to almost like 2005 2006 somewhere in there and there's still actively new projects being created new things up until sometime in the last year so people are still innovating and building new things in the the business side of things same thing is but you go book at data science ml it's more like 2015 and uh and then from there and so actively till through last year and it ships back a little bit for data engineering science has had its Heyday from like 2000s to the 2015 there's lots of reasons for that they're still actively being used but they saw the ones that saw are very specific this is my take like specific HPC problem like I am doing some massive model they're still in use right and there whereas the things that are more like machine learning data science data engineering they have new choices right they don't have to use these other systems and I think there's a there's a bifurcation there of use cases um so so it's it's interesting to see that there are different Trends here but they're also in these different sort of colons of use right it's fascinating that you break down like the data engineering the so besides the business side of the house but if we're looking at the technical stuff I can probably rattle off three or four data engineering e1s when it comes to the most popular you've got air flow that's proliferated everything and most people are using that whether or not they like using it is another story because it's been around the longest and it's had the most adoption and then you have the like the Mages or the dag I think it's dagster and the prefix out there that are attacking it in different ways and almost like the workflow data engineering workflow 2.0 type of thing I would say because they're a bit more new and they're taking a different approach to things and then in the ml world you have the Zen MLS and the meta flows like you said and even flight I think is another one that is in there and those are all fascinating because they're going after like the ml specific type of use cases and then the dev Ops World you've got like your Argos and maybe you could consider Cube flow in that world that kind of plays in the ml world devops world so when I when I was surveying this I had a whole colum and this is a spreadsheet I was like yes knowing and trying to put categories and things but I had a whole column about machine learning Ai and one of the most challenging because I'm looking at the doation and looking at the code and the repo is everybody is was got an actor project is adding MLA to their documentation as we do this right and and so I marked those people as neutral um and then found actual evidence of here is we have tasks for it we have examples here's a workflow that does whatever and that that shows that you can actually do it then I marked it as a yes and I think that's there's a Nuance here which is that any of these machine s any of these workflow systems that have some kind of a model of a task executor a lot of air flow in included in this they can do Advanced machine learning workflows just fine and yeah because that task executor could be some complicated thing running at kubernetes it could be some other system that you're interacting with it could be an inference endpoint that you're using at some inference provider like base 10 those are all they all have that capability it's but I think the the challenge here is and and that's true for the people who have business or historically also say of an older product was more mature and now they're saying hey we can do machine learning AI it's probably true there as well it's this question of how easy is it for the practitioner to use right to actually do that do they even though they said it am I do I have to do all the heavy lifting to make it happen or do you have infrastructure for me do you have examples do you have documentation have you we thought through the nuances of what I need as say say a model training Pipeline and how do I get all access to assets and gpus and things that I need or is that stuff I had to figure out even though I'm using your workflow engine and I think that's the differentiator and some of the newer systems also are cut a code ored there on that infrastructure as code track and so if you're going to describe your workflow use Python annotations and you don't have to deal with a DSL and all these other things you just write code and and that's a trend right now everything is code and we just write it in Python we use annotations and that's works for some people quite well and there's nothing wrong with that but it's not the only way and so older systems like airflow you you know there's also a differentiator there that that the dag is stored in a database right and so there's not a necessarily a representation of it other than to talk to the system and there's a bunch of systems that work like this and and then some things have dsls and it's a yaml file it's a Json file it's a something else custom language and then you write in that DSL and some things have just code right and that's why I to make a differentiation in the analysis of these different systems and looking at what what how do you interact with these things and the trend for data science and ml is more cooked right less anotation up is less dsls but maybe we should talk about DS at some point but but I think the and the older trend has been that there's a sterilization format there's a thing you could author there's an artifact that is the workflow itself right and it's it's coded in Json XML yeah some custom language and it's a piece of code itself that you can check in somewhere but it's not python it's something else describes all the metadata around it and treat it as such and that has its use cases right yeah but the trend is for is away from that I think right now I don't let that stick you mentioned to me before we have record that it feels like everything is moving towards code in this infrastructure is code world and the big question there was is that good yeah I have mixed feelings about that I I understand how it is when you're writing something it's very Compact and and it does a very technical thing and it's all sitting there in Python it's a bunch of pytorch and other things like that and be able to wrap those up in functions and then organize those functions into a workflow as a sequence of steps it's very elegant and useful but the challenge with that is then that code is the only way you can understand what the word looks like right so if you want to draw a picture of it from your code you have to run the code somehow and get an artifact what is an artifact right you don't have a DSL right how do you draw generate a diagram from that lot tools a lot of these things will do that for you they will make a picture for you and maybe you can save it as an SVG maybe you can't maybe you can take a screenshot whatever but you have a picture right and you can give somebody a picture and they can say this is what we're doing and you can have a discussion about it and so I think the problem with The annotation side of things is then only people who can write code you know can understand that workflow and so that's I think is a challenge and then I think it's you're like me and your ultimate goal is there's more workflows and there's nesting of workflows there are people who don't write code they're just talking about pure machine learning workflow is a big the part of a bigger system and that's a big workflow and we have pictures for all the rest but yours is a black box and we don't understand what it how it works and what the different failure states are and so forth and so I think that it nuns a follow of that so I don't think that the the infrastructure Co approach The annotation approach is bad because it could produce artifacts that are the definition of the workflow I just don't see a lot of evidence that that is where these tools are going right now and maybe they just as they go on their Adventure or building their systems services that there something will come out right there's a wide variance of what these dsls look like there's a lot of history there there would have been some attempts to standardize it in various contexts the science domain had a yo based thing called common workflow language bpn is something learn from the object Management Group I believe as a standard for business process modeling and they have a notation and that's another standard for the diagramming of them but those have they taken root in various communities sure are they widespread probably not and so I think the the it's not clear that there's a real winner there and it's not clear that we necessarily need a standard but certainly within your organization if you had 10 different formats you'd probably be unhappy yeah you know going crazy yeah there's some work to be done there but when it comes to these domain specific languages is it something that you in an organization you choose one and you go with it and you can abstract away like you were doing at Stitch fix you mentioned you had the end users using the domain specific languages and then underneath the hood you had almost that infrastructure as code layer and it feels like that was working well for you all yeah maybe the maybe depending on your person light it was working well to keep the the systems longevity right so that as we change how those tasks were interpreted by the system right as technology changes the workflow is just metadata about what we would like to see happen and how they are chained together so I think there's a value in that but I think that there is definitely some push back that comes with that as well because it's another thing another artifact it's external to your code there can be mismatches so there's lots of challenges to when and so I think it really depends on the particular user that that you are interacting with what's nice about most of these things there's a high prevalence of people coding these things in yamal it's the you could like it or not I I I don't find yam all problem but a lot of people don't like it it's okay but the structure is pretty much the same there's a list of steps and the step has a bunch of medad and they all have names and they point to each other and some mechanism and so the so having that kind of common format lets you take a workflow from system a and a workflow from different system B and if they're both in yaml you can think about how you would represent those as there artifacts that you can check into Source control you could generate diagrams from them maybe there's a tool and system for that so I think there's having diesel has benefits like that where it's a it's just something you can purse and you don't have to um run code because that requires infrastructure that requires you know environment set up you can just parse and understand what are the steps in that thing and then people that's when people jump off and be like oh we should have a standard but standards are hard right getting people to agree they take a very long time um I'm not saying that won't happen I'm just it's not something I see happening right now and but maybe there'll be a need for it in the future right and then there's the whole custom side right that's another thing people have created their own little languages declarative languages for describing these things that is definitely on the down side like I don't think that people are doing that so much anymore it's like that trend is fading there's a reason why they do it it's you can make a nicer thing right but then you have the problem of to LEAP up to learn your syntax and semantics and maybe that's more trouble than it worth at this well especially onboarding new Foles and everybody's got to go through that and so you're now creating just a more cumbersome process to get someone up to speed yeah and and these things are everybody likes to use the term dag which I always because not everything's ATT tag right Asic bar their workflows have loops right and so they're graphs right in general and sometimes they're forests and since that there could be workflows that have two different independent pieces there's lots of complex things out there those are the edge cases right even things that have loopes are the edge cases so the dag turn is a simplification computer science-wise to make it nice to execute and but it's still even with a tag you can have meats and joints right so you can have a little sort of Loop in your thing and you got to cut that somewhere if you're writing it in yaml Json whatever you've got to cut it and you've got to you know and how you make those choices maybe it's easier or less easy as the thing gets more complex and that's where you need tools in the end of the day or if you just have thousands of different dags or workflows in the organization it's I've heard so many stories of folks who are like yeah we started as a startup and then we had success and the airflow dags just kept growing and nobody really went back and right sorted those out and so you have that sprawl yeah I looked at um when I was a assis fix we had the air flow we had underneath there we had the system with the DSL so I could pull all of the flows thousands that we had and look through them and what was I found interesting is that there were some eyeballs and there're like that did all kinds of crazy stuff but most of them are a straight chain of steps right well ABC chained together right that's most of what people are doing and it's not a surprise so all this it's a dag it it has Loops or it's not a dag most of people's like the 90% case is probably a a straight through chain of of things that's a that numbers I guess but is what I I found overwhelmingly the majority wave higher than 50% of what and this was an organization I've been doing this for a while was these straight through chains and it makes sense there's some kind of preparation stages of what you're doing and then there's the main event you're training them all you're doing inference you're you know upserting into a database and then there's some cleanup maybe at and that's like most people's workflows and that's I think think how these ml data engineering workflows differ from the business process workflows right business box workflows aren't a straight through chain right there's decisions being made and about you know did we answer this customer's question did that if that system fails we go over here to do something else if if there's a transaction when we've done the transaction goes through the order gets made if it doesn't go through there's some other process right and so they think those business process work those they have much more complexity in them they have a lot of branching and conditionals and they have a lot of side effects like if we succeed here we're going to notify another system we're going to send a message to the customer or something else like that so they have all these side effects that happen along the way um that are just not a thing as much in although you can imagine them they're just not impr practice as much in MMO Ops and uh so that's how these systems are different and that's why they're different products for them it makes sense I'm just thinking what you get with your favorite dag tool is a slack message right which are useful right yeah exactly but it's not like what you're talking about with this super complex logic or Loops happening or then spitting out into another subset of a work flow then it is fascinating to think about what I would like I can't believe it took us whatever 40 minutes 45 minutes to get into this part of the workflow engines but I have to ask about agents and what your take on them are as almost like workflow engines and workflows in general and also I will preface by saying we had eigor on here a few weeks ago and he was talking about how agents should be seen or even llm calls should just be seen as another step in a dag oh we're taking messy data and making it neat and tabular and that's one of the steps of the dag but with agents I've seen so many different examples of folks who have tried to explain the agents or have the agents work as dags where they just make up the they make up the graph on the Fly and you now are dealing with this workflow that was created by an agent there are some companies out there in the mix that I looked at who are specifically focused on agentic systems they I put them in the category of sort of the business process because their tools look exactly like that and yeah and they're a little more generic in the sense that they are I would think of them supporting more like your chatbot type of interface where you're talking to your favorite Airline and things are happening and when you say yeah I'd like to buy that ticket or I have this problem with my luggage you know there's a it interacts with a bunch of stuff and comes back to the agent interface right and that's a workflow that they have to manage and it's automated in some capacity and so there's people who are building products for that kind of workflow and I I found them amongst the things that are with serving um I think the the sort interesting side that's not that is this kind of challenge of the LM based sort of agent system where you have some inference that's happening there and then there's a consequence it's made it's put something out there that was either code is or it's something that's coming back to the user and and that's part of this bigger workflow and that's more like an embedded workflow and that could be done in cve as you say that it could be generated by the system itself it could be generated by another piece of code or some other model and there are these sort of part of the mix here are these things that are more like workflow engines and they are so they're not systems necessarily as much as like it's a library to do this kind of workflow orchestration and and you can write your thing or have it g give it a one of these dags of things things to do and it will run it right and to some completion and that's more like an embedded workall engine and there's a bunch of them like that out there and that is interesting because it goes back decades to what people were doing with rules engines and for workplus systems because those were engines that they put in you put inside a product and sometimes they were like desktop applications that were doing this stuff and they were running the rules and acting on your behalf or and there's a user in front of them so now we've got like a chat Bond interface or agent out there that somebody's interacted with on a website or through an app and it's doing the same thing right but different scale right and it's not running on your desktop app it's running out there in the cloud somewhere but it's the same thing it's an embedded engine running an embedded workflow and dynamic or not and so I think that's some really that's there's just some cool possibilities there I have not and this is a good good research toer for somebody or myself but just you know where are you in that you know that if like what is what have people done successfully how is it architecturally different from what people are doing now with these interfaces where's the sort of there there and what C you do with the current systems you could do with this sort of theorical embedded llm based thing or agent and there's a lot of possibilities what I hadn't thought about before that you just opened my mind to is how the agent is almost a gateway to choosing the right workflow so we talk a lot about agents being able to choose tools or have access to tools and most people think okay now it can scrape the web or it can have access to my database but the thing that you just said is yeah one of the tools might be that it kicks off a workflow and so then you don't have to worry about the agent spawning a workflow every time and then maybe it spawns theong workflow or the workflow isn't exactly like what you need to happen so the agent just has to choose between what workflow it needs to use and that is very much like going back 20 years but now we have a little bit looser way of having the end user interact with the agents or the if that than this statement yeah the I mean that's kind of the indry interesting juncture that we're at where a lot of the you know large language models and we have V Systems and so we've taken them apart workflow systems so we now we have these kind of pieces that are much more advanced than they were back they're not all in one system just doing pointed in One Direction and so we can take these puzzle pieces put them together they like Legos and make different things out of them and and since the technology is more advanced we can do some really amazing stuff with it and I think that's a nice juncture where were at I was surprised going through the list of projects how many there were that were still active in in all the categories that I had and that's 79 is not a huge number there's a lot of noise out there and but there's a lot of just people are actively developing these things they're maintaining them they're using them for things and they're in a variety of contexts and so I think that maybe not what you hear out there right now in terms of the buzz and the industry that this is a very healthy vibrant area of work that people are doing they're using it for Stuff obviously because they're uh these projects are active and some of them are Niche and in a corner and some of them are commercial products and and that people are selling and everything in between and so I think that's good for users out there because they can find the thing that matches it the only challenges that there's a little bit of Teran of choice right yeah you're new to this you like I need a worful system for X you've got some choices that you're apt to make how do you categorize like the RPA systems would that be the business ones that you're talking about RPA RPA what do they call it robotic processing automation I think is okay what it stands for I don't you that's I'm not trying to think of did I run into any that was more on the automation manufacturing side and I'm going to say that I didn't see a lot there and what I surveyed so that might be a whole different thread of this certainly there are lots of people doing things that we did at my last company where we're sending a protocol to like a liquid handling robot because it's part of our automation of what's happening in a lab right and and that kind of like in biotech and in general like Pharma that kind of automation you can use these workflow systems that has been talking about here because it's a service call to something the thing has an API and you push a protocol to it and tell it to go and so that kind of level of automation I think people are using these kinds of tools for but the sort of more industrial things there's there's a whole other world there I I learned that in science there's a whole other world of plant automation that uses a different technology and it use some really old technology which is scary like it uses OPC was which is Microsoft o Olay from like 1995 uh what and and so you know that's why you can hack these like a power plant for example yeah yeah hold it for ransom those are and so there are some areas there that that are completely like off the radar from this kind of group of birf flow systems and there might be really good tech there I just I think that's a whole there a whole another world yeah so you've been seeing the trends of what's dying and what's growing and you did all this research on it do you feel like you have any bets or guesses on what the future of these systems holds I I think there are some good contenders in the SAS realm for machine learning workflows in general and lobs and and they have some nice tools I think the challenge for machine learning AI context is that uh we're just getting started and so their customer base are all these sort of early adopters and they're startups and people like that so if you take this thing and you walk into a large Enterprise where that's regulated and that is you know has like airgap systems right they can't use this s system yeah yeah and there's the big applications in these places making that leap from you can use your assass service and everything's cool and we writing python code and we're doing all this cool stuff with it to like these places where it's highly regulated it's and there's all these other Enterprise challenges and you have to have like certain kinds of certification to be able to operate in there that's where the revenue is for these companies to May potentially to go to and so there's deserve that they have to make that leap and then there are some that are doing that so I think the the evolution of some of these companies to be able to provide Enterprise products that really meet the needs these other needs of it has to have certain security levels and certain compliance things and be able to work in these sort of non uh cloud-based environments and so forth or private clouds and so forth that's the that that kind of is a growth area right and you have to be able to survive that because it's also expensive but plus so that's I'm watching to see who matures in that realm that's why some of the projects that are open source and good technology and well supported work well in these places because that their technology de take them and put them inside these environments and deoy them but then they have to manage them sex so I think that's a trend I was happy to see that I think the business process automation thing it seems like there's a healthy community of people using that I think the agent to systems is a area of growth for them to some extent the mlti stuff is an area of growth for them that be curious to know if they get Traction in there versus just saying we do M are people turning to these other products that are a little bit older we'll see right it it's almost like with the business process automation I see now they're incorporating the capabilities of llms into their products and so now as one of their steps in your whole workflow you can add whatever an llm call is capable of doing whether that is summarizing a bulk of text or it is going and you scrape a website and then from that scraped website you pick out the most important stuff and now you have that data to pass on for the next step and so you've got you've got new tools that you can work with and I've seen it done really well with friends in marketing who are trying to create content and what they do because you've got the almost like the easy way of creating AI slop which is just saying to chat gbt create me a blog post about whatever GPU consumption in the US and then it'll spit out whatever it has inside of it or you can start with saying create me an outline find like three relevant blogs that talk about it and then you choose as a human different blogs that you like and then it uses the information from those blogs it creates an outline for it and then you say now create the intro paragraph now create three body paragraphs and you really prompt engineer it to be a much more indepth type of workflow and the tools are now giving you those capabilities by default because you have the llm calls I think one interesting possible Evolution here would be that you have engineering team which are often very expensive taking things like gen AI technology in general LMS and and they're doing various things to and their quality checks and stff like that does this thing pass our tests is it their risk assessment test to make sure that it's not going to do something bad spit out bad results guard rails and all this stuff and so you imagine that there's these more technical workflows that they can take a new model run it through its paces and say this is good good this is not good or there's some score maybe how it's not a black and white type of thing it's a score of how well does it do in these different dimensions but as there's more to do and there's more uses of these Technologies you can imagine that there's a higher level user there that's not on the technical team who this is their product and there's a new model and they or there's a fix or there's a better model where it has to it doesn't do the bad thing because they tested the new version of it and there's a workflow for accepting that and rolling it out to their organization or in their application if that always has to be a technical engineering problem that's expensive and so I can imagine that part of the way the people build use workflow systems is that kind of multi-layered thing where there's technical workflows they use very specific technology that's geared towards the task at hand training a model evaluating it producing these risk cores and then there's a the business level workflow it's saying how do I take that model and get it into my application roll out to my users getting into production and there's a gatekeeper which is there's and this is we're human in the loop which we haven't be talked about but there's a human in the Loop's dep there where is this do I want to do this because there's a there's a business decision to to roll this thing out and when you cify that as a as a workflow and the only way it gets out the door is that somebody goes and does that human and loop step of saying yes that's a human decision maker it's not just a technical team somewhere who you know if they do the right thing in their devops you know thing it goes out the door it's a you know maybe they'll even they no longer have that ability it's only done through the worklow system and there's a human who makes that decision and it's traceable in your organization and it maybe is less prone to mistakes accidentally rolling out the wrong model or W that doesn't pass your tests and so that kind of like level of control and then bringing it out of the engineer oranization and back into the hands of of a park manager or a business user of some sort I think that's going to make the thing cost less you're going to get better results in terms of quality that's where a workflow system of different and different layers of them can be super helpful I would love to see that kind of like thing these kind of things already exist but they're run by technical teams right they're using the same tool to do their devops their operations right and so the tools are all there but I think that business process automation inside of it is not as much because again those people are usually like yes engineering team roll out the new version It's a lack message to a person and then a person goes and does a task looks good to me yeah and and that's that's where we are I think the challenge with limbs is there they're squishy right and some needs to look at these risk scores and there's new benchmarks and cool things coming out from that and then they make a decision about is this bris acceptor right to roll this new version of whatever model we're using from whatever provider and we've done our tests we we have our evaluations and now I have to make a decision my job as the product manager as the whomever just to roll this thing out and you don't want to skip that and you want to record it and you want to just so that we have understand how did this thing get out and then want to change your process or change that workflow to to deal with whatever issues your organization might have in terms of the their use of these AI Technologies right yeah it's another G but it's a like a business G 100% in the fintech or just Financial or even just any regulated space that is a necessity because they're going to get audited and so you got to have that explain ability of what exactly you were thinking when you put that out there and so it makes sense that this would only come in a workflow so that you have that specific area where someone pushed the button and said yep we're good with that and the logic behind why they chose that but I I do like the idea of taking it out of engineering's hands and just getting a different set of eyes on it because of the uh as we had we had algra on here a few weeks ago and she was talking about how she's a big proponent of density of diversity in a room and so by doing that by taking things out of engineering's hand then it's not only Engineers that are looking at it and so by way of that you have higher density of diversity yeah that sounds great and andw phisms are a good tool for defining that process and recording it and Fring metadata and collecting information along the way so you have a trail of information of what you did and and and then you can act on that trail in terms of making your processes better or when just somebody wants to know what versions of this model are we using out n systems they might be different right you have that trail right and people are doing this again some other way they maybe they have system for this but but maybe they're not not using these tools that are available and they not have to build a smash system for it they can just use a workless system that exists out there and that's so there's that sort of build by choice then too right and you know what this also makes me think of is how a friend told me about how Wild any Enterprise is right now when it comes to what AI they're using and not just like your ml teams but if you think about governance in the Enterprise level or on the organizational level and you have maybe the marketing team is using one business processing software that has some llm calls or capabilities AI capabilities within the tool then you've got the actual software Engineers that are using these AI coding helpers and you've got the HR team that's using some SAS software that has some kind of AI capabilities and you don't realize it but you as an organization you are exposed if your idea is like yeah we're keeping all of our data inside that is totally out the window because everybody's using a different piece of SAS tooling that potentially is sending the data wherever it needs to go and so from a governance perspective bringing these workflows in and recognizing that that if it's workflow if you have it documented in workflows you also have a bit tighter control on what is being used and how it's being used and how these things are happening I would assume but I can imagine that it also can slip through the cracks there too yeah yeah for sure I mean that there's a lot of data governance challenges that we hear today that that um are are made worse yeah I think and I'm not sure there's a clear way out of that box right now yeah even War post this Stu but yeah it's not going to it's not going to help that much that's the truth man but just yeah the data governance piece I know I had a friend to tell me that their company did an audit and they were expecting that folks were going to be using like 10 tools the AI tools and after they did the whole audit and this is a com this is a relatively small startup like 200 folks midsize is what you would call it what they found is that there were over 92 tools that were being used that had AI capabilities or were AI tools and they just were looking at each other thinking like wow this is wild and a lot there was a lot of repeat tools so you have a lot of the same workflows but maybe it's different parts of the organizations different branches maybe you're paying for even if you are okay doing the open AI Enterprise Edition maybe one branch is paying for it another branch is paying for individuals to be using it and so all that governance is a mess so that was a challenge in my last company because partially because I am not a MC biologist that's right and so I don't know what is standard tooling and things and resources that they use online and and the way that we took that apart was this process engineering aspect was having those discussions and drawing those pictures of what is what do you do on your dat to do this task and how what systems and like where do you get that result from oh we go online and we take that DNA and we stick into this tool and we run a blast search and and then we do this other thing with this other site because they they're good at this particular thing I'm interested and you're like okay and there so there's all these like touch points to different systems and and these were things that very unique isolate strains stuff like that so that data that genome is our was our sort of bread and butters you have this sort of challenge of if we take a little snippet of it nobody knows where that came from and so we're okay with that going out but there's a there's a fine line maybe a g line there of like how much is T much just knowing those interactions it all comes from doing this sort of I I think as the process if you can draw a picture like that flow chart whatever pick your favorite tool and then okay what am I talking to it also helps you with the data artifacts problem which is like so what data went in what data comes out do we care do we store that where does it go does it go into some knowledge graph doesn't need to be stored for to have a sort of full view of the experiment or whatever we're doing here your compliance the industry like maybe you need to record that because it's a you required to record those interactions that like is essential to your business people do this but they don't necessarily do it in a uniform way and so that's where these like I I was pushing on this using this thing called BP and Men this was some it just was a nice notation like it's a visual notation that has people have considered all these problems and so we don't have to make one up we can just use it there's tools for it and but that also means you can also say okay I'm going and talking to the s Service what does that service do and then you can you know decide whether you trust them yeah what data are you giving it what are you what's your end goal yeah how are you trying to use it because maybe we already have another service that we're paying for and you're paying extra for that I know that we had Maria on here probably two years ago now and she did this with her company AO because they did not have any centralized ml platform and so they just went to all the different teams and said what are you using how are you using it and they recognized that they already had a lot of usage on data bricks so they were like I think we should probably standardize to data bricks so that it is cleaner and just recognizing from talking to people and drawing it out I really think there's so much value in that but what you're saying is even more it's like you're taking an etching sketch and you're making a beautiful picture of what is happening at each step and what the goal is for each of these steps and being able to have that you then are so far ahead of the curve because you understand each person's tasks what they're doing how they're doing it but you also understand how that fits into different workflows as you were saying and and not looking at it as one big workflow but seeing all the different workflows and how they interact with each other because then you've been asked all kinds of great questions is it worth automating this you know is it and do we need to be worried about this Con data of leakage from this service that you're using is there a better provider for that uh if we were to use some kind of new machine learning te a generative AI or an LM or or whatever some model where do it provide the most value in this process that we have this big bird and and you can ask all these great questions and you can see what you know and you can get samples of your data because now you've done that analysis it might seem old school and hard to do which is the challenge I have this challenge this seems like a massive undertaking let's start simple let's start like a little piece of it or maybe a really big block diagram with like big chunks of what we do on a daily basis and then there's they're just black system like you need to dig into that when the time is right but I think those kind of like different ways of of making the problem smaller and more useful you can be very strategic and where you start but you get these sort of I think that's is a it's probably a bigger challenge for bigger organizations but I also think that smaller companies you need to think about this because if you're going to use machine learning Tech and in general whether it's just inference or if it's like I'm building models like it's not just those M Mo engineering teams in the corner that need the workflow systems and the process engineering it's everything around it as well and so you can you can go bottom up but that's a hard sell you can go more top down in terms of like where is this going to provide value and organization and what do we in our industry whatever it is need to be worried about and you have that again the a picture is a amazing thing and a way to communicate outside of those technical groups that are used to graphs right we so you know what I find the most difficult in these is when you update processes and updating them on all the documentation and then making sure that okay this is the newest way that we're doing things even though maybe it's an experimentation for a few weeks before you really realize if it is a better way of doing it and most of the time I'll update a process and it's on the Fly and then later maybe you don't codify that as well and so it's updated in one place but not in another and so that just is that the entropy of all of the stuff that's happening is really hard to Wrangle and you get it's like a workflow debt I guess you could call it right that and that's where if the workflow system is how you get things done and and it can produce documentation and and diagrams and things like that and that's where you go to look up like how are we actually doing this that's one of the B sort of core benefits is is you there's no out of sync because it's how you do things and the challenge when it fits into a bigger process that isn't automated but I think it's a useful starting point to have those discussions and I think it makes whatever the technical task easier to do when even even if it's a documentation piece artifact that you you're drawing this diagram you go back to the diagram and say so explain your change that you want in terms of changing this diagram right and then you've started with the change to your description your architecture or whatever the process the workflow and then you go off and build the thing and then there's a kind of a Reconciliation sometimes of you know the reality that I thought as I went to go build it is maybe not quite matching the what we thought that the beginning but that's part of the sort of doing an agile process you should be coming back to that thing and revisiting it having a little bit of structure there and diligence helps and but you don't have to doesn't have to be overwhelming and but if everything was work though it just all be up to date right that's an ideal the version of this thing it's not a not a reality of course it would be so much easier [Music] B [Music]