In this video we're going to talk about a relatively new hot topic in DevOps and Cloud space, which is “platform engineering”. There is a lot of discussion going on, where some people are asking whether platform engineering replaces DevOps. Many people say it goes hand in hand and is rather an addition to DevOps, but in reality it's a bit more complicated than that. Platform engineering actually changes a lot of established rules we knew about DevOps, SRE and Cloud engineering, it changes the game and introduces some new rules. So let's clearly define, what platform engineering exactly is, more interestingly, why was there even a need for this new role and how it evolved and of course how it compares to DevOps and Cloud engineering and does it really replace any of these roles? Whenever a new role or concept appears, the first question should always be: Why was there a need for it? Because there must be a problem behind, that couldn't be solved with existing solutions and naturally caused the evolution of this role. So let's talk about these problems that led to emerging platform engineering as a solution. Initially we had developers and operations working in separate teams and developers were responsible for programming the application. And when ready, they would throw the packaged application over to operations, who was responsible for deploying and running that application. So while you had a dedicated operations team with expertise to properly manage the infrastructure or operate company-wide CI/CD platform for example or other platforms for the application teams, this was an inflexible and slow process with developers waiting on operations when they needed any change in infrastructure or needed any infrastructure resources like additional servers or Jenkins pipeline for their applications. Or on the other side operations team waiting on developers to fix something in the application that affected the deployment or application runtime etc. So when DevOps was introduced it united those teams removing the inflexible and limiting process. So it removed the communication challenges and knowledge silos between these two parts, developing application and running and operating the application. So this was a huge improvement to the traditional way of working and this led to one DevOps team that now owns the application as well as the underlying runtime and infrastructure. So basically the application itself and everything the application needs to run. And this is way more flexible, fast and just a cool way of working for the engineers, but with lots of options and ownership comes a lot of responsibility and cognitive load. Now you’re having one DevOps team, everyone developing the application and running the entire stack under the application, so you have one team where either developers or a dedicated DevOps engineer is setting up a CI/CD platform and creating the pipelines, writing Terraform scripts for infrastructures code, spinning up Kubernetes clusters, configuring the cluster with best practices, configuring logging and monitoring, adding security scans, maintaining Helm charts, also maintaining all these infrastructures code scripts as the tools evolve and new versions come out, right? Managing Docker repositories and all these in addition to the actual application development, this is why we are even doing all these other things. And this increases the flexibility and speed of work and efficiency, but naturally it also adds tremendous cognitive load on the team, because too many things that just few roles need to be responsible for. But it goes even beyond that. Now imagine you have another application team developing a completely different application and they have the same challenges and tasks. So to increase their efficiency a DevOps engineer designs the same CI/CD workflow for the project, sets up cluster, maybe this time a managed EKS service instead of self-managed one, configures all integrations with AWS cloud, configures the storage, adds security scanning steps, monitoring the whole secret management and so on. Maybe they use a different Docker repository like ECR for an easy integration with the EKS cluster. So they have the same challenges and same needs for their application to run somewhere, so they do all this stuff in their own ways with their own tech stack, right? And maybe you have 10 more such teams in your organization, now all of them need to figure out all these things for their projects. So you end up with 10 DevOps teams building and operating 10 different platforms to deliver and run their applications, and this can be hugely inefficient and wasteful as well as have high cognitive load on individual engineers in those teams. And with complex systems like Kubernetes, sometimes the part of managing infrastructure becomes more effort than writing application code and producing business logic itself. So you have teams that are too busy building a platform and have less time to build the business value in the applications. Plus you need expertise of so many things within a single team if they have to manage the entire infrastructure stack underneath the application, which means many experienced engineers to add to many teams, which naturally also means higher human resource costs, right? And finally this becomes hard to scale, because each new project team will have the initial time and effort in setting up the infrastructure before they can even release the application to the end users and very very important Point even though you have these teams managing their own infrastructure and platforms they may actually not have enough expertise to do everything correctly so you may end up with Kubernetes misconfigurations or lack of monitoring and logging entirely or security issues all over your systems plus when you have one compliance or security team in the whole company and tens of teams with 10 completely different Tech stacks and just different different ways of running their applications it will be pretty hard for them to get insights and help all these teams Implement proper security because it will be really hard for them to understand what's going on in all these individual projects so this becomes hard to scale because we have no standards also when each team uses different Tech stack with different configurations this leads to inconsistency across the organization and finally I want to mention that the concept of you build it you run it which we know from the SRE was created back when applications weren't as complex as today and the underlying environment for running the application wasn't as complex either so in today's highly complex multi-cloud multi-cluster or hybrid Cloud world with thousands of microservices and hundreds of services for those microservices it's a bit too much to ask developers to integrate this concept in their work like asking an expert front-end engineer stop your work and focus on properly configuring Kubernetes cluster is not really cool and not the most efficient usage of that engineer's skills so to quickly summarize we either have autonomous DevOps teams going wild and doing whatever being fast and flexible but also leading to too much responsibility on one team with high cognitive load and organization-wide inconsistency or we have the traditional separate Dev and Ops where infrastructure is managed and secured but with a painfully slow process that really limits the work of developers and that's where platform engineering comes in to save the day but only when implemented correctly So how does platform engineering solve these problems platform Engineers take the tools that are needed for deploying and running the application and standardize their usage across teams so if 10 teams use 10 different CI/CD solutions make a standard CI/CD offering or if every team uses Kubernetes but in different ways standardize usage of Kubernetes so basically platform Engineers standardize everything that is part of applications non-functional requirements so what are the things that we need or the so-called non-functional requirements of the application so basically things that are not business logic but necessary to deliver the application to its end users well first of all every application needs a Version Control System every application needs SCI City pipeline the application needs to run some somewhere like on a runtime environment like Kubernetes Kubernetes cluster of course needs an underlying infrastructure like AWS Cloud platform or even a multi-cloud platform the application and the underlying infrastructure also needs logging and monitoring the application also needs proper security so as you see there's a lot of things going on that need to be set up for the application to run properly and each of these categories have lots of different tools and Technologies right there are loads of CI/CD platforms out there there are different Cloud platforms and so on so platform Engineers standardize usage of tools that offer these services for the platform now this is a pretty large list of services that platform team takes over so does that mean that platform team now becomes completely responsible for all this or what exactly are they taking over do they create currents cluster and configure it do they create the whole CI/CD Pipeline and manage it for the teams as well as operate these tools or do a platform and product teams share the responsibility for those services and if so where do we draw the line between those responsibilities and that's exactly where it gets interesting and why I mentioned that platform engineering only works if implemented correctly so let's talk about and really understand what platform engineering is responsible for when we think about the tools that application needs like gitlab Kubernetes Jenkins Cloud platforms databases and so on each of these tools has two sides the admin side and the user side the admin of the tool sets up and configures the tool make sure backups are in place access is secured they install any needed plugins and so on so all the things to make the tool ready to be used for the actual task that it's meant for 4. so for example current's cluster needs to be provisioned Network plugin needs to be installed access and permissions need to be configured for security load balancer configured for the cluster and a bunch of other stuff to make the cluster ready to be used to deploy applications inside so once the admins set up and prepare the tools to be used the users can come and use the tool for their intention like application developers access Jenkins and create pipelines for their application so you have a role that operates and manages the tool and a role that uses the tool so these two responsibilities can be easily separated now in DevOps teams as I mentioned you have both of these responsibilities in one self-sufficient team and that means individual application teams can decide how to operate and how to use the tools but as I mentioned mentioned these are two separate set of skills and you actually need separate knowledge base to do each they are even separate Kubernetes certifications for administering the cluster which is a Kubernetes administrator certificate and deploy into the cluster which is called Kubernetes application developer certificate because again you have to specifically learn each aspect of the tool so in order to standardize the tools across teams platform Engineers need to take over the operation side of these tools which means selecting the standard tools setting them up again one standard way with production and security best practices at the same time this is an improvement for the application teams because it takes the load off the application developers so at least one part of the services they aren't responsible for anymore so less cognitive load and more capacity for creating the business value in addition you can use the expertise of your engineers more efficiently because now instead of needing to have an expert on Kubernetes cluster Administration in every team you can have fewer in platform engineering team who take over this work for all application teams same for CI/CD tools database Administration and so on so you basically extract the need of expertise to administer these tools from the application teams so instead of Engineers who kind of half know this in that tool in every team you have one expert team who has the expertise to operate these tools properly plus be because it's their core responsibility they also have more time for it because they don't have an additional stress of having to release new features so this distributes the pressure responsibility and the need of expertise among multiple application teams and one organization-wide platform team and now instead of each team deciding which cscd tool to use or which Kubernetes cluster set up to use platform team offers a ready solution for the application teams and instead of each team building their pipeline steps with security scans platform can standardize that these scans are part of each team's pipeline like if for example the company has specific regulations for their industry or country these regulation specific skins will be part of every pipeline by default but wait didn't we put Dev and Ops together to avoid the siled teams that have separate responsibilities because now it sounds like we have a separate platform team that decides which tools are going to be used company-wide and sets up all these tools as they see needed and gives access to those tools to the application teams sounds a bit like that traditional approach that we definitely don't want to go back to well no worries because it continues from here this is just the Baseline so now how do app teams access these Services obviously it would be no improvement if we went back to the slow inflexible process of developers requesting resources from platform team to give them access to some services so how does that work platform team takes all these tools like Cloud platform Kubernetes databases that application teams need applies their expertise and configures them properly so they are secure up to date Etc then they create an abstraction layer on top with a user-friendly interface like a UI or API so that application teams can now come and self-service whatever services and tools they need so those provisioned configured secured services with an interface to easily interact and access them to use for the applications is a platform and since developers can just log in and self-service without going to platform team to ask for the resources it is a platform as a service for the internal developer teams or also called an internal developer platform or IDP so platform teams are essentially building the IDP or internal developer platform hiding away and abstracting the complexity of operating and managing the services that developers need to release and run their applications so basically instead of application team logging into an AWS cloud and provisioning eks cluster they go to the platform or IDP log in to that platform and use the eks cluster which is pre-configured with the proper security backups Etc by the platform team within minutes independently without asking platform team for anything so application team has the flexibility and speed to access the services for their applications without needing to worry about operating those Services now let's go into how this all looks like in real world and think about the following actual scenario remember I said platform standardizes the tools that are used across teams instead of five teams using five different CI/CD tools now let's imagine platform offers gitlab CI/CD as a standard solution but what if application team a says I want to use Circle CI instead so basically teams want to have freedom to use new modern Cool Tools or tools that may fit better to their workflow maybe they want to use Argo CD or specific service mesh for their microservice application so it seems like with platform team and the whole standardization we are locking down the selection and saying that's what we use in this company and that's it everyone abide well of course not that won't actually be an improvement and will take away a lot of the flexibility and creativity that autonomous DevOps teams have and the last thing platform Engineers want to do is become a blocker and have a weird dynamic with application teams because that will kind of ruin the whole concept so instead of saying you can only use the CI/CD tool or you can only use this Cloud platform the platform will say oh you want to use Circle CI instead okay we will help you set it up and configure it with best practices and once they do that they can now integrate it in the platform and offer it to other teams as well who may also benefit from using Circle CI in their workflows and this way when new services are added to the platform instead of limiting application teams to only use the CI/CD tool or this Docker registry and so on you are saying you can use any of these Registries or CI/CD tools or any services that we offer as part of our platform because then we know that they are all properly configured and operated in the background and with time the platform may add more such tools that application teams can select from now again in real life there could be cases where one specific tool may only be needed in one team in which case platform can decide that that application teams stays the owner of that team specific tool and they can operate it themselves instead of integrating it into the platform or the IDP so in no case platform engineering should be mixed up with the old way of working where the infrastructure or operations team analyzes several tools for security compliance Etc decides on one correct way of using that specific tool and lock this down and say to the developers this is the only way you can use this tool and that's it because we already did our analysis and that's the way to go so when implementing platform engineer team in your company you want to keep the platform flexible while adding guard rails and pre-configurations to ensure security consistency proper configuration and so on so it's important to understand that platform engineering is a step forward from DevOps not a step backwards with Dev and Ops separated again but I have to say in reality I think if people don't understand the concept of platform engineering properly there is a risk of it drifting towards the step backwards and I talked about the right approach but later in the video I will go a little bit in more detail to how to approach building platform engineering teams in a right way to avoid that before moving on I want to give a shout out to Pulumi, who is the sponsor of this video. Pulumi is infrastructure as code in any programming language like python, typescript, .Net, Go Java. Pulumi lets you ship infrastructure as code faster, because you have access to all the standard features of programming languages, like IDE, auto completion, type checking, loops and conditionals. You can also ask Pulumi AI to write your infrastructure as code script for any use case, all using natural language prompts. The automation API allows you to embed Pulumi into your applications to power complex infrastructure automation workflows. So you can manage 10x more resources with it. I will leave all the details of Pulumi in the video description and you can use the links to sign up for Pulumi and you will actually receive free swag from them. Now let's move on and see why infrastructure as code tools are an essential component of an IDP? Now avoiding these strict rules apply not only to which tools developers can choose but also the usage of those tools like saying this is the only EKS configuration you are allowed to have well we want to give developers flexibility in the usage of the tools as well not only in the selection of tools and as I mentioned if we split the responsibilities we see that platform is there to take off the load of application teams and create consistency in the organization in the same way they're helping the product team to correctly use the tools by introducing automated guardrails integrated as part of the platform so now the question is how can they integrate those guard rails for using tools correctly into the IDP and make it as part of the offering and that's where infrastructure as code or configuration is code templates come in platform team can leverage infrastructure as code tools like Terraform, Ansible or Pulumi to create the templates this means that these templates can have baked in best practice configurations they will be used to automate provisioning of resources and additionally offer the flexibility for product teams to pass in various parameters based on their individual project needs to create and configure those services so we have a fully automated self-service process with a high flexibility again an example would be to have various pipeline templates so if product team has a python application they can use the pipeline template for python APK specifically which has security scan tools or test steps for python applications pre-configured and this leads to the point of how to implement this concept successfully in a company the way you absolutely should not approach this is by starting off with a huge Master concept and this ideal IDP or self-service platform that has all the coolest features and modern tech stack integrated inside and is super flexible and powerful and can do thousand things this is not gonna work in almost all cases and there are several reasons for that which we're going to talk about in this section instead we want to take a very popular agile approach here as well you should start with small steps which can immediately add value to at least one team right away the reason is because in many cases when application teams are using old outdated Technologies maybe all the versions of modern Technologies and so on it will be very hard for them to migrate to this full-fledged modern tech stack platform in one full swoop right so it's not going to be much improvement for them because they will now have all the effort and work on their side to actually start using that platform so as a platform team or when implementing internal platform in your organization you should always first consider where the product teams are actually starting from so what is a status quo of Technology usage and then help them slowly move from the current state to the ideal state in steps and this is the approach that is way more efficient so the first thing is you should treat the IDP as a product what does it mean the IDP or internal developer platform is is not a project that you just Implement once and that's it application teams should just take it from there instead it is actually a platform as a service that needs to be developed over time and then continuously improved over time so just like the applications that product teams develop platform is the product that platform Engineers develop and just like application team introduces new features to the applications as well as updates those features makes improvements and so on platform teams also need to manage and upgrade versions of the services they offer to the product teams as well as new services and new tools and Tool combinations and so on so it is a product with its own development and release life cycle so they are developing an internal product or internal platform for developers and that's why it's called IDP and as all other products it needs ongoing work and item operations and as I said just like you develop an application one step at a time one feature at a time starting with version 1.0 and iteratively improving it that's how platform should be developed as well so now the interesting question is what is the version 1.0 of platform where do you start well I have some practical steps to answer that start with low hanging fruits for example if you identify the common tools that many teams use across the organization this could be the first candidates of tools that can be integrated into the platform and be offered as a service this could be Jenkins gitlab CI/CD Kubernetes volt so basically any tool that has kind of become a standard so a lot of teams are using it in order to do that you need to work closely with application teams because you're developing the platform for them to make their work more efficient so it makes sense to see what blocks them the most what is the most challenging thing for them like managing Kubernetes cluster and take over that part developers will be willing to cooperate if they see that you're actually solving them an issue or a bottleneck in their work process if you start with random stuff like hey teams you are all using different CI CDs so we want to introduce a few standard offerings and you all have to switch to one of those when you do that you're actually adding more to their work without improving their processes at least not in the short term so such things should be done more in a later step maybe version 2.0 once you have proven to them that you're actually making their work easier and more efficient and already offloaded some of the work so now they have a little bit more capacity to do this kind of initial additional work so as you see building a platform team successfully is as much about human aspect and how to manage the work with application teams and how to create a culture around it said collaboration rules and clear responsibilities as much as it is about tools and technologies that allow implementing that self-service internal developer platform and in long term you have a company-wide platform engineer team and a bunch of app teams and instead of each application team doing their own thing and handling application runtime and infrastructure in their own way they use pre-configured services that they can self-service via a platform that platform team has built with the best practices and standards and all the expertise already baked into it so now a very interesting and logical question is does this mean we don't need DevOps engineer anymore we talked about the shared responsibility between platform and application right that platform takes over the operations part while application team is responsible for properly using the tools and integrating them into their development workflow so application teams don't need to set up and operate the cluster but they still need to know how to deploy their applications into the cluster properly like create correctly configured manifest files they don't need to know how to create terraform modules for infrastructure but they still need to know how to use those terraform modules and maybe integrate them into their pipelines they may get a CI/CD template from the platform but they still need to set it up and add additional needed steps for their application so in addition to application development they still have some non-functional requirements they need to worry about even though the scope of that has become way less by Distributing that to the platform team and this means you still need DevOps engineers in product teams but they now have shared their cognitive load and don't have to have deep expertise in cloud and Kubernetes and Helm charts monitoring security compliance development and hundred other things because now they are focused on properly using those non-functional tools rather than operating them so the work is more focused and easier through this low distribution but they still need to do those tasks properly and need expertise in using those tools but here is where it gets even more interesting as I said the platform is also a product right it's a product for the application teams the same way you need to add features make the UI more user-friendly offer services for new Cool Tools fixed bugs in the platform develop terraform modules and using git Ops create pipelines for their infrastructure as code that is underlying the platform again if you have a separate cloud or on-premise infrastructure team or security team platform team needs to work closely with them to build their platform product so this means platform just like the application needs a continuous development with many feedback iterations and close input Gathering From The End users which are mostly application teams but sometimes also governance and compliance people because they need access to the information about whether systems are compliant across organizations so all of these that I just described and listed are actually processes that require DevOps because they're the same processes that we use in the application development so you have tons of the DevOps processes needed in the platform development process which logically would mean that you may need a separate DevOps engineer role in the platform team as well now in reality as we know when it comes to DevOps there are lots of variations of how organizations Implement that it could be that companies hire platform engineers who do the DevOps and Cloud engineering tasks and they will just call it a platform engineer role so different job title same skills it could be that they move the DevOps engineers from the product teams completely to form a separate platform team but this would then create a vacuum in the application teams because as I said you still need someone who will create Kubernetes manifest files or create CI/CD pipelines and integrate it with various platforms so companies may make these tasks as part of developer work so you may have application teams without a dedicated DevOps engineer role where developers are taking over those tasks which is already practiced in many organizations but the bottom line is that whether you have a dedicated DevOps engineer role in both teams or not you need the DevOps processes both in the application and platform development and again how companies decide to structure the teams and the roles that is most probably gonna vary across organizations So based on that you essentially end up with application or product DevOps team and a platform DevOps team now there is one question we get asked a lot so I want to address it here in this context many people ask which of these parts do we teach in our DevOps bootcamp and courses do we teach the application DevOps side or the platform DevOps site and this question became way more common since the introduction of the platform role in our DevOps bootcamp you actually learn both parts of DevOps administering and operating or setting up the tools as well as using those tools to streamline the development process for example our Kubernetes administrator course is completely about setting up and administering the cluster as the name also suggests which would be part of platform Engineers responsibility while in Kubernetes module in our DevOps bootcamp you'll learn mostly the Kubernetes usage side but also the part of configuring proper monitoring and setting up alerting in the cluster or setting up load balancer and automatically deploying to the cluster from a city pipeline for example when it comes to infrastructure as code with terraform that we talked about here in The Bootcamp you learn how to automatically provision AWS infrastructure like ec2 servers or eks cluster but you also learn how to use the existing terraform modules and integrate it in the application CI/CD pipeline for example and same with every other tool that we cover in The Bootcamp Jenkins next sources repository for darker images we basically provision these tools from scratch on dedicated virtual machines then write infrastructures code scripts to automate the provisioning and configuration of these tools learn things like installing plugins and doing Jenkins Administration creating users and accesses cleanup policies and Nexus for example so tasks that would actually be part of the platform engineer's skills but we also use these tools for setting up application pipelines on Jenkins learning different types of pipelines and how to configure them how to integrate and hook them into various other tools or uploading application artifacts on Nexus and creating those repositories on Nexus and so on and then in the gitlab cic course for example it is very similar you learn not only how to set up the CI/CD pipeline for the microservices application but also the architecture of of gitlab CI/CD and how to configure Runners locally as well as on AWS virtual machine how to configure different agents on those Runners based on your pipeline tasks which again would be the knowledge that platform engineer would need so we've had for a long time DevOps engineer needing to have this skill set of both setting up the cluster and using the cluster setting up and securing AWS infrastructure and using the infrastructure to deploy and run applications now these tasks and roles are split and rightfully so not because knowing both is too overwhelming or because you can't know everything I definitely love knowing and doing both parts and I have been working on those in my projects and you can actually learn them but in reality depending on how complex a project is it may be unrealistic that one person or role will have time and capacity at their job to do both so you you can decide what you do with your full complete DevOps knowledge after finishing The Bootcamp or the courses you can join the platform team and use your knowledge there to configure and administer tools and build the developer platform with the knowledge or you can join the development team and streamline application development and release processes there while being the interface or intermediary between the platform team and your application team and what's absolutely obvious is that you can do any of these jobs much better when you have the full picture and complete knowledge of these tools from both perspectives because as a platform engineer you need to work closely with application teams and understand their processes to find any bottlenecks or things that are needed in all teams that can be standardized so the full knowledge of understanding both sides like we teach in our courses and bootcamp you can do the job much easier because you understand both aspects if you want to check out any of our courses I will leave the information in the video description and the final topic I want to address is what is a difference between platform and Cloud engineer well generally speaking platform engineering is an enhancement of all other Concepts like DevOps Cloud SRE as we saw throughout this video but if we narrow it down to the main differences Cloud engineer needs to know cloud services be expert in that usually even specializing in one of the cloud platforms so they need to know how to migrate from on-premise to Cloud how to set up a hybrid infrastructure manage the storage and backups and Cloud manage Cloud costs so basically everything Cloud related and they should be able to combine cloud services to build infrastructure that maps to what company needs but platform actually has a wider range of knowledge of the tools outside the cloud alone and they actually build a platform that developers or product teams can use to self-service any resources they need on top of the cloud resources and various other tools so basically they're taking the infrastructure and service that AWS for example offers to custom platform as a service for the company internal teams so essentially they build a layer on top of Cloud with bunch of cloud services as well as other services and tools which are not part of cloud and I'm sure as always each project will look different and Implement these Concepts in various ways many companies will hire DevOps Engineers as platform Engineers many Cloud Engineers will probably become platform Engineers as well in smaller companies platform Engineers will probably take over the DevOps role in larger company Cloud team will work with platform team and share their expertise so Cloud layer with platform layer on top and and this may be like this for a while until an industry standard evolves of one standard way to structure the teams it could be a large company that implements this successfully at scale and then other companies can basically replicate that successful model so we have some kind of standardization there but at its core that's the foundation and from an engineer perspective it really helps you in your job search as well as to actually do your work properly when you hired for a specific role when you understand these differences and the vision of responsibility is because you can then guide your team or you you can guide your company into having more clarity around these roles as well so we've talked about a ton of things in this video and I really hope I was able to help you understand what platform engineering exactly is how it fits into the existing DevOps and Cloud world and I hope you got some valuable information from this video that you can use in practice in your own work or generally in your career and with that thank you for watching and see you in the next video! :)