hey team Sid here with devops directive and welcome to this complete kubernetes course I've been wanting to build out this course for a few years now and finally found the time to dedicate to doing so properly this course is for software Engineers who want to level up their expertise and add kubernetes to their tool B of devops and infrastructure skills over the next few hours we're going to go from beginner to Pro learning the fundamentals of kubernetes before building out a robust application platform deploying a representative demo application to multiple environments also while you're probably watching this video on YouTube the video is just one component of the course there's also a companion GitHub repo containing all of the code samples so you can follow along each lesson has a written module that you can view over at courses. devops directive. comom and there's a Discord Community where you can join to interact with other community members and ask questions if anything isn't clear or if you get stuck along the way I've invested around 150 hours into designing and building out this course and I'm providing it to you free of charge because I want to ensure that this information is accessible to everyone who wants it if you find Value in the course and want to support me in building out this kind of content there's a few ways you can help first like the video on YouTube and star the repo on GitHub these are both completely free and will help more people discover and benefit from the content that being said I can't pay for my groceries with YouTube likes and GitHub Stars so if you do want to support financially you can do so via GitHub sponsors or buy me a coffee both of which will be linked in the description okay that's enough Preamble let's get into it throughout the course uh we'll be switching back and forth between covering Theory uh using diagrams and visuals and then jumping into our code editor and interacting with kubernetes clusters directly deploying resources inspecting them learning how to deploy and manage applications on top of kubernetes there are a handful of prerequisites that will help you immensely as you follow along in this course the first of which is some familiarity with web applications the example applications that we use in this course are built in JavaScript go and python so if you have familiarity with at least one of those that will be super helpful the second one is knowing some basic shell commands and being comfortable to navigate around within a Linux shell as we build and debug our applications this will come in super handy the third one is some basic Cloud infrastructure knowledge so we're going to mostly be working with managed kubernetes clusters and I'll talk about what that means in a later lesson but being able to go to a cloud provider create account and provision some resources will help you to understand what we do when we set up those clusters and how to do so effectively finally I'm assuming that you have intermediate level containerization knowledge now if you're lacking this I do have a whole course covering Docker and containers that will get you far beyond the expected Baseline here so I would suggest you go check that out but if you have these four things will make learning kubernetes much much easier now I've broken the course down into two major sections in the first section we'll be doing a little bit more of the theory component we'll be learning the motivation and history behind kubernetes how kubernetes itself works and what the system components do we'll set up our development system with all of the necessary tooling and dependencies that we're going to use and then we'll provision both a local cluster as well as two Cloud clusters and any of those three options are perfectly fine for following along with the demos but I wanted to give people options for the cloud providers of their choice or they don't want to have to create and manage a remote cluster there is an option to run entirely on your own system we'll then dive into the built-in resource types that come out of the box with kubernetes these are the building blocks upon which you're going to build your applications and finally in part one we'll wrap up by covering a tool called Helm which is a way to package and bundle your applications that is very popular in the cloud native ecosystem and is often how you will consume thirdparty applications that you install into your clusters in part two we're going to go really Hands-On and learn how to take those building blocks that we covered in part one and using a representative demo application we're going to first showcase what that application is we're going to deploy it onto kubernetes using the built-in kubernetes resource types we're going to talk about how people are extending those buil-in Resources with custom resources of their own we're going to deploy a lot of third party tools to help enhance our experience working with the Clusters we're going to examine a couple of challenges with the developer experience in kubernetes and how to improve upon those we're going to learn how to debug broken applications within kubernetes and then we're going to take those skills and learn how to deploy them to multiple environments so in the real world you're not just going to have one cluster you're probably going to have a staging cluster and a production cluster maybe many more clusters and so being able to take a configuration and build it out in such a way that you can reuse it across those environments is super critical we're going to talk about cluster and node upgrade so kubernetes itself is evolving and releasing new versions every quarter we need to be able to upgrade onto those both for taking advantage of new features but also with staying up to date with the latest security patches we'll talk about how to automate our continuous integration and continuous delivery how we can take changes to our codebase and get them deployed into kubernetes automatically and quickly and then we're going to talk about Concepts that were out of scope for this course but you might want to investigate moving forward I mentioned the demo application a few times I just want to highlight here what it is so that you have an expectation of the type of thing that we're going to be building out it uses a react front end so that's the client application that you're going to run in your browser it then has two different API implementations and these apis are super minimal they have a health check endpoint and they have a root endpoint that returns two things it Returns the current time uh as well as the number of times that that endpoint has been called we store the information about those connections so that we can know how many times it's been called in a postgres database and then we also have a python application that just runs in a loop and calls those apis over and over to generate some sort of load this is quite familiar to the demo application that we used in the docker course the new components here are the python load generator as well as in the docker course we didn't store anything in the postgis database we were just getting that time stamp this time every time we make a call to either of these apis we store a row in a table in that database before we start working with kubernetes it's useful to take a look back and try to understand some of the history and motivation behind why a system like kubernetes even exists if we roll the clock back to the 2000s there was no kubernetes there was no Cloud if you were deploying your applications to the internet uh you needed to do so on premises meaning you're going to have servers that you own and operate within a data center or perhaps you are hosting your servers within a shared facility where someone else is taking care of the power and HVAC and facility infrastructure and you're operating your servers within that location companies would generally have teams of system administrators to handle provisioning and managing all of these fleets of servers there was a lot of Toil and effort that would go into doing so you were operating primarily in a bare metal context and so by that I mean you're running your applications directly on the hardware without a layer of virtualization between the hardware and your application stack because of the complexities of provisioning and managing these servers it was not practical to operate in a with a microservice architecture monoliths were pretty much the only practical way to go because it greatly simplified the operational overhead of dealing with this Hardware the tooling was relatively immature would often have tons of homegrown tooling for monitoring your application lots of sort of bash script and and glue code and Powershell tyeing everything together and so this is what the world looked like in the 2000s for deploying your applications now in the 2010s uh the cloud has emerged and has sort of shifted the Paradigm in terms of now virtual machines can be created and destroyed in a matter of minutes uh and so this allows you to approach your operation slightly differently you can deploy a new VM uh very quickly rather than needing to physically purchase a server and install it in a data center along with this a lot of the tooling around configuration management for these systems things like puppet and Chef have started to come into popularity and allow you to much more much more easily and programmatically configure a fleet of servers you're still doing some things manually for example if you have a number of different applications that need to go onto a set of VMS you're still doing a manual bin packing meaning kind of plank Tetris with the resources required to fit your applications into the virtual machines in an intelligent way the tooling is continuing to improve uh and it makes managing a larger set of applications practical so you're able to look at different architectures uh such as microservices and manage them in a in a more practical manner however managing large numbers of cloud resources is still a challenge if we move forward into the last few years containers have really become the deao standard for deploying applications and workload orchestrators such as kubernetes are enabling operators to treat clusters of machines as a single resource these types of systems provide a number of utilities and interfaces to address common challenges with operating applications things like automatic and efficient scheduling across different instances so rather than picking and choosing oh this application and these two applications need to fit on this ver machine instead you can provide a specification to the system about how much CPU and memory each application requires and allow it to figure out the proper scheduling onto each different uh machine things like automated health checks to ensure that if an application gets into an unhealthy state it will be removed and replaced automatically service Discovery so how your applications can find and talk to each other over the network many of these workload orchestrators will contain mechanisms for doing this right out of the box including kubernetes configuration management now rather than needing to configure the host itself much of that configuration is moved into the orchestrator layers so you're defining all of the ways that your application needs to be configured and deploying that alongside your application as part of its config the ability to automatically scale Up and Down based on demand mechanisms to easily provision and manage persistent storage to go alongside your application and finally how you can configure networking across your different applications running within one of these systems kubernetes itself evolved from earlier work within Google uh and their workload orchestrator named Borg as you can see here this is an architecture diagram showing how things worked internally at Google and they decided to take many of the ideas that they had built and matured for their internal systems and make a huge bet in terms of building out this open source system uh as a differentiator for them in the cloud world as they wanted to grow Google Cloud platform they started the kubernetes project and brought in many other companies from across the industry and have continued to evolve and build kubernetes such that it has become such a Powerhouse in the cloud deployment space if you want to learn more about the history there and there's a number of really interesting aspects of why Google decided to open source kubernetes what the BET was that they were making Honeypot has made this two-part documentary where they interview experts and key people from throughout kubernetes history and so I would definitely recommend watching these two videos it will give you a great historical context for where kubernetes came from and how it got to where it is today now that we have a grounding in the historical context of kubernetes and the types of challenges that it's helping to solve let's take a look at the system components and the architecture of kubernetes itself to understand how those components play together and form the system there's a few terms that are very important to understand as a foundation uh those four terms are cluster node control plane and data plane so in this diagram I've laid out what those actually are the cluster is the set of resources that make up the kubernetes system uh those will be comprised of individual nodes so each node is a server that it could be a virtual machine or a bare metal system uh that is interconnected to form this kubernetes cluster those nodes are going to be broken across two groups primarily uh the first one is called the control plane and this is where all of the system components run so the components that comprise the kubernetes system itself will run on the control plane and then the data plane is where our enduser applications so if I'm building a system that's deployed to kubernetes those applications are going to run on these worker nodes you can host a sing a cluster in a single node where the control plane and the data plane are just one however in most production systems you'll have a separate control plane likely with three or more nodes and then you'll have a data plane with as many nodes as you need to host your applications I've taken that same diagram and you can see we've split between control plane and data plane and added the system components that make up kubernetes the blue icons are the kubernetes system components and then you can see I've added uh a workload and another workload uh those are the end user applications finally I've added a cloud provider API because often times you're deploying your cluster onto a cloud and there's interactions there between the system and the cloud provider let's walk through each of these components in turn and describe briefly what they do the first one at the top here is the cloud controller manager this is the interface between kubernetes and the cloud provider this is where Logic for resources that live at that boundary are going to happen for example if you need to provision a load balancer on the cloud provider there's going to be API calls that the cluster makes to the cloud API or any other resources uh that need to be provisioned outside of the cluster within the cloud provider are going to live there the next component labeled cm is the controller manager this runs all of the various controllers that regulate the state of the cluster kubernetes is based on this idea of a control Loop where you have a set of applications that are running and looking at some State and making sure that that desired State matches the actual state of the world and the controller manager is what manages all of those controllers such that they can do their job and make sure that the actual State matches the desired State the next one down is is the API so this is the kubernetes API itself this is how you actually interact with the kubernetes cluster you make API calls to this API and then it will in turn make calls to the other various components as needed to carry out the actions that you're requesting next up is etcd uh this is the data store that kubernetes uses to manage all of the resources that you deploy to it it is a highly available key value store that contains information about all of the resources deployed into the cluster and ensures data consistency across all the different nodes the last component here shown on the control plane is the scheduler so the schedule's job is to assign pods to new nodes based on their current usage so when you define a workload for kubernetes you can tell it how much CPU and memory that workload requires the scheduler can then look at the set of nodes available and which one would make the best fit based on the available resources now there's two kubernetes system components that live on the worker nodes as well uh and these are the cuet and the cube proxy the cuet is the component which is responsible to actually uh spawn and manage the workloads themselves it also performs the health checking of the application and relays that information back to the API server on the control plane cube proxy is responsible for setting up and maintaining the networking between the different workloads uh so so it sets up the uh rules within IP tables for example to sure that the workloads can communicate based on how youve set up the configuration now not every cluster is going to leverage Q proxy there are some newer uh networking plugins that don't use Q proxy and instead use different mechanisms for this but most clusters that you use will have q proxy there to handle the networking while it's useful to understand what all these system components do uh in the context of this course we're focused more so on the workloads that we're deploying onto kubernetes and how to use an existing kubernetes cluster rather than administering the cluster itself many Cloud providers have hosted or managed clusters that hide most of this complexity from us and provide us with access to that kubernetes API server and we're able to consume that kubernetes API to deploy our applications without having to get into the nitty-gritty of all of these system components that frequently I provide it so that you understand what the components are doing behind the scenes but mostly in the context of this course we're going to be focused on the applications that we deploy onto the data plane onto those worker nodes now there's also three standard interfaces that kubernetes uses to handle the container runtime uh container networking and container storage these allow for a much more modular system where you can have a base kubernetes cluster and then use different implementations of these standard interfaces to allow an individual company or organization to to iterate more quickly on one specific component of the stack and so there's many different implementations of the cni of the CRI or of the CSI and so we can pick and choose which implementations fit our needs uh and swap them out to achieve new functionality or better performance and so this is a really key aspect of the kubernetes ecosystem where much of this used to be included in the full kubernetes deployment itself these pieces have been moved out of the core of kubernetes such that we can have a pluggable implementation and innovate on one domain independent of the rest of the project the CRI or the container runtime interface is the standard interface that kubernetes uses to execute and run container processes within the system the two most popular container run times these days for kubernets clusters are container D and cryo container D is certainly the most popular historic Al Docker was used as the container runtime but it had a slightly its interface was slightly different than the official CRI interface and so there was this component called Docker shim that used to exist and in kubernetes 1.20 uh kubernetes removed Docker shim and so Docker can no longer be used as the container runtime because it's not compatible with that CRI interface and so from then on container D and cryo are now the the more popular choices there the cni or the container Network networ interface defines how networking should be set up for the containers that are running within kubernetes there are many different implementations of the cni uh including a number that are listed here some are cloud provider specific so if you're working on Amazon you may want to use the Amazon VPC cni because it integrates well with their virtual private Cloud uh offering if you're on Azure and Google similarly they have their own cni implementations there are also a number of popular projects such as Calico flannel or psyllium that implement this as well I mentioned Cube proxy earlier not all of these cni implementations use Cube proxy something like psyllium uses a technology called ebpf uh to eliminate the need for cube proxy and instead manage the networking at the kernel layer the CSI uh is a standard interface by which we can provide storage two containers this interface is used for a variety of purposes the most obvious one is to provide durable persistent storage to a workload running in kubernetes these drivers often times interact with a cloud provider to use their underlying block storage implementations such as EBS for Amazon or compute engine persistent discs for Google cloud and by using this specific CSI driver kubernetes is able to interact with that cloud provider and provision and manage those persistant discs there are also other use cases for the CSI driver where you can use this driver to provide information or configuration to a container at runtime for example CT manager which is a tool for provisioning uh TLS certificates can use the Cs the container storage interface to load a certificate at runtime into the file system of the container automatically or The Secret store CSI driver enables you to load sensitive information into the file system at runtime as well while we're focusing on kubernetes and looking at a handful of projects associated with kubernetes the cloud native ecosystem is massive and in particular the cloud native compute Foundation has a program by which they assess and manage the maturity of different projects here I have the what is known as the cncf landscape and as you can see there are just a swath of different projects at various levels some are at the graduated level so those are quite mature and are often being used in production across industry there are others that at the incubating stage which maybe are a bit less mature but are upand cominging projects in the space and so as you start to work with kubernetes and need functionality to provide specific capabilities definitely check out the cncf landscape and see if there is tooling that might provide the capabilities that you need okay the time has come for us to set up our local development systems and provision the kubernetes Clusters that we're going to be using throughout the course for the demo and lab sections there are a number of different tools that we're going to use and within the GitHub repo for the course I've provided a configuration that will allow us to install most of these automatically we're going to install a couple manually and then use a tool called devbox to install the rest of the dependencies that will ensure that you have all of the dependencies available and are using a version that is compatible with the one that I'm using in this filming the first dependency that we're going to install is Docker desktop this one I suggest that you go to the docs at docs. do.com Docker and follow the instructions for your specific operating system it will guide you through the installation process and should be relatively straightforward the second tool that we're going to install manually is something called devbox now devbox is essentially a wrapper around a technology called NYX NYX is essentially a package manager that provides access to a number of different tools and allows you to configure a reproducible installation however the learning curve for NYX is notoriously difficult uh and devbox provides a really nice wrapper around it that makes it much easier to get started with if you go to the jifi website and look at the docs for devbox there's a quick start there that will provide you the commands you need to run for your specific system if you're on Mac or Linux it should work directly if you're using Windows I would suggest that you use WSL so that you have a Linux environment to work with jumping over to the companion repo for the course and as you can see at the top level we have this devbox do Json and devbox do loock file the devbox Json file contains a listing of all the different CLI tools that we're going to install via devbox and then the devbox do loock file contains the specific versions so that you will be guaranteed to get the exact version that I'm using and you won't have any incompatibilities because of different versions the way that devbox installed these is into an isolated environment on your system and sets up your environment path variables such that you can activate a specific shell associated with this project and then deactivate it when you're done and it won't have any impact on other projects throughout your system so for installing devbox as you can see I'm here on their doc site uh the Linux instructions are here the Mac instructions which I'm on are here I'm going to curl this endpoint it's going to get me a bash script which is then piped into bash to install it if I paste that into my terminal it's going to ask me if this default location is where I want to install it I'll say yes I need to enter my password because it requires pseudo permissions and there we go now we've successfully installed devbox in order to you use devbox anywhere within the context of this git repository uh because I have the devbox configuration files that I just mentioned I can do devbox shell and it's going to create create a shell containing all of those dependencies ready for use on the very first time you run it it's going to take quite a bit longer because it's going to have to pull all of those dependencies from the package manager but on future times it's going to cash all of those dependencies such that the shell will start right up I can then do devbox list and we can see here are all the dependencies that devbox installed within my current shell session you'll notice one dependency in particular go task and this is a taskrunner program that I using throughout this course to help store all of the commands that you're going to need uh and make it very easy to follow along with all of the various demos so for example if I now navigate to the third module of the course and I run task list all you can see I've defined a number of tasks within this task file each of which executes some set of commands as we run through this portion of the course for example we have a number of tasks in this task file associated with deploying and managing a kind kubernetes and Docker cluster which brings me to the next step in this module we're going to deploy a kind cluster onto our local machine one really nice thing about kind is that even though it's running locally and each node is technically a container it still supports running a configuration with multiple nodes so we're actually going to set up a cluster which has a single control plane node as well as two worker nodes so that while we're running on a single host we still get a feel for for how things would work and scheduling workloads across different nodes even if they're just simulated this will work for most examples throughout the course there are a few places later in the course where we're setting up public DNS to Route traffic to our applications that won't work here but for pretty much everything else the kind cluster will work uh as expected so let's jump over to our code editor so if I just run TL that is Alias to run task-list all and it's going to show me all the tasks within the task file here defined within the third module of the course as you can see there are these four associated with our kind cluster the first one is to generate a configuration file uh that we're going to use to deploy our cluster or the reason that we have to generate this rather than use one that already exists is that I want to have these two extra mounts specified and these are where within my host system the kind cluster is going to mount in uh each node for persisting data so in this case this is a path on my system it's an absolute path and obviously your username is going to be different than mine and so rather than use that directly I've created this template containing these two environment variables such that when we run task kind1 generate config it's going to execute this command where we substitute in the present working directory into that template file and outputs a version that is not checked into git uh containing the absolute path to these two extra mounts now you'll notice there that I was able to tab complete the various tasks that are available uh the way that I was able to set that up was following the setup completion step within the installation instructions and so you can see all completions are available on the task repository within this completion subdirectory it provides the necessary completion files for your various shells download the relevant file and then you'll want to make it executable and finally add a source call to your profile for your shell such that you'll get that autocompletion set up with each time you open a new Shale session with this config now generated we can now run the second command within our task file and so that second command is going to be to actually create the cluster we're going to run the kind create cluster command and pass it that config with the name of the config file so I'll just run task kind O2 create cluster behind the scenes it is going to create the corresponding Docker containers that we specif so we're going to have one container for the control plane and two containers for the worker nodes this is going to take a little while maybe 30 seconds great it looks like it's done setting up so I can do Cube C get nodes and by default it's going to add the configuration file to connect to that kubernetes API directly to our config such that we can now authenticate to the API server by doing CTL get nodes it shows us we have that control plane node and we have the two worker nodes within the cluster awesome now there's one additional task associated with the kind cluster here in this task file and that is the cloud provider kind uh this is a tool that will enable us to run load balancers within the kind cluster such that we'll be able to access it from the host and so when we get to the point where we're deploying a service of type load balancer we're going to come back here and run this command and this will set up a program running on the host that will enable us to make that connectivity happen for now we don't need to deal with it but I'm just calling it out because we're going to use it later once we're done with the course or want to clean things up we have this command the kind delete cluster command and that's going to take the cluster and remove it from our system cleaning up those containers on our host now I showed you the Cub C get nodes command we can also look at the Cub C get pods across all namespaces and we see there's a number of system pods including the types of things that I talked about earlier Cube proxy is handling that networking piece we got the API server we've got the controller manager Etc and so with that we have a fully functioning local cluster uh where each node is running as a container that we can use for the various demo portions of the course however I think it's also educational to deploy remote clusters on various CL providers CU that's probably how you're going to configure things uh in a workplace environment uh the first cloud provider that I'm going to use is one called SEO it's going to allow us to provision a simple cloud-based cluster uh they have very fast cluster creation and destruction so it's very useful for cases like this where we're going to provision clusters and maybe tear them down uh throughout our learning process they provide during your first month a $250 free credit for new user so that should be more than plenty to handle the use case within this course uh and then beyond that you'll have to start paying for the compute resources in this case relative to kubernetes and Docker it's going to allow us to demonstrate some more realistic networking configurations we'll be able to have public load balancers that we can route uh DNS records to uh as well as having persistent storage that lives outside of the cluster so that we can maintain State across application deployments and reloads outside of our containers last time I checked there was an account verification process that was required for new accounts so after you sign up there could be some delay before you're actually able to provision a cluster and so make your account reach out to the team there and make sure that your account gets verified such that you'll be able to provision clusters and and work along with the course now let's jump back over to our code editor and provision a cluster with COC Cloud if I run the TL or task list command I'll see the tasks associated with SEO uh we're going to first authenticate to the SEO command line with an API key we're then going to create a network so you could provision a cluster within the default Network it's much better practice to create an isolated network uh rather than the default so you are in control of all the settings we're going to create a firewall to specify exactly which ports we should allow Ingress and egress traffic within that network uh then we'll create the cluster itself if you wanted to Short Circuit all of that and run all of those in sequence you could run first you need to authenticate the CLI but then could run the create all command and it would create the network the firewall and the cluster Allin one go let's go ahead and run that first authenticate CLI command these notes are provided within the task itself uh we're going to log in and create an account if you don't have a team already you'll want to go to the teams page and create a team and add yourself to that team then you can go to the security page to get the API key itself so if we navigate here I'm already logged in I'll regenerate a new key it it's generated here I'll copy it now I need to give it a name we'll call it beginner to Pro I'll paste in the value of that key and you can see I had one key named primary which is set as the default and I have this new key which I just added to the CLI here in order to use that new key I need to run Zeo API Key current and then give it that beginner to Pro name if I list the keys again I can see now the new beginner to Pro key that I just saved is set as the default great let's examine what these next commands are actually going to execute the first one is going to run coo Network create it's passing in this cluster name environment variable as well as the COO region environment variable those are set up here at the top the cluster name is going to be Dev obective kubernetes course I'm using the New York City 1 region you could change the region if you have a different if you have a different region that you prefer or if you're located at a different place so if I execute that we can see a new network was just created with this ID next up we're going to create firewalls associate associated with that Network it's going to run a few commands in sequence the first one is going to be to create the firewall itself Now by default the firewall rule tools that SEO creates with a firewall are very permissive they allow access on all ports and so we want to actually remove the firewall rules that they provide and so this is just a little Loop that looks up the firewall rule IDs and removes them all and then we're going to go ahead and add back three specific rules one for Port 80 one for Port 443 and one for 6443 80 and 443 are for normal web traffic that we're going to bring into the cluster and then 6443 is for accessing the kubernetes API itself The Cider block that we're allowing is all IP addresses because we want traffic to arrive from the public internet if you wanted to be a bit more restrictive for the API server in this third rule you could limit that to let's say your specific IP of the machine you're connecting from or you could have a another machine that you're connecting through as a proxy and specify only the IP address of that machine let's go ahead and run that command it created the firewall it's now looping through and deleting the default rules we then added back the three rules I described and I just have this note here around if we wanted to lock down that kubernetes API server further we could restrict that third firewall rule to a specific set of IP addresses let's go look in the COO interface and see the network and firewall that we just created we can go here to networking networks you'll see the default Network that we could have used we see the new network that I just created and then I also host host a plausible web analytics server here and that's on its own network as well we can look at the firewalls here's the firewall that I created associated with that same network we just created we can look at the rules and we can see that for inbound traffic we have the three rules that I described and we're allowing egress on all ports across all IP ranges at this point the next step is going to be to create the cluster itself we're going to run the SEO kubernetes create cluster command uh we pass it the cluster name we specify the network and the firewall that we have just created in this case we're going to create a cluster with two nodes using this uh machine type if you want additional CPU or memory you can either change the number of nodes or you can change the size that we're using to create those nodes by default coo installs a traffic Ingress controller using a node Port configuration we don't want that for our cluster we're going to start with a base cluster that has no Ingress controller and then this wait option tells the command line to wait until the cluster is done provisioning before it returns so I'll run T3 cluster create it executes that command subbing in the environment variables as specified in my task file and this will take between 90 seconds and 2 minutes if we navigate back over to the COO dashboard we can see the cluster here listed on the kubernetes page it looks like one of my two nodes is active the other one is still coming online this is that node type I specified it has two CPUs 4 GB of memory uh and a 50 GB disc let's click into it okay uh and now it is completed the first time I tried it it failed for some reason so I deleted it and recreated it it looks like it completed this time in about a minute and a half so that's great uh and if we do Cube CTX we see that currently we only have that kind cluster in our Cube configuration file however we have a command coo uh get Cube config which is going to execute the COO kubernetes config passing in the cluster name --save and D- switch so it's going to merge it into our existing Cube config file and then switch such that the default is the new SEO cluster so now if I do Cube CTX we see both the kind cluster and the devops directive kubernetes course cluster and if I do a cube CTL get nodes we see the two nodes within that cluster also you'll notice throughout the course that I often times skip typing out cube C and just do Cube just do K get nodes that's because I have an alias set with k equals Cube CTL so now we have our kind cluster setup we have a COO cluster setup and the third type of cluster that I'm going to demo here is Google kubernetes engine on gcp so GK is the oldest and in my opinion the best manage cluster experience if you want to manage cluster and you aren't already using a different cloud provider GK is probably the way to go there's two different modes that they offer there's a standard mode which will be very similar to what we just saw with SEO where you deploy the cluster they manage the control plane uh and you sort of manage the worker nodes within your Google Cloud account there's also an autopilot mode which abstracts even more a way such that you don't even really have to think about the control plane at all you set it up you interact with the kubernetes API they abstract away all of the worker nodes as well so you're you just Define a workload and they handle all of the underlying node provisioning and management and so if you want to really minimize the operations having a deal with kubernetes autopilot can be a great way to go we're going to go with the standard mode just so we get a little bit more hands-on experience working with clusters Google Cloud also has a free trial for new users within the first 90 days you can get a $300 credit also within their free tier you can have one zonal cluster meaning the control plan is deployed within a single zone rather than across multiple zones uh included within their free tier so that includes the control plane only any worker nodes that you deploy you're going to have to pay for outside of that free trial but it is nice that they have one zonal cluster control plane included within Google Cloud again we're going to be able to demonstrate more realistic networking to what you're going to be using in the in a production setting load balancers that we can route public DNS to just like on coo within Google Cloud there are multiple different persistent storage classes coo just has one you can deploy different types of storage with different speeds and IO capabilities Etc and then Google Cloud also provid some some basic but useful workload monitoring capabilities out of the box you'll be able to see things like how much CPU and memory your workloads are using as well as viewing aggregated logs from across all of your workloads if we jump back to our code editor and once again list out our tasks similarly we're going to start by initializing the CLI Google interestingly doesn't enable a lot of their Cloud apis out of the box you have to tell it specifically in each project which apis you want to be able to use so we'll enable all those apis we're going to set a default region and Zone this is just so we don't have to specify it every single time and then the following steps are pretty similar to what we did with CA we're going to create a VPC or a virtual private Cloud that's essentially the network we're going to create a subnet within that Network and then once we've done that we can create uh the cluster itself this create all command just combines all these steps together if you wanted to run a single command and bring up the cluster like that you can do so with that single uh create all command just like witho you could do all this through the web interface you could click here create a GK cluster and go through the web UI however we want to have everything codified in our repo and so we're going to do so via the command line an even better approach would be to use an infrastructure as code tool like terraform to handle this I'm just using the CLI because it's a quick way to get up and running but still allows us to codify our configuration in the repo the very first thing we're going to do is to initialize the CLI so we'll do T gcp1 init CLI it's going to walk us through the setup of the command line uh in this case I'm going to reinitialize the configuration with new settings so I'll pick number one if you haven't already initialized maybe you'll be picking number two create new configuration it does a little bit of Diagnostics you pick which account you want to use which project in this case I want to use the kubernetes course project we'll config configure a default compute region in zone I'm going to use us Central 1 a as my default that's number eight and we should be good to go next up I'll run the enable apis command and so the seven the eight apis that we're going to use here are are the compute API the container API the cloud resource manager API uh the identity and access management API secret manager service management and service usage so these are all the apis that we're going to end up using behind the scenes and so we need to enable those before we're able to do so because this was already set during that initialization process you can skip it if it was unset you could run this command and it would set the default region and zone for us now we want to create a VPC if we look here within the web UI we click on VPC Network we can see there's this default network with 41 subnets lots of settings that we may or may not want so instead of using that we're going to deploy our own new VPC using the gcloud compute networks create uh we're passing it a name for the VPC and then we're giving it this subnet mode custom this is going to tell this is going to allow us to deploy our own subnets versus automatically deploying one subnet per Zone once the VPC is created we're going to create the subnet we're naming our subnet subnet one we're passing it the name of our VPC we're passing it the region that we want to deploy it into and then we're giving it an IP range this is the set of private IP addresses that should be included on this subnet with the VPC and Subnet in place we can now run the create cluster command this is the command is running behind the scenes gcloud container clusters create passing at a cluster name giving it a Zone giving it a network name from the network we created before the sub network name that we created in the previous step the machine type that we're using is this E2 standard 2 machine with two nodes and we're turning on the standard Gateway API option this is so that it will automatically deploy the custom resource definitions for the uh Gateway API that we'll talk about a little bit in the next section and then this final option the workload pool this is going to allow us to authenticate to other Google cloud services without needing to have static credentials so we're able to use something called workload identity to to handle authentication to other Google cloud services in a much more streamlined fashion than if we needed to create static credentials and pass those in this is going to take a few minutes and in the meantime we can go over to the Google Cloud console and take a look at the resources we've created thus far if we refresh here on the UPC page we can see we have not only the default network but also the new network that we created with our single Subnet in the US Central One region using the IP range that we specified if we go to the kubernetes engine page we can see the cluster that I'm creating via the command line is spinning up here it's configured it's deploying and then it will do some health checks just to make sure that the cluster's healthy before it deems it complete all right it looks like the cluster is up and now it's doing its health checks we used a few options with that command line provisioning but as you can see here there's tons of different toggles and configurations that we could have chosen as we provisioned the cluster uh so if there are non default settings that you want you should go through the documentation and see if any of these should be turned on I've enabled the options that we care about for exploring the different kubernetes capabilities but there are things here that you may want to check a look at okay it looks like our cluster creation is complete uh so let's go ahead and do Cube CX you'll you'll notice that this gke cluster creation process added the credentials to our Cube config file automatically uh and set it as the default so now we have the kind cluster the gka cluster and the COO cluster all listed here we'll use the GK cluster K get nodes and we've got our two nodes online and ready do K get pods in all the namespaces we see just like with our kind cluster we've got the system pods in that Cube system namespace even though we haven't deployed any workloads ourselves these are things that GK is managing behind the scenes uh for our cluster so that it works properly each of these Cloud providers has a cleanup task as well so if we wanted to delete the cluster the network Etc we could run the gcp cleanup task uh however there are edge cases where maybe you've deployed something that is associated with the cluster that I haven't accounted for and so if you run that cleanup task and it runs into an issue you may need to go into the console figure out what resources are blocking and manually clean those up I just want to make sure to call that out because if you have these resources up and running uh and leave them there you will be paying for them over time once those free trials run out and so I don't want you to accidentally leave a cluster running I did that when I was first starting out and incurred a multiple hundred bill from AWS luckily they forgave it when I explained what had happened but just want to make sure you're being careful about looking at the resources that you're deploying understanding them and cleaning them up if they're no longer necessary you can use whichever of these cluster types that you'd like throughout the course I wanted to Showcase how to deploy all three of them so that you can pick uh but realistically any of them are going to work and hopefully give you experience working with the different types of clusters so that when you encounter a new one in the future you'll be comfortable and able to do so effectively now that we have an understanding of some of the kubernetes history and how the system components function as well as having deployed one or more clusters that we can operate with it's time to take a look at the built-in resources that kubernetes provides out of the box what we would use them for and how they behave this section is meant to give you a high level understanding of the different pieces and build blocks but outside the context of a demo application it's very hard to fully understand the nuances and so my hope is that you will leave this section with awareness of the breadth of different resources that are available and then as we revisit these resources in more detail as we build out our demo application later in the course that understanding will solidify and you'll start to understand the patterns with which you would use these in future application designs the first resource type that we want to look at is a namespace and this this provides a mechanism to group resources within the cluster it kind of helps to provide organization or tidiness to your cluster where you can take app a single application and put it in its own namespace or you could take a group of similar or or applications that interact with each other and put them in a namespace while it is possible to put everything in the default namespace you would end up with tons and tons of resources and it would be very hard to understand what was going on and so it's best practice to use namespaces to group things logically so that it's easier to reason about and understand the applications running within a cluster it is important to call out that by default namespaces do not act as a network or security boundary uh you can still make Network calls across namespaces and so while it seems like you're isolating your applications with a namespace they don't provide additional security sort of out of the box when you first create a kubernetes cluster there's going to be four initial namespaces created by the system the default namespace which is there just to allow you to get started without needing to create your own namespaces if you don't want however as I said it's best practice to do so and then there's some system name spaces like Cube system Cube node lease and Cube public uh that the kubernetes system uses you probably won't need to worry about those too much except for knowing that within that Cube system namespace you'll recognize some of the system components that we've talked about like Cube proxy within the module 4 directory in the companion repo you'll see that there is a subdirectory for each of the resource types that we're going to explore and so in this case we'll be taking a look at the namespace folder within each folder there will be a task file as well as some other yaml files those other emo files are where we have the actual definition for the resources themselves while the task file contains all the different commands that we're going to use to interact with a resource they're all going to start by creating a namespace and so that's why we're looking at namespaces first by creating a namespace for each resource type in this case we're able to isolate things and pro avoid name conflicts even if we don't clean up from one resource to the next while this may be a little over the top in terms of creating tons of name spaces I thought it would be useful in order to isolate each resource type and avoid potential issues with conflicts there so hopefully that gives you some idea of the structure here I'm going to jump over to my code editor and pull up these same files and then we can execute them against one of our clusters I could use any one of the three clusters that I set up in this case I'm going to use the kind cluster for now just because it's simple and it's local the results would be identical across all these different clusters though I want to at this point call out another useful command that the cube C interface has and that is Cube CTL explain so you can do Cube CTL explain and then put in any resource type in this case we'll do namespace and it will tell us all the different fields that we need to specify a description of what that field represents within this top level set of fields you can also dive in to see any of the sub Fields so you could for example do K explain namespace do metadata within the metadata we can see all the different fields in this case the only field that we're setting is the name field for each of these resources we'll also be using the cube TL command line to investigate and inspect the resources so before we've done anything else we can do K get namespaces so there are the four Nam spaces that I called out that would be in any uh kubernetes cluster upon startup and then there's this additional local path storage namespace that is specific to kubernetes and Docker and it is provisioning an application in that namespace to enable us to use local paths on our host system uh to set up persistent storage at the top here we can see the definition of a namespace we specified that is it is of kind namespace and that namespace belongs to the API version V1 and so that's just that's the root kubernetes API version number one and the only information we really need to provide it is the metadata name field and this is the name that will show up after we create it now if I navigate into the namespace subdirectory where this task file lives and I do TL for task list we can see there's three different commands here the first one is to create a namespace via the command line so we we can just say Cube C create namespace and then pass it a name so I'll go ahead and do that it executed this command and we see that the namespace 04 namespace CLI was created I can do K get namespaces now and we see that it now exists it was just created however this is creating it in more of an imperative fashion if we want to store the definition of our resources in git then we do so by putting the definition within a yaml file and then we can instead of running the create command we can run the apply command so Cube CTL apply DF for file and then pass it the name of that resource configuration so in this case we can do T2 it's going to create the namespace based on this file and now again we can do K get namespaces and we see the new names space- file namespace was created in general it is preferred to store the definition of your resources within your code basee rather than creating them on the command line imperatively and so in this case I would generally recommend that you create these files and use the apply command rather than creating them directly with a cube CTL create namespace namespaces are simple enough that it doesn't matter too much but as we move to more complex resources this will matter more and more now just to clean up we can run this delete namespaces command which will run in sequence these two commands first we'll delete this one by passing its name via the command line and then alternatively we can pass again the file containing the definition and it will find the resources that are contained within that file and delete those now if I do a k get namespaces again we can see that the two Nam spaces we had previously created are gone now I wanted to cover Nam spaces first because we're going to use them in each of these lessons but the next resource that we're looking at the Pod is really the foundational building block upon which the rest of our workloads in kubernetes are going to be built it is often cited as the smallest Deployable unit within kubernetes uh and it is where our containers are going to be run however I will caveat with the fact that you were almost never going to create a pod directly we're going to do so here for educational purposes uh such that we can look at it and interact with it in isolation however there are higher level controllers and resources that we're going to get to down the road that we'll use in almost every scenario I've included here a minimal definition for a pod resource we need to give the Pod a name within the metadata field and then we need to specify one or more containers within the spec field giving each of those containers a name and a container image to reference while a pod can have a single container it can also have multiple containers within it uh if you have multiple containers within a pod generally you're going to have one primary application container and then you might have what are known as init containers or sidecar containers an init container is going to be a container which runs before the startup of the other containers uh and that can be used to do things like preparing various files in the file system injecting dependencies for something like a a metrics aggregator into the primary container uh there's a number of use cases there and then side card containers are containers that uh don't run before the primary container but run alongside it in perpetuity these are of used for things like serving as a network proxy uh for example uh on gcp there's a proxy that is provided by Google to help you connect to their managed database instances called cloudsql proxy and that is often run as a sidecar container now within a pod the containers are going to share networking resources and storage resources they can have shared access to volumes and so they can pass files and signals between each other using those volumes and also while my previous configuration literally only contained a name and a image there's tons of different configuration options that are available such as which ports the application should be listening on uh things like liveness and Readiness probes that tell the cuet how to check if the application is healthy or not and how often to check you can set the uh CPU memory and storage requests and limits so the requests are the amount that you require and and the limits are the maximum amount that the system should allow uh the application to utilize there is a security context which allows you to specify whether or not the application is allowed to run as root whether it should have readon access to the file system Etc we can specify environment variables and that can be a way to manage configuration across different environments or pass in sensitive information in the form of credentials you can specify volume that could be used for persistent storage or configuration and you can specify how DNS should resolve for that particular pod as we deploy both in this lesson and future lessons we'll explore how a number of these different configuration options manifest now I wanted to just throw this slide in here to give a number of useful patterns kind of a cheat sheet for using Cube CTL you'll see me follow and execute many of these commands throughout I'm not going to go through each one right now but I put it here for reference so that you can come back and take a look at it the general structure of these commands is going to be Cube CTL and then a verb like uh get create apply edit Etc and then a noun where that noun is a is the resource type so that could be namespace or pod or any of these other resources that we're learning about in this section if you're targeting a particular namespace you would do the- n flag with the namespace name and then you can get the output in a variety of formats uh by default def fault it's going to be a format that looks nice in the command line via standard out however if you want it in Json or yaml such that you can do parsing of the results via uh some other program you can specify that with the- flag jumping back to our code editor and navigating to the uh pod subdirectory of lesson 4 we can see that we have a namespace file that namespace file is going to follow the convention o-- pod4 meaning it's module 4 and then Das Das and the resource name is POD and then we have two pod definitions here that we're going to take a look at but first let's peek into the task file and see the commands that we're going to be running as you'll notice it's very familiar to before the first thing that we're going to do is create a namespace so we'll apply that namespace yaml by running task 01 create namespace you can see it does two things first it do the cube C apply command on our namespace resource creating that 04-- pod namespace and then secondly we're calling this Cube namespace command we're setting that to be our default namespace so that in future commands we don't have to pass it that dashn flag now if I try to do K get pods we can see we haven't deployed anything yet next up let's use the cube CTL run command to create a pod directly from the command line you'll notice that I've provided a big warning saying please please do not do this this works just fine however you now have no record within your code base that this was created it's likely going to live on in perpetuity as just sort of this orphaned resource until it causes some issue down the road but if we look at the command that it ran it executed Cube C run we passed it the image via this image option uh in this case we're running an engine X web server version 1.26 we are deploying it into the namespace that we created and then I named it created the wrong way so if we do K get pods we can see we have created the wrong way is the name of our pod and it's been running for 41 seconds for now we're just going to leave it there it's not harming anything and we can take a look at the two configurations uh of PODS that we have here in our yaml files the first one is a super minimal configuration with the minimum set of values specified in this case we're providing it with a name uh we're providing it with one container that container has a name and that container has an image this container similar to the one we created with the qctl Run command is going to take the default values for all of the other configuration specs however at least this time we have the resource yaml in our code base and so it's possible for us to understand and reason about the fact that why it would exist the way it does I can apply it by doing T3 minimal apply it starts in this container creating State and then a few seconds later it's running these two have a pretty much identical specification however the minimal is preferred because again we have this declarative state that we can store in our code repository that being said there's a number of default values that this minimal configuration is going to take on and we can do better that's what this pod engine X better specification is is doing um many of these ideas and specifications are pulled from a friend of mine Brett fiser who has this GitHub repo podspec you should definitely check it out it provides uh some ideas on the a set of good defaults you can use within your pod and understanding which of these values matter and which of these values it's fine to take the the regular default for in this case we're specifying both the name and the name space so in the minimal configuration we didn't specify namespace and so it was deployed into whatever namespace our Cube configuration happened to set as the default because in my previous task I had set my default namespace to be this 04-- pods it was deployed here but otherwise it would have been deployed into the default namespace for our better spec we're actually specifying the namespace in our configuration so it's super clear uh and explicit there's no way you confuse which name space it ends up in also instead of only specifying a name in an image uh you can see I've specified a whole bunch more information here about the container I'm also using this uh image from chainu guard and chainu guard is an interesting company that helps build uh more secure container images if you are concerned about uh vulnerabilities and keeping your images secure definitely check out chuard for the types of images that they provide and you'll see that I've provided uh a listing of ports that this appc is listening on and so for something like enginex generally you want to have it serving up content on some port in this case 880 is the default so we'll be able to uh actually test this in a second here I've also specified that the Readiness probe that ku990 should use is just to make a request on Port 80 using that root path I've given it a resource limit and request in this case I'm specifying um a memory and CPU requests so this pod will be guaranteed at least 50 mtes of memory and 250 millor of CPU and I'm only specifying a limit for memory uh so this is important because if you don't specify a limit for memory and the Pod has a and the container has a memory leak it can consume more and more memory and eventually it will be competing with other containers and pods scheduled on the same node which can cause issues and kubernetes will start to evict some of your pods which can cause instability in your application we've also set the security context to disallow privilege escalation uh and set the privileged value to false as well as enabled the set comp runtime uh and some specific users and groups that are nonroot as long as our application and container image are designed to run in such a fashion these can just provide an additional layer of security such that there's no potential for a privilege escalation should an attacker gain access to this runtime environment we apply this in the same way we'll do T5 and now all three of our pods are running to show you uh that these are actually engine X and actually serving something we can do a command called cctl port forward uh so in this case let's start by port forwarding to the minimal pod which executes the cctl port forward in the Nam space that we're operating in the name of the Pod and then we're going to go from port 8080 on my host system so 8080 in my browser or uh command line to Port 80 within that container so I do Local Host 8080 we can see we get the default engine X page served up that just shows that it is running and it's doing what we expect however if you remember we didn't actually specify any ports here we can still port forward to that pod directly uh but we but it's much better if we specify the ports that it's expecting to listen on such that others looking at that configuration can easily read that and see oh it's listening on Port 80 similarly we can port forward to the other pod however in this case we're forwarding from 880 on the local system to 880 in the container that's because this image is running as a non-root user and that non root user cannot bind to Port 80 inside the container uh so slight Nuance there but just important to understand which ports the applications are listening on so that you can connect to them on the right Port having that defined within the spec here makes it very clear for other teammates to know this pod is going to be listening on port 8080 if I reload this we get the exact same thing those two container images are quite similar except that the chain guard one is a bit more minimal and likely has less vulnerability ities now finally before we move on I'm just going to delete the name space and conveniently deleting a namespace recursively deletes the resources inside of it so all of those pods that I created are now going to be deleted uh and so if I do K get namespace you can see that namespace no longer exists and K get pods that namespace doesn't exist so the pods definitely don't exist within it building up from the base level of the pod we can now move on to the replica set the replica set takes a pod definition and wraps it in another layer adding the concept of replicas it's going to take that definition of the Pod that you provide and then you can specify from one to n replicas and there's a controller within that Cube controller manager that's going to ensure that we have however many replicas specified instances of that pod running at all time you'll notice highlighted into green that the specification of the pod on the left Maps one one to one with the specification under the template section of the replica set so any of those options that we talked about before we can configure for the replica set and the way that you connect a replica set to its underlying pods is through labels we see that selector using match labels where you're having some key value pairs when managing these the replica set controller is going to look with pods containing those same labels so we need to make sure that whatever we specify in that selector we also add as labels on the Pod template these will get applied to the individual pod pods and that's how the controller can maintain the link between the replica set and the pods again like pods you almost never are going to manage a replica set or even interact with a replica set directly instead we're going to use a higher level construct deployment which we'll learn next let's jump over to our code editor create some replica sets and see how they behave I'll navigate to the replica set folder we've got a task file the first thing I'm going to do as always here is create the namespace so we created this namespace 04- replica set and then we use the cube NS command to set it as a default now let's take a look at the minimal replica set definition that aligns with what I showed on the slide here at the bottom within the template section we have a specification that contains the definition for our pod we've got a label that's going to be applied to each of those pods and then we have our match labels selector that's going to allow the replica set controller to find and create those pods so if I apply this definition we're going to expect one replica set to be created and then we're going to expect three pods because I have replica set to three to also be created by applying this file using the cube CTL apply DF command uh we see this replica set was created again I can do K get replica sets or just K get RS it shows our replica set it shows us the desired number of replicas as well as the current number of replicas and how many of those are determined to be ready based on the let's health check if I do a k get pods now we see each of these three pods have have come up based on that single replica set now an interesting thing here is I can delete one of these pods and you'll notice it gets this random hash at the end uh this is so that we don't have naming conflict it's going to take our name of our replica set and then append this as a postfix to each of the pods that it creates however watch this I'll delete this pod now I'll run kg get pods again and we have have two of the original Three running but the third was deleted the replica set controller noticed that the number of replicas was lower than the desired State and added a new pod to compensate like I said you're very rarely going to interact with replica sets directly you'll see them in your cluster because we're going to use deployments and deployments create replica sets uh but it's useful to understand what they are and the fact that what we're adding here are the number of replicas and how we use labels to connect the replica set to its underlying pods replica sets are great for for maintaining a static definition of that pod and keeping the number of instances that we want alive however in order to be able to change your pod spec by doing things like updating the container image within it maybe modifying the amount of resources that the Pod needs basically any change to the specification of that pod the replica set can't handle in it of itself for that we move one step further up this chain and create a deployment if you look at the definitions here again we can go from pod to replica set we see that spec within the template matches that of the Pod and then as you can see in this minimal example the spec of the deployment matches the spec of the replica set there are some additional fields that I'm not showing here uh in the deployment spec that you can use such as the strategy with which you want to handle rollouts and the revision history limit so how many ver how many older versions of a deployment you want to keep around but for the most part the specs between these two resources are quite similar the main difference being that the deployment controller adds this concept of rollouts and roll backs uh and so you can specify how you want the pods to change as you go from one version of the deployment to the next uh so for example if we if we update the image tag to a new version of our application we can specify do we want that to happen one POD at a time how many pods are we okay with being unhealthy at a time uh Etc and so this gives us all of the tooling that we need create our cap applications take advantage of that replica set controller to ensure that the number of replicas that we want are running and then take advantage of the deployment capabilities to smoothly roll from one version to the next or one configuration to the next you're going to use deployments very frequently for pretty much any stateless application that you're running on top of kubernetes you're going to reach for the deployment as your resource of choice it allows you to Define that podspec specify how many replicas you want and smoothly move from one version or configuration to the next let's jump over to the code editor and create a few deployments and see how they behave I'll navigate to the deployment subdirectory once again we'll create the namespace now we've got this 04 deployment namespace and we've set it as the default now let's apply this minimal deployment configuration and we'll apply the better configuration as well these are pretty much the same Concepts that we saw in the Pod lesson where the minimal specification only contains the absolute minimum set of values required whereas the better specification includes all of the additional pod configuration options we can do K get deployments to see all the deployments in this Nam space in this case we have our two deployments this one was just created and so only one of three pods were available at the time if I run it again now we see that both of the deployments are healthy we can do K get replica sets and we see one replica set for each so we created the deployment kubernetes in turn created an underlying replica set and then each of those created and manages those underlying pods so if I now do kg pods I expect to see six three associated with the minimal deployment and three associated with the better deployment great I mentioned this concept of deployments adding rolls roll outs and roll backs uh we can showcase this by doing one of a few things one we could modify the container image specified in the deployment specification and reapply it that would trigger an update to the deployment which would create a new replica set which would create new pods or we can issue the roll out restart command so we can say T4 this issued the cctl the cctl rollout restart on the deployment engine X better and I'm watching the cube CTL get pods command and we saw it cycled through those pods one at a time bringing up a new one deleting an old one bringing up a new one deleting an old one we can also look at the replica sets now and we see there's one replica set for the minimal deployment but because we've gone through this upgrade or this roll out it actually created a new replica set the old one Still Remains but now the number of replicas is set to zero such that it's not creating any pods these are kept around so that we could roll back if we wanted for example we could do the cctl roll out undo command and it's going to roll back to that previous state we'll see the number of pods in this replica set jump back up and the number of pods in the the current replica set go down there we go now as I mentioned the third the other thing that might trigger a roll out like this is modifying the Pod specification so for example if we went within here to the engine X minimal configuration and we updated this to version 1.27 and then we reapplied it with K apply DF uh on engine X minimal we can see that it is tearing down some of the old pods and creating new pods in their place this is a new pod that has been created with that new image these are still the old pods that haven't been torn down yet once that roll out completes we now have three new pods all running that new image this shows the power of this declarative approach because we can have this single definition of our application deployment we can modify Fields within that specification as we upgrade things as we need to change our configuration we then reapply that same definition kubernetes takes the new state and figures out based on its controllers how to reconcile that state with the state of the cluster such that we get the end state that we desire and by default it does so in that rolling fashion where we're replacing one po at a time so we never had to have any downtime because we had multiple pods running and it would only remove the old pod after the new pod was healthy we'll be using deployments quite a bit for the stateless components of the demo application in a future lesson now that we have a mechanism to host our applications using the deployment uh the next logical question is how to get network traffic to those applications and the answer in kubernetes is a service there's a few different types of services that we're going to talk about here but effectively a service will serve as a load balance spor across replicas of our application that could be only accessible internally or it can be accessible outside the cluster as well like the connection between the replica set or the deployment and the underlying pods the services are going to use labels to determine which pods to serve traffic to and there's three primary types of services that we're going to care about uh the first one is known as a cluster IP service this is the default type of service uh and will be accessible only within the cluster it's going to provide a stable IP address that will then Route traffic to any number of replicas containing the appropriate labels in most cases when we create a deployment we're going to create a service alongside of it to Route traffic to the pods within that deployment now a node Port is similar to a cluster IP except for the fact that instead of only being accessible internal to the cluster it's actually going to listen on every node within the cluster such that you could Route traffic from outside the cluster into the cluster and to your application and finally the service type load balancer will use that cloud controller manager component of kubernetes to talk to the cloud API for whichever Cloud you're running on and provision a load balancer within their system and that load balancer will be used to bring to bring traffic into the cluster it is important to note that these load balancers within each cloud provider generally have a charge associated with them so adding a load balancer service for every single application that needs external traffic would add up quickly there are other mechanisms that we'll talk about down the road where you can use a single uh point of entry either a node either a load balancer or nodeport and then Route traffic using either what's called an Ingress or the Gateway API and we'll talk about those in a little bit you can see in the service specs down at the bottom the main things that you're going to need to Define are the ports and protocols that the service is listening on the port is the inbound port for addressing the service and the Target Port is going to be the port in the Pod that it is connecting to you also need to have a selector and this needs to align this needs to map the label specified in your pod definition in the node Port specification you can see at the bottom I have commented out a node Port value you can specify an exact node Port it needs to be between 30,000 and 32,000 something uh if you don't specify a port number here kubernetes will pick a port within that range uh and automatically deconflict with any other services if we look at the diagram in the upper right you can see traffic can reach our cluster from the outside world uh that could either come into the cluster via a node Port Service uh or via a load balancer service and then the cluster IP service is only accessible from within the cluster so here we have application B talking to application a via that cluster IP service behind the scenes kubernetes is going to set up all the necessary networking to be able to route that traffic across nodes as long as they're connected to the cluster let's go ahead and jump to the code editor provision a few services and see how they behave uh we'll need to navigate to the service subdirectory and as you can see task was not found that's because I had restarted my computer between filming uh so I can just do devbox shell it's going to find that devbox configuration and start a new shell with all those dependencies available as always we're going to start by creating that namespace so now we have the 04 service namespace and it's set as our default next up we're going to apply this deployment configuration it matches the minimal configuration from the previous lesson with the main difference being that I've changed the labels a little bit just to Showcase which of these labels matters for which purpose uh in this case uh the key here for the label on the deployment is Fu the key for an annotation on the deployment is bar the selector is using the key baz and the podspec template is using the label baz so the important thing here is that these two match and then we're going to also have to use that same baz pod label key value pair in our service to Route traffic to this so we can do a cube CTL apply on that deployment file and it is creating an enginex minimal deployment within this name space those pods are all up and running now we're going to create three services one for each of our different types that I described I've added this warning here when we create the noport type Service uh unless we have our firewall rules or Security Group set up to enable traffic on that node Port Port uh we're not going to be able to reach the cluster accordingly so if you do want to use a node port to bring traffic into your cluster you'll need to update those rules to allow that traffic we can do K get service or just K get SVC and for this example I was using the cluster deployed onto sio cloud and so when we did K get service you'll notice the cluster IP has an internal IP address here that's a private IP the node Port does as well and then the load balancer has both an internal address as well as a public external facing address that we can use to access that service you'll see for the node Port it's listening both on Port 80 as well as on Port 31500 if I access that external IP for my browser you can see that traffic is routed into my cluster if we go into the web interface for coo under the networking and load balancers page we can see this load balancer was just provisioned uh we've got our public IP here that we were just using and now we could set up DNS and Route traffic to our service from the public internet and so this is highlighting the usage of that cloud controller manager where we're interfacing with a cloud provider to provision a res Source outside of the cluster but that integrates nicely with the cluster also while it's nice that it creates a IP address for us that wouldn't be all that useful from a service Discovery standpoint and so there's actually another service running within our cluster called core DNS and we can look at it in the cube system namespace so we can see there's this deployment containing two pods for core DNS and what that does is looks at all these services and sets up DNS that resolve internal to the cluster so that based on the service name and the name space we can create a DNS name that will route us to that service to demonstrate this I'll create a temporary pod in first I'll create it in the namespace that we're working in in the o4 service name space and then I'll curl the service at enginex cluster IP on port 8080 on Port 80 as you can see we get the welcome to enginex default index. ml return to us so if we're in the same name space we can just use the name of the service directly however if we exit this and we create the same pod but in the default namespace or any other namespace for that matter and we tried to run the same command now that's no longer going to resolve if we want to make a call across name spaces we have to use the service name followed by by the name space where the service lives followed by service cluster local now if we run this command even though we're making that call from the default namespace by using that fully qualified domain name cord DNS is going to resolve it to the service in our o for service Nam space this will be very important as you work with services and want your applications to address each other within the cluster I'm also going to run this demo on the kind cluster just to Showcase how to get the load balancer service working properly so I'm going to apply the deployment and load balancer service types so I'll do T1 create namespace t02 uh apply deployment and then t05 apply load balancer service if we did nothing else this load balancer service would be stuck in pending forever uh because kind out of the box doesn't have support for these load balancer type Services however if you recall back in lesson 3 when we were setting things up we had this additional task where we could run the cloud provider kind uh to enable those load balancer services so I'm going to open a new terminal here I'm going to navigate to lesson three and we're going to run that uh cloud provider kind command in this terminal we have to run it as pseudo and now if we do Kate service we can see that our load balancer running on the kind cluster now has this external IP address this isn't truly an external IP that's publicly accessible however it will be accessible from our host system so I can now access this from my laptop and ACC and access that deployment running in the kind cluster directly this is very useful if you want to have your uh local development cluster match the behavior of your remotely deployed Cloud clusters so the deployment was the right resource type for our long running stateless applications but that's not the only type of application that we might want to deploy onto kubernetes there are also applications that we want to run to completion uh and so that is where the resource type of job comes into play it's still built off a foundation of a pod but we're adding to the concept of a pod this idea of one or more completions for a particular container as you can see here in the example specification at the bottom rather than running something like enginex which is meant to run as a long running process I've instead modified it to use this busy box image busy box is a suite of Unix tools uh bundled up in this container image uh one of those Unix tools that I'm using here is the date command all this container is going to do is start up issue the date command which is going to print the current date time to the command line to standard out uh and then exit and then I've specified a restart policy as never such that kubernetes doesn't try to restart this container within the context of that job on the left hand side is a standalone pod and we can see that specification for the Pod translates directly into the spec template portion of the job specification I've also added a backoff limit to the job specification that tells kubernetes how many times if it were to fail on the first try how many times it should retry before it gives up on completing the job let's jump over into the code editor and create some jobs to see how the behave I'll navigate to the job subdirectory we're going to start by by creating the namespace and now first let's create this pod directly without a job you can see it starts in container creating State and then it's in status completed we can look at the logs for this pod by doing K logs and passing it the name and we see it printed the current date time to the standard out just like we expected however ever if something had gone wrong during the execution of this pod let's say rather than just printing something immediately and exiting it was a long running job that might take an hour or two hours or a day and during that time there was some sort of interruption if we had divined it as just a pod kubernetes would not retry it for us by using a job specification we can track completions and if one fails it will automatically retry so instead we can jump over and use this job specification so here we're doing Cube CTL apply on the minimal job yl we can list the jobs and we see here's our job it's in a complete status with one of one desired completions the default number of completions is one you can modify that if you want to have it run if you want to have it execute multiple times and we can see it took 3 seconds before it completed let's look at the pods this is the Pod we created Standalone manual and then this is the Pod that the job controller created based on on our job specification it shows zero of one ready because the Pod is already run and completed and is no longer executing now let's take a look at this additional job specification using some of the additional options that are available for configuration here I've set parallelism to two so it's going to run up to two copies of this pod in parallel we want two completions so it's going to run them until we have two successful completions I've specified an active deadline so this is how long a pod should be allowed to run before kubernetes decides that it is too long and kills it and then again we've specified a backoff limit saying retry up to one time and now I forget if this is inclusive or exclusive so let's actually do a k explain on jobs to check out the back off limit field specifies the number of retries before marking it failed so by specifying one it will retry one time and if that second if that retry failed it would Mark the job as failed as we can see the default value would have been six I've also modified the container spec here slightly uh we're we're using a chain guard image rather than the Upstream busy box image we're specifying limits for memory and requests for memory and CPU we're modifying the security contexts many of the best practices that I highlighted in the section where we covered pods now if I apply that better job we're going to expect two pods to create at the same time because of that parallelism and complete completion setting we can see that we have two of two completions and that pod is now done we're going to reach for a job whenever we have a specific task that needs to be completed one or more times rather than a long running process that kubernetes should keep up forever continuing down this path of running uh workloads that are expected to complete we just learned about the job and how we can use that to run a one-off task to one or more completions there are many cases however where you want to run a job like that but you want it to run periodically on a schedule that's where the Cron job comes in so if we see here we can create a Cron job and specify a schedule in the form of a cron string uh you'll notice that five character uh string Under the schedule key five asteris means run on every minute each character in that string specifies a different portion of the time that it should be run and so we're once again building up from the Pod we have our pod that's going to run to completion we wrap that in a job and then we further Define a Cron job which is going to execute our job on a schedule for learning more about how to specify the cron string or for figuring out the cron string that you care about I would suggest go into this website cab. Guru and as you can see in my example I had specified five stars this just means at every minute let's say we wanted to run uh once a day at midnight we could say run at 0 0 and this would run at midnight UTC time every single day as you can see you specify a different value for different fields within the string and that will give you a different schedule upon which the Cron job will run there's not much different about cron jobs and jobs except that cron jobs are going to automatically execute on the schedule whereas jobs execute only at creation time let's jump to the code editor create some cron jobs and see how they behave we'll navigate to the Cron job subdirectory we'll create the namespace so let's go ahead and apply the better Crown job definition we can see the crown job was created but if I do K get jobs there's no jobs yet we have to wait until the next minute occurs at which point a job will be spawned based on our schedule now that the next minute has ticked over we can see that the Cron job has spawned a job it's running it's now completed if we let this be every minute we would get a new job which would be created from that template which would run based on our specification however when debugging your schedule is usually not going to be as frequent as every minute you're probably going to run something daily or a few times a day and so it's not very convenient to have to wait until that schedule occurs in order to test that your configuration is correct there is a convenient option to create a job from a Chron job specification so we can do this and it's going to be the cube CTF create job-- from we pass it the name of our Cron job and then we give it a name of the job that should be created so in this case I named the job manually triggered if I go to k get jobs we can see here's the one that was automatically created from the schedule and here's the one that I just manually created now that another minute has passed if I do K get jobs again we can see two jobs have been created automatically and we have the manually triggered one that I created via the command line this can be a useful technique if you're actively iterating on a job and want to make sure that your updated configuration is working uh without having to wait till the next instance of that Chron schedule occurs another type of application that you might want to run on a kubernetes cluster is one that requires having an instance on each of our different nodes and so for that we use what is called a demon set the types of applications that would use this are things like uh log or metrics aggregation you often want to have one copy of your application running on each node where it can scrape those logs or metric and propagate them to some other system uh you might want to do some sort of monitoring of the node itself or if you're needing to do something like setup storage on those nodes having that application running locally is necessary or improves performance in these examples here I'm using a different container because using the enginex image wouldn't make much sense here instead I'm using a a container image fluent d uh fluent D is a log shipping agent in this case it's not configured to send those logs anywhere but I figured I should at least use an application that would make sense in this context on the left hand side we've got a standalone pod definition and then on the right hand side we see that pod definition Maps into the spec portion of our Damon set template just like with our deployment we're going to use the match label selector to align our demon set resource with the pods that it creates and manages and by default a demon set is going to run one copy on every node except for the control plane nodes you can modify this and for example have it run on a subset of those nodes based on particular conditions or you could modify it so that it does run on the control plane nodes if that were necessary let's jump to our code editor and deploy a Damon set to see how it behaves we'll navigate to the Damon set directory we'll create our namespace and now let's create our Damon set as you can see it created two pods I'm going to do KGET pods - wide and by doing so we get some additional information that's not included by default and the part that we care about here is the node that the Pod is running on as we can see this was in my kind cluster we've got our control plane node and our two worker nodes the demon set automatically spawned one copy on each of the two worker nodes but not on the control plane node as I mentioned before you're often going to encounter demon sets uh when working with things like log and metrics aggregation so for example if you were installing data dog into the cluster you're going to run uh their data dog agent as a demon set so it can collect the the necessary information from each of those nodes and that brings us to the fourth and final built-in workload type within kubernetes uh that is the stateful set so a stateful set is going to be quite similar to a deployment uh except that it is designed for stateful workloads and because of this each pod within the stateful set each replica is going to have a stable or sticky identity so rather than having a random hash at the end of the Pod names we're going to have A- 0-1 an ordinal within a deployment if we were to have used persistent volumes each replica would have shared a single volume however in a stateful set each pod is going to have its own separate volumes and then finally the roll out behavior is ordered so it's going to go in sequence from 012 Etc versus with a deployment there's no specified order in which kubernetes replaces each pod goal of the stateful set is to enable configuring workloads that require some sort of State management such that each pod in the stateful Set uh May behave slightly differently for example if you have a database configuration where one of the pods is the primary and the other pods are read replicas you could use a stateful set to handle that type of thing because each pod has that sticky identity and is maintaining its connection to its own separate storage volume you'll notice that the minimal example here on the right is a bit more involved than many of our other resource types uh that is because in order to Showcase some of the features of the stateful set I needed to add an init container and as you can see the inet container here is running a little bash script that bash script is storing a HTML snippet into a path on the file system and that file system path happens to be in a volume that is shared between the primary container and the init container and so by doing this the init container will come up it will store that HTML snippet containing the ordinal number of the Pod into the volume and then when the primary container comes up it will load that this type of approach of using an init container to store a specialized configuration for each replica based on which number in the stateful set it is is a technique that you can use such that you can have different configurations loaded into each of our replicas so that they can know whether or not they need to act as the primary read replica Etc or whatever configuration you're using for your application one thing to note here is in that volume claim template section we haven't covered persistent volumes or persistent volume claims yet those are coming soon but in order for this stateful set to run successfully that storage class name standard needs to exist in your cluster I believe for the kind cluster and the gke cluster that I've spun up for this course those have out of the box a storage class named Standard so it will work I think the COO cluster by default it has a storage class but it is not named Standard so you would need to update this to use the storage class that exists or create a new storage class named Standard to be compatible stateful sets could be a course in and of themselves uh there are as many types of configuration for a stateful application as there are stateful applications that exist and there are a number of shortcomings associated with the stateful set in kubernetes specifically a big one is that you can't modify many of these fields after it's been created for example you can't modify the resource request or storage size of that persistent volume after the stateful set has been created even though in newer versions of kubernetes you can resize a volume but the stateful set in order to main backwards compatibility only allows modifying very few Fields within its spec there are some approaches that projects have used to try and get around this they're outside the scope of this course and I think a deep dive on stateful sets is important if you plan to run significant stateful applications on top of kubernetes let's jump over to the code editor deploy a stateful set and see how it behaves we'll navigate to the stateful set directory we'll create our namespace and then let's apply that definition containing the init container so I'll load it here just to remind ourselves we're going to have three replicas I'm also explaining here that engine X is a bit silly to use as a example for a stateful application because generally it is run as a stateless web server or proxy but I'm demonstrating a common pattern where you would use an init container to pre-populate a customized config for each replica and then load that within the primary container to act accordingly there's an example within the kubernetes docs showing a similar pattern used for a MySQL stateful set to set up one replica as the primary and other replicas as read replicas an important field to call out here is this service name field now each stateful set is expected to have what is known as a headless service associated with it and so in this case I'm naming it engine X's plural and it's plural because it's going to be a headless service where we are able to address each pod independently the way that you create this headless service is to specify a cluster IP service but you specify the the field cluster IP set to none in this case it will not create an internal IP address that load balances across the resources instead we'll be able to address each pod directly via DNS inside the cluster so if I run the apply with init command we first created a regular cluster IP service that would route across all three of the pods in my stateful set we then created The Headless service service. engine access. yaml and then finally we applied our stateful set definition including that in a container when we first created it it's in a pending State we see that the first pod has come up and is running now the second pod should be coming up and then the third pod as I mentioned instead of a random hash in the name we get these sequential ordinal numbers as our postfixes and we''ve got our two ser Services here this is our normal cluster IP service if we were to Route traffic to that if we were to Route traffic to that it would get load balanced across all three of these replicas and then because we specified cluster IP none in the Headless service we don't have a cluster IP specified however this will set up the DNS required such that we can address each of those pods independently let's port forward to one of our pods and now if we access it from the browser and refresh we see hello from pod zero if we were to instead port forward to container number one and reload we get hello from pod number one and so the way that we set that up was within our init container we used the host name environment variable stripped out that ordinal and injected it into the statement which was stored here that file exists on this volume which is loaded by our primary container as well and so when Engine X comes up this is the default index file that engine X is going to serve as I mentioned we could spend hours on stateful sets alone I'm not going to do that in this course however I'll reiterate that if you do plan to run significant stateful workloads on top of kubernetes I would certainly do a deep dive on stateful sets and how to use them effectively and safely now that we've covered all the primary workload resource types we can start to look at how we would configure these for each specific environment one of the primary tools for doing this is known as a config map it enables you to have environment specific configuration and decouple those from the deployment and the container images that those are running there's two primary styles of config maps that you'll see the first one is to use property like keys and so in this case these Keys generally represent environment variables that you're going to inject into a container running in a pod or file like keys where you have a configuration file that your application is expecting to load at runtime I'm showing two examples here on the bottom left the first one is the property like Keys one where we've got a name a version and an author specified and then the second one is more of that file like key style where the key is conf. yaml and then the value is a multi-line string containing those same contents I've used all uppercase in the property like Keys as a convention if they're going to get injected as environment variables but lowercase or camel case potentially in the file light Keys version based on how you might expect to see a configuration file on the right hand side is a podspec consuming these two config Maps property like Keys config map is used in the N from field it's going to load all of the contents of that config map in as environment variables and then the file like Keys config map is added as a volume and mounted into the file system at/ Etsy config within that POD at that location in the file system there will be a conf. yaml containing those values let's jump to our code editor and see how this works we'll navigate to the config map subdirectory we'll create our namespace and as you can see I've got those same yl configurations that I showed on the slide here in my editor as well as the Pod that consumes them I'm going to go ahead and apply all three of those we can do K get config maps and see that we have the file like Keys config map the property like Keys config map and then a cube root certificate that was created by the system we can ignore that for now the Pod is up and running in order to show you that the file like config map was actually mounted into the file system I'm going to execute a cat command inside of that container to show us the contents of that file so this Cube C exec command passing in the name of the Pod the name of the container within the Pod so that was specified here and then the the command cat Etsy config com. yaml prints out the contents of that file from the container's file system so as you can see these data from the config map were injected into that volume mounted into the file system and accessible from our application to showcase the environment variables version we can run run a print n command inside that same pod so again we're using the cube CTL exec command this is going to pass a command into the Pod itself this is the name of our pod and the name of the container within the Pod and here's the command we want to run print environment is going to print all the environment variables active within that container we can see some things that would have been there already however we can see the three values that we injected author your name name your app name and version 1.0.0 as seen in our config map most common applications are going to follow one of these two patterns for loading in config at runtime either from a file or from environment variables this enables you to decouple your environment specific configuration from your common configuration that could be used across environments inevitably there's going to be some configuration that is sensitive that you want to protect with a higher degree of security than the rest of your configurations for these you can use the kubernetes secret resource now these are very similar to config Maps uh you're going to consume them in the same way VIA either environment variables or mounting them into the file system uh the main difference in their definition is that the data within them is going to be base 64 encoded uh it's important to note this is not a security mechanism it is in order for the secrets to be able to support binary data as well as string data just because it looks like it's a little bit scrambled when you view it that doesn't mean it's encrypted uh and you should still treat those encoded data as sensitive when you are setting up and managing a clust cluster you can specify an encryption configuration that tells kubernetes if or how they should be encrypted in the etcd data store some managed clusters are going to handle this for you others are not definitely look into these policies if you're are trying to meet a specific security standard with your clusters also because they're a separate resource type within kubernetes you can manage and control them with separate authorization policies than config Maps such that certain users are able to read them and certain users are able to write to them control that dependently of other kubernetes resource types I've shown two definitions of Secrets here at the bottom the first of which is using the first of which is using the string data type and so this is a convenience that allows you to Define your manifest in plain text as you can see I have the key Fu and the value bar and then when you apply this to the cluster kubernetes will base 64 en code the value on the right hand side I'm specifying the data directly and in this case the key is in plain text but the value is the base 64 encoded version of bar as you can see from the podspec on the right you consume them in pretty much the same way you do config Maps either as environment variables or as volumes mounted into the file system however I've shown an additional option here instead of loading the entire contents of the secret as an environment variable you can specify the specific key within that secret that you want to load in this case I'm referencing the base 64 data secret and loading only the contents of the fu variable and and those will get placed at the nvar from Secret environment variable the M from option that I showed for config Maps also works I wanted to show how you would pull a specific key value pair out of a secret and how you would map it to a different environment variable name within your container let's jump over to our code editor create some secrets and see how they work first I'll navigate to the secret subdirectory and create my namespace now let's start by creating that secret as shown with the string data I can do T O2 so that's applying this file uh you'll notice the type is specified as opaque that is the default type of secret and if I now get that secret and do o yl to Output it in yl and I'll pipe that to yq just so we get a nicely formatted value you can see that even though I'm specifying it in plain text here within the secret itself when we look at the value you see under data we have the keyue and then this value bar which is base 64 encoded if you ever want to get the value out of a secret and base 64 decode it you can use yq similarly so we can get the secret we need to pass the name of the secret we'll do output in yo format we'll pipe that to yq then we can get the dash data key now we can get the dash we can get the fu key within the data and finally we can pipe that to base 64- D and as you can see decoded we get our value back so if there's ever a secret where you need to look up the contents of that's generally how I do it we can create the same secret we can create a secret with the same contents using the data directly by specifying this value as base 64 encoded data in order to generate this value you might think oh I'll just do Echo and pipe that to Bas 64 however if you do an echo in the Shell it's going to add a new line to the end and that actually gets included in the base 64 encoding which will give you the wrong value when you're trying to use that password or something so instead it can be useful to use print F instead of echo to base 64 and code we can do T3 here I'm doing print F of bar piping that to base 64 watch what happens if I do Echo bar into base 64 we get a different value because there's actually a new line character being added by Echo and it's getting encoded here so we want to make sure not to have that happen so be careful when you are generating your base 64 encoded strings uh to not include new lines in them if we tried to decode this and then pass that to let's say our database or something we would get denied because it's the wrong password we could go ahead and apply this secret definition as well now we've got our two Secrets here once again I can get the value out by doing this great there's other types of Secrets Beyond opaque for the gener generic case where you're storing a password opaque is going to be the right type but there's also what is known as a Docker config Json type secret and this is used by kubernetes to authenticate to a container registry so if you have a private container registry you'll be able to pull images and use those images by specifying a secret like this uh as you can see it is using that file like syntax where we have a Docker config Json file as the key and then this is a base 64 encoded Json which will get used by the container runtime to pull those images so let's Echo this and we'll do a base 64 decode we'll pipe that to JQ and as you can see we get our output in Json format it contains a offs key and in this case it's a username and password for the dockerhub uh container registry just using a dummy username and password as an example uh this off block is again Bas 64 encode and just contains the username and password together rather than try and construct this with the exact right format and then base 64 and code uh there is a cctl command that allows us to create such a file and so in this case I'll do t7 here I'm doing cctl create secret I'm naming it Docker registry I'm selling it to use the docker config Json type and then as options on the command line you can pass in your email your username your password and the docker server so in this case it is the dockerhub server and it will create that secret for us so I suggest you can use this style of command to create it you could even do a-- dry run equals client yo and now instead of creating it on the server it's giving you this value effectively using the cube CTL command line to generate a properly formatted value rather than creating that secret imperatively this allows you to then store these values in your secret manager of choice for future consumption now we've looked at opaque secrets that's for arbitrary user defined data so think username and password for various sets of credentials uh we just looked at the docker config Json secret used for storing credentials to container Registries the other ones that are useful to know about are the TLs type so these will be used for store in TLS client and server data uh if you're setting up applications that you connect to over https potentially using something like C manager you'll notice that it uses these TLS type secrets and then also the service account token type secret uh kubernetes uses service accounts to Grant access to things within the cluster and the way that you utilize those service accounts are via these tokens those four types are the most commonly used the other ones exist but I don't use them all that frequently all right we're in the home stretch only a few more built-in resources that we're going to cover here the next two are related to how to get network traffic from outside the cluster into your cluster and Route those to your various Services uh the first one is called Ingress and the second one is a newer API called Gateway API we'll be covering them both here effectively Ingress allows you to Route traffic from a single external load balancer so this could be something from your cloud provider and those to a number of internal cluster IP Services there's a whole bunch of implementations of this API that you can choose from uh the most one of the most common is called Ingress engine X but there's many projects out there ha proxy Kong ISO traffic uh that implement the Ingress API and allow you to Route traffic in this way I'll call out that the Ingress specification only supports layer 7even routing so think HTTP and https but many of the implementations have layer 4 routing so TCP and UDP uh with additional configuration and often times those additional configurations are managed via kubernetes annotations and so if it doesn't have a field in the official schema the way that controllers get around this is by enabling you to provide custom configuration via annotations which you can provide as arbitrary key value pairs looking at the definition here this is the most minimal Ingress you can Define it's essentially saying route all traffic because I have the root path there and it's a path type prefix so anything will match that and it is targeting the enginex cluster IP backend you specify information about the request that should be routed and then where you want that to go in this case it's to that cluster IP on port number 80 looking at the diagram you can see that the traffic will come in from outside of our cluster it will hit a load balancer at which point it will be routed to the Ingress controller pod so that is a pod running internal to the cluster that pod is looking at our Ingress resources and setting up the rules accordingly So based on our Ingress let's say we're running Ingress enginex it's going to read our Ingress object and configure the the enginex configuration internal to that pod automatically to Route traffic accordingly based on our rules within our Ingress resource we can then use information such as the path to Route traffic between different services so service a or service B which will in turn send that traffic to our pods I'm going to go ahead and cover Gateway API here and then we'll jump into the code editor and show examples of both Gateway API is a newer kind of evolution of the Ingress API and it came from the fact that many people felt the Ingress API spec didn't meet their needs fully it's important not to confuse this with the generic concept of an API Gateway uh they have very similar names but Gateway API is the specific kubernetes API an API Gateway is the general concept of a cloud resource that can take API calls and Route them to various services so one is kind of just a general concept and one is the specific API this does add official support for layer 4 routing and it has support for more advanced routing scenarios built into the specification itself we can define something similar to the Ingress we saw on the previous slide where in this case it's an HTTP route which is under that Gateway API specification I'm matching a path prefix of the root prefix so just like before and I'm pointing it to that backend reference of the engine X cluster IP so this HTTP route is going to behave pretty much identically to the Ingress that we saw prior looking at the diagram you can see again traffic's coming in from outside the cluster and hitting our load balancer it's then routed to our Gateway controller that Gateway controller is monitoring the cluster for Gateway and Route resources and configuring its routing based on those resource rules and finally it's able to Route our Network traffic according to those rules to the different Services which then send the traffic to our pods Gateway API has been in progress and under development for a number of years uh it has it graduated to Beta in 2022 and it graduated to General availability in 2023 uh when I did a poll across Twitter and Linkedin asking the percentage of people that were using Ingress versus Gateway API to bring traffic into their clusters it's still quite split I think more people were still using Ingress uh just because of the fact that it exists and there's some cost to switching over um however if you were doing Green Field development it could pay to use to go ahead and use the Gateway API because that is kind of the standard moving forward let's go ahead and deploy some of these and see how they behave let's start with the Ingress so we can navigate to our Ingress folder we'll create our namespace and for this example I'm going to be using our my gke cluster uh one of the reasons is that it supports the Gateway API out of the box whereas the COO cluster does not we could still install a third party Gateway API controller however I'm going to start with Google Cloud here because it does support G the Gateway apis out of the box it's a configuration we had to set when we provisioned our cluster but it doesn't require any additional software installation we now need to create a service that is going to listen and reply to our traffic so I'll go ahead and create a deployment for that and then deployment is just going to be a minimal engine X deployment with three replicas and pretty much all default settings now let's apply our service the first type of service that we're going to deploy here in front of that deployment is a node Port service and the reason for that is that the the built-in GK load balancer type that Ingress uses has to be of type node Port so we'll deploy this service which is going to Route traffic to the PODS of that engine X deployment and as a reminder we're using the selector to match the labels on the pods where we're routing traffic 2 we also have an Ingress definition here and the way that we tell the cluster that we want to use the Google Ingress controller is by specifying this annotation so it is of kind Ingress but we tell it that it is using the Ingress class of GCE this GCE Ingress class is going to set up a external load balancer uh you could however instead use the GC internal class and that would set up a load balancer in gcp but it would not be publicly accessible it would only be accessible from inside your virtual private Cloud it is important to call out here that another way that we could configure this Ingress class and we will for another Ingress controller would be to use under the spec field the Ingress class name that's not supported for GK Ingress and so that's why we had to use this more outdated annotation approach let me go ahead and apply that we started here by applying that node Port definition and then we applied this minimal gke Ingress now I specified this host here when it's looking at when this Ingress is looking at inbound Network traffic it's going to look for this host to decide whether or not to Route traffic to this back end you could have this match all hosts such that all traffic would go there but in more I think it's more realistic that you're going to be routing traffic from a specific domain name in order to simulate this I don't own this particular domain but I can modify my local Etsy /host file such that when I when I make a request to this domain it will actually route it to the proper place let's go into the Google Cloud console and find the address of the load balancer that it provisioned we can see that Ingress we created show up here in the UI and it's in a creating Ingress status now as I click into it we see that it is finished provisioning and it's given us this external load balancer with this IP address so if I were to navigate to this IP address directly it doesn't work that's because we specified that host name in our Ingress resource so now I need to take that and update the IP address I have mapping to that Ingress example.com so let me grab the IP address copy replace this and now if I navigate to that in my browser you can see I get the default welcome to engine X page I don't have TLS set up so this is an HTTP request not an https however you can see how that traffic made it from my computer to the load balancer to the node port to the Pod itself now what if you were not running on gke or another cloud provider or you didn't want to use they're built-in in Ingress controller now the option there is to install your own Ingress controller into the cluster so in this case let's go ahead and install Ingress engine x one of the most popular implementations to do this I'm using a tool called Helm we're going to learn more about Helm in the next lesson however it's essentially a package manager I'm saying install the in Ingress engine X Helm chart this is where you can find that the definition for that chart I want it to go into the Ingress engine namespace if that namespace doesn't exist go ahead and create it and use version 4.10.1 it looks like it went ahead and installed properly it's giving us some examples here in the output of the command but if I go okay get pods now in the Ingress enginex namespace you can see this controller is running with that enginex controller installed I can now create another Ingress object in this case though I could use the same an ation style to specify which Ingress class but also I can use spec. Ingress class name and tell it to use inex my actual rules here are pretty much the same one difference being here that I'm just showing another option for the different types of path types that exist we used prefix in the previous example here there are some enginex implementation specific ones that you can look up in the docs of how that will behave let's go ahead and apply that so for this I created a cluster IP Service as well as this Ingress resource and the reason that this is a cluster IP is that because that engine x controller is running in my cluster I don't need to use a node port to access it I can access it directly over a cluster IP instead again I'm going to modify my Etsy host file to match the load balancer that engine X Provisions if we look in the Ingress engine X namespace at the services you can see it has created a service of load balancer type which then the cloud controller manager is going to go off and provision a load balancer for and so now we have a load balancer that will route to this pod and it will in turn route via the Ingress to our other deployment so let's update that file this is for Ingress example enginex we'll save that and now if we go up here and load this in our browser you can see the same result and so this is just showing we have two separate implementations of this Ingress specification one provided by Google one provided by the open source in Ingress enginex project there's many more choices but this just showcases how you have choices around which implementation of this API you actually want to use now each of those external load balancers that gka provisioned are going going to be charged per month in addition to your normal cluster uh compute and so I'm going to go ahead and delete those two ingresses just so I'm sure that those get torn down in the background and I'm not getting charge extra for them so I'll do K delete Ingress and I'm in the proper namespace already because when I created the namespace Cubs set it as my default so I'll just say all and you can see the two ingresses were deleted I'm also going to delete the Ingress engine X namespace because it also provision that external load balancer that should delete all the resources inside of that namespace including that service let's jump to the Gateway API uh directory and we can see how it behaves quite similarly but maybe slightly different we'll create our namespace and set it as the default so now we're in that 04-- Gateway API namespace we're going to create a deployment just like before the deployment is our basic three replica engine X deployment and just like with Ingress I'm going to show two different implementations here of the Gateway API specification the first one is going to be the buil-in gke implementation and if you'll remember let's just go look at our setup for gke if I look at where I provisioned the cluster I passed it this Gateway API equals standard uh if you didn't pass this at all it would not enable the Gateway API here I'm saying use the standard definition the two resources that we need to deploy to Route traffic via the Gateway API to our deployment uh first we need to deploy the Gateway and so in this case we'll give it the Gateway class name there's a few different names here that you can look up in the gka documentation effectively this is saying I want a gke layer 7 so layer 7 networking Global external managed load balancer so this is quite similar to the the external load balancer that was provisioned by the Ingress controller it is important call out here that the gke Gateway API implementation does not support uh TCP routes currently I think that is in progress um but it only supports HTP routes so that's important to call out and so from that perspective it does not fully support the Gateway API spec however I believe they plan to support that in the future so if you did use this likely in the future that support will be added you then set up a an array of listeners in this case I'm saying I want this to listen on Port 80 using HTTP route types and then from this Gateway I can specify any number of these HTTP routes which will have the actual routing rules that I want it to follow let's take a look at what that looks like note here the name of the Gateway is gke so in the HTTP route definition the parent ref here that's what it's referring to the parent ref is that Gateway resource you can see it contains pretty much the same information as the Ingress just moved around a little bit we're saying for the host name Gateway example gke I want to import I want to follow this rule where any request is sent to my enginex nodeport backend and again this only supports nodeport types for that backend service again let's go ahead and create it we can see the Gateway resource we can see our HTTP route resource once that Gateway is healthy we should see the stat the address and the programmed status change we can describe it you see it was added 37 seconds ago it was then synced a number of times right now it's effectively waiting for the controller to go off and provision that external load balancer let's go ahead and look in the UI under gateways we see it here it says waiting to be accepted not sure what that means okay it looks like whatever it was waiting on accepted it and it's now in a healthy status and we have that external IP address for the load balancer it shows here the address and it tells us that now it is programmed once again I'll modify my Etsy host file with the Gateway example gke entry we'll go ahead and load that in our browser and once again we're routed to our deployment perfect now finally just as one more example let's use a third party Gateway API implementation many of the ones that I found were only partially supported uh or in various states of working SL broken uh one that worked right away for me was the Kong Gateway API we're going to go ahead and install the Kong Ingress controller again we're going to use Helm to install it uh we're passing it the charts we're putting it in the Kong namespace and we're installing version 0.12.0 Additionally the Kong implementation uses different versions of the custom resource definitions um associated with the Gateway API we're going to talk more about what custom resource definitions or crds are down the road um for now just notice that I'm updating the versions of these to be compatible with this new controller if we look at the pods in the Kong namespace we have our Kong controller and our Kong Gateway the three resources that we're going to deploy alongside this to actually use this implementation are the Gateway class so in this case the Gateway class for the gke one already existed it was installed along with when we set up the cluster um here we need to tell it we need to create a Gateway class which is going to say which controller ought to be used and so you can have multiple of these controllers run alongside each other in the cluster and specify which controller you want to use via this Gateway class mechanism so I'm telling it to use my Kong Gateway class controller the Gateway itself is going to look very similar to the GK one the only difference is that now instead of specifying the Google specific Gateway class name I'm specifying Kong as my Gateway class name and finally I'll have an HTTP route which again looks pretty much identical the one difference being I'm routing to this cluster IP service because at this point the Kong control is running in the cluster and doesn't need to use that node Port whereas gke is sort of interfacing at the external load balancer layer and configuring it so it has to Route traffic to the cluster via no Port the Kong Gateway controller is already running in the cluster and so it's able to use that cluster IP I'll apply all three of those resources I've added the this little warning out of the box the Kong Ingress controller does not support the TCP routes similar to the GK one by default it's only enabling it's only using the HTTP type routes there's an additional documentation here on how if you want to use TCP routes you can set up the Kong Ingress controller to do so let's take a look at the Gateway we can see the new Kong Gateway is online it's given us an external IP address if I look at the services in the Kong name space we'll likely see a load balancer type that matches that yep see this Gateway is now using this load balancer type service perfect and let's look at our HTTP route and we can see it's been created here I'll add that entry to my Etsy host file so I can access it open a new browser tab paste it in and there we go I'll go ahead and delete this namespace and the Kong namespace just to clean things up and make sure I'm not getting charged for those external load balancers but hopefully that gives you an idea how to use Ingress and Gateway API if you're building something new I think I would suggest going down the Gateway API route it will future proof your system and allow you to take advantage of those newer features if you already are using Ingress and it's working fine for your needs there's no reason to switch over right away we showed how to use the builtin Ingress and gate API controllers on a managed provider like gke but if your managed provider doesn't include something like that let's say you're working with civo you would then want to install a controller of your choice I believe SEO clusters would have installed a traffic Ingress controller by default I disabled that so we could install our own uh but you can pick whichever controller suits your needs the next topic to cover is how we can store data such that it will persist across container restarts so containers are great by themselves they represent this ephemeral file system that every time we create a new container we're going to get a fresh new system if we want data to persist across those restarts we need to store those data in what is known as a persistent volume there are many different implementations of persistent volumes when you think of cloud providers these are often going to be the block storage devices that they offer so in Amazon it would be EBS in gcp it's going to be a compute engine persistent dis you can also use things like Network file shares as a persistent volume but essentially the persistent volume and persistent volume claims are kubernetes interface for creating managing and consuming storage that is going to outlive any particular pod one of the most important attributes of the storage are the access modes so one of the most common ones that you'll see are read write once and that is saying that you can mount this storage device with a single pod in a read write mode so if it's already mounted to a pod and you try to mount it from another that second one would fail this is how most of those block storage devices are going to behave there's a little Nuance here in that read write once you actually could have multiple pods mount it in a read write fashion if they were on the same node uh because it's associated with which node the container is running on and so there's a new version of this that is read write once pod which actually limits it to a single pod regardless of whether or not the other pods are on the same node there's also read only many and this says you can have multiple pods mount it but only in a readon fashion you would use this if you needed multiple pods to pull data from a single Source but the underlying storage didn't have a way to deconflict multiple rights at the same time and then readwrite many is going to be the case where you can have multiple pods across multiple nodes all mounting this volume and reading and writing to it at the same time those underlying implementations need to support handling those multiple rights in a consistent fashion these will often be things like NFS systems which are already designed to handle this type of use case but generally we'll have limitations in other areas such as the speed at which you can access data and then another very important configuration spec here is the reclaim policy for the storage class and this is whether or not when the persistent volume claim is deleted what should happen to the underlying persistent volume uh in the retain case that persistent volume and therefore the dis that it provisioned or the volume it provisioned in the cloud would remain versus if it's set to delete when you delete that persistent volume claim that underlying persistent volume will also go away so retain can be a bit safer but may leave some unattached persistent volumes that have no consumer in your cluster that you'll have to clean up manually if those data are no longer needed you can provision persistent volumes and persistent volume claims directly or within a stateful set you can provide a template that will provision them dynamically in the specifications here on the slide I've shown a storage class so these storage classes are going to be what Maps the persistent volume to an underlying implementation I'm showing here the one for the Theo implementation of a persistent volume that maps to their volume object and then I'm showing a persistent volume claim which uses that storage class to provision a persistent volume behind the scenes there you can see that access mode as well as the size of the volume that we will be provisioned in our diagram you can see that a pod is going to specify one of two things either a volume claim template in the case of a stateful set that will enable a dynamic provisioning of the persistent volume or in other cases a persistent volume claim which is provision directly that persistent volume claim is going to consume a persistent volume and so persistent volume maps to the underlying storage and a persistent volume claim is the Tie between that storage and the pods consuming it the Pod consuming it is either going to specify a storage class which can dynamically provision the underlying storage or for example if you're operating an on- premise environment in an on- premisis environment you could pre-provision the underlying storage and then Define a persistent volume that manually maps to that persistent storage this is a case where the underlying implementation across clusters is going to be different and so I'm actually going to demonstrate this on all three of our different clusters the kind cluster the COO cluster and the gka cluster so we can get a feel for how these behave the first thing that we're going to do is manually provision a persistent volume and a PVC as well as a pod to consume it using our kind cluster so I'll switch to Our Kind cluster I'll then create the namespace and now let's look at the definition of those in the kind subdirectory I've got this persistent volume definition I'm naming it manual kind specifying how large it should be the access mode and which storage class I can look at the storage classes in my cluster by doing K get storage classes as you can see I have one storage class that was created by default by my kind cluster using this Rancher local path provisioner so it's going to be a local path Within in my host system for this type of storage class I need to specify a path and so here this could be anything but I'm putting this sum path and container as an example and I'm specifying that it needs to be on a specific worker node in my cluster now my persistent volume claim is going to align is going to find that persistent volume using the match labels selector so as we saw I have this label here on my persistent volume and then the persistent volume claim uses that same label to find it I'm saying that I want to use my entire 100 megabytes that I provisioned again specifying my access mode and the storage class name finally I have a pod here that I'm going to create which is going to use the persistent volume claim as a volume and mount it in at a path within my container so let me go ahead and deploy all that with T2 as you can see we created our persistent volume we created our persistent volume claim and our pod to consume it we can look at our persistent volume we can see that it has been created and is in a bound status because we're consuming it from that po look at our PVC you can see that it maps to our manual volume and then finally our pod and it's running if we do- o wide we see it's running on the specific worker that I mentioned that is where we provision that manual persistent volume let's exec into that container and navigate to the Mount path for the volume if I cat that file hello from host we see high from number one when we set up our kind cluster that is one of the things I did within the kind config is put a host path this is a absolute path on my host file system and mounted it into the Container path at the same path I used in my persistent volume because of this this file that exists on my host system is now showing up inside this pod I can now create a new file here go up here to my directory on my host system and I see that file was just created so this is just showcasing how that host path volume connected between my host system and the container if you're working with a kubernetes cluster on premises you may be using something like this where you're creating volumes that map to paths on your physical host so that shows the manual creation of persistent volumes and persistent volume claims let's now look at how you can dynamically provision these in this case we don't actually need a persistent volume we can specify a persistent volume claim directly and use a storage class name such that kubernetes can go ahead and provision that for us you'll notice that we don't have a selector here because there is no persistent volume already in existence with a set of labels we want to match to I'm also finding a deployment here which is going to consume that persistent volume claim so here under volumes I have my persistent volume claim and the name that I gave it and again mounting because this volume type is readr once these pods can only run simultaneously on the same node if they were scheduled onto separate nodes one of them wouldn't be able to mount the persistent volume let me go ahead and deploy that you can see they were both scheduled onto the same worker which they had to be in order to be able to mount that same persistent volume and this is an interesting thing about how deployments versus stateful sets work with persistent volumes in the case of a deployment if you specify a persistent volume claim all of the replicas are going to mount the same one in the case of a stateful set they will each get an independent persistent volume claim now like I said stateful sets consume these slightly differ let's go ahead and look at how a stateful set would Define its persistent volumes so now rather than needing to create a persistent volume claim separate from the deployment instead we have a section of our specification called a volume claim template here you give it a name and you give it a specification and this will be used for each replica it will go off and provision it its own persistent volume claim using the information provided in this case I'll have two replicas and I'm creating my headless service just because every stateful set is supposed to have one with the deployment both of those replicas came up right away with a stateful set set they come up in series okay the first one is running the second one is running and if we look at the persistent volume claims now you can see for the for the dynamic PVCs from the deployment there's only one which is shared across the two pods for the stateful set each one gets its own individual PVC each of these will have a corresponding persistent volume and they're all bound to the four pods that are currently running hopefully that gives you an idea of the different paths for which you can provision these types of things you can manually specify them you can dynamically specify them and there's a different Behavior between those dynamically specified deployments and stateful sets let's also go ahead and deploy a version of the stateful set onto coo and onto gke to do this I'll switch to my coo cluster and I'll deploy T5 if I navigate to the COO subdirectory you can see I've got my stateful set defined here it looks identical to the one I deployed into my kind cluster with the only difference being the storage class name is different if I get the storage classes within this cluster you can see it has that SEO volume that was provisioned by default by the by SEO when the cluster was created it's using this coo CSI provisioner remember we talked about the container storage interface in the previous lesson this is where that comes into play where a different driver can be used to interact with the different underlying implementations of that storage provider if I look at the pods now we see it's in a container creating state we can see the underlying PVC here now if I go to the UI I can see that volume created within the SEO Cloud outside of my cluster let's see if it has fully spun up with the Pod still in container creating State let's describe it it's saying unavailable volume not available presumably that will resolve over time let's give it a little bit of time let me check again okay so now my first pod is running and my second pod is creating that should have created the PVC behind the scenes and so let me refresh here and see if another volume has been created there it is it should be spinning up and so because these behind the scenes are going off and provisioning another resource within the cloud provider it can take a little longer on my behind cluster I was just mapping to a path on the host file system and so it was very very quick in these cases I need to make an API call to the COC Cloud the volume needs to provision it needs to be attached to the cluster Etc so there can be a little bit more of a delay when provisioning these Cloud resources however they do enable you to scale much more because you're no longer tied to the specific storage devices on your system uh you're provisioning and attaching as much additional storage as you want via these volumes all right now both of our pods have come up they dynamically provision the underlying PVCs let's do the same on Google it's going to look almost identical but just showcasing how it would work I'm going to delete this staple set I'm going to go ahead and clean those up and we can move over to the uh GK cluster we'll create our namespace if we look at the storage classes here there are three different storage classes provided the standard which uses the kubernetes io Google compute engine persistent disc one that one is included in the kubernetes Upstream so before the CSI existed the implementation for these lived in the kubernetes project themselves uh and then afterwards the individual CSI storage drivers were created and so there's three classes this one uses the older implementation these two use the newer implementation as you can see the standard read write once is the default and so in my storage class name here I'm using that standard read write once option otherwise it should be pretty much identical to my SEO one let's go ahead and deploy it okay as you can see the first pod is already up so the volume creation on gke was quite fast it's creating that second one now and if we navigate to the cloud console and go to compute engine here are the two nodes in our cluster but if we go to diss these two are the discs attached to my two nodes and these two are the discs that I just provisioned for my stateful set awesome let me go ahead and delete my stateful set those are now gone there is a configuration on the persistent volume claim itself uh that defines what the behavior is when the underlying resource consuming it is deleted historically this would just remain now you can specify both when deleted and when scaled what will happen and so the when scaled makes sense in the case where you have a stateful set which may be scaled up or down let's say you had five replicas you scale to four what should happen to that fifth persistent volume claim uh and then the when deleted is if you delete the stateful set entirely in this case the default behavior is to retain if you wanted them to be cleaned up automatically you could specify it as delete as you can see I did not include this in my specification here and so it defaulted to retain and it stayed in place right so the the reclaim policy on the PVC specifies what happens when we delete that PVC should the underlying persistent volume and therefore the dis in the cloud provider remain or should it be deleted as well whereas the persistent volume claim retention policy specifies what should happen when the consumer is scaled or deleted let me go ahead and delete these two PVCs to delete the underlying volumes hopefully that gives you a sense of how to work with persistent volumes in the various different configurations and how you would use them to store and persist your data outside the life cycle of any particular pod as you host stateful applications on kubernetes understanding these behaviors is going to be very important to to ensure that your data are safe the next topic that we need to cover is how you can grant access to either applications or users to the KU API so each of these resource types that we've been talking about you can Define permissions associated with them and allow or disallow access based on those permissions for your applications to be able to make calls to the kubernetes API and interact you can grant these access on a per namespace basis or on a cluster wide basis and we're going to show how to do both of these as an example to demonstrate this I've written a very simple job in the bottom right here which runs a container using the cube CTL command line which is just going to is issue a cube CTL get pods command either in the default namespace and then I have a variant that tries to get them across all namespaces if you didn't include any additional information about a service account the default permissions would deny this access and this job would fail however we can grant this access by creating a service account in this case namespace pod reader creating a role which gives access to get list and watch pods in the default namespace and then binding that Ro and service count together via a role binding then within the template in our job definition we can use that service count and tell it to Auto Mount the service count token so that our Cube CTL command can use it and succeed I think this will make more sense as we create these roles and see how to behave when they succeed and when they fail in an actual cluster environment I'll navigate to the arbac subdirectory create my namespace first let's create that job with no service account and therefore no permissions to query the kubernets API if we take a look at what that job looks like it's like the one from the slide but we have not specified a surface account and we've not specified anything about that surface count token now if I do a k get pods we can see it's errored it's tried again and errored if we look at the logs the error message tells us that the surface account that's being used in this case the default surface account in my namespace is not allowed to run the get pods command if we want to Grant access to make that command let's first start by doing it at the namespace level so I'll start with this surface account called namespaced pod reader I'm also going to create a role called a pod reader and so a ro applies only within a namespace whereas a cluster R applies across the entire cluster and then I'm going to bind that podre reader roll to my namespace pod Reader Service account using this Ro binding finally within my job specification I'm now using the namespace Pod Reader Service account I'm specifying that it should Auto Mount the service account token this was a change that came into kubernetes a few versions ago where historically it would automatically mount a token associated with the service account just by specifying it now you have to set this to true if you want that to happen and then I'm issuing the get pods command within specifically the o4 rback namespace that I'm operating in so if I do T3 we created this we created the service account the roll the roll binding we created this job and then we also created this job here I'm modifying the command that I'm running so instead of only trying to get pods in the current name space I'm also trying to get pods across all the namespaces so if I do get pods now this was my first job that failed let me delete that just to clean it up we have the first one which operated only in the 04 rback Nam space and then we have the second one which failed retried and failed again and the reason we only see two copies is because we set this back off limit to one so that means it fails once it tries one more time and if that fails it stops let's look at the logs from each of these you can see the logs from the successful pod show the pods that exist within this namespace because that's what the command was running and if we issue this we can see it is not authorized to get Pods at the cluster scope because the role that we specified only had access within the namespace if we did need access at the cluster level we could use a cluster roll and a cluster roll binding so now I have another service account that I'm calling cluster pod reader a cluster roll which has get list and watch access to pods and this will apply across all namespaces and so you can specify any number of rules here within a cluster R here I'm granting access to pods but these could be any kubernetes resource or any custom resource which is defined down the road also these verbs I'm specifying here that it should have read access but if you needed to have WR access you would specify that in the verb section we then have our cluster roll binding tying together the surface count with the cluster roll and finally we have our job specification where we're referencing the cluster pod reader and accessing pods across all the name spaces so I'll go ahead and apply that we can see it is running it has completed it completed successfully and let's look at the logs here you see pods from across a bunch of namespaces looks like this one I hadn't cleaned up yet the current namespace and the system namespace and so it succeeded in getting pods across all the namespaces so if you ever need your workloads to access the kubernetes API and resources within the kubernetes API arbac and service accounts are going to be the mechanism to do that also while this course is not focused on administrating kubernetes clusters as you grant access to users of the kubernetes cluster arbac is going to perform in the same way where you're going to specify which resources any individual user or group of users is allowed to access and what actions or verbs they're allowed to take against those resources the specific mechanism for how you create a user account and how you map it to a role or a cluster role is going to are going to differ across different managed clusters but you will be using this rback system under the hood and now that actually brings us to the end of the built-in kubernetes resource types that I wanted to cover hopefully that gives you a lay of the land of the different types of resources that are available and how you would use them to build out your application architectures before we move on though I do want to call out labels and annotations we've used labels a little bit throughout this section specifically to tie things together and allow referencing between for example a deployment and the pods that it's managing but in general labels should be used for attaching key value pairs to resources that can be used to identify and organize kubernetes resources so that example where you're tying a service to a pod or a deployment to a pod or perf examples where those are identifiable labels that you have to attach to these resources you also can use labels when you're making your Cube CTL queries or querying the API server to filter the result often times you'll include a label specifying what application it is associated with so for example you might have the name of your application on it or you might specify a version that the application is running annotations similarly live within the resource metadata however they should be used for information that is non-identifying the kubernetes system often uses annotations to do things like store configuration details or deployment history we saw in the Ingress section that many controllers use annotations to add configuration beyond the default specification I just wanted to highlight the difference between labels which should be used for identifying information and annotations which should be used for non-identifying information and while annotation can be used to configure behaviors like we saw in the case of Ingress it's important to call out that they are arbitrary untyped strings with no schema enforcement and so it is very easy to make a mistake and fairly hard to debug those mistakes and so when using annotations in that way you'll want to just be careful that you're double-checking the values that you're putting in and making sure that they they're valid and perform the way that you would like them to now it may felt like we took forever in this section that's because there are a ton of built-in resources that kubernetes provides I did want to call out a few things that I am not covering here and I'm I'm leaving these as an exercise for the viewer they're useful to know about but I didn't deem them critical for this particular course and those are limit ranges a limit range allows you to as a kubernetes administrator govern the number of resources that a particular object kind can request to prevent resources from being deployed that take up the entirety of the cluster resources Network policies are a mechanism to control access from a network standpoint across different boundaries so you can say this set of pods are the only ones that are allowed to talk to this set of PODS it allows you to declaratively specify the set of allowable Network traffic patterns and so applying Network policies can be a great way to increase the security of your cluster by preventing connectivity from pods that shouldn't be talking to each other other anyways the next two mutating web hook configurations and validating web hook configurations are ways that you can either enforce specific standard about characteristics of the resources that you're deploying for example you could have a validating web hook configuration that required every deployment to specify its memory request and limit you could reject any configurations that did not meet those rules and so you can use validating web hooks to enforce specific standards that you care about and then mutating web hooks actually allow you to take a request to deploy some resource and make a change to it on the Fly this allows for some really interesting Behavior Uh for example you could automatically inject a sidecar container that is performing some functionality without the end user needing to know about it or include it in their definition the horizontal pod autoscaler is a tool you can use to automatically scale the number of replicas within let's say a particular deployment or stateful set based on metrics that you define so you could Scale based on the CPU usage uh that's kind of included out of the box or you can have custom metrics let's say you have some Q processing workload the length of that queue you could include as a custom metric and scale accordingly and then custom resource definitions are one of the more interesting pieces of kubernetes they essentially allow you to extend the kubernets API and Define resources that are not in this built-in set that can be used for the specific purposes of your application we're going to cover this a little bit in a later section because this is a really key aspect of many of the tooling that is built on top of kubernetes if any of these additional resources sound like they would be useful for your particular application I Ur you to go look at the documentation for them and figure out how to use them for your particular use case the final module in this Foundation section is going to cover Helm in the previous section we used Helm a couple of times to install various tools such as the engine X Ingress controller but in those cases I kind of gloss over the details and so I want to spend a little bit of time getting you familiar with Helm and how to use use it uh and what Helm charts look like so that we'll be able to use them effectively moving forward there could be an entire course on Helm I'm going to cover it relatively quickly here and just touch on the most important features you need to know about in order to use the tool Helm has become the de facto standard for Distributing software that's going to run on kubernetes uh many open- Source projects or thirdparty tools that you run in your cluster the way that you will install them is via a Helm chart it's kind of a combination of a package manager and a templating engine so if you're familiar with npm in the nodejs ecosystem or maybe your package manager for Linux or Brew for Mac OS uh it allows you to take a set of resources that you would deploy bundle them up and store them in a repository somewhere that other people can consume it also has templating built in which allows you to use a single Helm chart and specify a configuration that will be relevant for a particular environment so you can use the same chart and container images across different environments the primary use cases are going to be application deployment so as I described installing thirdparty tools into your cluster and environment management meaning configuring applications across different environments using the same underlying resource there is a command line tool called Helm that you will use to interact with Helm charts uh we're going to look at the install and upgrade commands as well as the rollback commands and then the diagram here on the right shows how all the different pieces of a Helm chart come together at the top level you have a repository this can either be a Helm repository or an oci open container initiative repository you can store Helm charts in either of those within that will be one or more Helm charts the helm chart itself is structured having some metadata as well as some templates and so those templates are where the kubernets resource definitions are going to live this will be things like deployments Services config Maps Etc and then in the metadata there's a values. yaml file this is going to be the interface with which you can configure and customize your templates so you're going to specify a set of values in that values file or pass them in at runtime and those will get templated in to your templates hence the name and rendered out as kubernetes manifests when we install a chart it's going to create an object called a release within our cluster and that release will have the rendered version of those templates and deploy them into our cluster now the templating when when you first look at it seems pretty simple here we've got a pod and we're substituting in whatever version is specified in that values. EML file so here you would use enginex and then the tag is going to be whatever version specified seems simple enough however these templates can get pretty gnarly pretty quickly uh this is an example I think pulled from the MySQL Helm chart and as you can see with all of those curly braces it very quickly becomes hard to interpret what the heck's going on some of these sections are only going to be included Ed conditionally others are going to be using templated values and the customization interface that chart authors are defining can get complex very quickly because they need to account for all of the different deployment scenarios that end users might want to have and that is just one aspect of working with Helm that can be somewhat of a challenge there's a number of features within Helm I'm going to cover four that I think are the most important in terms of templating uh the first one is how to reference metadata within your template you can reference the chart object the release object or the values object you can see here on the bottom right under app version we can actually reference capital chart. apppp verion and that's going to pull information from our chart itself and pull that into that templated value similarly we can reference the release as you can see in the name of my config map here I'm templating in the release name as a portion of the name that's going to get rendered for the config map and then with my values file I can reference any field that's defined so here I'm using values. config data or on the left I'm using values. environment and checking if I'm in the production environment or not variables are also an important concept on the left hand side I'm defining a variable called n short it's just going to be a shorter version of my environment so if it is production I set it to prod otherwise I set it to non-prod then within my templates I can reference that as you can see the nend short field pulls that template in references it and uses its value in the config map Helm also includes features for control flow so conditionals meaning I can have values that only render under certain conditions so for example on the left hand side when I'm defining that variable I'm using a conditional on whether or not the values. environment value is production on the right hand side I'm using a conditional to check if a feature is enabled or not we also can Loop over sets of values and the syntax for that is the range so here I'm Val I'm looping over some number of values in the config data field within my values. and for each of those I'm going to add a new key to this config map using these four Concepts you'll be able to cover most of the templating scenarios that you care about and you'll be able to understand and parse most of the templates that you read I've pulled out a few examples that we can run through in our clusters to get some hands-on experience with Helm charts both consuming a third party chart as as well as authoring some simple charts of our own now the first thing that I want to do is navigate to my Helm folder and then within this folder uh there's a postgres folder and a charts folder let's start with the postgres one this one will be consuming a thirdparty Helm chart so I'll create a namespace just to isolate things then the first thing that I need to do is add the repo now before we do anything else let's confirm that you have Helm installed this should be installed by devbox here I'm using version 3 .1 15.0 this is Helm version 3 which runs only as a client side tool many years ago there was a previous version of Helm that had a component you installed into your cluster called tiller I think that's old enough now that any chart relying on that should be deprecated so we're not going to cover it in detail but just know that moving forward you should be using at least Helm version 3 and it only operates on the client side the first thing thing that I'm going to do is issue the helm repo ad command hand uh this adds the repo to my set of Helm repositories in this case I'm referring to a public repo hosted by bitnami which is a company that gen that develops many uh Helm charts and referencing their repository here as you can see I had already added this you should add it to your Helm configuration the next thing that I want to do is issue the helm search repo command on the bitnami repo and the postgresql chart specifically asking for the different versions that are available so as you can see this chart has been upgraded many times as well as the application so this is the version of postgrad that it's going to install and this is the chart version associated with that so as you can see there's many versions available we're going to actually install one of these newer versions but not the newest version so that we can show the process of an upgrade I mentioned that there's both Helm repositories and oci repositories I just want to showcase there's a slightly different approach you use when interacting with oci repositories when you're looking at the documentation for a particular Helm chart it should tell you whether or not it's a Helm repo or an oci repo some tools actually offer both in this case I can log into an oci repository by issuing the helm registry login and here Docker Hub is actually an oci registry you'll use your dockerhub username and password to log in great so my login succeeded and then we can't actually use that Helm Search Command against an oci repository there is a c I that was installed via devbox called oros or ORS I'm not sure how to pronounce it and you can use this to interact with these OC repos directly here I'm asking for the set of tags within the dockerhub bitnami charts registry for the postgressql chart this should be the same set of tags that are available via the helm repo because bitnami hosts both as their own Helm repository as well as an oci repo on dockerhub now if you want to view a local copy of the helm chart if it's hosted on GitHub or something you can go find the repository and look at it directly there but you also can pull the chart so I can issue a Helm pull command referencing the repo and the chart and a particular version and you'll see within my postr SQL directory I now have this tarball which contains the contents of that chart I can unzip that by issuing a tar dxcf and then the name of my tarball and this will produce a directory called postgres which has the contents of the chart so if I look at the chart. here is the metadata associated with this chart you can see the version you can see the name of the chart you can see who is maintaining the chart additional keywords for Discovery purposes the application version so this is the application this is the postgres version included in the chart Etc we also can then look within this templates subdirectory and we see that if we deploy this it's going to deploy some combination of a bunch of ro-based access control things it's going to deploy a stateful set a service some monitoring capabilities a network policy a configuration each of these is going to have all sorts of templating built in this Helm chart also has the ability to deploy a backup Crown job to help you back up your database and so by pulling it you can explore locally and also look at the values. file so this is going to contain all of the available configuration options that that we can modify for our specific purposes so we could pass in things like what we want the password and username and default database to be we could store those in an existing secret so that we don't have to pass them directly in via the chart they could be stored as a secret within our cluster and just reference those if we wanted to override the image that was being used instead of using the default image let's say we had our own version or we're hosting it in our own uh container registry we could do so here and so there's all sorts of values that you may want to override hopefully the chart authors provided a good set of defaults but you should always review the available configurations and see if any of them need to be set to meet the needs of your deployment there's also this values schema file this is not a required file but it can be very useful to help enforce a schema on the values of the chart otherwise people can specify uh anything they want in the values file and let's say they had a typo without any validation that typo might go unnoticed and just not impact the deployment whereas if you include a schema Helm is able to enforce that schema in addition to the values. emo file in the chart itself or the value schema file you can also issue the helm show values command and that will also specify the available values without necessarily needing to pull the chart okay that's a lot of Preamble but let's go ahead and install this Helm chart in this case we're going to issue the helm install command passing it the name of our chart passing it a version and here I have two versions specified in my task file I'm starting out with version 15 15.4.1 uh and then I'm going to upgrade in the next step to 1542 just to show you how that upgrade process works I'm telling it to create the namespace if it doesn't exist and then I'm passing it a values file if I go to the values. yl I'm giving it a common annotation that I want to apply to all resources and this is going to be substituted in Via those templates to apply this annotation to all the resources that the chart is deploying it has some helpful information here that is being printed out after the deployment was successful it's telling us how I can access this on DNS within the cluster so this is the name of the service this is the name space and then service cluster local so we can see it created this cluster IP service behind the scenes we can see that it stored our password in a secret here and because I didn't specify a password I believe it generates One automatically a random one let's go ahead and do K get pods and as you can see we have a single replica from that stateful set that's up and running I'll do kg get all in my current namespace and you can see we've got our stateful set we've got a single pod and we've got a service as well as a headless service now let's see if this common annotation got subbed in and applied to my resources so if I do K get St set- o yaml pipe that to yq and here is my annotation that I provided via that values. yaml file that was injected into the stateful set itself as you can see it was also injected into my service so I just wanted to Showcase how setting a value and passing it via that values. AML file gets applied into the various templates otherwise this was a completely default installation it's using my Cube config credentials and using the Nam space that I configured as default which in this case was my 05-- poster SQL namespace so what if I want to upgrade this I can issue a Helm upgrade command whenever I do a Helm upgrade in in command I generally also include this D- install and that's because this allows you to use the same command whether it's the first time you're installing it or if you're upgrading uh if you didn't include this-- install flag and the helm release did not exist yet it would fail and so I just use this one command to always install or upgrade here everything is the same as before my only difference is that I'm passing in a new version so before we had 15.4.1 now I'm passing in 15.42% to have been successful it's also useful to know that Helm stores its history of the releases within secrets in the cluster so if I do K Secrets now in this in this namespace I can see I have two releases I have this initial install and then I have this uh upgrade and so this is how as a client side only tool it's able to store some history such that we can do things like roll back so let's say the new version we upgraded to didn't perform as expected we needed to roll back to the previous version we could do Helm roll back postgres and this is the name of the release so if I do Helm list we can see this is my release that I have in the cluster and so I'm doing Helm rollback release it's going to take my current version and go back to the previous one and you can see it's now referencing the original version 15.4.1 that I had deployed into the cluster now I use this command already but if you do Helm list and specified namespace this is going to show all the releases within that namespace I can also ask Helm to provide me with this specific set of values that were provided for that deployment it's going to provide me with the specific set of user supplied values the rest are going to use the defaults but these are the ones that I provided via my values. file at runtime so this is very useful command if you need to figure out the configuration that was deployed hopefully those are all stored within your git repository and are up toate but this can be a way to validate that the deployed version matches your expect and then you also can run the helm get manifest command and this is going to look at the deployed release and give you the set of rendered out manifests so taking all the templated values and rendering out the actual deployed versions that you want to have in the cluster and this is a great way to see all of the Manifest that that Helm chart is deploying in their rendered form now finally let's say you are no longer using whatever tool you had installed with Helm there is an uninstall command that's just Helm uninstall and then you pass it the name of the release Here the release is uninstalled and if I do K get all in my current namespace those two Services the stateful set and the corresponding pod were all cleaned up awesome so that covers kind of the types of interactions that you'll have with a third party Helm chart how to access the repo how to install the repo how to look at the different configuration values that are available and then how to interact with the releases on the cluster side once you've deployed things let's take a brief look at how we might author some charts of our own so I'll navigate to the Chart subdirectory I'll create a namespace now I've included a couple of things here um the first one is this what I'm naming Helm create unmodified there is a Helm create command so I can do Helm create Fu and it will create a new chart with a bunch of default values and all of the necessary boiler plate for using Helm uh I'll go ahead and delete this Helm create unmodified is exactly that I ran Helm create Helm create unmodified and It produced this chart so if you wanted to explore the uh set of default configurations that it provides it showcases a bunch of the different features of Helm how you would use them uh Etc you would then go in and take these templates and customize them to meet your needs I've also included a super minimal Helm chart that doesn't include all of the boiler plate but it's just enough to get started and showcase some of the different uh templating options that we have I'm calling this Helm chart chart minimal you can see here in the metadata chart. yaml we've got the name a description um there are two Helm chart types there's application type and so if your Helm chart is deploying an application that is generally the one that you're going to want to use the other one is a library type and so that is if your chart is defining resource Primitives that are then going to be consumed by other charts so you can have a dependency hierarchy where one chart depends on another and so there are Library charts that are not meant to be installed in and of themselves but are used by other charts to provide kind of foundational resource types let's look at the default values yl file here here I'm specifying a an environment as well as an array of config data and then in my templates directory I have two things I have a helpers file this is the same variable definition I showed on the slide based on the environment in my values file Define an N short variable so it's either going to be prod or nonpr and then within my actual templates that are going to get rendered I just have have a config map and this config map is going to be named based on my release name it's going to include an app version it's going to include the environment and put quotation marks on it it's going to include that end short variable and then it's going to Loop over the set of values in my config data and decide based on those values whether or not they should be included in this config map or not also in the notes.txt file that you include in your templates this is going to get rendered out after you install or upgrade the chart and so this is where you can provide that helpful information like the postgress chart gave us information about the service or the or this the password Etc kind of getting started information is what you would put here now I want to install it with the default values so to do that I'm going to do Helm upgrade install like I said I use this even when it's the first time I'm installing something because it's valid in either case I'm passing it the name of the release that I want to use and then here I'm passing it a relative path to where the chart lives in the file system so I haven't this chart to any repo so instead of pulling from a repo I'm also able to reference a local chart within my file system you can see that the release was installed in my 05 chart namespace and we got that note from my notes.txt telling me which environment I deployed you see it rendered out that M short because my default values file was production we deployed prod let's look at the config maps in my namespace to see what was actually deployed okay we've got our config map data here we can see that included the app version the environment and the M short in this case it was PR and then also based on this Loop here so this enabled key was true we got a conditional key with this string as the value conditional key this will be included let's go ahead and deploy our chart again using our alternative values file here now because it's the staging environment we'll expect the M short to be nonpr we'll expect this conditional key not to show up because I have this enabled false meaning that this conditional is going to skip over it the way that you pass a specific set of values again it's the same command here except now I'm passing a values file as well you can see our notes was templated out properly and we can get our config map and look at the values so now we no longer have any conditional key from our config data and we only have these three keys because this enabled field was false awesome I'll go ahead and clean up that namespace and so that is a speedrun of Helm and using Helm one thing we didn't cover are Helm hooks Helm hooks allow you to specify resources that should be deployed either before you install something or before you upgrade something or afterwards often times there's some sequencing that is required when you're rolling out applications let's say you need to do a database migration that you might put as a Helm hook that is deployed before install or before upgrades or there could be some cleanup actions that need to be deployed after those could also be a different hook and so just wanted to call out that hooks exist even though we didn't cover them here but hopefully that gives you an idea of how you'll use Helm which will be a very common occurrence as you install third party tools into your clusters at this point we've built out our understanding of the fundamentals and the Baseline knowledge necessary to start using kubernetes effectively now we need an example application that aligns with what you would see in the real world and so in this module I'm going to Showcase a bare bones but representative application that we can use to apply all of the knowledge we've learned thus far into deploying it onto our kubernetes clusters if you took my Docker course this demo application is going to look very familiar I've made a few small tweaks uh but it is mostly the same base application and that is a three- tier web application we're using react on the front end we'll have two separate API implementations one in node.js and one in go and then that will talk to a postgress database on the back end the two new elements here are that I've added one additional service and that is a python load generator that just spins up and calls the apis repeatedly and then additionally I'm storing each request to these apis as a row in a database table such that we can keep track of the number of requests that have been made to each of them on the right hand side you can see what the UI for this application looks like it takes the responses from those two apis and displays them on the page we get a timestamp from the database as well as the number of requests that have been made to each of those apis because this course is not focused on application development I'm not going to go too deep into all these configurations but I am going to quickly walk through the applications and show you how to run them so that you're familiar with the underlying implementation the first thing that I'll do is navigate to our module 6 directory and we're actually going to start at the bottom and work our way up so the first thing that we're going to need to do is deploy our database I've got a number of tasks here in my T task file to do so first thing that I'm going to do here is execute this postgres run postgres command and you can see it's issuing a docker run command and passing the environment variable of postgress password we're setting that to Fubar baz we're creating a volume to store the underlying data and we're mapping that to this path within the container which is where postgres expects its data to be stored and finally we are connecting Port 5432 on my local host on my system to Port 5432 in that container and this is the image that we're running postgres version 16.3 with an Alpine base image okay I'm going to leave that terminal up and running and open a new one now as I mentioned I'll be storing information about each request in a table in the database and so I need to create that table and the schema for it I have this migration file here create users table up and create users table down so an up migration is going to create it and if something went wrong I would run that down migration in here you can see I'm going to create a table named request in the public schema that has two different columns the First Column is the created at column which is going to be of type timestamp and it's going to use the timestamp when any given row is created and the second column is API name and that is going to be either node or go depending which API is calling now to run this I can issue this I can run this particular task which as you can see first I'm getting the container ID by doing a Docker PS command and filtering down to the particular image that I'm running I'm then copying this migrations file from my Local Host system into that container and finally issuing a Docker exec command using that container ID I grabbed in the first step using the psql command line with the user postgres and executing this SQL script based on the logs it successfully issued this create table command and now that should exist in my postr database now with my database running and with the schema created I can spin up my two backend applications so we'll move one layer up the stack to our backend apis the go API lives in this API goang subdirectory the dependencies are defined within this go.mod file I'm using two top level dependencies the Gin API framework and the PGX postgres client in order to install these dependencies locally I can run the install command which under the hood runs go mod tidy in this case I already had all these dependencies installed so nothing happened if you ran this for the first time it would install the dependencies onto your system now I can run it with I can then run the application with the Run task which behind the scenes passes in a database URL containing the credentials for my database as well as where on the network you can find it because I'm running postgres in a Docker container map to a port on my Local Host I can connect to it at this address and then finally calls go run main go which is going to compile and run the application it's now listening on Port 8000 so let me go ahead and access that you can see it g gives me the current time from the database and the number of requests I've made so each time I refresh this we get an updated time as well as an incremented request count great in the console we can see it logging out these 200 responses are successful API responses and because I don't have a favicon defined it's giving me a 404 when the browser tries to request the favicon looking at the source code briefly my main function is very simple we load in that database URL either from an environment variable or from a file we then initialize a client to connect to the database and finally set up two API endpoints the first one is just the root endpoint so when I when I request that API with no additional path we do two things one we insert a row into the table saying hey a request was made and then two we load the current time and select the EST count from the database if I jump into these two functions the first one we're inserting into the request table the API name here is going to be go and the time stamp is going to be picked up automatically we then issue this function where we select the current time and then we count the number of rows where the API name matches my current API name so that's all there is to it with this API we also have a health check endpoint that checks if it can make a database connection and if so it returns a 200 we can now jump to the nodejs application I'll leave this running and create a new terminal navigate into the API node subdirectory start my devbox shell and then we're going to install our dependencies this calls npm install okay our dependencies should be installed now and then we'll call npm runev passing in again our database URL containing the connectivity information this is now listening on Port 3000 let's just validate that it's working okay we get back the current time and the number of requests as we request more and more the request count climbs and as you can see request count is independent between the two apis it's what we want we're seeing the responses as before like with the go Application 200s on the API requests and 404s on the favicon because there is no fabicon 44 meaning not found looking in my package.json you can see that I'm using Express as my API server Morgan is a logging utility and PG is a postgres client so we now have our postres instance and our two backend apis running let's set up the react client that lives in the client react subdirectory start my Dev boox shell we're going to install our dependencies runs an npm install and then an npm run here we don't have any c credentials required it's calling our backend apis which don't have any Authentication we can navigate to our browser and see here's the application it's making a request to both of these apis and returning the results to the front end as I refresh the page each one is going to increment a single time because they're both making a single call looking at the source for this it is a very simple react app we have our top level app.jsx it has a function current time which is going to be used for both of those components we have two instances of that current time component with two separate apis the first one is calling our go based API the second one is calling our node API the data from those get returned and populated into this react component there we go now the final service here is written in Python and it is just going to make repeated requests to one of our backend apis let me create one more terminal I'll start my devbox shell we're going to install our dependencies now I'm using a package manager called poetry to help manage my python environment here I'm calling poetry install no root in this case all of my dependencies have already been installed so that works fine and then finally I'll issue my poetry run command and in this case that behind the scenes calls poetry run python on my main.py and specifically is calling Local Host 8000 so if you remember 8000 was the port that my goang application was serving on and we can see in this terminal over here the goang application now is getting hit repeatedly by that load generator if I load my page in the front end you can see the goang API request count is climbing rapidly whereas the node API is stable until I make another request we've made 300 requests to the goang API it's now up to 340 if we take a look at the source code of the load generator it's very simple within our main.py we're importing some dependencies we have two things we have our load generator function so this is just going to Loop forever until this terminate value is false for each of those Loops we're going to try to make a response to the API that's provided we're going to log some info to console and if for some reason it fails we will we will catch that exception and keep trying then we sleep for in this case a configurable number of milliseconds and then the only other piece of this is handling termination signals meaning if we want to kill this application with a control C it will catch that signal it will catch that signal log the fact that it has done so toggle the terminate value to True which will end this Loop and the process will terminate here at the bottom I'm just setting up some basic logging loading which API I want to call as well as how long it should sleep between iterations setting up my signal handling and then calling my run load generator function at the bottom and that's really all there is to it while each of these Individual Services is quite minimal they cover a variety of different languages and many of the different types of configuration that you might want to handle within a microservice based application that you're deploying onto kubernetes before we can go about deploying our application onto kubernetes we need to build our container images that kubernetes is going to use while I'm not going to do a deep dive on building container images like I did in the docker course I'm going to go over them briefly and showcase how we can build and push them to container Registries here as a quick reminder the general way that we Define a container image is via a do file this is a text file containing essentially the steps of instructions required to build out the application you're usually going to start with some base operating system install maybe your language runtime if you're running in python or node.js you'll install your application dependencies copy in your source code and then set up the necessary commands to run the application once we've built an image we then need to store it somewhere that our cluster can use the place that we do this is called a container registry Docker Hub is a great example of on but there's many others including GitHub container registry many of the major clouds have their own container Registries Etc so you'll build these images either on your laptop or maybe within a continuous integration server you'll push those to a registry and then our cluster as a production environment is going to pull an image from the registry with a given configuration I'll call out here that it is important to think about the Build architecture of the images your development system may or may not match the architecture of your staging or production servers for example if you're building on a Apple silicon laptop but you're deploying to an x86 server by default those two would be incompatible however there is a way to build multi-architecture images where you're going to emulate these other architectures during the build such that the container image can be run across multiple different architectures for each of our applications I've set up one version that's using Docker build that's going to be a single architecture that matches the architecture of your development system I've also set up a build X command which is going to allow us to build multiplatform images across a variety of architectures in order to run on kubernetes we do need to build all of these into container images let me stop all of these processes and we'll take a look at how we can build the container images that we're going to use in our kubernetes cluster if you want a deep dive on how we wrote the docker files and optimize the build process you should go check out the docker course but in this case I'm just going to quickly run through them and showcase how we can build them and push them to a registry in the case of postgres the database that we're going to run in kubernetes is going to use that Public postr Image however we do need to be able to execute our migrations against that database to do that I'm starting from this base image which is using the goang migrate package it's a CLI and goang Library used for running database migrations and from that base image I'm just copying in all of the migrations in my migrations file so very simple Docker file in order to build that I can run T build container image which is going to issue a Docker build- T here I'm tagging it with a dockerhub container image name if you wanted to build and push this to your own container image registry you would need to replace this and then I'm passing it the context for the build which is my current directory looks like that built successfully in today's world where many people are developing on Arm based architectures because of the Apple because of Apple silicon or potentially deploy to Arm based servers it can be useful to build multi-architecture container images that can be used against either amd64 or arm architectures in order to do this we can use Docker build X the first thing that we're going to want to do is bootstrap a build X Builder to do this we're going to issue the docker build X create command we're going to give it a name we're going to tell it to use the docker container driver we're going to allow it to connect to our host Network and then we're going to specify it as the default Builder looks like that successfully and now we're configured to use that when we issue Docker build X commands now if I go back into my postgress directory I also have this build container image multiart task and we can see it's issuing a Docker Docker build X build command passing two platforms both the Linux amd64 as well as the Linux arm 64 passing in the image tag and then automatically pushing it to that registry if we go to that registry on dockerhub you can see it pushed the foo bar as tag just a few seconds ago great and it has both of the architectures that we specified available so you could pull this image and run it on either type of system if we go to the go API I do have a Docker file here specified this is the one that I used in the docker course however there's a really cool tool called Co KO that allows you to build goang applications without needing to build or optimize your own Docker file so in this case when I issue the build container image task it calls co- build with a couple of options enabled in this case platform all is going to do both arm and amd64 and I'm passing it the name of the docker repo that I want to use and then without it even needing a Docker file it will go off and build my application we can go to dockerhub and look at that repo and see in this case not only did it build the amd64 and arm 64 it also built an arm V7 a ppc64le and an s390x architecture so with that one simple Co command and no Docker file at all I was able to build for all these different architectures and and you can see it's just an 8 megabyte image so it's it's fairly optimized for nodejs this is going to be the same Docker file that we used in the docker course uh we've got our base image which is pulling from an official Upstream node image we're setting some environment variables installing our dependencies copying in our source code and then issuing the command to start the application we have the two tasks again the first one builds a single architecture image using the docker build command and the second one uses our build X Builder to build the multi-architecture version now it is important to note that you can't use build X to build the multi-architecture version without this push command and so if you didn't have a remote registry on dockerhub or elsewhere set up you could a local registry using the docker run command on the registry image listening on Port 5000 and running in the background you could then use this registry and tell Docker X to push your multi-architecture images to this local registry so that can be a good way to validate your process without having to have a remote registry authenticated okay that was the API node one within client react we've got the same deal our Docker file looks quite similar to our nodejs application we're starting from an official Upstream image we're installing dependencies we're copying our source the one difference is that instead of serving this as a nodejs application we run the npm build command which outputs our static HTML JavaScript in CSS and then we're going to serve those from an engine X container you could serve them from a container like this or you could distribute them via CDN finally for our python application this one did not exist in the docker course so I wrote this Docker file just for this purpose you can see we have a multi-stage Docker file we're starting from this first stage where we're going to generate the requirements needed in our application to do that we install poetry we then use our P project toml and our poetry. loock file to generate a requirements.txt file then within our second stage we copy that requirements.txt file in install via pip copy our source code in and then specify the command that's going to be run we do it this way so that we don't need to include poetry and all of that development Machinery in our final image the build command and the multi-architecture build commands are going to be the same a Docker build and a Docker build X command so as you can see by defining things in a similar way across all these different Services the actual build commands end up being identical whereas we can use Docker build for a single architecture or we can use build X to build a multi-architecture image the go API was the one slightly different one because we're leveraging that third party tool Co to handle all of that for us and so while I went quickly there hopefully that gives you an idea of the configuration surface area of the various Serv and how they all fit together in this puzzle so that you have the context required as we now take all of these services and figure out how we're going to deploy them into kubernetes once we have our container images built and pushed we need to translate the different components of our architecture into a set of kubernetes resources that we can deploy onto the cluster for the various stateless components those would include our two backend apis as well as the server for our react client files and the load generator none of those have any state that we need to store in the cluster so those we can use deployments for our database represent a stateful application and therefore should be deployed as a stateful set in this case we're going to use a Helm chart to do so we'll deploy services in front of each of our deployments and stateful set to provide stable Network endpoints and then we'll deploy an Ingress controller and Ingress routes to Route traffic from outside the cluster to the appropriate Services inside the cluster finally We'll add config maps and secrets for any configuration that we want to decouple from our application deployment let's jump in Define these resources and apply them to our cluster all right let's navigate to our 07 deploying demo application directory as you can see I've got a number of tasks defined for each service I have an apply command that's going to run a cube C apply on all the underlying resources I also have a common subdirectory which is going to deploy my Ingress controller in this case I'm going to deploy an Ingress controller called traffic just to show another example of a different Ingress controller and then for postgres I'm going to install it via Helm chart as well as apply my initial database migration let's start at the bottom of that architecture diagram and work our way up so we'll start with postgres in this case we'll run the postgresql install postgres command it's going to add the helm repo and then call a Helm upgrade install on version 15.3.2 using the set command to pass in off. postr password to the value that I want it to this-- set command is another option instead of passing in a values file you can set specific values via the command line in this way for a credential like this where you don't necessarily want to store it in a file passing it at deploy time can be a good option you can see that was deployed into the postgres namespace and the helm chart deployed a relatively small but it'll work for our purposes persistent volume claim associated with that stateful set now in order to run the database migration the type of kubernetes resource that makes the most sense is a job it is a task which we want to run one time to completion and so what I did here is I defined a kubernetes job I'm calling it The DB migrator I'm putting it in the demo app namespace and so before I can deploy this I'll need to create the demo app namespace I have this common apply namespace which is going to apply the namespace yl and that namespace yo contains demo app all the services are going to go into the demo app namespace except for postgres which is going into its own names great now that namespace exists let's take a look at the template here for the job it's going to have one container which I'm naming migrate it's using that container that I built and pushed which has all of my migration scripts in this case it's just that one in this case it's just this one create users table script and then the args for that migrator command line tool I'm giving it a path in my container this is where the migrations live slapp migrations and then I'm passing in a datab datase URL as an environment variable containing the credentials and host name for my database I'm also turning off SSL mode because I'm not running SSL on the database I also want to run the up migrations rather than the down migrations in order to get this database URL into the Container I'm passing it a reference to a secret named DB password which I've defined in this file here I have a secret named DB password in the namespace demo app and I'm using string data to Define my database URL this is the password that I created if I look at the services in the postes namespace I've got a cluster IP service and a headless cluster IP service in this case I want to address the normal cluster IP service so this is the name of the service the name space service cluster local on part 5432 in the postgis database awesome so let me apply both of those files it's starting with the secret and then it's creating the job let's look at the pods it's already completed we've got one of one completions great and let's look at the log from that pod it created our user table in 14 milliseconds awesome so now we've got our database running and the initial migration has run now this migration job we would want to run every time we have a new migration uh before we make the corresponding application deployments now I'm going to move on and deploy the things in my common subdirectory so we already deployed the name space however I want to also deploy the traffic Ingress controller so I can do that with t common deploy traffic I'm adding my Helm repository and then I'm running my hel upgrade install command naming my release traffic from this Helm chart using version 2880 I'll look at everything in the traffic Nam space we can see it created a deployment which in turn created a replix cassette which in turn created a pod we also have a load balancer typee service which will deploy in this case I'm running in my gke cluster so that will deploy a Google Cloud load balancer calling that command again the external IP has been set up there's one additional common element that is shared between my two apis and that is a middleware um the middleware is a is a custom resource that is defined by traffic which enables us to do things like strip a path prefix from the incoming requests uh so for example if I navigate to my whatever my domain name is and go to/ API node I'm going to Define an Ingress to route that traffic to my node.js application however my nodejs application is not expecting to have that prefix in the request so this middleware allows me to strip that out so that by the time it hits my node API it'll look as though it's coming from the root path so I've created that middleware let's now take a look at the Go API resources we've got a deployment this will look very familiar to the deployments that we learned in module 4 I'm calling it API goang it's in my demo app Nam space I'm including this app label just to say that this resource is associated with my goang API it's going to have a single replica and then I'm specifying my selector which this label needs to match the label on my container as specified here the container image is the one from dockerhub that I built and pushed I'm specifying to listen on Port 8000 I'm pulling in my database secret from a secret named API goang database URL which is defined alongside it here I'm specifying a container port listening on Port 8000 and a Readiness probe for the cuet to Ping to check if my application is healthy I'm specifying the resources that this application is requesting in this case 100 megabytes of memory and 50 millor of CPU I'm limiting the Privileges of the Pod just from a which is good from a security perspective and that should be about it I also have a service defined so this is just a standard cluster IP service listening on Port 8000 and passing the requests to Port 8,000 in my pod using that same selector that the deployment is using in order to connect it to those underlying pods so if I run T API goang apply it's going to call cctl apply DF on that directory if you do a cube CTL apply on a directory it's going to try to apply every yaml file within that directory so I've created my deployment my Ingress route my secret and my service if we look at the pods in the demo app the DB migrator is the one we created a a few minutes ago and then my goang API is up and running if we look at my Ingress route you'll notice this is a little different than the Ingress that we deployed in module 4 in this case traffic has defined their own custom resource to avoid having to use those annotations for all the custom behaviors and instead you can Define this Ingress route and traffic will interpret it to determine where Network request should be routed this is the domain that it's listening for and this is the path prefix that it's using to send to decide if a request should go to my go AP I I'm using that common middleware that I deployed to strip out those prefixes and then it's routing it to the service that I've defined alongside this and one thing I just noticed here is that this port should actually be 8,000 um I had it going to 8080 but I've updated my application to be listening on Port 8,000 so I'll save that and then apply just to validate that things are working I'm going to port forward to the service then we can access it on Local Host 8,000 there we go we've got our API responding I can then take this external IP address go over to my DNS provider of cloudflare update my a record to point to that IP address and then if we navigate to kubernetes course. devop direct.com aigang we can see the traffic is getting routed to our API okay let's move on to our node.js application the resources are going to look pretty much identical to the goang one the main difference being that we're running our API node container image and and we're listening on a different port that's Port 3000 we have a secret which has the same contents it's just pointing us with the credentials to the proper service inside the cluster and our Ingress route instead of using the path prefix API goang it's using the path prefix API node let we go ahead and apply this we can see our pod being created it's now running let's try and access it okay we see the node API giving us a response now for our client react app it's going to use that engine X container that we built which contains the output of our npm build process again it should all look very familiar the main difference here being we're using the specific container image associated with this application I'm listening on Port 880 I've got my Readiness probe set up I've got my resource requests and my security context set up I'm also mounting in this config map as a volume so this config map is mounted in as a volume and then is located at Etsy engine X compd which is the default location that engine X is going to look for a configuration file and then within that config map I'm defining that it should listen on Port 880 I have my health check route here these two locations are not actually necessary here because the routing to my backend apis is happening at the Ingress layer and then this tells us where the files in the container for my react client live and that should be sufficient so we've created our config map we've created a deployment an Ingress route so this Ingress route is essentially saying match all traffic to this domain and is not specifying any sort of path prefix so any request to the root path is going to get routed here I'm pointing it to my service that's defined right here that's in front of my deployment and that should be sufficient to get traffic into my application so if I remove this prefix now you can see there's my application the final service that I haven't deployed yet is the load generator for this I have a deployment I'm specifying the labels accordingly to indicate that it is its own application I'm using the container image here specifically the version that I built and pushed pointing it to the cluster IP service for my node API and specifying a half-second delay between each call if we wanted to we could move these types of configurations into a config map in a simple case like this it's okay to Define them directly here in the deployment if there were many different environment variables it might make sense to extract them out and put them into a config map I've got my resources defined as well as my security context now this is an important one on dockerhub I have the registry containing this container image specified as private now I did that so that I could demonstrate how image pull Secrets work and show you first it's going to fail to pull then we'll create an image pull secret and then it will succeed so let's start by commenting out this image pull secrets and we'll deploy those resources now I've got this warning here warning you about the fact that this is a private container image repository if I now look at my pods this pod is now in an image pull backoff State meaning it tried to pull the image and it failed if we describe it you can see failed to pull that's because it's a private container image registry so in order to solve that we need to do two things we need to uncomment this we need to create a secret containing the proper credentials that this can use in order to do that I can use this task load generator python create image pull secret but it's expecting me to specify the following three environment variables Docker username Docker email and Docker password so I'm going to specify those and cut that out I'll rerun this command now that I've exported those environment variables this command can succeed it's calling QBE Cube CTL create Secret in my demo app namespace it is of type Docker registry I'm naming it Docker config Json and then passing in all of the values that it needs and with our Docker config Json secret created we can now reapply our load generator configuration you can do kg pods and we can see it was able to successfully pull and run our container image and now we have our load generator pod let's check the logs of it as you can see it's just requesting over and over on that internal cluster IP and so if we go back to our website here we'll expect the request request count for the node API to grow much faster than that of the go API it's set to delay a half a second between each one so it's not going to climb super rapidly but you can see those requests are happening in the background it's at 182 and now it jumps up to 200 to recap we created our four stateless applications using deployments three of them have cluster IP services in front of them we've got our traffic Ingress controller installed routing traffic via those Ingress routes we've got our postgres database that we installed via the helm chart and then we did our We performed our database migrations via a job and those credentials for the database are stored within Secrets one for each service the Ingress controller provisioned an external load balancer for which we were able to create a public DNS record and now access the application because the load generator python container image registry was Private we had to create that image pull Secret in order to pull it successfully and now everything is running within the cluster one nice thing about gke is that out of the box you get log aggregation I can go here to my cluster and get logs from the control plane or I can go to the logging page and get logs from all my different Services I could filter down to specific ones and so this is important previously we've been only looking at logs by issuing a cube TL log command those logs would only persist through the life cycle of the Pod and so you want to ship your logs to some log log aggregation system here Google is handling that for us and pulling those logs off of our containers into its loging platform where we can then observe and search over time as well as see them all in one aggregated place if we click into the workloads tab here in the overview it shows all the different services that we've deployed as well as their status we can then click into the observability tab and see the usage of various services in this case let's narrow down to our demo app namespace and now you can see the various apps that we've deployed it also gives us different events that we've created so here I've this is showing me when I've modified a deployment or when I've deleted a deployment and so if you're if you're looking at this and you see a spike you can you can very easily dive in and see whether it corresponded to some action that you took maybe you upgraded a deployment and and that caused your memory usage to spike you could then go back and look at what changed in that version of your application and figure out why that had happened of the manage cluster offerings I would say Google's observability tooling is the most mature that I've seen uh on coo there is some log aggregation that's still in kind of a beta state but I don't think they have anything quite like this in terms of being able to see your metrics and correlate those to events in the cluster you could deploy your own observability stack you can deploy something like Prometheus to collect the metrics and grafana which is a a tool for building dashboards and you can query Prometheus and build dashboards like this but having it work with no additional configuration on our end is pretty nice here within gke so far we've mostly been using capabilities that are built into kubernetes itself with some additional functionality provided by drivers for those common container interfaces now let's take a look at the ways in which you can extend the kubernetes API to adapt the system to fit the needs of your particular application when people talk about kubernetes the first thing that comes to mind is that it is a container orchestrator it allows you to take containerized workloads and deploy and schedule them across a number of different compute resources however it's not just a container orchestrator kubernetes effectively provides an API for declaring a set of resources you send those resources to the kubernetes API server it stores them in NCD and makes sure that the state of those resources is shared across all the control plane nodes and accessible via the API the second piece that kubernetes provides is this concept of a control Loop that is continuously observing and acting upon those resources let's take some of the built-in resources that we learned in module 4 and think about how this plays out in those cases if I create a deployment I have my deployment. file it defines a whole bunch of configuration for that deployment I send that to the kubernetes API with the cube CTL apply command it gets stored in ETD and then that second part kicks in we have the kubernetes controllers that look at that deployment M say oh this is a deployment I need to create a corresponding replica set and it will do so then there's another controller that looks at that replica set and says oh this is a replica set I know what I need to do I need to create a number of PODS to go along with this replica set the behaviors that the system should take for those built-in resources are all provided by these controllers that come with kubernetes out of the box however you can Define your own custom Resources with whatever schema you want tell kubernetes how to interpret that schema and it will happily accept those custom resources store them within ND and maintain their state you can then write your own applications in this case we call them controllers which will query the kubernetes API to find out which of those custom resources you have deployed into your cluster and take whatever action you want this may sound a little bit abstract but let's break it down like we did for the built-in Resources with a couple of use cases one example is a project called Cloud native PG which is designing a system for deploying postgressql databases onto kubernetes that we're going to take a look at in the following section and actually deploy it into our cluster what they've done is created a set of custom resources associated with postgres clusters backups of postgres Etc and then the logic of what maybe a human database admin would perform is encoded into a custom controller such that you can execute many common workflows in a declarative fashion by deploying these custom resources another great example is management of TLS certificate there's a project called CT manager which I'm not covering in this course but I would certainly suggest you often take a look at it and they use custom resources for managing certificates so they have one for certificates one for let's say an HTTP challenge one for a certificate provisioner that is pointed at let's encrypt for example and so they have those custom resources and then they also have implemented a controller such that when you deploy those custom resources to your cluster that controller is able to go off and take action such as provision a new certificate and store those credentials in a kubernetes secret another great example is a project called crossplane it enables you to deploy infrastructure simply by creating custom resources in kubernetes they have what are known as managed resources which map on toone with let's say an infrastructure providers if you're deploying something on AWS maybe you have an ec2 instance custom resource that lives in your cluster crossplane controller sees that is able to interface with the AWS API and actually create the corresponding instance with all the configuration you've provided in your AWS account as you can see these three use cases are incredibly varied and hopefully that Sparks the idea that you can take this pattern of defining a set of resources and then writing an application or a controller to observe and act upon those resources to handle all sorts of different scenarios there are many different projects that help with providing the tooling required to build these types of controllers some of the most popular ones are Cube Builder operator SDK meta controller or you can use the kubernetes client code that's provided for many popular languages to bootstrap your own if you wanted to get started with building out your own kubernetes operators Cube Builder has an excellent tutorial in which you literally go through the process of writing a replacement for the built-in Cron job controller so you're effectively solving the same types of problems that the people building kubernetes were and learning the underpinnings of how this operator model works by building out your own implementation of Aon job on top of kubernetes to make this a little more concrete I wanted to just show what a custom resource definition looks like it is an open API schema that defines all the fields that are allowed required and or optional for your custom resource and so this example here is the Ingress route and we actually used this in the previous module when we were deploying our Ingress for the application we deployed traffic and it defined these custom resources which it then used to define its routing configuration if we look down here at the schema you can see all the different properties it tells us what the resource is it tells us that in order to create an Ingress route object we need to Define an API version a kind a metadata a spec within that spec we need to have routes we can have a TLS configuration and so these projects or you define a custom resource like this and that gets stored in your cluster just like any other resource so if I do K get crd here are all the custom resources that are installed into my cluster you can see that traffic deployed a number of custom resources you can see that Kong reused in the Gateway API section of the built-in resources module deployed a number of custom resources and then because this is the gke cluster Google has also deployed a number of custom resources that it's using on the back end hopefully this description of a variety of different operators and how they tie into this control Loop model that kubernetes provides will get you thinking about how you could apply this pattern to your own application scenarios building operators is a relatively Advanced topic and Falls outside the scope of this course but we will be deploying and using some of these operators in the following section to get a feel for how they work and how you should interact with these custom resources now that we've learned about the idea of extending the kubernetes API and how to build systems on top of kubernetes let's take a few examples of that type of project and deploy them into the cluster these are going to be applications that enhance our usage of the cluster in one way or another there are many companies and open source projects that are building this type of tooling and so you'll want to take a look across the landscape to understand if there are tools that would make your application platform on top of kubernetes even better if we think about how we would operate a database for an application that we're running in kubernetes there's four main options the first of which is to keep the database outside of the cluster entirely you could deploy it on your own or you could use a database as a service so if you're operating in AWS that could be an RDS if you're in Google Cloud that could be Cloud SQL if your cloud provider offers a database like this it can be a great way to go to shift some of that operational overhead of managing a database and backups and ensuring everything is working and testing those things onto the cloud provider and so that can be a great way to reduce the amount of time and effort that you and your team need to focus on the database itself however if you do plan on hosting it within the cluster there's three primary options you could write your own stateful set you could use a Helm chart like we did previously where someone else has written a stateful set for you that you just configure or there's also this project Cloud native PG that has built an operator that you Deploy on kubernetes and then you can declaratively manage your postgres clusters via kubernetes custom resources what does this actually look like uh you will deploy the oper into your clusters this green box is the operator pod along with it are a number of custom resources for example there's a cluster resource up here at the top that you can see you defined how many instances you want how much storage you want on the back end along with a number of other configuration options when you create that custom cluster resource the operator will see that and based on the configuration provided it will create one or more pods a primary pod that is read and write and one or more secondary or replica pods that will be readon within your cluster it also has custom resources for backups where you can specify that you want to back your cluster up to an object store of you're choosing so on Google Cloud that's going to be Google Cloud Storage COC cloud has an object storees well Amazon has S3 and so you can Define your backups or a scheduled backup for your database in a declarative fashion let's go ahead and jump over to our code editor and deploy Cloud native PG uh and see how we can set up this type of thing I have a number of tasks defined here within my task file for the uh module 9 subdirectory the first thing that I'm going to do is create a namespace that is the 09-s cnpg namespace next up I'm going to install the operator to install the operator I'm using a Helm chart so I use Helm repo ad to add the repo to my Helm configuration and then I call the helm upgrade install command giving it a name for the release putting it in the cloud native PG system namespace creating that namespace and specifying which chart to use I'm using all the default values here if you do need to do some customization take a look at the helm chart values to figure out what you might need to change it then outputs a sample cluster that we can use if we look at the pods in the cloud native PG system namespace we have one pod and it's up and running great we can then also look at the custom resources that this has deployed into our Custer by doing a by doing K API resources and then grapping postgres you can see we have a cluster resource a backup resource a scheduled backup Resource as well as image catalogs and cluster image catalogs I believe those are how you define the container images you want to use for each particular version of a cluster you want to deploy and then this pooler's resource which I believe uses PG bouncer to set up connection pooling for your cluster with that installed we can now deploy a cluster custom resource I'm using the absolute minimum configuration here the goal of this section is not to teach you how to deploy the optimal postres database with Cloud native PG it's to show you how you can use cloud native PG and interact with these custom resources so if you're using this you should look into the options that the cluster crd provides uh and I believe Cloud native PG even has a Helm chart with some of the best practices for deploying a cluster encoded into it so that could be a good place to start let's go ahead and deploy this minimal cluster into our kubernetes cluster if we now look at the cluster we can see that it's setting up the primary if we get the pods in our namespace there's this init DB job which is executed first before it spins up our primary replica looks like that pod associated with the job has completed and now here's our primary replica for the database coming up we should see if we do KET PVC we can see that it has created two PVCs the first of one the first one for our primary replica is bound to the Pod that we just saw the second one because that second pod hasn't come up is still an appending status our primary replica is now running and we have a join job which is going to allow that additional read replica to join our cluster that is now done and now our read replica is coming up we can see the two instances but only one of them are ready so far and now both of those replicas are healthy and our cluster is in a healthy State we can also look at the services that it stood up and see that we have a read WR end Point that's going to Target our primary replica and then we also have a readon endpoint that's going to Target our read replica interestingly if we do K stateful sets there are none so the cloud native PG project has decided to skip stateful sets entirely and manage its pods directly presumably this was due to some of the limitations with how stateful sets work within kubernetes and they found that the those limitations were easier to overcome by directly controlling the pods themselves specifically most Fields within a stateful set including things like the size of the volumes that it's provisioning are immutable even though volumes can be dynamically expanded now in kubernetes they have chosen as a project to skip stateful sets and manage their pods directly so just an interesting call out there now let's take a look at what we need to do to set up backups for our postgres cluster here is a configuration for a cluster that looks quite similar to before we're going to have two instances a very small volume size but I've added two things here one I've added this backup section along with a Google Cloud Storage bucket that I'm about to create I'm specifying that it can use the Google credentials that it finds from my GK environment and telling it to retain backups for 30 days next I'm setting up a service account template and this is necessary to use a feature within jke called workload identity that will allow me to link together a kubernetes service account with a Google identity and access manage service account such that my kubernetes service account can leverage the same permissions and roles that the IM service account has and this will allow me to access this bucket to store my backups without needing to store any sort of static credential into a secret in the cluster the way that that works is you put an annotation on the kubernetes service account that looks like this this will be my IM service account name this is the uh gcp project that it is associated with let's go ahead and create those resources so first we can create the bucket here I'm running gcloud storage buckets creat create and passing it that name we can then go over to the Google Club console click refresh and we see this bucket was just created with nothing in it now let's add the necessary permissions on the Google Cloud side to enable us to store objects in that bucket this is going to do a few things first we're going to create a service account named cnpg backups then we're going to attach two roles to it the uh storage object admin Ro as well as the Legacy bucket reader Ro and those are specifically for the bucket that I just created so I'm using the gsutil command line tool to add those two roles associated with the bucket that I just created it's important to call out here that this bucket name needs to be globally unique so you cannot share the same bucket name as me if you're following along you'll need to modify the name of this bucket maybe add a postfix maybe put your own name in there somewhere and that will allow you to have your own globally unique Google Cloud Storage bucket with those two roles associated with the bucket I can then add this role the workload identity user to the IM service account and this is what allows the kubernetes identity to assume the IM identity and specifically you need to give it this member where I'm saving the service account within this Google Cloud project specifically in the namespace that I'm working in and this service account name is allowed to use that IM service account looks like that was successful we can see all this set up on the Google Cloud side we click into permissions and scroll down here is the account here's the service account that I created and here are the two roles that I attached to it for this bucket if we go over to the I am page and click under service accounts we can see the service account created here if I click into it go to permissions here is that workload identity user which references our Nam space and the the name of the surface count that we're going to create in the kubernetes cluster and then the final piece that we needed to make this happen that I set up quite a while ago in module three when we created our cluster was we we used this workload pool option which passes our Google project ID do service idid Google and that enables the GK cluster to utilize identities in this workload pool that can be seen in the cluster configuration here where workload identity is enabled and the workload identity name space is the one associated with this gcp project with that all set up we can now apply this cluster config with the backup and this did did three things one it applied this file and earlier I didn't call out what this stood for Barman is the backup and recovery manager you'll see Barm man a few times here in the backup configurations that's what Barman stands for so I applied this cluster with backup configuration I then applied a scheduled backup file so this is essentially a Cron job that's going to use the backup configuration in my cluster on this schedule so it would run at midnight UTC daily and then just so we'd have something to look at and have to wait until midnight I also added a backup resource which will run right now in this Nam space pointing it at my cluster so first let's do K get clusters looks like this cluster is still coming up let's do K get backups it's in a running phase our cluster is now healthy and our backup is completed let's go look in the bucket and see what it has created so we click into our bucket this is the name of our postgres cluster we've got both our base data backup with a timestamp and then any information that was still in the right ahead log that hadn't been fully synced into the primary database storage would be here in the walls subdirectory great so just like that we were able to deploy Cloud native PG operator into our cluster deploy a couple of postgres clusters set up backups using workload identity to store those data into a Google Cloud Storage bucket and utilize the effort and the testing that has gone into that Cloud native PG Project without needing to Define our own stateful set or rely on that Helm chart for this logic it's also possible to set up backups in a cloud like sio store you can use any S3 compatible Object Store as the destination for these backups uh in this case you do need to give it this endpoint URL because it is not using the default AWS S3 endpoints and there's no workload identity feature on civo so you would as you create your bucket you would create a set of credentials with an access key and a secret access key and store those within a kubernetes secret but other than that it would work in the same way next tool that I want to add to my kubernetes cluster is a tool called trivy operator uh trivy is a open source project from a company called aqua security and it is used for scanning container images and cluster configurations and so by deploying this into our cluster it's going to automatically detect every container image that is running in the cluster and scan it when it is detected but then also it's going to rescan every 24 hours or whatever time period you specify and so this is really awesome one because you're going to get automatic visibility into all of the container image whether they're your first party container images or third party container images and also if you were to scan images within a CI pipeline for example often times vulnerabilities are discovered after you have deployed an image and so that initial scan result may become outdated and need to be superseded by one that includes the latest vulnerability database it also exposes metrics that you can for example send to Prometheus or data dog and use those to alert your security team based on certain criteria we're going to deploy the trivia operator into our cluster it's then going to watch for any job replica set demon set or stateful set extract the image and tag from that and that will trigger kubernetes jobs that are going to both do a vulnerability scan as well as a configuration audit which then produces additional custom resources that'll be stored in the cluster and we can view a vulnerability report as well as a configuration audit report so let's go ahead and install trivy and then view what these look like we're going to install the trivia operator with Helm so we're doing a Helm repo ad for the aqua repo and then a Helm upgrade install for the trivia operator we're using the default values again there's a number of configuration options but for this use case the defaults work just fine and we're specifying a version if we do K get pods in the trivia system Nam space you can see first here's the operator pod which came up 26 seconds ago but already in this first 30 seconds of its life it has spun up a number of vulnerability report scan jobs one for each image that it's detecting in the cluster we can then do K get vulnerability reports none of those scan jobs have completed yet but as they complete trivia is going to write to these custom resources we be where we will be able to then take a look at the specific vulnerabilities that our images have we can see that as those vulnerability scan jobs complete more and more of these vulnerability reports are added it started with the cube system namespace and now we can see some of the images that we built are now included you can also use the- o wide flag we now in the output get a summary of how many vulnerabilities at each severity level trivia has found so for example in our migrator job we see that there is one critical vulnerability what if we wanted to dive in and look at specifically what that was we can do K get vulnerability report in the demo app namespace give it the name and now if we do o yl and pipe that to yq now we get all of the details about the vulnerabilities that it found I can search for critical and we can see that there is a standard Library vulnerability in goang 1222 associated with the net net IP module that was fixed in 122.4 so if we wanted to fix this we should upgrade the version of go that our base image is using in order to get that updated version we can click in to the link provided to see the aqua security summary of the vulnerability or we can search this cve number and see the official nist details about the cve it describes what the cve is has some additional resources and again tells us which configurations it has been fixed in and so with that basic install of the DET triy operator we got these custom resources that are now giving us deep insight into vulnerabilities across our workloads there's many other projects that uh you should consider taking a look at I just wanted to highlight a couple that I find very valuable and showcase how you go about installing and then working with the custom resources that these systems deploy a few that I would just call out here quickly are cert manager which is a tool for provisioning and managing uh TLS certificates the external Secrets operator we're going to Showcase in the the developer experience module but that allows you to store the source of Truth for Secrets outside of the cluster and then pull those in and mirror them as kubernetes Secrets which you can consume from your workloads there's a number of uh monitoring observability tooling there's paid versions like data dog or grafana Cloud there's self-hosted options if you if you want to go the open source route such as Prometheus grafana Loki there's geit Ops tooling to enable you to automatically keep your cluster State synced with a uh with the state of your configurations in git there's service meshes from a networking standpoint open policy agent from a compliance and specifying rules for what types of configurations can be deployed in your cluster I mentioned cross Plan before uh which is an infrastructure as code tool that runs on top of kubernetes K native is a serverless platform that runs on top of kubernetes cubt allows you to deploy and manage virtual machines with kubernetes and then Valero is a backup uh and restore tool for your kubernetes cluster so I just want to give a speed run of a bunch of valuable tools that you should look into as you build out your application platforms on top of kubernetes explore the landscape and hopefully you can find projects or companies or tools that meet the needs of your specific use cases at this point we have our application running in kubernetes we have a number of auxiliary tools that are running to make our experience better however there's still a handful of developer experience challenges that I want to call out and address here in this module specifically I want to look at iteration speed so how long it takes you from making a change to your application to being able to test that change in a representative environment and two Secrets management every time we've shown a secret I have a big warning that says Don't store this in Version Control Etc and so we're going to take a look at how you can manage Secrets effectively without needing to put them at risk in terms of the iteration Loop or the speed with which you can make a change and and see that change represented in your application there's a number of different approaches that people take some people do their development outside of containers and just ride the processes local on their system the big challenge with that is now your Dev environment is can be significantly different than your staging or production environments and eventually you're going to hit edge cases where the application behaves differently based on those differences a next approach that I see a number of teams taking is using Docker compose and Docker compose allows you to declaratively specify a group of services defined as containers along with networking and configuration it does allow you to test your applications in their containerized form however because you're configuring it with a different tool the configurations and networking and things like Ingress anything that lives at sort of that cluster layer are going to be different in Docker compos versus kubernetes so you're going to have some while it's more similar than testing without containers you're still going to have some differences between your Docker compost environment and your kubernetes environment that you're deploying into that being said Docker compose is very easy to get started with and therefore can be a good option early on the next four tools all use kubernets directly and have some variation on detecting changes and automatically rebuilding and pushing images to enable those changes to show up in that cluster environment as quickly as possible some of them you work with local clusters others you set up a proxy to a remote cluster and they do some technical magic to sort of intercept Network requests and Route them to the right places as you're figuring out how your team is going to work with kubernetes and develop applications for kubernetes I would explore these options as I said we're going to Showcase how to use tilt to build out an environment like this locally let's jump over to the code editor and do just that let's go ahead and navigate into our module 10 directory into the Tilt subdirectory and take a look at our tilt file this top level tilt file just references for additional tilt files one for each of our services that we're developing so we've got one for our go API one for node one for client react and one for our load generator now you could set this up to also deploy things like the helm chart for postgres tilt does support Helm out of the box but for this demo I just wanted to Showcase a basic setup where we're going to deploy everything into the cluster and then run tilt to iterate on our services let's go take a look at what the node API tilt file looks like like so that lives in module 6 alongside the code and I put it alongside the code just because it makes it easier to detect changes and rebuild and that's where the docker file lives so if we go here and open our tilt file this is about as simple as it gets we specify how to build the image in this case I'm giving it a container registry name giving it a build context and telling it to use Docker it's going to use the docker file that lives alongside this tilt file I'm then telling it where the yaml for the kubernetes resource using that container image lives so this is is the module 7 API node deployment. that's the consumer of the container image and then I'm telling it to port forward Port 3000 such that we can access this API the goang one is going to look just a little different and that's because we're not using Docker build we're using Co to build it so if we open up this there's an extension for tilt that uses Co runs the co build command again we pass it some arguments tell it where the yaml for the consumer of the container image lives which port we want to forward Network traffic the Tilt files for the other two resources are going to look quite similar so we don't need to go through them we're going to make sure we're pointing to the kind cluster then we're going to make sure that we've deployed our base resources so as I mentioned currently the Tilt file only deploys the four those four Services we want to make sure we have postres already running uh we want to make sure we have traffic already running Etc so let's go ahead and do that by navigating to the module 7 then I can do a te apply all it's going to install postgres install traffic configure all my services Etc we can see here all the services are in a running State except for the load generator which is in an image pull backoff State it's because I haven't created that Docker config Json Secret in order to pull the image from the private Docker registry I'm going to ignore that error for now because tilt is actually going to build a new version of the image locally and push that to my container registry so it's not going to be a problem in order to access this application we can look at the services in our cluster and remember in order to get a external IP for a load balancer type service we had to run this cloud provider kind software on the host so I'm going to run that in the background and that's going to allow traffic sent to this IP address to be routed to my Ingress controller I've added this to my Edy host so that when I access access kubernetes course devops direct.com it'll route that traffic to this to this local IP address so now if I access this in the browser I'm getting back my API responses at this point we're ready to run our tilt up command so just running tilt up in the context where that top level tilt file lives it's going to bring up this dialogue in this case I'm going to hit space it's going to open up a browser and we can see the services defined within our system if we click into the detail page we can see logs associated with the various Services as well as any logs coming from the Tilt system around changes to those systems one thing I noticed here is I had forgotten to change the text that's displayed in the browser bar here this still says devops directive Docker course let's go ahead and make that change and see how tilt rebuilds and updates our image so I'll go here into my browser search for do obstructive Docker course here it is change it to kubernetes save it and now let's go back to our tilt window you can see client react detected that change rebuilt our image in 5 seconds and now if we go back to this tab and click refresh we get the updated text here and now any change we make to any of our services tilt is going to detect that and make that modification you can see the load generator is querying our node API and so we're getting the logs from both of those this is collating all those logs together we could filter it down to a specific service or we can get all of the logs in one window we can also filter by reg regular expressions or whether the logs are coming from the Tilt system or applications if we go into our go Application let's just change the health check uh to instead of return pong return blah blah blah and then we click save you'll see immediately tilt detects the goang change it's rebuilding our image you can see the logs from that build process and with that in 12 seconds it rebuilt our image also now that my Docker cache is hot let's change that back and see how long it takes to build if I go back over to tilt updating and it looks like the time was cut in about half also some of these other services are building much quicker and so this is where you really want to optimize your Docker file so that your Docker image cache is working as efficiently as possible and tilt also has a concept of live updates which allow you to specify certain files within a project that will get synced directly into a container without needing to rebuild it for an interpreted language this can make a huge difference you may be able to skip the docker build process entirely and sync changes directly into your running container for example if you were working with python you could specify that anything under the API directory should perform a sync whereas any change to the dependencies via the requirements.txt file that will run a full build for a compiled language it's a little less intuitive how this works but as you can see in this Java example it's syncing in some of the compiled class files from the build jar directory and then restarting the process following these steps they were able to optimize their build time for this Java application from their initial naive approach of nearly 90 seconds which is clearly unacceptable for a local development Loop all the way down to under 5 Seconds there's docs for using and optimizing tilt across a number of different languages such as go python node Etc as you're setting up tilt make sure to go here and follow their best practices to ensure that your iteration speed is as quick as possible we can also take a look at our deployments to see how tilt is working behind the scenes so if we scroll up here to the image now instead of using the tag that I had specified in my emo file tilt has gone and built this specific image and updated my deployment such that it uses the new image rather than the one I had specified if we are done with our development we can just kill this process and so by using something like tilt you can now develop directly against a kubernetes configuration that will match as close as possible to what you're going to run in production this will allow you to identify and debug issues that live not only at your application layer but also at that kubernetes layer or maybe in your Ingress controller so I would definitely take a look at tilt scaffold mirror D or maybe telepresence and see if one of those meets the needs of your development teams the other major developer experience issue that I've hinted at throughout the course is managing in of secrets so we for all of our kubernetes configurations we are able to declare them within our G repository that serves as a source of truth that we can then use to deploy from and interact with however for sensitive information we don't want to store those in plain text in a git repo and this leads to some friction or some challenge in terms of managing those Secrets there's a number of approaches that I have seen used some of them work better than others the first one that many teams get started with with is to manually create and update Secrets separate from whatever process they're updating and managing their other configurations maybe they store the secret. yaml in their password manager and then whenever they need to make an update someone has to log into that password manager pull down that yaml make some updates apply it manually to the cluster this is how many teams get started because it requires no additional effort upfront however it quickly becomes a nightmare to manage and it's very easy to have a human make a mistake in that process forget to update the source of Truth or forget to apply it to the cluster at the right time I suggest that teams look at some of these other options rather than stick with that initial approach the second option that we're going to leverage here and demo is the external Secrets operator so this allows you to store the source of Truth for a secret in a system outside of the cluster such as a secret manager from a cloud provider uh could be hashicorp Vault there's a number of sources that you can use but this allows you to have a non-sensitive configuration in your repo that points to an external configuration that has the sensitive data and then it will mirror the value from that external secret into a kubernetes secret for you to then consume within your cluster another couple of options are sealed secrets and Sops these are both mechanisms to encrypt your secrets and so if you encrypt your secrets with a secure key the encrypted version then becomes less sensitive and you're able to store those encrypted versions in side of get along with your other configurations and then the key to unseal those Secrets or decrypt them will live only in the cluster and so this can provide a mechanism to keep your secrets in your git repo and manage them just like all your other configurations the challenge with these workflows is that now in order to create a secret you need to have access to that encryption key and there's also some challenges with rotating those encryption keys if you need to down the road you also could skip using kubernetes Secrets entirely uh and use something like hash Corp Vault or AWS Secrets manager or Doppler and then you can have your application query those systems directly and pull in the credential values at runtime avoiding the need to deploy kubernetes Secrets at all there is sort of a bootstrapping process right where you need some credential to get access to the system at which point you can pull in the credentials you need depending where your Secret store lives that could be a kubernetes secret that also could be using workload identity like we did for the cloud native PG backups where we we're able to use a kubernetes service account and tie it to a Google cloud service account to leverage those Cloud resources one good option here that avoids the need for any static credentials is to use your Cloud secret manager and then use workload identity to access those Secrets using either a tool like external Secrets operator or pulling them in at runtime from your application directly the version of this that we're going to implement for the course is using external Secrets operator with the Google Cloud platform secret manager so on the left hand side we've got we're defining a secret in gcp secret manager it's called external Secrets example the value is the secret is stored in Google Secrets manager we we then can define a manifest within our git repository for this external secret and it's going to reference the Google Cloud secret pull those values in to a kubernetes secret we deploy this external secret resource to our cluster the external Secrets operator will be running in that cluster see our new custom resource go over R the value and populate a kubernetes secret accordingly let's go ahead and set this up navigating into the external secret subdirectory the first thing that we're going to do is switch over to our Google Cloud cluster the reason we want to use our Google Cloud cluster is that we can leverage that workload identity feature to access our secrets manager with no static credentials now we can install external Secrets which we're going to do via Helm we're adding the external Secrets repo and then we're installing it with say Helm install command with our external Secrets operator installed we're now going to follow the same set of paths we did for the Google Cloud Storage bucket permissions we're going to create an IM service account we're going to tie that to a kubernetes service account and then we can go ahead and use that to access Google Cloud platform secret manager so we're creating a service account named external secrets with the g-cloud command line we then need to Grant access for that I am service account to access Secrets within Google Cloud so we tie our service account that we just created to that role so that we can access secrets we then need to attach the workload identity user role to that service account that will allow us to tie together our kubernetes service account with the IM service account and specifically we give it the context of the kubernetes service account that's going to be the consumer in this case it's in the external Secrets name space and it's named external secrets we then need to an our kubernetes service account with the appropriate annotation to enable that connection to take place if we look here at our service account The annotation we provided is for the gke IM Service uh specifically we can we want to be able to use the IM service account named external secrets in this gcp project now if instead of defining this service account yaml in our repo and applying it after the fact we could have handled this within our Helm's installation directly and so to do that we would need a values. yo file and specifically we would set the looking at the helm chart here we can see that there's a service account uh field and within that there's an annotations object and so we can set service account annotations and then here is our annotation and now within our installation command we're going to make that an upgrade install command and let's pass it our values file and now if we reinstall now that annotation would automatically be added to our service account without needing that extra step the next piece of the puzzle is to create uh what is known as a Secret store and so this is the custom resource within external Secrets operator that tells the operator where to find the secrets and so for this use the cluster Secret store kind you could also use a Secret store which is namespace scoped I'm calling this the gcp key store uh and specifically I'm using the Google Cloud platform Secrets manager within my particular project so if I apply that it's come up it's come up it's ready and it has readwrite capabilities great now we can apply an external secret configuration and so this is the same configuration I showed here on the slide and let's take a look at what's in here I'm naming it example it's going into the external Secrets name space every 1 hour I want the operator to check for new values and automatically pull those in there is a way to add an annotation to our external secret to force it to sync sooner but 1 hour is fine as a default I then reference that cluster Secret store that I just created and here I'm specifying which kubernetes secret I want external Secrets operator to populate using these data and then finally I specify where external Secrets operator should find the values for this particular secret so I'm saying look for the secret in Google Cloud Secrets manager name external Secrets example and then where within the kubernetes secret I should store those values so if you recall if we go back to the secrets manager I've prepopulated this secret it's called external Secrets example gcp and the value here is this if we look at that external secret I just created we can see that it has successfully synced and is ready and what that means is if we now look at the secrets in this namespace here is the secret here's the secret that it created based on the target name if we look at the values and specifically under the data key and under this key so here is the key that I specified it should store the value at and then we'll base 64d code and we get the value that was stored here on the secret manager side within our kubernetes secret and so now we can store this external secret resource which has no sensitive data in our git repo alongside with the rest of our configurations but store the source of Truth within Google Cloud secret manager which is purpose-built for controlling and managing sensitive information the team can go into Google Cloud secret manager and update and manage Secrets accordingly you can control access to those Secrets directly via Google Cloud am but you also get the benefit of automatically syncing those into your kubernetes secrets that you can consume from applications this is a pattern that I use quite frequently uh for managing Secrets as long as you're on a cloud provider that has a secret manager like this and AWS Google and Azure all do this can be a great option to streamline Secrets management and avoid that manual out ofand process that you may have started with so hopefully that gives you an idea of how you can solve some of the developer experience challenges associated with kubernetes both the iteration speed so that process of building pushing and deploying new images into your cluster as you iterate as well as the secrets management problem of how you can efficiently manage Secrets while also keeping them secure in the next module of the course I want to take a look at how to debug applications running on kubernetes if something is going wrong and specifically I want to focus more on issues at the kubernetes layer that would be preventing your application from running or causing them to crash rather than at the application layer there is an awesome visual flowchart guide published by learn cuberes doio if you go to this URL you can find it there there's an accompanying blog post here's the full resolution version of that you're almost always going to start with a get pods command this is going to give you a high level overview of what pods exist in the cluster and what state they are in then based on their state you'll follow the path and try and figure out what exactly is going on by looking at the logs you might see that there's an application issue by looking at the description you can see all the different events that the cluster issues about the pods if your pod appears to be ready but is not working you could port forward to that pod and access it directly from your system and so while I don't use this directly anymore and generally just know these commands this going to be a great starting point as you're looking at debugging an application for the first time and you're not sure what to do next to use as an example for trying this out I'm going to deploy an example application from Google called the microservices demo it is a pretend online shop made up of a number of different microservices as you can see here there's one for checkout one for currency one for sending emails one for generating load Etc and I have purposefully taken their configurations and made a handful of changes such that when you try to deploy it it will break let's go ahead and deploy that broken version of the microservices demo into our cluster and then try to walk through and figure out what specifically is going on and how to fix it uh let's just use the SEO cluster for this one one any of the Clusters would be fine and then we'll navigate to our debugging module we'll start by creating a namespace and then we're going to install our microservices demo to do that I'm just going to apply run a cube CTL apply F on the microservices demo yo file in the subdirectory there's a note here at the top which references both the original project from Google Cloud platform as well as the fork that I made where I introduced these braks if we look at all the different services in our cluster we can see that the microservices demo created a number of services within our cluster it provisioned this load balancer service here and so let me just access that from the browser and see what's going on okay so we're able to use the load balancer to access something but it's surfacing this 500 internal server error uh it looks like it's trying to connect to one of the services in this case could not retrieve currency so maybe it's the Currency Service uh it looks like that failed and so this front end is bubbling that application error back up to us let's check the status of the pods this is also a good opportunity to showcase a tool called K9s which is a text user interface for kubernetes and so if I just do K9s it will start a a Tue a text text based user interface and I can navigate around via my keyboard and so rather than having to issue a whole bunch of kg get pods K describe pods Etc I can just run K9s and navigate around accordingly you can see the current Cube CTL context at the top and as you can see four of my services appear to be broken the one that we saw on the front end was referencing a currency so it's probably this Currency Service I can hit the D command to describe it and now we're dropped into a description of that pod if we scroll here to the bottom we can see all the events that kubernetes is publishing about this pod and these will by default persist for one hour so after an hour these would get wiped out but we can see 2 minutes ago it was assigned to a specific node we then pulled the container image it started but then the Readiness probe failed so let's take a look at what's going on here if we scroll through the definition for this pod for this deployment we can see the reason here is that this was O killed that means out of memory killed they tried to use more memory than the limit that was specified and so if we look down here in the request section it's only requesting 5 meges memory and it has a limit of 10 so why don't we bump those up and give it additional memory and see if that solves the issue I'll open up my microservices demo. I'll search for Currency Service find that deployment and as you can see I've commented out the original values and have replaced them with these insufficient values so if I replace the original values and then do an apply buy again now let's go back into K9s you can see the Currency Service original pod was crashing we have a new replica coming up from that modified deployment the new one appears to be healthy and the old one was taken down okay so that's one of our four issues solved let's load the application on the front end again and see what happens and see if we get a different error okay uh so we're still getting 500 error but in this time it's talking about the cart service let's take a look at the cart service it is indeed in a non-ready state and we can see that it has been restarting over and over and over okay let's go ahead and describe that we're going to take a look at the events we assigned it we pulled the image we started and then the liveness probe is failing if the liveness probes that we specify are not passing then kubernetes will assume that the application is unhealthy and will kill the pod in order to try again if we scroll up to look at our Readiness and live Ness probe definitions here we see that for livess it's checking on our pod on port 8080 and our Readiness is the same and so why might that not be working if we scroll up here to The Container definition we can see that the actual Port that's specified is Port 7070 the cuet is trying to check for liveness on port 8080 but the application is actually listening on Port 7070 let's go ahead and fix that to our definition we go to the cart service scroll down and as you can see I had commented out the original version and replaced it with this bogus Port now we'll reapply and we'll go back into K9s to see if that fixes things we've got our new pod coming up associated with that modified deployment it appears to be running but it's not in a ready state yet and now our cart service appears to be healthy we've got two additional pods that seem like they're having issues uh let's take a look at the reddis cart we see that it is in an image pull backoff state so that means that kubernetes is unable to pull the image associated with the container we look at the events we can see more details about that pulling image reddis deian failed to pull image redis Debian so let's go to the dockerhub page for the reddest container image look under tags and so we're trying to pull the dbn tag no tags found so perhaps this tag is just simply invalid instead let's take a look at one of these other versions let's use version 7.2.5 we'll go into the AML file we'll search for reddis cart we'll find the container image and yes so here's the bogus image that I had added to make it fail it looks like the original definition uses this redus Alpine image let's go ahead and use that we'll apply go into K9s and now our redis cart service has come up healthy uh and finally we have a our load Generator Service oh interestingly the cart service is still failing let's take a look at that um oh it looks like I maybe modified only the Readiness probe and not the liveness probe so let's go and fix that uh yeah so I modified one of these back to the original value but not the other so hopefully that fixes our cart service and then the load generator is the final one not working let's go ahead and describe it if we scroll down look at the events two nodes have insufficient CPU so zero of two nodes are available for scheduling this pod two have insufficient CPU let's go ahead and look at the CPU required and we can see that the limit is specified as requiring up to five CPUs virtual CPUs and the request is specified as needing up to three CPUs and I believe these nodes that I provisioned in SEO have only two CPUs each so that makes sense that the deployment is requesting at least three CPUs just for this application but none of the nodes have capacity to satisfy that so it gets stuck in this pending State let's look at the load generator service and here you can see the original CPU request was 300 I had up that to 3,000 the original CPU limit was 500 I'd up that to 5,000 and those values were just too high and so our cluster could not meet the demand we can now apply one more time go into K9s looks like our load generator is coming up now it's healthy and now all of our services for our application are healthy let's go ahead and access it from the browser refresh the page and now this is what the demo application is supposed to look like we've got our products different pricing we can choose any number we can add them to the cart those will get saved in the cart we can switch to a different currency and so now all of the microservices appear to be functioning correctly and that's represented both by the fact that our pods are all running and healthy and by the fact that we can access our application success F from within the browser now all of these breaks were within the uh deployment definitions there could also be issues obviously uh maybe let's say between the services and the deployments if you specified the wrong set of selectors or the wrong ports my goal here was to give you an idea of how to look at the information that the kubernetes cluster provides either via the qctl command line or via a tool like K9s if you prefer and walk down all the potential different issues identify them and fix them and so as you work with kubernetes inevitably you're going to run into some issues make sure to take a step back think about what the root cause could be and systematically work through the different avenues for gathering information and identifying and fixing those bugs up until this point we've had a single configuration that we've been deploying into a single kubernetes cluster this is valuable but in any real world scenario you're going to have multiple environments that you're going to want to deploy your applications to this can be long-lived environments like a production environment or a staging environment there also could be short-lived ephemeral environments that you want to spin up and spin down quickly in order to do this we need to have some approach that allows us to take a common configuration and reuse it across multiple different places there's a few popular tools that I'm going to cover here in this section describing the trade-offs between them and eventually building out a fully working example of our demo app that we'll deploy to multiple clusters the first tool that I'll cover is customize uh it is a command line tool that is built into Cube CTL it also can be installed as a standal and binary and it uses what is called a base and overlay model you start out by defining some base configuration this is going to have all of the common fields that all of your environments use and maybe some default values as well then for each environment you want to deploy into you will specify an overlay so only fields that are different for your particular environment need to be specified in an overlay so it can ideally be a much smaller configuration customize then takes those two and merges them together to get your end resulting configuration it's very easy to get started with conceptually as well as the fact that you already have the tooling built in to cube CTL however there are some major limitations that you'll run into eventually related to how it handles arrays within yaml and and if you need to replace a specific value within a yaml array or if you need to do patching for multi-line Strings within yaml there's no support for that so you'll have to replace the entire multi-line string block that is particularly relevant if you use file like config Maps where you're defining a config for a particular file as a multi-line string within your config map yl another very popular tool that is used uh to handle deploying into multiple environments is Helm we looked at Helm earlier in this course and so this is just taking that tool and applying it to your first-party applications Helm uses a templating model so you'll have your chart which includes a number of templates for all the resources you plan to deploy within those template files you can specify variables that you want to be able to substitute in values for and then for each environment you'll have a custom values. file containing the specific values for that environment when you do a Helm install or Helm upgrade it will take those values substitute them into the template files and pass the resulting hydrated templates into the cluster one nice thing about Helm is it has this concept of hooks hooks allow you to specify The Ordering of deployment of resources and so if there are dependencies between certain things let's say you need to run a migration job before you upgrade your application you can use hooks to encode that logic Helm charts do represent sort of another deployment artifact that you're going to need to build publish and manage so it's just another process that you have to deal with and potentially automate I think some challenges with Helm are that the templating style those go templates quickly become hard to read as you add more and more templating to your chart Helm also adds a significant amount of boiler plate even for the most simple of use cases and so if you only need to substitute in an image tag and maybe a few other values like resource requests and limits um Helm can feel quite heavy for that and then another challenge can be crd management so we talked about custom resources earlier in the course and how tools use them while you can install all crds via Helm there's not a lot of control over how you would upgrade those crds with Helm and or if you had conflicting Helm charts that were trying to install different versions of a crd that could become an issue as well now a third tool that I want to highlight and one that I'm particularly excited about so much so that I sponsor uh the creator of this tool on GitHub is clue control or clue cuddle it also uses a templating model similar to helm uh and also has hooks for configuring the order of deployments it it requires much less boiler plate than Helm when implementing it for a project you will have a number of template files kind of like Helm you'll have a clue control. yaml file that you define your environment in and then within that file you'll specify the different environments you want to deploy to and you can have variables at that top level that get templated in or you can use those top level environments as sort of an entry point you store a configuration for each of those environments within the templates themselves it integrates nicely with Helm and customized so if you're already using those tools it's not too difficult to migrate over piece by piece as you see fit and there's a built-in giops engine which you can use as an alternative to something like flux CD or Argo CD that allows you to leverage all of the features that clue control has there's also just a ton of nice developer experience features within this tool that I think will become apparent as we start to interact with it I do want to call out a few other options that I've seen in the wild some people will Implement their own sort of templating solution with bash and M stubs or using yq to substitute values within the AML this generally works and is fairly simple to get started with but over time you end up with this very complex convoluted uh setup that's hard to maintain uh there's a tool called timoni which is a quite powerful application packaging system that allows you to package up both your application container images as well as all the configuration for the different environments as oci artifacts it uses a language called Q which can have a bit of a steep learning curve but it's worth looking into if you have nuanced application packaging needs there's also uh providers for infrastructure as code tools like terraform and palumi that allow you to provision kubernetes resources via terraform or via palumi and then CD Kates is a tool which enables you to write and deploy kubernetes configuration using general purpose programming languages like typescript python or go and by doing so which allow you to use the control flow within those languages to produce configurations unique to any particular environment let's go ahead and jump over to the code editor uh and take a look at what some configurations with customize Helm and clue control will look like for our demo application okay let's navigate into the module 12 subdirectory within this directory we've got a Helm directory Dory a clue control directory and a customized directory we're going to start with customize by looking at the structure here we can see I've built out a configuration for all of my services I haven't included for example the helm installation for postgres or the helm installation for traffic however this should give you an idea of what a customized overlay model would look like for your first Party Services as you can see within the base directory I have one subdirectory for each of my services here uh within that I have the resource yl files these are going to be very similar to what we saw in module 7 where we deployed onto kubernetes in a single environment I then have in addition to my base I have a production subdirectory and a staging subdirectory each of these contains only a subset of the resource definitions that need to be modified from that base so let's take a look at our go based API and so let's go into the base for our goang application this is a deploy yaml file that is actually identical to what we saw in module 7 we're defining a set of containers including the one that we built and pushed we're specifying the port we're giving it some resource limits we're specifying the security contexts so this is all stuff that we've seen before however this image tag is something that's going to differ between environments right so how can we write an overlay that will allow us to modify just this subset of the Amo file if we go into our production overlay for the goang API we have a few files we start out with this customization this is sort of how you tell customize which files it should reference in my base I say each of these files represents a resource within this section is going to be the base definition of any of your kubernetes resources however within our overlay under the resources section we pass a path to the base folder and that's going to pull in all of those resources from that base folder and then I have a number of patches here so the first patch is my deployment. yaml let's take a look at what that looks like and so in my patches I'm changing two fields from my deployment. yaml I'm specifying instead of one replica I want two replicas for production and instead of version Fubar baz I want version production version when we go to apply this to the cluster using customize customize is going to take that base and then replace those two particular Fields within this yaml such that the resulting deployed specification will have these values if we look at our stage overlay in staging we're going to run a single replica and we're going to run the staging version so this shows you that we can share most of that configuration in that common base and then only modify the fields that we care about via these overlays the other patch that I'm making is replacing one of the routes in my Ingress rule for production the host is going to be kubernetes course. DAV obstructor and for staging I'm going to postfix this Dash staging to the end of it now you'll see you'll notice that this Pat looks a little different and this brings up that limitation that I mentioned around customized not having great capabilities for merging together configurations using yaml arrays So within the Ingress route the routes are a yaml array and so in order to figure out which rule we want to modify we have to use this style of patch where we specify an index within that array customize is not able to automatically align our Pat to a particular entry within the array that's why we had to use this style of patch for that particular use case because we didn't specify patches for any of the other resources such as the secret or the service those values are going to get deployed as is from the base into all of our environments the other services look quite similar we're going to be patching in a new image tag and maybe modifying the resource requests or the replica numbers I'll let you review those on your own time let's look at how we would interact with these setups if we want to render out the values but not apply them to the cluster we can do a few things so let's render out the production values to do that we can call Cube C customize and then pass it the path to the to the overlay environment we want to render and then I'm just piping that to yq to get uh the nice syntax highlighting here as you can see let's scroll down to the API goang we've got the host that got patched in and if we find the deployment we can see we got two replicas and our production version is the one being used let's instead render out staging and see the difference so for staging our deployment has a single replica uses staging version and if we look at the Ingress route and if we look at the Ingress route oh looks like I have a bug where I didn't patch that properly in one of my overlays looks like for node I did for goang I did and for client react engine X I did not so let's take a look at fixing that so if we go into our staging overlay for client react Ingress route this was incorrect and should be staging let's render it and now our Ingress route for the staging configuration is correct if we wanted to apply these to the cluster we could navigate to the corresponding directory do Cube c-k and then pass it the path to that directory uh because I don't have real image version t I had that production version and staging version as placeholders I'm not going to do that right now but that is how you would and so as you can see getting started with customize is quite easy and so I think it's a great place for people using kubernetes to get started with deploying into multiple environments the next tool that I mentioned is Helm which I've already covered quite a bit we'll take a very quick look at what a Helm chart for one of our services might look like the types of values that we might expose as the interface so under the helm subdirectory you can see we have chart directory we've got our chart. yaml um within the templates is where you define the resources that will be deployed and so here I just have a single example if we were defining a Helm chart for our goang API so under here you can see this deployment. again is going to look very similar to what we saw before the main difference being now we've got this go templating that allows us to specify number of replicas in the values. replicas and a tag that we want to use for our container image as values. verion in our Ingress route we're able to substitute in an environment postfix and then our secret and service our static yo that have no modification between environments if we now look at the values. yo I've got three values specified I've got an environment postfix a replicas and a version the values. within your chart are going to be the default values for everything so I've specified default version one replica and an empty string here such that the Ingress route template collapses down and we get kubernetes course. devops.com then I've got these example values file that will show how you can pass these in when you're doing a cube control install uh such that they get used as we render out those templates let's start by rendering out the different values files and seeing how that impacts our manifests to do that we can use the helm template command so here I'm calling Helm template passing it the path to the helm chart and then piping that to yq just so we get nice syntax highlighting and this will render out the values based on those default based on the defaults in our values. AML so we would expect the image tag to be default version a single replica and an empty string used in that Ingress route if we go to our deployment you can see default version was indeed used we have a single replica and our Ingress route did not have any postfix on our subdomain great if if we then want to render out production for example we want to pass that Helm template command the-- values option passing it the values file that we want to use in this case it's the values. production. yaml so we're going to expect two replicas and production version to have been used scrolling but down we see that's exactly what happened two replicas and the production version was used now if instead of Helm template we used a Helm upgrade D- install like we've been using for a lot of our third party Helm charts we would pass it that D- values option pass it our values file and these would get applied to the cluster as a Helm release now the other things that I mentioned about Helm is like this is an additional artifact that you need to build and publish and version Etc uh so we have a package command so if I do a t package that runs the helm package and passes it the path to our chart using the using the version that's in our chart. yaml it will then create a tarball containing our Helm chart so here we see API goang Helm chart version 0.1.0 so it bundled up our Helm chart into this package we can then take that package and push it to a registry of our choosing so if I do T push it will do Helm push passing it the tarball that I just generated passing an oci registry where I have created a reposit itory for this particular chart so I had created this repository on dockerhub and as you can see I just pushed this version a few seconds ago now others could connect to this repo and pull down that version to consume within their clusters hopefully that gives you an idea of how you could write Helm charts for your own services and choose the specific configurations that you want to expose now let's use clue control to do the same thing and for with clue control we're actually going to build out a full version of our application including the third party dependencies we'll deploy a staging environment to our SEO cluster and we'll deploy a production environment to our gke cluster now before we build out the entire project let's start with a single service and see how we would use clue control to Define our templates and environments that we're going to deploy our goang application into from there we'll extend it and add all the other services including our third party dependency so I'll navigate into the clue control single service directory and we'll open up this clue control. yaml file which is where you define all the different environments as you can see I've defined two targets targets are the language used by clue control to Define an environment you're going to deploy into often times this will correspond one to one with a kubernetes cluster so a single Target may represent a single cluster however that's not required you can have multiple targets that go into the same cluster into different name spaces for example and all the examples here I'm going to have that one to one mapping from Target to Cluster there's a really nice feature within a Target that you can specify a context this is the context within your Cube config file by doing so you can avoid the potential for you being authenticated against the wrong cluster and applying an incorrect configuration which could be catastrophic if you did so against the production environment by adding a context here you're able to eliminate that whole class of bugs entirely in addition to targets you can Define arguments these are top level global variables that can be used throughout the entire clue control project in this case I'm passing a single argument and that is the name of the environment so I'm going to pass it either production or staging using that argument I'm going to be able to load additional configurations that will get injected into my templates this discriminator field is how clue control keeps track of which resources it is managing so you need to have some sort of discriminator here which will allow the tool to uniquely distinguish between different targets if you're into separate clusters this is less critical but still best practice if you're deploying into the same cluster this is the only way that clue control can know which resources it is managing for a particular configuration and that allows it to do things like prune and delete resources safely so here I'm templating in the name of the target itself into this discriminator so this will be clue control- staging and clue control- production as my discriminator the next important concept and file to look at is the deployment. yo file this is separate from the concept of of a kubernetes deployment so you'll see these deployment. yo files throughout the clue control project within my repo I'm using lowercase deployment to mean a clue control deployment and then as I have throughout the course I use the uppercase D deployment to refer to a kubernetes deployment if we look in this file it has a few things first it's loading in additional variables so like I said that single argument at the top level you could put more arguments at the top level but it can quickly get quite noisy if you put all of your templating configurations within your clue control file so instead I've opted to have just the environment name there and then I'm able to use that environment name to load in an additional configuration file so within this I'm loading in a file at config SL staging or production so if I go in my config subdirectory you can see I have a production file and a staging file and these configurations at this top level are going to be sort of variables that are shared across different services so for example I have my host name which is shared between my two backend Services as well as my react client and so by putting it at this location I'm then going to be able to reference that shared vs. host name across all those different manifests and so depending which Target I choose I'll either load in kubernetes course. Devas direct.com or I'll postfix that- staging just like we saw in some of the other tools also within deployment I have further deployments listed and so you're able to reference other deployments to recursively go through your directories and Define how you want all the resources to be applied in this case I have one path called names spaces that's this subdirectory and I have another path called Services here I then have a barrier and what a barrier does is ensures that items that come prior to it in this list will be executed first such that my name spaces will be created before my services try to spin up and then finally you can place a common label so these will be kubernetes labels that get applied to all the resources that are being provisioned by this deployment we can then take a look at these sub deployments so if we look in nam spaces it's just a single namespace object because namespaces just as Manifest at the top level it will automatically pick it up in the services subdirectory I'm going to have multiple Services here so I have my API goang folder and then the deployment references that subdirectory and finally within there I have one more deployment here I'm passing it a path to the Manifest file manifest is where all of my kubernetes manifest are going to live and then I have another config subdirectory and these will be configurations that are specific to this application so if I load that one up I'm loading in a version and a number of replicas that will get used in my deployment. yo file so if I go here in my deployment. yo I can see I'm referencing that API go lang. replicas that's going to get loaded in from this config it's going to be two for production or one for staging as well as the version is going to get used from this uh API goang specific configuration but then if I look at the Ingress route here I'm using using that shared vs. host name which is coming from my top level configuration files so that's coming from this top level configuration and so this while it's a relatively minim minimal example shows a lot of the different ways that you can handle templating and managing configurations for different environments including sharing some of those configurations across Services while having others being service specific similar to helm template or cube C customize clue control has a render option so we're going to pass it DT flag to specify Target most of the clue control commands will require this target flag and that's because you need to know the context of the Target that you are trying to execute that command against so here we're going to render out the production configs print them all to the console and then pipe that to yq to get syntax highlighting as we can see we get all of the resources associated with our with our go based API and also we get that name space at the top of the list because that's what's going to be deployed first it's adding some convenient labels you can see this common label that we had specified if you recall within our top level deployment. yl we said we want this kubernetes course label to be applied to all of our resources and as you can see here it is here it is you can see our number of replicas was injected as well as the version that we specified let's now render out the staging values you can see our host name indeed has this post fix like we would want and then then our deployment has a single replica and is using a different version that it's picking up from that staging specific config and so that's how we can configure a single service let's take a look at what the clue control configuration looks like when we add in the remainder of the services and start to add third party Helm charts and configuring those as well so I'll navigate up One Directory to the clue control directory and let me just show you what the directory hierarchy looks like as you can see I've added a whole bunch more things the API node client react uh load generator python etc those are going to look quite similar to what we just saw with the API goang where each of them is going to have a config subdirectory with a production and a staging yaml file one for each environment they're going to have a manifest subdirectory containing all of the kubernetes resource manifests with those template placeholders that will be overridden at deploy time what's new here is that I have this thirdparty directory where I'm now installing Cloud native PG as well as traffic from a Helm chart let me look at my top level deployment and we can compare it to our deployment from before so previously I just had namespaces and then a barrier and then Services now I have name spaces a barrier now my thirdparty applications and I need to have another barrier because my postgres cluster that I'm deploying will need to use the custom resources that cloud native PG is installing and then finally my my first party services are going to run last once all of that base layer of infrastructure is deployed going into our thirdparty subdirectory we've got two things we've got our traffic installation and our Cloud native PG installation and then we're specifying that we want to wait for cloud native PG to be ready before this deployment is declared as ready clicking in here you can see we have a few things we have one the name space that will be that it will be deployed into two we have a Helm values file that if we wanted to use any non-default values we could specify them here and three we specify the chart so we've got our Helm repo name the version we want to deploy what we want the helm release to be named the name space we want to deploy into and then this output file is an optional field but it is where a clue control will render out the contents of the of the helm chart clue control the way that it interacts with Helm is actually to render out the contents and then apply them via its own mechanism rather than calling out to helm directly and so this is a file name where the rendered out contents will temporarily live we then have a customization yaml which tells clue control how to apply the final resources in this case we're going to deploy the the name space and then we're going to deploy that output file that I just mentioned which will have the rendered contents of our Helm chart this does mean that the behavior of Helm hooks may be slightly different when using a Helm chart with clue control uh they do their best to maintain behaviors across those two and there's similar hooks within clue control that it will use but it may not be exactly one to one uh in terms of the behavior of those hooks so that's just something to call out here the cloud native PG subdirectory looks pretty much identical we have the name space it's going into we have the helm chart itself including version repo name Etc and then we have any values that we would want to apply in this case we're installing with all the default values now the one subdirectory of our services that's going to look a little different than the others is the postgres directory um this is because here we are deploying a cloud native PG cluster that using the custom resource definitions that the cloud native PG operator is installing so again we have a specific configuration just for the subdirectory here we're saying in production we want two instances so we'll have a read write instance and a readon instance in staging we're just going to have the one and then our manifests here are going to be a cluster so this is using this Cloud native PG custom resource deploying it into the postgres namespace setting up a super minimal persistent volume and referencing a secret where the password is going to be for the super user If This Were your actual application you would not want to use the super user you would need to have likely a much larger disc you would probably want to set up backups the goal of this is to get a baseline cluster setup that we can run our application against so let's go ahead and deploy our staging configuration onto the SEO cluster this was the context name for that cluster in my Cube config to Showcase that the context set in this configuration does indeed get applied let's do a cube CTX uh for My Kind cluster so even though my default context is this kind cluster clue control is going to use the context specified here we'll do T deploy staging and that just does clue control deploy passing it the staging Target you can see that when you run this deploy command it goes through a number of steps to figure out what it needs to deploy it then will give you a diff against the current state of the cluster and your proposed deployment so in this case all these objects are new it's deploying all of our application resources it's defining the custom resource definitions for thirdparty apps Etc one thing I didn't call out that I will look at now is the hooks that I use to ensure that my migrator job runs before my applications if we look under our goang application under manifest the DB migrator job has a hook specified as predeployment or it's a little bit less realistic you should not have two services that are depending on the same API schema um in this case the node API is not coupled to when this migration job runs but I just wanted to Showcase how you could apply this pattern and you would follow the same approach if it were more realistic and you had each service talking to its own schema or its own database entirely but that's going to ensure that the migrator runs before the service is installed and or upgraded it's giving us some warnings about uh validation web Hooks and the fact that some of the custom resources don't exist in the cluster so the dry run can't validate them let's go ahead and proceed here we started it created our two name spaces it's now applying our thirdparty services because of how they were they were next in that top level deployment. yl it waited for our Cloud native PG deployment to become healthy and now it's applying our application manifest it's deploying that migrator job let's go ahead and look in the demo app namespace uh let's look at the logs for that K logs DV migrator okay we got a connection refused I wonder if that is just that the cluster the clust cluster itself wasn't healthy yet yeah so it looks like our cluster is still coming up looks like my postgres instance is now up and running let me go ahead and just rerun the clue control install and that's the nice thing about having everything defined declaratively we can just reapply that same configuration and ideally issues will self-resolve as the resources come alive we can see the missing objects from last time are the ones that we're trying to wait for that migrator job to finish so if I run this we've got our migrator job should be coming up looks like the migrator job completed successfully then created our go API deployment so that's great our react client is in a crash loop back off State let's look at the logs it was not finding our go API running now that it is running the next time kubernetes tries to restart it it should come up healthy or we can speed that along by doing a k roll out restart deployment client react great looks like it came up healthy that time now the final issue is that our load generator python is in an image pull back off state that indicates to me that the secret that should be used for the dockerhub Repository is not working properly off screen I just added that secret let me go ahead and do a roll out restart for that pain creating State and we're in a running State okay and so now our application appears to be healthy let's look across all the name spaces if we find our public endpoint for the load balancer here it is this is what traffic is going to be listening on so let's go ahead and modify our Etsy host file we could also set a public DNS record for this but it's just faster to do it this way now if we navigate in our browser zoom in and there is our first manual human request to our API so we see the request count one for our goang API uh but the load balancer sorry the load generator has already made a number of requests to our node API let's refresh again we see a second request here and this continues to tick upwards so there we have it our staging configuration deployed to our SEO cluster now and now just to showcase the power of why we set this up in the first place let's deploy our production configuration to the GK cluster to do that we can just switch to our GK context I have it specified here in my top level clue control config file let's just modify this let's do a deploy again we got the same listing of objects we've got uh some warnings about the custom resource definitions not existing we're going to proceed anyways now once again it's going to need this job is going to fail until that uh postes cluster is healthy and so let's just let it sit here for a minute and see if the retries on that job will be sufficient it should retry I believe six times by default uh will be sufficient to have it run successfully on this first go it looks like the first of my two replicas is now healthy so let's see now if the next time this job executes if it will succeed I think it will it did succeed however it looks like it looks like the clue control command timed out so I could either extend the Timeout on the clue control command I could add an additional dependency to ensure that the migrator job didn't execute until my cluster was up uh but in this case this is really only going to happen the first time we deploy so I'm just going to I'm just going to go ahead and rerun my apply command and allow it to proceed a second time looks like everything was applied I'm not sure what this uh issue here at the end I'm seeing something about how the command line is trying to access the secrets that it's provisioning um but it looks like everything applied successfully uh once again my low generator is in an image pull back off State this is a fresh cluster with no with no image pull secrets in it let me deploy that off screen and then I'll roll out restart the deployment our load generator is now healthy also you can see that I have two goang apis here because I implemented that configuration with the production instance having two replicas and the staging instance only having the one I can grab my external IP for that load balancer modify my Etsy hosts now we can navigate to the production URL there we go we're live in production and so I now have a single configuration that I can use to very easily template the specific values that I want to change across environments we saw the now let me make just a trivial change to one of our applications for example let's add another replica here to production and deploy again this diff feature is incredibly powerful and can help you from making stupid mistakes because we can see exactly what is changing between resources here we see the number of replicas in this specific deployment is going from two up to three great I haven't seen any other tools that do this good of a job against of diffing against the live state of the cluster when you're using gitops you can see a git diff that's still one layer removed from what could from what is deployed in the cluster it should match but it doesn't always now if we do a get pods in demo app namespace we can see three pods including the one that just came up 16 seconds ago hopefully that gives you an idea of how powerful clue control is in terms of it gives us that templating power of Helm but without a lot of the overhead and boiler plate that comes with it it kind of gives us it's it's basically as easy to get started with as customized there's a few additional files we need to use and we do have to install the clue control binary given the power that it brings I think that it's totally worth doing so pretty much any Green Field project that I'm building out right now I try to use clue control unless there's some reason that I can't and I would urge you to as well we'll see in the cicd section that there is a built-in gitops controller so we'll be able to take all of that all of those capabilities that we just learned about and apply them automatically to our cluster using the clue control gitops controller and really take our automation of deploying to different environments to the next level another important operational consideration with kubernetes is ensuring that your cluster remains up to date both the control plane as well as your worker nodes kubernetes the project releases updates every 3 or four months generally about three times a year and after a certain period of time they stop maintaining those older versions that means those older versions won't get security updates and as you build out on top of kubernetes you're NE going to find a feature that was released recently that you want to leverage this module I'm going to take a quick look at one possible procedure for updating your cluster and nodes safely this is a procedure that I've used a number of times in certain cases your managed cluster may not provide the level of control that is required to do this and you may have to upgrade in place but if possible I suggest using an approach like this the first step of any upgrade is to check and see if you are utilizing any of the apis that may be removed in the version you're upgrading to so as the kubernetes apis Evolve they add new versions old versions may get taken away and so you want you want to make sure that you do not have any of those older resources deployed into your cluster and if you do you should upgrade them to the newer versions they have an upgrade path where they wait a few versions before they remove any deprecated resources to allow you to move to the new version get the new resource type get the new resource version upgrade your resources to unblock the next upgrade there's an open source tool to that can help to do that called Cube NT or cube no trouble here's the GitHub repo for it I'm going to run it against my clusters just to verify that I'm not using any this will scan your cluster check what API version you're using uh and check all of the kubernetes resources that are deployed to make sure that none of those are being removed and it will warn you if they are once you're sure that you're not using any of those deprecated apis you want to upgrade your control plane now your control plan can be ahead of your worker nodes in terms of version uh but it used to be by one minor version I believe they've relaxed that to now be two minor versions so you'll want to upgrade your control plane version first and then once your control plane is upgraded you want to upgrade your nodes there's two paths to doing this you could either upgrade the nodes in place usually what this means is that the cloud provider such as in gke has a concept of a node pool with multiple nodes in it it will add a new node with the newer version cordon and drain the older node move which will move traffic onto the new one and cycle through the nodes in your node pool one at a time or multiple at a time depending on the configuration settings you've applied the other option which I think is safer is to deploy an entirely new set of nodes using the newer version once those have come up and are healthy you can then shift the workloads that are running in your cluster onto those new nodes and once that is done you can you can delete the old nodes this blue green approach is safer because if something goes wrong in that upgrade process it's very easy to shift the workloads back because you already have a set of healthy nodes and you're much less likely to encounter downtime due to having an upgrade issue and not having available capacity at the ready I'm calling out here that some people do away with this problem entirely by uh moving to an entirely new cluster so rather than upgrading your control plane some people back up the state of an entire cluster and shift it over if you have a bunch of stateful workloads running your cluster this is much more difficult to do with no downtime but I'm just calling out there that some people feel that that is safer so not only just shifting on to a new set of worker nodes but also shifting onto entire entirely new cluster is a different approach also it's important to call out here that the blue green approach that I'm talking about of creating new nodes and destroying the old ones is much easier to achieve in a cloud environment than in an on premises environment because of that you're more likely to upgrade in place with a longlived physical server that you're managing on premises rather than needing to have a whole new set of Hardware to move your workloads onto and so with that let's go ahead and upgrade our gke class cler using this process we'll start by navigating to module 13 the first step is going to be to run Cube no trouble against our cluster let's make sure we're pointed at our gke cluster it will use your default Cube config context so I'm running just the cube NT command that should have been installed by devbox here it's using Rego which is a language that you can use to specify types of policies it's checking against all of the different kubernetes apis that were deprecated in each of these versions or that are planned to be deprecated in the future we got no warnings back so that means we are good to go and there are no deprecated apis being used so we can proceed to the next step if we look at our kubernetes cluster in the console and click into it we can see I am on version 1296 then we want to list the available versions that we can upgrade to gke maintains three release channels there's a stable which upgrades the most slowly regular a bit faster and Rapid has even more versions available as you can see only the rapid channel can you use the 1.30 versions so let's make sure that our cluster is indeed using that rapid version so we'll update the release Channel accordingly that runs the gcloud container clusters command and passes it the rapid Channel and let's pick one of these versions that we want upgrade to let's grab the default version for the rapid channel that seems fine and let's specify it in our task here paste that in and so this version variable is going to get substituted in when we call our container clusters upgrade command and specifically I'm telling it to upgrade the control plane so let's do that so here it issued that command passing it the version It's warning me that this is going to block other operations on the cluster that's fine now this is just upgrading the control plane it shouldn't impact the worker nodes at all so I wouldn't expect any of my workloads to move uh and they should continue running just fine while this upgrade is happening if I reload the page here on Google Cloud you'll see that my cluster is now in an upgrading State this will take a few minutes behind the scenes so after about 5 minutes that control plan upgrade completed you can see here this warning you can see here this note that if Auto upgrade is in enabled uh that node upgrade would happen in the background and GK has been around enough and has tested enough of these upgrades that auto upgrade in place for a node pool is generally pretty darn safe but if you want to take one extra degree of safety you can do what I'm about to show you and provision a new node pool on that new version and shift the workloads onto it to do that we're going to run the node pools create command it's going to be the same machine type as before two nodes within it and pass at our cluster and Zone just as a reminder previously we had just the default pool now we have the default pool and the updated node pool you can see the versions are different because the new node pool is going to use the same version that the control plan is using if we do kg get nodes within the cluster we can see the four nodes here uh three of them are ready and one of them is not now all four nodes are ready at this point we want to shift the workloads from the old node pool to the new one I have this this command here to you can do what's called coordinating a node and that will apply a taint such that new workloads will not be scheduled onto it then you can issue the drain command which will terminate all the workloads on an existing node there's some add additional options here that can help uh if things are getting stuck around if there's empty directory volumes that would prevent a workload from being deleted or getting stuck for another reason you can add the force flag the first thing that I'm going to do though is to just cordin the nodes so if I just want to coordin the two old nodes let's do K get nodes let's do K get nodes and then we'll issue a k Cordon on that node and that node you can see that the status has changed from ready to ready scheduling disabled because this is now Cordon new workloads will not be scheduled onto it if I do a kg pods if I do a kg pods D wide you can see that these are currently all scheduled onto the default node pool but if I do a k roll out restart on a deployment as you can see they were now scheduled onto the new node pool great any workloads that you worried about having downtime for you would roll out restart and as those new pods are scheduled because we've coordin the old nodes they'll get scheduled onto that new updated pool now just to force everything else once you've moved the workloads that you're worried about downtime for you can issue this drain command and specifically you can pass it to force flag which is kind of like taking a sledgehammer to it just saying terminate all the pods regardless of their current state so let's go ahead and do that just to force everything else over it's evicting a bunch of PODS and So eventually once you've restarted all the workloads that you're worried about downtime for the only things running on your old node pool are going to be demon sets and that's okay because we already have one copy of that running on each of our new nodes as well so we can delete that node pool safely so we can issue a g-cloud container node pools delete we'll say yes and at this point Google's going to go off and delete this node pool from our cluster now a few minutes later the original default node pool is gone we've got our upated node pool here if we look at the nodes in the cluster we see only the two nodes awesome and so like I said doing an inplace upgrade can work and it will cycle one note at a time however having that additional capacity ready to shift the workloads back to if something were to go wrong is a nice Safeguard to have okay we've reached the final module of the course and this is the one that takes a project from kind of a oneoff hobbyist project where you're manually deploying things into a more production ready system where you have automations that are enabling you to push your code to get and have those changes be automatically built into container aages have those images be automatically deployed into your clusters that's what we'll be looking at in this continuous integration and continuous delivery section and specifically it allows us to reach these higher levels of capabilities where early on in the course I showed you how to use the cube C create command from the command line we never want to be doing that in production it's only really useful for learning purposes the next level we progress to in module 7 a manual Cube control apply command we got our configurations ready we apply them in module 12 we defined our configurations to be able to apply to multiple environments however it was still a manual application the next level beyond that would be to run a cube control apply or a clue control apply from an automated pipeline like GitHub actions or circleci and then kind of the next level beyond that that has become a deao standard for companies managing kubernetes resources is something called gitops and gitops is the idea that you have all your manifest in G which we already do however now you're automatically syncing the state of that repo into the cluster and applying those manifests such that your cluster State and your repo State maintain parity automatically for continuous integration we'll be using GitHub actions and generally the types of things that you would want to run in continuous integration pipelines are things like your test Suite so when people are trying to merge new code from a pull request into main you want to execute all the tests that you have against those to to make sure they don't break anything things like linting and validation as well you would generally want to build your container images and then push those container images to a registry for your kubernetes clusters to consume on the continuous delivery side here we're thinking about how we get those changes into our kubernetes manifests and then how we apply those kubernetes manifests to the Clusters we also would want to validate that those deployments are working as expected in this case we're going to use clue control and it's built in gitops capabilities combined with a GitHub action that automatically updates our manifest within the git repo to achieve this automation you could use something like renovate bot to achieve this as well I've implemented kind of a hacky GitHub action that does a find and replace on the specific versions that we care about for the Manifest and so let's jump over to our code editor and get this working now GitHub actions for a repo live in a top level. GitHub folder within there there's a workflows folder and any number of yo files in this case I have a single workflow that I'm naming image CI for a workflow you can specify triggers so here I'm saying every time someone pushes to the main branch in this repo I want this workflow to run or anytime someone pushes a tag that matches this number do number. number format so those will be my production releases whereas I'm going to deploy to staging with each push domain and then I'm specifying this path key here and this says only rebuild images when the application themselves change so because I've got so much stuff in this repo I didn't want to rerun this workflow every single time I changed something in module 4 for example so this says only run this workflow when the module 6 files change because that's where the applications and their Docker files live within a workflow you can specify any number of jobs we're going to start with the job that's going to generate the image tag so this is going to be the tag that we use for those container images and it has just one output that output is the image tag itself and then the second job is going to take as an input that image tag and it's going to build tag and push all of our container images the reason that I separated this as a separate job is so that the second job can run in parallel using this Matrix strategy so we'll have five pipelines running in parallel building our container images all at once whereas the image tag generation only needed to happen a single time if you had separate image tags for each of your services for example you were able to release them independently you might need to generate an image tag specific to each service and therefore they could live in the same job the third job what I've named here update tags and this job specifically goes into the repo finds all of the kubernetes Manifest that use those tags and updates them before creating a poll request such that I can go in as a human review the tags that have been modified and merge that pull request to deploy to to production let's look at the specifics of what's Happening Here the very first step is to check out my code and because I'm using the git tags as a mechanism to generate the image tag I need to use this fetch depth zero this will ensure that the GitHub action workflow has all of the git tags available to it rather than just the latest commit I'm then installing task this task Runner that I've been using throughout I want that available in the runner because I've actually defined the commands that are executed by this workflow in task and then I'm going to generate the image tag here I'm specifying a working directory so this is my module 14 directory and specifically I'm calling the task generate version tag task so let's take a look at what that actually is let me navigate to my 14 to module 14 we'll navigate to GitHub actions within that and you can see I've got this generate version tag command let's execute it so it output 05044 G and then a hash the command that it's running is get describe looking at all the tags and finding the first parent of any particular tag matching on this pattern and so if I run get tag you can see here are all the tags I've applied in my cluster ranging up to 050 the most recent one is this tag and so then when I do this get describe command it starts with that most recent tag it tells me I've made 44 commits since that tag and then this is my latest Commit This 51b 89 and so this command gives me a mechanism to have my image tags increment with each new commit on Main and also anytime you have a release this prefix will change and so this makes it very easy for me to see from my image tag what's the latest version on production and how many commits away my staging version is if I was on a commit with a specific tag Instead This would just return 050 okay so we generated our image tag we stored that in a variable and then we echoed that out into the GitHub output at the key image tag because we specified this output specifically calling out the step that we should grab it from that value of 05044 g51 would then get passed into the next step this needs command ensures that this job won't run until the previous job has completed and then we'll have one copy of this job for each of these paths let's take a look at what steps are going to be run again we start by by checking out so this will give me the code associated with the event that triggered the workflow I'm installing task I'm setting up qemu or Kimu this is a Computing architecture emulation capability such that I'll be able to build both amd64 and arm versions of my container images I'm then setting up Docker build X and these are third-party GitHub actions that you can pull in as you can see with just these just a couple lines there is a GitHub actions marketplace with all sorts of third party actions that you can take a look at and see if they meet your needs and so if there is a action that's available that meets the needs of your particular workflow you can leverage those open source projects and avoid having to write a bunch of custom scripts yourself so we're going to set up Kimu we're going to set up Docker build X we're then going to log into dockerhub passing it a username and token If this job is specifically associated with our goang API we need to set up go in order to build it we also need to set up Co which is that tool we're using to build the container image only if this copy of the job is running for our goaling API and then finally We Run The build image command we pull in that image tag from the previous job we specify a working directory of the path of the particular copy of this job that we're running and then we actually call out to task running our multi-architecture build task from that module we're leveraging the work that we already did in these other earlier projects to execute this task and because we have the same task Name Across each of our services that's why we're able to run this one command and have it work across all of those different projects simply by specifying a different working directory one thing we're not optimizing here is saving and restoring the docker cache across GitHub action workflow runs that's something you could do if you wanted to speed up the build times have it cach those Docker layers between runs now if you recall when we issued these build X multi-architecture build commands they included a push to the registry so that's why we don't have an additional push step these tasks build and push those images now the final job that I talked about is this updates tag job and this is where we want to take the tag that we used for building these images and update the corresponding kubernetes manifests within our repo here I'm saying that both of these previous jobs need to complete before this one will execute and then taking the following tasks we check out the code we install task this is where the bulk of the work is done in this update image tag step we get our tag from that initial job and we run two tasks the first one we update our staging tags and this will run every single time this executes whether it's on Main or from a release and then if this workflow was triggered by a tag then GitHub ref will look like this it will save refs tags and then have the tag number and if this is true then we'll also run task update production tags with the new tags let's go look at these tasks and see how they work I've got this task called update IM update image tags and it as it described recursively updates tags and files with the specified comment and so this looks this takes an input of a path and then recursively goes through every file in the repo looking for a specific comment to identify which lines to modify and then replaces those with the version specified I've included this particular task file as exclude Ed because otherwise the identifier comments that I'm using here would get Modified by itself so this is just a a way that I can prevent that from happening and then I run a series of commands this first one is just error handling saying you have to specify an identifier comment and you have to specify a starting path otherwise it won't work uh and so that is just there to provide a help statement for people to use this command I'm then specifying out some information and then this is where the actual execution of the replacement takes place I use the find command using the starting path as the entry point looking for any yaml file searching for using grep the identifier comment and then taking the output of those and looping over them for each of those items as long as the file is not within my excluded files list then I update that file using said or stream editor I find the version tag that exists and the identifier comment and replace and replace the version with the new tag once I've looped through everything those files will now be updated in place because I Ed the - I command and we can proceed I then have two additional tasks which call this top task so first I check that you have a new tag specified as it is required and then I call the update image tag specifying the two inputs in this case the identifier for staging is staging image tag so looking for those across our repo we can see within module 12 in my clue control services directory specifically in my staging doyo config I have a version with a tag that looks like the one I've been generating followed by that identifier comment and so when I issue this command it will find all of these similar definitions and up update them all accordingly so let's try this locally also for my starting path I'm using the git rev parse command to give me the the absolute path of the root of my git repo and so this allows it to work regardless of whether it's on my system and I've cloned the repo into one location or it's on the GitHub action Runner and it's cloned in another location or it's on your system this will still give you a proper starting point to Traverse through your repo and find all of these versions so let's try this locally we'll start just by running it this task failed new tag is required because I didn't specify new tag it didn't know what to update to and so that error checking we had did its job and so let's rerun this but we'll set new tag equals whoar baz here it looped it found all the yaml files looped through and we see here it tried to update all those files let's go into this one and we can see that now instead of the previous version it is specified as Fubar baz across all of the those staging configuration files if we look at the production ones because they used a different identifier comment it would not have changed we can instead do update production tags with Fubar baz Bing it will Loop through and now the production versions match that while the staging versions are still on Fubar baz this would work perfectly fine um but does make a few assumptions around how you're storing things but as long as you followed the conventions of using these identifier tags this a this is a reasonable approach for updating tags if we now jump back to our GitHub action workflow so on every single run of this workflow we're going to update the staging tags to match on the workflows triggered by release tags we're going to update both the staging tags and the production tags and so this allows us to and so at this point those tags have been updated but only local to the runner within that git repo so now we need to get those local changes pushed back up to GitHub so that we can merge them into Main and get them deployed and so this final action is a third party action called create pull request it does require a personal access token you can't just use the token associated with the workflow a PR request against whatever base Branch you specify in this case main containing those changes and so why don't I go ahead and make a commit to the repo and we can watch this workflow happen and you'll recall that I have this paths filter here so it's only going to rebuild these images when I modify something something in this path so why don't I go into module 6 and I will create a trivial change to the read me I'll just add a period now we can commit that and we'll push it and so if we go to the repo now under actions you can see a new workflow run has been created we're starting out by generating that image tag we then are running our five build jobs in parallel if we click into one we can see the tag that it received was 0 05045 because now it's 45 commit since that latest release two of our applications have now built and pushed successfully let's go over to dockerhub and see that it was pushed we see this version 05045 was just pushed a minute ago awesome okay our build jobs have completed after a few minutes and now we're in that update tags job we can see that it updat our staging configuration files like we would expect and now created a pull request we should be able to go under pull requests and see here the title the tag is correct and my personal access token that's why it shows my name pushed the latest to this branch and if we look at the changes it contained changes to all of those staging configs just like we would expect now because this was a push to Main and not a push due to a release tag that's why it only updated the staging ones and so that showcases an endtoend CI workflow for generating useful image tags building all of our container images across a parallel set of jobs and then going and updating those tags within our kubernetes manifests automatically you'd obviously want to have additional workflows for things like running your unit tests and integration tests but I just wanted to focus on how we get code changes into container images now we can shift Focus onto our git Ops installation to get those versions that are now represented in that poll request into our cluster when it comes to gitops there's two names that people will always reference Argo CD and flux CD those are both very popular and very powerful projects however I'm going to Showcase a third approach using clue cuddle glit Ops one because I think it's awesome and we can leverage all of that multi-environment configuration capability that we showcased in section 12 and two because I want more people to use this project I think the developer experience is actually better in many cases than or Argo and because of that I wanted to highlight it here and hopefully convince you that you should use it as well the concept behind G Ops as I mentioned before is that you're going to have a controller running in your cluster that is able to pull updates from git now that can be triggered via a web hook to increase the speed with which updates make it into the cluster that's not super important the important piece is that we have this operator which is able to pull in updates automatically and keep those updates in sync with the deployed state of the cluster Within in the clue cuddle giops subdirectory we've got a few things defined we've got our top level clue control file so we're going to define a deployment just like we did for our multicluster deployment but in this case the deployment is controlling the clue cuddle giops controller we've got our two Targets again staging is pointing to the COO cluster and production is pointing to my gke cluster the args that we're going to pass into this deployment are just going to be the cluster name and we've got a discriminate using that cluster name to enable clue control to properly prune and delete resources the deployment specifies we're going to start with our name spaces we have a barrier and then we're going to deploy our clusters so Nam spaces here there's two Nam spaces that clue cuddle uses for G Ops we have our clu cuddle giops namespace as well as our clue cuddle system namespace so those will be deployed first and then it will look into this clusters subdirectory where we find the following deployment and this is another interesting thing about clue control that we didn't look at in module 12 in module 12 all of our deployments we're referencing things in our file tree we're referencing a subdirectory however we can reference a git repo public or private we could specify credentials to access it here and so these configurations don't even need to live in our repo they can live in for example the clue control repo and specify that it should install the controller and the web UI so let's go to this repo and see what those contain we're also specifying a tag directly it's leveraging gets ability to have versions and releases and serve us these files rather than needing to pull those in locally so if we go to this repo and then to the install directory the two things it was referencing were controller and UI so the controller one is going to deploy this and the web UI is going to deploy this the web UI is a stateless deployment that will allow us to visualize the state of our clue control resources and the controller is the brains of the operation that's going to actually be performing the command line executions that we would have done locally but now they're going to happen inside of the cluster so these are kind of the system components and then we have two additional subdirectories we've got the all subdirectory so this is anything that's going to be common across our staging and production environment would live in here within that you we specify a custom resource called a clue control deployment this is how we tell clue control about our project specifically we say here is the repo on git and I want you to look in module 14 my current module to find the resources I care about and so by specifying this clue control is going to be able to manage itself via git Ops I want to Target the cluster name specifically and then I want to pass that in as I deploy this such that it will be usable as an argument these two options allow the controller to then clean things up if I were to delete the resource from if I were to delete the resource from my git repository and so this is going to be shared across all of our clusters with the only difference being it will get a a different cluster name specified from the argument that I pass it back to our clusters deployment. the final piece that we're going to install is based on the cluster name one of which will be staging and one of which will be production we're going to install either of these two deployments this deployment is pointing to the module 14 subdirectory these deployments are going to be pointing to the module 12 directory and will control deployment of our application itself just like we did in module 12 but now in an automated fashion now I know that might been a little hard to follow with which elements the clue control deployments were managing so I drew this diagram to help hopefully clarify things on the left hand side we see the two clue cuddle deployments the first of which we're naming clue cuddle giops this is going to be managing itself as well as the clue cuttle controller and web UI deployments and finally it will manage the clue cuttle deployment referencing our demo application deployment from the other module by including itself in this deployment now we can manage clue cuttle via giops that second clue cuttle deployment demo app references our module 12 directory and so it will be deploying our demo app clue cuddle deployment that includes our thirdparty dependencies the postgres cluster and all of our first-party services will'll manually deploy the clue cuddle giops deployment one time it will then install the second clue cuddle deployment which will then in turn deploy all of our applications let's go ahead and deploy these two gitops controllers into the staging and production class clusters we'll start with staging I went ahead and cleaned up the name spaces and resources that we had deployed manually for moduel 12 such that we would have a fresh slate and we should be able to now deploy into this navigating into our module 14 into the clue cuddle get Ops directory we can see we have four tasks here the first one is to deploy into the production cluster second one is to deploy into the staging cluster let's start with staging this just calls a clue cuddle deploy passing it the staging Target we'll say yes if we list our clue cuddle deployments across all our Nam spaces we've got our git Ops deployment uh this one looks like it has finished deploying it was successful this includes two pods within the clue cuddle namespace the controller this is where all of the clue control logic is being applied and then the web UI which we can actually use to investigate and look at how these deployments are progressing when we deployed this a random password was generated which I can get out of the kubernetes secret using this command I'll copy that and then I'll port forward to the service in front of that web UI so that's service clue cuddle web UI and the clue cuddle system namespace on Port 880 now I can go to Local Host 880 and log in with admin and that username that I just copied and we can see here's the gitops deployment it looks like it's healthy you can see it includes both the commands that I issued via the command line so that's this path it deployed two minutes ago and then and then here are the deployments that will be issued by the controller looks like our demo app is still reconciling you can see the node API is up the DB migrator is crashing likely because the database is not up yet okay it just came up 38 seconds ago so hopefully the next time that DB migrator restarts it should connect successfully that will enable the goang API to come up and then our react Cent to successfully start up as well once it detects all of the necessary backends I'll also create that secret for our load generator python now if this were a real project rather than manually creating that image pull secret I would store that in Google Cloud secret manager and use external Secrets manager like we showed in an earlier module to pull those values in automatically to our cluster and then those non-sensitive external secret configurations could live within the git repo alongside the rest of our configuration that way there would be no manual steps after that initial deployment and now a few minutes later after all of those bootstrapping steps have taken place all of our service pods are now healthy we can find our external IP edit Etsy hosts this was our staging cluster navigate to our staging URL and there's our application completely bootstrapped via gitops let's take another look at that web UI refresh the page I'm going to just revalidate now that our application is healthy we've got a healthy validation State it's telling me that my reconcile failed I think it's because it timed out on that initial run let's try it again it's still showing an error here specifically for the traffic dashboard Ingress route now this error appears to be with something with how the helm hook associated with the traffic Ingress route for the traffic dashboard is being applied let me try manually deploying module 12 to see if that resolves the issue deoy staging and that's one interesting thing about the clue control giops model is that it's meant to enable you to use both the in-cluster controller in conjunction with the CLI whereas flux and Argo generally discourage that practice by default if you manually apply a change the gitops controller is not going to revert that unless you have specifically told it to or upon the next upon the next time you push an update to your git repo then it would revert those manual changes and apply whatever configuration you've pushed to the git repo refreshing the web UI we can see now that we have a command line push here that was deployed 1 minute ago and everything appears healthy let's rerun our deployment ah I could have just rerun that deployment on the server side most likely and it would have succeeded we'll try that in the production cluster and so we've got our staging cluster deployed now let's deploy our production cluster clue cuddle deploy pass it the target say yes all right let's take look at that web UI as the two clue cuddle deployments come online this is a random password that was generated so it'll be different than the staging password you can see the gitops clue control deployment we deployed it first through the command line and then the first gitops reconciliation was successful we can watch the logs of this demo app production deployment going through it's waiting on that migrator job which is crashing until the postgres cluster comes online looks like our database is healthy our migrator job has completed our two replicas of our goang API are coming online now I'm going to delete this pod so it'll restart and we've got our healthy Services let's get our service our external IP put it in Etsy hosts navigate to our domain and there we go our our production application is now deployed via giops let's check the web UI and see if that same issue that happened on staging happened in production it looks like it did when I click validate it checks that the target is ready and it is true now let's click deploy the controller should rerun that deploy command behind the scenes and everything should come up healthy there we go we got a check mark at this point at this point these clue cuddle deployments will check for new configurations on GitHub and rerun those deployments every 5 minutes as specified in their configuration and so it will automatically pull in new updates to demonstrate this we can go here to demonstrate this we can go here and merge this pull request that was generated automatically by our GitHub action and we should see an upgrade across all our services on staging from this version to that version looks good let's go ahead and it we'll switch over to our staging cluster just to confirm the versions that we're currently running with let's do K get podo we can see it was indeed using that 030 version now I could wait for this next iteration of this or I can go ahead here oh it looks like it has detected a new version and run it on the GI op side now once that 5 minutes has elapsed this will rerun and we should get that latest version updated I can also skip that by going ahead and clicking the deploy button let's go into our cluster we'll do K9s and it looks like that new version must have been detected because we've got new pods coming online and see all of our pods have now upgraded to the latest version let's just confirm that we're now on version 05045 and we can see that this was just deployed 40 seconds ago and so that just showcases making a change in our git repo and having that automatically be reflected in the n in the Clusters so now instead of having to deploy stuff manually we can just make those changes in our git repo and those will then flow automatically into the cluster at the next iteration of that deploy one aspect of our GitHub action that we didn't demonstrate was the trigger based on a release event so now that we've made it all the way through the course we've got everything up and running our gitops controllers are active we've got our cicd set up I'm going to tag a release we'll go here to the releases page say draft new release this is going to be the 1.0.0 release we'll create a new tag when we publish it and we will click publish this is going to create the 1.0.0 tag we should be able to go to GitHub actions and see we now have a GitHub action running our image CI workflow based on that tag and now because of how we set up that generate image tag command using get describe we'll see here that the image tag that's getting used for our container images is 1.0.0 now that the images have been built and pushed we're updating the tags with that complete we see this autogenerated pull request updating the image tags to 1.0 you'll notice that it is updating both our production tags as well as our staging tags which is what we want we want to be running this image across both environments we can merge this we can go to staging redeploy here we can go to production and redeploy here we can see that our pods updated a couple of minutes ago let's check the version on it and there we can see that our pod is running that up updated 1.0.0 version in production I expect it to be the same in staging both of our apps are up and healthy we can hit either one from the browser and so with that we've got a fully automated pipeline for making code or configuration changes and pushing those to get and having those changes make their way into the appropriate environment building that container image pushing it to its registry updating the git manifest and then having our G op controller automatically pull that in to the cluster and update it State this gives us that powerful familiar workflow where now we can push code to Main and have it automatically deploy to staging the one thing we would need to change is rather than creating a pull request from our GitHub action we could commit those staging image changes directly to main I just wanted to use the pull request option to Showcase how you could have a manual human in the loop if you wanted to but now you've got a really robust git Ops based workflow that you can use across any number of clusters within your organization before signing off I do want to call out a few additional topics that could serve as logical next steps as you continue to build your kubernetes knowledge there's a few networking topics that would be worth looking into we learned about kubernetes networking and how to get traffic into your cluster and how to communicate between services but there's a lot more depth you could explore there I look at the various cni plugins and the trade-offs between them as as potentially how to handle networking across multiple clusters and optimizing your network for maximum scalability Additionally you could look at Network policies and how they can help you secure your your cluster by defining the specific Network paths that should be allowed for egress and Ingress between and amongst your services and then thirdly service meshes there's tools like ISO and linkerd that can provide a ton of networking capabilities they'll give you Mutual TLS automated retries and some additional observability with no application code changes on the workload optimization side you should learn how to tune your services with the appropriate level of resources there's going to be a balance between optimizing your resource utilization and cost efficiency while also achieving application stability in order to do this you'll need to understand the resources that your applications consume under different load patterns there's some tools that can help with this like Goldilocks and krr and you can use those to monitor your applications and they'll provide recommendations about the resources that you should be requesting I would also look at Auto scaling throughout the course we had a static cluster size and a fixed number of replicas for all of our workloads however within kubernetes you can scale at both the Pod or the cluster layer the horizontal pod autoscaler is a component that allows you to scale the number of replicas in a workload up or down this can be based on CPU usage or other custom metrics and then the cluster autoscaler or there's an open source project called Carpenter allow you to scale your cluster by adding and removing nodes based on the Pod scheduling demand speaking of scheduling we mostly let the default scheduler do its thing with the exception of in module 13 where we shifted some workload from the old to the new node pool while performing an upgrade you could look at using node affinities taints and tolerations or custom schedulers to influence where pods get scheduled within your cluster to ensure that you're designing your systems for high availability resource efficiency and to meet whatever specific application requirements you may have have as you start running kubernetes in production you're going to need to understand and Implement some sort of Disaster Recovery plan in case something goes wrong tools like Valero and cast and K10 can help with this if you're using gitops like we did in the course then your cluster State should be stored in Version Control but any stateful applications you're running will need to have an appropriate backup and Recovery Solution and you should be testing that solution periodically to make sure it still works finally we talked about operators in module 8 and how you can extend the kuber API but didn't dive deep into actually doing so within the course taking those ideas and building out a custom operator of your own is another great way to take your kubernetes skills to the next level and with that you've reached the end of the course if you made it this far my hope is that you feel ready to deploy and operate your applications on kubernetes to briefly recap we started by building our foundational knowledge of kubernetes we learned about the history and motivations for the system explored the built-in capabilities learned how to use Helm to deploy applications we then took that knowledge and deployed a representative demo application along with a variety of useful tooling into a kuber Denis cluster finally we explored what happens after your app is deployed how do you debug how do you deploy to multiple environments and how do you automate the process of getting code into your cluster with automated pipelines and giops my goal is for this course to become the go-to resource for people who want to learn kubernetes effectively if you found value in the course consider sharing it with your colleagues at work or with your Network on social media if you do so please tag me I'm @ Sid Palace on Twitter or you can search Sid Palace on LinkedIn also if you want to connect with others who've completed the course come join my Discord Community there's a link in the description and we can continue to talk about all things kubernetes remember kubernetes is a vast and evolving ecosystem hopefully this course has given you a solid foundation but there's always more to learn and remember just keep building