Exploring OpenTelemetry and Observability

for those who don't know me my name is Steve Flanders I am a senior director of engineering at Splunk which was recently acquired by Cisco so I'm not sure which one I'm supposed to say uh but I still call myself a splunker uh I've been involved with the open census which is kind of the precursor project to to open symmetry uh there was open sensus and open tracing uh and now open symetry since the the very beginning uh been involved in the observability space for over a decade now I was at VMware working on a logging product I joined as part of a founding team uh at company called omnition that Splunk acquired it's now the Splunk APM product uh and at Splunk I've been responsible for the Splunk infrastructure monitoring product which was the signal effects acquisition uh so I've been working with traces metrics and logs uh for a very long time and then very recently in fact last week I released a book called mastering open Symmetry and observability uh hopefully you will uh take a look at it so let's start quick with a a poll I mean I think we're at cubec con so I think there's going to be a lot of hands here but how many people have heard of open simetry okay great how many people are using open slimmetry how many people are using open slimmetry in production oh wow I love it we've come a long way uh so many people in fact earlier I was having some conversations with some folks that were talking about the collector uh which I thought was great right so I'm going to start with a little bit of basics who has not used the open open elry before anyone here we go yeah there's a few hands that that's expected so I'm going to start with a quick introduction of the project but I'm going to spend most of my time today talking about one specific component and that is called The Collector uh so let's start at a very high level what is open Telemetry basically it is an open standard uh that makes it possible to generate collect and process Telemetry data uh you might have heard of the three pillars of observability traces metrics and logs open symmetry provides a standard for that but it supports even more than those three pillars right you have things like client instrumentation or profiling synthetic data and the like uh so the project is really focus focused on all Telemetry data that you would care about in your environment whether that's within an application maybe an end user device like a browser or a mobile device uh any language that you care about all of that would be covered but open simetry does not provide a backend for that data instead it is vendor agnostic and allows you to send that data wherever you want whether that's an open source project your own home Brown uh home BR home constructed item uh or like a vendor for example cloud provider or or some service provider something to that extent but very very flexible now how it does this is first it defines a standard that's called the specification the specification basically defines the rules of like what must happen What should happen What could happen in terms of generating this simetry data for any signal that it supports signal is the term that they use for this simetry data so a metric is a signal Trace is a signal log is a signal there's different types of them uh and then the project enables like context and correlation across these different signal types and there's a lot of value in that now you may be wondering like well why does this matter why would you need a a standard for this because prior there wasn't one and so basically any backend that you use would provide its own instrumentation uh and that may or may not have the capabilities or the Integrations that you care about there security aspects operational concerns and changing it especially from a customer end user perspective was very very difficult now you can instrument or add data collection components from open Petry and you can use that across vendors across open source tools that you have uh which makes it very very powerful so on top of this specification there's basically two reference implementations there are what known as uh instrumentation libraries these actually go within your your code itself uh you can do that through things like automatic instrumentation or manual instrumentation uh and then you have basically a collector component you can think of it like an agent or a Gateway or a pipeline that basically exists that will receive this Telemetry data processing in some way and then send it wherever you want it to go uh I could spend all day just talking about the project there's really a lot to it but as I mentioned I'm going to focus a lot on this data collection part today now why does open symmetry matter well first kind of it has an open standard that's great because now we have kind of one way of doing things one kind of terminology that we can follow uh but it also provides the ability of kind of being vendor agnostic so you're no longer tied to a particular vendor when you're using uh instrumentation or data collection pie pieces uh this gives you kind of data portability and data control you choose what to do with your Telemetry data you choose how much to generate you choose where to send it you have full control of it now and that's very very powerful so if you want to have a data Lake locally if you want to run like open source tools within your environment you can if you want to send that to some like managed provider or a SAS solution you can if you want to send them to both like a local and ands as you can uh it's very very powerful from a end user perspective to help you achieve observability now of course we're at cucon so open symmetry is part of the cncf but it's actually a very active project in fact it's the second most active one in all of cncf behind only kubernetes uh and this I actually think shows that it's a real problem that needs to be solved and the community is great it has a very large ecosystem uh and you see collaboration across vendors Cloud providers and end users which I think is amazing right because working together that's how you get the best observability uh possibility at the end of the day and what you will see is that basically every major vendor or cloud provider supports and contributes to open cementry today uh and many end users have already either contributed or adopted it as well in their environments uh and again it's really for this flexibility that it provides now you can have one way of doing it uh and it supports other Open Standards that are out there right so it's meant to be flexible and extensible as the observability market kind of grows and evolves so will the open symmetry project along with it all right so I mention mention most this talk is on the collector so that's it for kind of introductory information what is the collector basically it's a binary that exists that allows you to receive process and export data visually kind of internally it looks something like this it's not a complete picture it's kind of a simplification of it uh but you have this notion of basically receiving data kind of getting it into the collector and there's different ways to do this you can kind of use push or pull mechanisms for example if you're familiar with Prometheus you would usually scrape an endpoint to get metric data that would be kind of a pull mechanism versus if you've used distributed tracing before it's often that the application will push the trace data to an agent or to an endpoint that would be a push mechanism now once you receive this data you may actually want to do something with it just generating it may not be sufficient for example maybe you want to process it in some way you want to filter what is actually sent to an endpoint you want to redact sensitive information that might exist in there maybe you want to do some sort of aggregation or sampling or what have you all of that happens within the internals of The Collector itself and then eventually you want to export it you want to send it to some destination The Collector is usually not the final spot for it um you could persist it to a dis if you wanted to but that's not all that valuable because at the end of the day you want to actually query that data you want to get like dashboards and kind of visualizations or alerts from it and again open Telemetry doesn't provide a a backend instead it plugs into available backends that are in the market today all right now in terms of reference architectures there's two primary ways to deploy The Collector today uh the modes are called Agent mode which basically means it runs as close to the application as possible that might be a binary along with your application that might be a sidecar could be a Damon set in that it runs on like every single host in kubernetes for example but basically it's very close to the app itself the benefits of this mode are that you're basically offloading responsibility from the application as quickly as possible you have it generate the Telemetry data but you don't have it do any of the processing now the benefits of that are kind of twofold one is you're not introducing additional load within your application that consumes more CPU memory kind of resources in general and second you can now have the The Collector which is a single binary handle all the processing in a generic way otherwise you have to have those processing capabilities in every single language that you have in your environment because the application instrumentation is going to be language specific at the end of the day uh so's some nuances to that in terms of configuration or even knowing like language specific uh semantic conventions now the other way that you can deploy The Collector is as a Gateway or or as a service this basically would sit in some sort of network boundary maybe this is like a data center or region or realm or whatever your terminology is uh and in this mode it's usually clustered so you have more than one instance of The Collector and you have usually a load balancer in front of it so you can support uh a whole bunch of load coming in in agent mode you have one instance it runs right alongside the application uh so if you're doing like um uh a Damon set you're going to have one per host you would not cluster it in that in that way because it's using certain ports on that host you would have Port conflicts it doesn't actually work uh in Gateway mode you have to have a cluster of these or you usually have a cluster of these because you want to have things like high availability uh or maybe you're supporting a large number of applications or host in your environment so just the pure scale of it requires more than one instance uh both the agent and The Collector can send data wherever you want it to so you don't have to use like both here you don't you can choose either one and at the end of the day everything in hotel is optional so you can also choose not to use the collector uh if you wanted to you can actually have the application the hotel library for example send directly to a back end if if that is uh if that solves your business needs at the end of the day um so really flexibility and choice every step of the way depending on your requirements uh personally I think the collector is kind of a great component because it offloads a lot of responsibilities and kind of can get you into a vendor agnostic State a lot faster uh you don't don't even have to change your uh application instrumentation if you're already instrumented your apps let's say it's not otel based as long as it's a format that the otel collector can receive it will handle all the translations for you so you could receive in one format let's say you receive in Prometheus format and you can export in a different format so if you want to send an OTL for example which is open telemetries protocol all those translations happen within the collector automatically that's what makes it vendor agnostic at the end of the day uh now there's a couple specific things I want to drill into this will become more relevant as we get into the to the the demo but I mentioned it's a single binary that binary is written in go uh it's a compiled language so the good news is it supports like major operating systems and you can just pull down that binary if you want to use it there are other ways you could deploy as well I mean we're at cubec con right I'm sure lots of you are using Docker containers or some sort of container engine uh you may even be using kubernetes all of that is fully supported right open simetry is kind of borne into the cloud native error which is great so it has native support for all these different environments uh and there is packaging for this built into the to open Telemetry community so if you look at the documentation you'll find that there's Docker images that you can readily pull down uh and they they are usually customized for what you're using so in the case of like kubernetes what you'll see soon is that there's actually packaging where it only pulls in the components necessary to support kubernetes you could add more if you wanted to uh but the goal is like to to give you native support for it easily uh and again everything is extensible so if you want to write your own packaging it's actually possible to do that as well now there one uh notion in open symmetry that you may have heard of it's called distributions uh open symmetry natively has three basically three primary types of distributions for The Collector today uh but anyone can have a distribution of open slimmetry if they want to that could be an end user that could be a vendor that could be another open source project um so the project itself has three uh they're typically called core contrib and kubernetes or KES uh core basically has the core capabilities that are required to use it with just pure uh so for example I mentioned that open Telemetry has its own protocol called OTL as a result The Collector has an OTP receiver and an OTL exporter because that's necessary for the project that would be in the core repository because it's core to the open Telemetry project contrib is where everything that's not core to the project or maybe is a little bit more Nuance maybe it's only applicable to a subset of environments it would live in contrib so if you have something let's say I want to send data to an endpoint like Zipkin or joerger that's going to be in the Tri repository it's not core to open simmetry but it is a capability that some amount of end users are going to care about and want at the end of the day now the the reason why I'm mentioning core versus contrib is because that'll be necessary to actually configure The Collector coming up that's why I want to make sure to cover it and the case of kubernetes again as I mentioned this just packages the applicable components the receivers processors exporters that you need to support a kubernetes environment uh now there is a tool available called OCB which allows you to build your own distribution so if there are like certain components you want to pull out of contrib you don't want every single component that exists in contrib you only want a subset you can actually build a manifest file and create your own distributions and what you will see is that a lot of cloud providers and vendors and open source projects had their own distributions too right so a distribution is not a fork so if a distribution is done right it's basically pulling in the open citry components that are necessary in order to support a specific environment most uh vendors today have their own distribution usually they're only pulling in the components that are relevant to them maybe they have their own for example vendor specific exporter that would be in their distribution if you're not using that vendor you don't need that exporter um many vendors are actually moving to the open simetry format so hopefully in the future it'll all just be native OTL uh but not all are there today uh so that's one particular example or maybe there's some like specific thing that a vendor supports that's not applicable to other environments or what have you maybe that could be in a distribution as well um ideally distributions should not be custom and that they contain things that are not eventually going to make it into the open simetry project because that would be vendor lock in again right it doesn't actually preclude the the problem of hey I've instrumented one so I have one data collector and now I can collect everything so ideally it should be the same config that otel provides uh and it's possible maybe uh some some third party has developed something that they haven't been able to get Upstream yet maybe they temporarily have that in distribution but longterm everything within a distri bution should be available in Upstream all right so let's talk a little bit about open symmetry configuration and I've mentioned several of these already right but there's basically five major components to The Collector receivers getting data in processors we talked about those exporters there's two more uh so one is called extensions these are usually things that do not touch the Telemetry data themselves but provide some sort of capability that you might want so for example if you want to have like health check capabilities for The Collector that's an extension uh but extensions can actually enrich things like receivers and exporters let's say I want to have some way of like authenticating a receiver or an exporter that would live it as an extension let's say I want to have things like service discovery that would be an extension and so that extension could be called by a receiver or an exporter but an extension itself doesn't typically touch the Telemetry data it provides some additional capabilities on top of it and then the newest type of component is what's known as a connector a connector is unique in that it is both a receiver and an exporter uh and so what a connector does is uh in open symmetry you build what are called pipelines so I Define like receivers how I get data in uh processors what I want to do with that data exporters how I get that data out that would be a pipeline I can have multiple of these pipelines but after I run that flow let's say I get to the exporter step maybe I want to reprocess it again I want to do something else with the data that's where a connector comes in so for example let's say process something and I get some Metric out the other end that I care about and I want to do something with that metric I could create a connector that says okay I'm going to export into the connector and I'm going to reprocess it in a new receiver which is the same connector and I'm going to do a different thing with that output at the end of the day um so there's a bunch of different use cases for this but it's a little bit more advanced I'm not going to talk too much about it but you should know that that is available components as well how was The otel Collector configured yaml how many people like yaml oh wow a lot how many people people don't like yaml yeah I'm with all of you I've never liked yaml uh so everything's yaml Bas you're in kubernetes everything's yaml based so I mean you're going to be very very familiar with it at the end of the day uh but the nice thing is if you understand the components and the structure of the collector you kind of understand how the configuration of yaml works right so like every component I mentioned that's a top level construct in the yaml config you can see receivers here right you can see processors exporters very very simple and then for every single receiver a receiver has a name for example you can see a host metrics receiver here and then every single component can have some amount of configuration and of course that is documented somewhere so you need to go find that documentation but then you you specify the right amount of configuration for it and that component will work now there's a couple things to note about the collector configuration specifically uh it's really a two-step process so step one is you must Define the components that you want to use and configure them the way that you want them to be configured so if you want receivers processors exporters you have to have a section for those and you must properly configure it but just putting it in the config just this top level that you can see highlighted here that does not enable it it has only defined it and configured it to enable it you must add it to a service pipeline that's the second step so you can see at the bottom here it says Service pipelines those pipelines are to imetry specific so you can see metrics and traces listed here but logs and other other data types would also show up and then for metrics you can see it defines which receivers it's going to call and clearly this receiver must support metrics otherwise the config would fail and then exporters but you have to specify the ones that you want so configuration here is really a two-step process if you only Define it uh if you define it outside of a service pipeline like let's say I had uh this host metrics receiver up here but it wasn't listed in any of these pipelines it's kind of useless it's not actually doing anything it's just in the config so note that you have to do kind of both parts now how do I find all these configuration options get Hub readme pages so going back to this core versus contrib thing let's say that I care about a specific receiver config I would have to know whether it's in the core repository or the contrib here I'm in contrib I'd have to go to whatever component I care about here I'm in receivers maybe you want processors or exporters and then I'd have to go find the config or the component that I'm trying to configure in that component so here I'm looking at the OTL receiver there's a readme file and that readme file has a configuration with yaml examples on what's required to get started and all the configuration options that exist there so this is a great reference to make sure that you're configuring it properly at the end of the day you will have to check to see if it's in core or cont trip now once you know the configuration you have to specify that to The Collector at the end of the day uh The Collector is binary so here you can see uh multiple different examples one is taking the otel co binary and passing a config flag and I pass a ammo file and you can actually pass more than one and you can actually combine these so it can get very complicated depending on how you want to do this uh those config files could be statically defined or dynamically generated for example you can pass environmental variables into the Amo file if you want to uh and there is something in open symmetry that makes it a little bit more SQL like uh it's called the open symmetry transformation language or OTL uh and this is another way of specifying configuration uh a lot of OTL is still in the alpha state so I'm not going to talk too much about it but I do think it's a very cool project and something to take a look at and I think in the future many of the processors if not the entire otel config will be based on OTL so stay tuned for more information regarding that all right so let's do a quick demo on how to kind of get this working uh for an environment so I have downloaded the binary to my system I'm on uh an OSX system here so I have Darwin arm 64 and uh in here it basically just has the uh contrib reposit so I'm not using core I'm using contrib because it has a lot more cool components and so if I just want to run this I would just say start the the binary right now it's not going to work because by default there is no config that I specified to it so it doesn't start it basically says you need to provide a configuration so uh let's go ahead and build something so as I mentioned we have to have components that we care about so I'm going to want receivers I can say OTP again open simetry protocol and I can say host metrics let's use that as an example and host metrics has this notion of scrapers scrapers are things like CPU memory dis things that I care about and let's say I want to enable the memory scraper so very very basic configuration here and then I will Define exporters and to make it very very simple uh I'm going to use the debug exporter which basically logs the output to the console uh so I'm not actually sending it anywhere specific but of course you could send this to Jer Prometheus or elastic or whatever you care about any back end uh all those configurations are going to be kind of specific to your environment now all I need to specify is receivers and exporters processors are technically optional so this is a valid config at the end of the day but I have only uh defined them and configured them I still have to build my service pipelines uh and so here let's say I want to have a metric one and I have to again Define the order in which I want these to happen so I'm going to say let's enable the host metric receiver and let's uh send that data to the debug exporter so I mean very very basic config at the at the end of the day here I'm not actually doing anything with the data I'm just taking it in and I'm sending it back out on the other end now uh first question might be is this config valid answer is actually no but um you don't know when you're typing in yaml but it turns out that open symmetry has this validate uh command which is pretty cool so basically it will validate the config file that you have uh and make sure that it is actually uh containing what you what you you need so here it's going to come back and say this is not valid right I I looked at this and it says that the exporter is an invalid key for the service pipeline metrics okay so that's because it's exporters plural matters here I have a syntax error now the reason why I'm showing you this validate command is because I don't see a lot of people use it uh if you were to push this to production and you don't run the validate commands The Collector didn't start as you saw right that means it crashed well let's say you had a working collector config you made a modification you push that to production you don't run the validate command you would bring down the collector and you'd start losing Telemetry data it's not great right so you should always validate before you actually deploy this otherwise you'll run into problems okay so we fixed the first typo now it's saying hey there's something wrong with the OTL Receiver right it says that it needs to specify some sort of protocol well maybe it's my first time using the OTP receiver so I know nothing about it so let's go ahead and go find the uh open simetry receiver so I'm in the core repository receivers OTL there's a read me it says well if you define an OTL receiver you must Define protocols and you can choose grpc or HTTP or both well I didn't do that so again I have an invalid configuration so if things are not working on your collector go check the readme files they're very very helpful uh if you see a documentation problem in the readme please file an issue or submit a a PR to kind of fix it uh but the readme file is a great way to see if you're doing things correctly so I can say protocols here and I could protocols and I could say something like HTTP and now let's run it one more time and see if we are valid and no error this time great I at least have a valid config now it's valid but remember I have OTL defined here I didn't Define it in a pipeline OTP is not actually being enabled so it's kind of useless but it is Val config like The Collector will start it's not going to crash on me uh so we have kind of our first step so let's go ahead and run this I can say run it with the config file and now it will start and it is actually logging information that is very very relevant for example it tells you what is actually being configured in these info messages and if something was wrong it would actually show warnings and errors in the logs so I highly recommend that you look at the logs as well this is a great way to see if things are working as you expect at the end of the day now the most important line is actually this last one which basically says hey I actually collected some metrics and uh three data points actually and I sent it to the debug exporter now you can't actually see that here because it's not being it's just being summarized as to what's being generated let's make this a little bit easier to to see so we can say uh verbosity debug and just rerun this and then oop unknown metrics verocity oh details yes my bad details see this is why you need to go check the config uh go and so so now that summarized line becomes an actual output that I can read so I configured the metric scraper for memory and you can see it created symmetrics for me used free and inactive and you can see the value of those data points as well so if I were to send this to Prometheus it should receive this data or if I expose a Prometheus endpoint it could be scraped by a Prometheus server so again very easy to get started but this is pretty basic right like no one's probably going to have just one receiver one exporter at the very least you're going to have processors so let's talk about processors because they're kind of uh important and unique here so first there's something known as the batch processor and second there's one called the memory limiter uh I'm going to call these out explicitly because uh it's a little bit buried in the documentation if you look you go to the colle I'm open symmetry iio docs collector if you go to the configuration it will tell you how to like configure so you can see passing the config flag it's showing you different ways of passing environmental variables it tells you about receivers processors and exporters we can go to processors and there's a link here that says processors are optional although some are recommended that's interesting let's click that and it tells you actually that you should have the memory limiter first and the batch close to the end and some other specific ones in between well you have to manually configure this so if you didn't read the documentation you might not be aware of this but by default open symmetry doesn't batch so every single like Telemetry thing that's being generated will just export immediately well well at scale that's not ideal you kind of want to batch this stuff up it compresses very very well you want to resour uh limit some of the resource use or network connections that you're making the memory limiter is actually also very important if you don't configure the memory limiter then the collector can eventually consume all the memory and then crash because it's out of memory so setting the limiter in place is very important for production environments otherwise Again The Collector could be in some sort of Crash Loop State and you'll be dropping Telemetry data so I highly recommend configuring both uh the batch processor actually has a default config that's pretty sane so you don't typically have to specify more options um in the case of the memory limiter you have to specify how often you want it to check maybe like 5 Seconds uh and you need to specify at least some sort of limit so I'm going to specify like a 400 megabyte limit and then again that's only to find it and configured it if I want it to actually be enabled I have to put it into a pipeline so I would have to say memory limiter because it recommends putting that first and then batch so that actually brings up another excellent point which is processors that list is executed in the order in which you define it so the memory limiter is being run first the batch processor is being run second if I switch the order it happens in this the different order so the processors are the only ones with the order that you define it in the configuration matter it's not true for receivers or exporters it's only true for processors uh so keep that in mind so okay did that uh First Step should be validate the config let's make sure it's actually valid there's no errors great then I can just rerun it and this is the exact same test I didn't actually change the Telemetry data so the output is going to be the same other than the values and the time stamps have changed but like it's behaving the exact same way but now I have a more production like config I'm configuring things that are recommended by the openet project one final thing let's add one more that's actually a little bit more interesting I'm going to add the resource detector uh I'm going to probably forget the config here uh detectors and system so this is going to check the system uh and attempt to add more metadata I hope it is called resource detector I'm going to find out in a second uh and I want to add it between these two so resource detector and let's see how I did in terms of configuration nope uh resource detector processors what's it called I forgot what is it detector one yeah resource detect oh oh thank you oh it's just a typo thanks pretty close still doesn't like me resource detector thank you thank you resource detection so close it's been a long day no underscore really no not no on score okay I tested this before and I've already forgotten so this is why checking the documentation is important and hey thanks folks uh config so this is only going to do one thing it's going to still generate the used free and inactive but if you scroll up you're actually going to see that it now contains some more metadata that it didn't contain any before that metadata is the host name and the OS type of my system now the resource detection processor can do a lot more than this it's a very basic example uh and it can hook into things like kubernetes or your Docker a Damon set or anything like that and collect that metadata too um but this is enriching this limitary data and honestly this is very important because that's how you do problem isolation and kind of root cause detection at the end of the day you want to know why memory is high or where exactly in your environment it's happening that's where things like resource detection can be helpful uh now a final example would be like showing you how to do like crud metadata operations create remove update delete type things uh you can do that to any of the metadata that's attached here for example if there was some like personally identifying information or something that you didn't want to like send to a backend to an exporter you could actually Reda that information or hash it uh if you wanted to enrich this from The Collector let's say I want to add a tag for everything that comes through the collector you can do that as well through processors uh so again pretty easy to get started but you have to know that syntax and even I made a mistake here live right like you have to check the readmes for this the otel documentation I think is pretty rich so definitely take a look they have different examples to kind of get you started but all those receivers processors exporters they're in the GitHub repo today they are not fully up on the on the doc site hopefully that will be fixed at some point here very very soon okay so uh wrapping up here I have a link to kind of everything that I kind of showed here so you can kind of review this afterwards I'll share the links on the on the schedule site so you can kind of see it and then finally uh thanks so much hopefully you found this kind of useful uh hopefully you'll take a look at the book I actually have a few copies so we can do that for like people that have questions uh there's a promo code if for people that go to cucon if you're interested so thank you very [Applause] much and I have a couple couple minutes for questions if you could use the microphones it would be great hello hey um so from the some of the other talks that we heard is like if you want to get metrics at the full resolution while sampling traces we need to run a layered collector is that what you would recommend where we need to run like collector in multiple layers if you want to preserve metrics while just sampling traces so so you're sampling traces but you want all the metrics is that correct yeah with 100% fertility yeah yeah so there there is a connector for this that will give you the the the red metrics out of the the data uh and then if you're using something like tail sampling you'll have to use like a a routing you you have to Route the every single trace within every span within the same trace the same collector instance so if you're using the collector and agent mode it actually has a way of routing to the dynamically based on the trace ID to the same instance you have to enable that in agent mode and then within the collector you have to enable the connector that would actually do the metrics for you if you wanted to you could separate the metrics and the traces into different collector clusters um that's really only necessary at high scale or if you really care about one Telemetry type over the other and you don't want to have Noisy Neighbor problems uh for Simplicity sake I would say doing it in the same col instance is probably fine for most use cases uh but at massive scale or for some Corner cases you may need to break that out thanks yes so a stupid question uh is there any uh is there any ways to debug uh the like processor logic other than like using the debug exporter uh so what I faced was like when I tried to configure uh T to spam Matrix con I wrote a completely like incorrect OTL and didn't export anything and need to figure out and I couldn't figure out why it's wrong by just checking the debug exporters y so yeah so you're asking like hey I want to make sure this thing is working properly how do I see more of the internals it depends on which part you care about uh so for example if you care more about like how the pipelines are constructed there's a Z pages extension and it'll actually show you the pipelines live there's actually a CLA wide way to actually see the pipeline to to make make sure it's like looking the way that you think that it is um there are even sites like otel bin. where you can take a yl file and upload it and it will visually show you what your configuration looks like U so if that's your problem there are ways to do it but if it is like hey I've configured it a certain way and I'm not seeing the output that I expect or like the data is being dropped somewhere the de debug Explorer is one of the easiest ways I'd say um there was another one I'm not sure if it's still there called the tap exporter and there was a way of tapping a pipeline to see data between it I'm not sure if that's still there uh but personally I use the debug exporter often when I'm having issues like that all thank you yep go ahead hi I have a uh question about uh large scale use cases of open Telemetry oftentimes it's advised in monitoring systems to push aggregations to the edge as much as possible yep um are there either any stateful or stateless aggregation capabilities currently in the collector and are there plans to add those sorts of capabilities in the future yeah yeah uh so we actually having this some of this conversation earlier so I talked a lot about the yaml config of The Collector but not necessarily the operational aspects of it uh in general my recommendation would be if you can run the collector stat less it's in your best interest it's way easier you can just add another instance behind the load balancer it scales linearly life is good but of course there are use cases where that's not possible you're doing tailbase sampling it's really not possible because you have to make sure all the spans for a trace end up on the same collector instance that's stateful at the end of the day uh there are stateful aspects of The Collector for example there's a there's storage uh exporters that you can configure uh or processors that you can configure and those are stateful at the end of the day so one use case might be I want to make sure that I don't drop Telemetry data so if the collector like restarts or something I need to retry that data you could use the storage extension here to like store that data locally and have it basically checkpoint pick it back up when the collector comes back online and then export it back out you were talking about aggregation there are aggregation use cases where that's also necessary at the end of the day if you're doing like histogram or certain like Dynamic things where you want to aggregate things up before you send it out um where possible I would say do it in memory because if you lose it it's probably not the end of the world uh but for some use cases like logs or like compliant environments losing that data is a compliance violation you would have to use something stateful on the collector uh to make it work but yes there are definitely stateful aspects that you can configure if you can avoid it I would say avoid it if you think something is missing please file an enhancement request and just say hey I need this capability the community is very very active and can definitely help you with that as well yes all right uh yeah this isn't really a configuration question per se but just kind of the rationale behind using otel versus other tooling um my company were already using fluent bit and Prometheus to get logs and metrics from everything yeah um we've had some internal meetings about whether to start using otel um but it it's kind of confusing talking to some of the senior devs trying to figure out why adding one more tool to what we're already doing or or what would be advantages of votel over some of the other stuff we're already using so look if if what you have is working and it meets your business requirements there's no reason to change right so if you're happy with fluent bit and Prometheus fine no problem right so it depends I would say on what are your use cases and what are you trying to solve for one use case that otel can help with is being vendor agnostic so let's say you want to switch off of Prometheus or for whatever reason you want to switch off of fluent bit if you have otel you can now point it to something else if you wanted to don't have that requirements don't change right it's fine uh so it really comes down to what are you trying to solve I would say if you're using a vendor's proprietary instrumentation or data collection there's a lot of value to moving to something like Hotel because now you're not locked into that vendor if you want to send the data somewhere else you easily can if you want to switch vendors you can easily do it you're talking about open source tooling otel supports fluent bits like they they natively integrate you can have fluent bits sent to the collector if you want so like there's no need to replace what's already working in your environments if there's some value you get out of otel that you might not be getting from fluent bits maybe that could be a reason for you if that's not the case keep going you're already using open source tooling so I don't want to get into like a religious war on which open source tool to work they're all great right like if it's meeting your use cases and you're not locked into a vendor solution maybe there's not a lot of value being provided we're at time so uh I'm happy to take additional questions I can follow people out but I think I'm going to get kicked out so thank you so much [Applause]

Transcript for:Exploring OpenTelemetry and Observability

Transcript for:
Exploring OpenTelemetry and Observability