Transcript for:
Understanding Simultaneous Localization and Mapping

In the last video, we talked about how we could estimate an autonomous vehicle's pose-- that's its location and orientation-- if we already had a model or a map of the environment. In this video, we're not going to have a map ahead of time. We're going to have to build one while simultaneously figuring out where the vehicle is within that map using a process called SLAM, Simultaneous Localization And Mapping. There are many different SLAM algorithms. But they can mostly be classified into two groups-- filtering and smoothing. Filtering, like the extended Kalman filter, or the particle filter, models the problem as an online state estimation, where the robot's state and maybe part of the environment is updated on the go as new measurements become available. On the other hand, smoothing techniques estimate the full robot trajectories from the complete set of measurements, not just the new measurements. And understanding these methods can be more intuitive if we address them under a graph-based formulation. In fact, in recent years, one particular framework posed graph optimization, or more generically, factor graph optimization has become the de facto standard for most modern SLAM software solutions. So in this video, we're going focus on understanding what pose graph optimization is, and why it works. So I hope you stick around for it. I'm Brian. And welcome to a MATLAB tech talk. Let's set up the problem. We have an autonomous vehicle, a robot, that has the ability to move through an environment. We've given it lidar to sense the distances and angles to nearby obstacles. And we've given it a way to dead reckon its relative position over time using odometry. In this case, it uses wheel encoders to count the number of rotations each wheel makes as it drives. And from that, it estimates how far it's gone and how it's turned since its last known position. So the question is, how can we use these sensors to understand what the environment map looks like and figure out the robot's pose over time? Well, let's start with a simple mapping problem. Here I've placed the robot in a rectangular room with a circular obstacle in the corner. We can see this map and see where the robot is, but it has no idea of either. To it, it's just an unknown void. Let's start with a very ideal situation, one in which there is no uncertainty, or errors in the lidar, or odometry measurements. They're just perfect. In this situation, developing a map of the environment is relatively trivial. The robot could take a lidar measurement. And we'd have confidence that what it measured is the real obstacle location. It could save that off in a global map, and then continue on. After driving some distance, which, again, we know precisely because of the perfect odometry, it takes another measurement. Now, the two measurements will align nicely in the global map since there are no errors in our system. And in this way, we can just keep driving all around the environment and create a perfect map, all while knowing exactly where we are at all times. Of course, this scenario isn't realistic. There is error in both the lidar measurement and in the odometry. So there's some uncertainty in the estimated robot pose and in the distances to the measured objects. So let's try to map this room one more time. But this time we'll say that the lidar is still pretty accurate, but there is a large uncertainty in the odometry measurements. I don't know. Maybe one of the wheels keeps slipping. So the robot thinks it's moving in a different direction than it actually is. So let's see how this works out for our robot. We get an initial lidar measurement. And we assume that there's an obstacle there just like before. And then we drive a little bit, except this time the estimated pose is different than the real pose. So the lidar returned some relative distances to the wall that the real robot sees, but we can only assume that it's relative to the estimated robot pose since that's all we know. This places the two measured obstacles in different frames, and not a global environment frame. So they don't line up. If we continue this process as we drive the robot around the room, the error in odometry causes our estimated posed to deviate more from the real pose. And what we're left with in the end is this map of obstacles that doesn't resemble the real environment. So this is one of the reasons why we can't just take measurements of the environment and stick the results into a map. Uncertainty in our system is going to mess it up. But this is where SLAM algorithms can help us. As I said at the beginning, in this video, we're going to focus on pose graph optimization. I want to provide a little intuition behind the algorithm and give you a feeling for how it works without a bunch of math. But if you want to learn more about the mathematics, I've left some great resources in the description that go through it in detail. Just like before, we have our real and estimated robot poses, which at the beginning lie right on top of each other in the real environment map. But now on the right, there's also this blank area. And this is where we're going to build up the pose graph. The first thing our robot does is take a measurement of the environment. This measurement is associated with the current estimated robot pose. And we can add both to the pose graph. So essentially what we're doing is saying that this pose is defined as an x and y location and a rotation angle. And along with it we save off the distances and angles to the sensed obstacles. Now, there's also uncertainties associated with this pose entry, but I'll get to that in a second. So we have our first entry in the pose graph. So let's get the second. Again, the robot drives for a little bit and our estimated pose starts to deviate from the real pose. Another measurement of the environment, which gets associated with the new estimated pose. And this combination is now saved in the pose graph. So we have two poses, each with their own local estimate of where the obstacles are. And even though we don't know where these two poses are in the environment, we do have an idea of about how far apart they are from each other. In our case, this knowledge came from counting the wheel rotations. But the idea behind the SLAM algorithm isn't tied to a specific set of sensors. The relative pose distances could have come from another Internal Measurement Source, like an IMU, or we could have figured out how far the robot moved from other external sources, like GPS or visible odometry. The point is that we have some best guess as to how far away these two poses are from each other, as well as a measure of how confident we are in that guess. In this way, there's a constraint that we can place on the relative distance between these two poses. Ideally, they would stay exactly this far apart since that's our best estimate. But due to our uncertainty in the odometry process, maybe we'd be better off if we moved these two poses around relative to each other. For example, it would be nice if there was a mechanism that would be able to move the two poses around to align the currently misaligned obstacles. Well, there is. And the constraints are the key to doing it. To visualize a constraint I think it's helpful to imagine a rubber bar connecting the two poses. The nominal length of the bar is how far apart we estimate them to be. And with no external forces on these poses, the bar will just keep them at this fixed distance. However, if I hold one pose fixed and move the other pose closer to or further from it, it will compress or stretch the rubber bar. And there will be a force that wants to restore the poses back to their nominal distance. And the strength of the rubber bar depends on how confident we are in the distance estimate. If we have more confidence or we have a really good odometry process, then this is a really strong bar that makes it difficult to change this nominal distance. If we have almost no confidence, then this rubber bar is very weak and provides almost no restoring force. So in pose graph nomenclature, we say that the poses are the nodes of the graph. And the constraint or the rubber is an edge. Of course, this constraint acts in all three pose dimensions, both x and y, and in rotation, always trying to bring it back to its estimated distance. So there's not much we can do with two nodes and one edge. So let's keep going. Just like before, the robot keeps driving along and measuring the environment. And after a measurement, we can take the estimated pose, along with its measurement of the local obstacles, and place it in our pose graph and add a constraint. So now we have three nodes and two edges. And we can continue, this filling out our pose graph one pose at a time until we have something that looks like that incorrect map that we built earlier with the exception that there are these constraints connecting all of the poses together. We still can't do much with this information because you can imagine that all of the bars are at their nominal length, and everything wants to stay right where it is. However, we're at a point where something interesting can happen. Our first pose and the current pose are both observing the same feature in the environment. This means that we can build a new edge, a new constraint between these two nodes. We just need to understand how these two features align to figure out where the two poses have to be relative to each other. And in this example, they would have to be in the exact same location. So now we can add a new constraint one that connects the first and last poses together and closes the loop. This bar has a nominal length of zero since these two poses want to be right on top of each other. And the strength of this rubber bar depends on how confident you are in your external measurements and your ability to associate the two sets of data as the same feature. If you're really confident, like we are in this example, this bar is extremely strong. It really wants to pull these two poses together. So hopefully, you can start to see that a loop closure is the thing that makes all of this work. Without a closure, all of the constraints were just happily being met. But with a loop closure, we have a way to inject tension into this graph. The blue bar wants to pull those two nodes together. And the purple bars all want to keep the relative distances the way that they are. And so allowing this graph to settle to equilibrium, or to balance all of the forces that each of the constraints are imposing, is the optimization part of pose graph optimization. And what's really cool about this is that by optimizing the pose graph, not only do we have a better estimate of the current pose and a better model of the environment, but we also have a better estimate of where the robot was in the past since all of the past poses were updated as well. So we got a lot of value from this one loop closure. Of course, you can see that we don't have a perfect model of the environment or the robot poses. But that's OK, because we can continue to improve our estimate by making more loop closures. The robot can continue to drive, adding new poses from odometry and new loop closures wherever it's able to determine a relationship using external data. And for this example, let's assume we were able to relate these three poses together, even though this area of the room is relatively featureless. At this point, all of the previous constraints are the same as they were before. And until we added these loop closures, the graph was still in equilibrium. But now we've added new loop closures. And this has created more tension, which wants to tug and push on all of the other poses a little bit. And so we need to rerun the optimization again and allow the graph to resettle. In this way, you can see how after each loop closure the graph will continue to improve its global model of the environment and the past robot poses. That is as long as the two poses that you connect are truly looking at the same environment feature. If you make an incorrect association and you build an edge with a constraint that is not real, you can imagine how this would persist in your graph, and continue to skew your optimization process every time you run it. It would always be pulling those nodes in a direction away from the truth. So it's incredibly important that you are confident in any loop closures that you make. In fact, it's better to miss a real loop closure than it is to add a false one. You can always drive around some more and make another closure in the future. But it's harder to remove the bad ones once they're in. Now, it's also important to note that a loop closure requires an absolute measurement of the environment, like we got with lidar, or you could get with a visible camera, or some other environment sensor. You can't close the loop using a relative sensor like an IMU or the wheel encoders because there's no way to relate that information back to an older pose. You may think you're near a past node, but without checking the environment you can't know for sure. So we have our pose graph, and we've optimized it. Now how do we build a map of the environment from all of these data points? Well, there's several different ways to represent an environment model, but I'm going to quickly introduce the binary occupancy grid. The idea behind it is that the environment is broken up into a grid of cells. If you believe the cell is occupied, you set the value to a one. And if it's not occupied, you set it to a zero. Here I'm assuming the entire environment is empty, and then filling in the cells where I believe there is an obstacle. But you could also assume the opposite, where the cells are all occupied, and you set them to zero when your sensor indicates that there's nothing there. You can see that with just ones and zeros we've generated a pretty accurate model of the rectangular room and part of the circular obstacle. There are some holes. And we can fill them in by driving around a bit more. But overall, it looks OK. Now, there's also probabilistic occupancy grids, where the cell doesn't have to be fully occupied or not. And instead, there's a probability that it's occupied between 0 and 1. With this model, you have full confidence in black and white cells. And everywhere else is some shade of gray, depending on your uncertainty. So there we have it, SLAM using pose graph optimization. Hopefully now you can see how a map of the environment and the pose of the robot can be determined simultaneously with this method. And now that we have an occupancy grid map, we can start the process of planning a future trajectory through this environment. The robot no longer has to wander around aimlessly, but can use this map to plan where it wants to go. And we're going to cover some ways to approach planning in the next video. But before I end this one I want to remind you that a great way to learn this stuff is to just practice it and try it out on your own. There's a MATLAB example that uses the navigation tool box called Implement SLAM with Lidar Scans that builds up an occupancy grid map of the environment using just lidar, no relative odometry process required. Essentially, the environment has enough unique features in it that the lidar measurement of every pose can be related back to other past poses just through data association, or feature matching. It's worth checking out and playing around with. And I've linked to another video below that walks through this exact example, if you want more information. That's where I'm going to end this video. If you don't want to miss any other future tech talk videos, don't forget to subscribe to this channel. And if you want to check out my channel, Control System Lectures, I cover more control theory topics there as well. Thanks for watching. And I'll see you next time.