Physics Informed Neural Networks (PINNs) Overview

welcome back so in this video I'm going to introduce this idea of a physics informed neural network or a pin which was first presented in this paper by maer risey Paris paric Caris and George carneia dois and since its introduction it's become one of the Workhorse uh algorithms and kind of ideas in physics and form machine learning so it's based on a kind of uh neural network idea that also incorp Ates known physics uh in the form of a partial differential equation in the loss function so I'm going to walk you through this some examples of how it's been extended some kind of caveats and cautionary tales uh and some big success stories as well of what it can do so kind of a traditional approach to um using a neural network to predict something like a fluid uh flow field you might build a neural network um some kind of you know deep feed forward Network that predicts the field variables of Interest maybe UV W and P these would be the X the Y and the Z velocity field components and P might be the pressure of a fluid and those are functions of the spatial coordinates XYZ and time T this picture is just a two-dimensional fluid but you can imagine uh if I wanted to model a three-dimensional fluid with three velocity components and a pressure in terms of three spatial coordinates and time uh I might be able to do that by building a big feed forward neural network and training it on a bunch of examples of flow data I might be able to learn that mapping uh from space and time to some uh coordinates UV W and P okay this is kind of the naive approach to using a neural network to predict some physics and what uh what these authors did in this kind of classic now hin paper is they extended this uh this naive neural network picture where you have just a simple loss function um of your kind of predicted fields from your true measured Fields they extend this by Computing things like partial derivatives of these output variables in terms of the input spatial and temporal variables and this is a really simple but very powerful idea that you can compute these terms essentially using the same kind of Auto differentiation automatic differentiation and back propagation that would you would normally use to train a neural network you can use that same idea in a modern language like py torch or Jacks to compute these partial derivatives of the outputs with respect to the inputs and if you have those partial derivatives computed you can add in this additional loss function that your partial differential equation should be satisfied so if you know for example that this uh velocity field should be Divergence free you can add a term in purple here that is literally the Divergence of your velocity field and if it's non zero it means that that physics is being violated so I have an example here of uh the navier Stokes equations which is one of the most popular examples used um with pins and again just to Reit at uh what this this big picture does is it takes a standard kind of neural network representation of the field variables you want to predict or estimate things like velocity fields and pressure Fields as a function of space and time it takes that standard kind of generic neural network uh model and it computes the partial derivatives that are relevant for the physics in this problem again using automatic differentiation and back propagation things like that and from those computed partial derivatives you can create this additional loss term that tells you uh kind of if this known physics is being violated or not basically you know if there are things that should be zero like the Divergence or you could take the right hand side of this equation and move it over to the left hand so all of that adds up to zero those would be terms in this purple uh virtual loss function and what that does is it adds this um this Penal when your neural network is violating known physics so this works really well for systems where we know um something about the physics we know we're dealing with a fluid flow or a Quantum system or an electromagnetic system and we're still trying to predict kind of our field variables as a function of space and time um so it's a really really cool idea and you'll notice that um I have these sums labeled this is a sum over actual data so that's training data that's actual you know velocity fields that you have either measured or computed and then this purple sum is summed over virtual points so this is one of the cool things about this network is if you know the physics if you know that you're looking at a navor Stokes fluid flow then even if I only have pretty limited data here I can still test if my network is physical meaning does it satisfy you know these equations is this l function small I can evaluate that on test points that are not actually in my training data so that's a huge and Powerful idea also is that often times I've seen these pins trained on very small amounts of actual measurement data and you offload a lot of the training to these kind of virtual points where you can evaluate is this network physical on those virtual points because even on a virtual point where I don't have real measurement data I can still evaluate if my network is you know conserving mass and momentum okay pretty cool idea here and if we uh kind of go back to this big picture of physics inform machine learning it's pretty clear that um pins or this physics inform neural network sits here in this kind of stage four of crafting a loss function that promotes what we know about the physics now of course there's some architecture you know we're Computing those partial derivatives uh and we're modeling field variables as a function of space and time but I think the real uh kind of special aspect of pins is here in this stage four crafting a loss function okay so back to this picture here now a couple of things I want to point out um there are reasons I like this architecture a lot and there's things that I think um you should be aware are maybe a little bit of like a cautionary tale about using an architecture like this so in general I think this is a really clever idea because it takes something that is very simple and Powerful idea in machine learning this kind of big feedforward neural network a big MLP and it adds a really simple way of incorporating physics into that uh problem so this loss function here and the fact that we're using kind of automatic differentiation to compute these quantities of interest that go into this virtual PD loss function it's a very very clever and also a very simple way of adding in known physics to this neural network process so I like that and I think that's probably one of the reasons that this has been so widely adopted um you know cited thousands and thousands of times and used by tons of researchers um across the world is because it's a really simple and intuitive way of adding physics into you know the standard way you would build a neural network to model physics another thing I really like and I mentioned this before is that this really allows you to work with relatively small data sets you don't need necessarily that much data here uh for the actual um database loss function you can use Virtual points to evaluate if your network is physical or not um and so I've seen lots of these networks trained with tons of V virtual particles and not that much data and so I think that's a really really powerful part of this as well now the things that um this isn't you know exactly a negative it's just a feature of the way that this is um that this is developed is that because the physics is added as a loss function that's a strength because it's very intuitive and easy but it also only suggests that your physics is being satisfied these two terms are always going to be fighting with each other you're always going to be having this pull between your possibly noisy data measurements and this kind of virtual term here that says that your pte should be satisfied your physics should be satisfied and so when you train this thing it might be pretty stiff um and difficult optimization and you're almost never going to actually have this purple term here be exactly zero so your physics is not being exactly Satisfied by this uh this pin mass is not exactly being conserved momentum is not exactly being conserved and you know that may or may not um become important in different applications in tons of applications it is perfectly to have a small but nonzero physics loss and it still is an improvement over not adding that loss at all but in some applications you might want to have a more exact uh enforcing of your of that physics maybe through an architectural choice or through an actual constrained optimization and that's something we've talked about a little bit before and we'll talk about again but I just want to point that out here um very simple and intuitive it allows you to work with relatively small data sets and those are both you know huge positives but the fact that this is a loss function means you're only suggesting that the physics is informed you're not in forcing that the physics is informed and so you should or sorry enforced uh and so you should just be aware of that I think that's that's super important um and I want us to be thinking about what this kind of Architecture is going to be good for okay so it's really really good for estimating things like whole flow Fields with fairly sparse sensor measurements it's a really good good way of doing that and it improves on some of the past methods that we used to do before pins existed and I'll talk a little bit about that so data assimilation and kind of flow estimates from uh limited or sparse measurements that's a pretty cool application of this so you can picture maybe you have like a you know a fusion reactor some kind of sphera fusion reactor and you can't put sensors everywhere inside of that reactor because they'll melt so you maybe only have a few you know probes on the surface this could be a pretty cool architecture to take that limited data on the surface and add in the physics loss because we know that the physics is you know Magneto hydrodynamics or some kind of plasma physics and estimate the internal flow that's most consistent with those measurements and with the fact that we know our physics is governed by this pde so that's just one example um good it might not be the most useful thing in the world for speeding up your high fidelity simulations though you know that's something uh I think you should really be thinking about how can I use this architecture and what are ways that I maybe don't want to use this architecture good um and so actually one of the best and kind of most interesting papers uh that was published you know in at around the same time as the original pens paper is um this this paper kind of I think it's called something about hidden physics by by many of the same authors in science so very high-profile paper where they essentially show that if if you have relatively limited data from a fluid flow um I think this is you know some kind of a flu fluid flow in an artery or like a blood flow I don't know if this is called like a thrombosis but it looks like there's some recirculation of of blood flow here you can take very limited information and sometimes information that is um not even like a velocity field like a smoke visualization or some kind of uh like a movie and you can infer what the actual velocity field the quantitative velocity field that is kind of closest to satisfying the Navy Stokes equations and satisfying the measurement data that you have and you can see that you actually get relatively good reconstruction uh of the kind of pins reconstruction and the ground truth reference uh velocity field and you can kind of see quantitatively here you should download this paper and actually read it it's very interesting paper and it tells you how to take uh other types of measurements and infer these veloc vity Fields but again I think that's one of the things that uh pins are really really good for is taking limited sensor data and inferring between the lines kind of reading between the lines of what the velocity field should look like to be consistent with your measurement data and to be consistent with your governing equations and it's funny that's actually pretty close to what a common filter does if you think about it it kind of optimally balances this idea between you want to um agree with your data but you also want to agree with your model and depending on how much you trust your model or your data you might change this kind of hyperparameter that balances those two so there's also a hyperparameter in pins that balances how much you trust your data versus how much you kind of want to be enforcing your governing partial differential equation now I'll point out because we're Computing those derivatives those partial derivatives that go into the the kind of physics informed loss function here this is going to have a harder time on system that are you know discontinuous with things like shock waves or you know chaotic convecting flows where this neural network function of time might be really diverging and really chaotic it's not going to probably be amazing at those kinds of flows so again in the back of your mind you have to be thinking this is not magic this is a neural network with an custom loss function and so you need to know when it is and is not going to work okay um but it you know does really really cool things in certain circumstances for things like flow Reconstruction from limited measurements okay um and in the past we used to use a method called 4D VAR it was like kind of a variational method in space and time so that's where the 4D comes from and it was this horrific physics-based constrained optimization it was an optimization constrained by the physics to again find a flow field that was consistent with your measurements and with the physics and all of that now in pins is kind of abstracted from the human it uses modern machine learning architectures and optimizations and it typically does you know as well in a lot of cases so for some of you uh who remember Ford VAR you'll see the parallel pretty quickly for those of you who don't maybe just do a quick Google on 4D VAR and you'll see kind of uh morally how these are related and different okay um now I'll put some links in the in the video description there's some great resources out there that I'm going to link to um so a few really good um really good videos here um this is one by uh Juan Tuscano and I believe this is uh Benjamin Mosley on uh on on this YouTube channel these are both really good kind of tutorial videos um you know this one um Juan actually works this uh a training how to train a a physics inform neural network out in code so I would highly recommend this one um and there's some great tutorials online also for how to code this up uh in torch okay so I'll put links to these in uh in the video description but really it's super important for you to try this yourself I think that's one of the main points here is if you want to know what this is good for and what this is not good for you got to actually try it on a couple of examples and see oh was it did it take forever to train was it fast to train would did it take forever to execute was it fast to execute did it generalize did I need a bunch of data or a little data those are all the things you would you would play around with and roughly speaking um you know these methods again they're good for filling in lots of missing data or or kind of um inferring if you have very sparse measurement data they're good for speeding up the inference step so training might be expensive uh compared to something like you know cfd or or or fluid flow simulations uh and you in fact need the training data from those High Fidelity sources but the inference step once you have a train model can be very very fast and so that's another you know good use of these methods and among these resources again this video was by uh Benjamin Mosley and they have a really really nice um kind of blog you should check out um all about scientific machine learning and they have a really cool blog post specifically about pins with uh code on how to actually you know train these and test these yourself so I think this is one of my favorite uh to follow along and they contrast pins with the kind of classic naive neural network approach where you don't add that physics informed loss and so I believe the example that they train on here is a really simple one-dimensional physics where you basically take a mass on a spring and you let that spring go or sorry you let the mass go and you have this kind of decaying oscillation of your your spring Mass damper system and there is some pretty sparse training data these Blue Points here and essentially these are movies so go to the website and actually you know check out the movies but essentially this shows the training of a naive neural network over uh training step and it basically never trains it goes into this kind of suboptimal kind of crummy solution because it can't generalize past the blue data that it was trained on because it doesn't know anything about the physics and in contrast if you have this kind of physics informed loss function where you actually say hey I know that this is a you know f equals ma a spring system now from those blue training points you can train a physics inform neural network that is much much better at generalizing into the future because it got the physics right because that was in the loss function okay um so really simple example but it's one you can code up yourself you can follow it all makes intuitive sense um and it's a little EAS easier than the pte version now I will point out I noticed here that this is trained to 620 steps and this is trained to 12,000 steps so again that should be making you wonder hey how much training do I really have to do to get this pin to converge but again U maybe the training is an offline computation maybe you're willing to spend a lot of resources to train this thing as long as it learns the physics and executes quickly uh in the future and again this General izes way better because it incorporates the physics that's kind of the upshot here good um and there's a lot of cool extensions to pins that's one of the things I like you know my lab and my collaborators we develop a lot of methods um you know similar kind of machine learning and physics and form machine learning methods and when I think of my favorite methods things like uh DMD and Cindy and you know pins and things like that they tend to be highly extensible they're such a simple IDE aidea that you can often add a lot of kind of add-ons to make them more performant for different kinds of scenarios and so this is just a very you know select few examples out in the literature there are dozens of custom types of pins for different types of physics for different types of data different types of scenarios one of the ones I think is really cool uh are fractional pins for um partial differential equations that have things like fractional derivatives what if I have the 1/ half uh Power diffusion term or some you know hyper diffusion term or I have an integral differential uh partial differential equation this happens all the time like not everything is as simple as navier Stokes just simple mass and momentum conservation with you know easy integer power differential operators often times you have these nasty integral differential equations with fractional derivatives in your physics uh um and so there's this really cool kind of uh split architecture here where your traditional pins terms here can be handled using Auto differentiation using kind of the automatic differentiation of the you know Pi torch or Jack's environment and then the other terms here your fractional derivatives your integral terms things like that those can be handled using your traditional numerical discretization schemes things like finite difference finite element spectral methods and you can kind of combine those in this fractional pin architecture so pretty cool idea um and actually that broadens it out to a massive massive class of PDs um another one I really really like is this idea of Delta pins um and this is an idea that's been kind of floating around um you know a lot I know some of some of um my posts and collaborators talk about similar ideas where um if you have a geometry like if I know that my physics is evolving on a sphere or on this spiral or on um like let's say I look at the ca of an ear I know that that's kind of this spiral shape so if I know the geometry kind of the determine the boundary conditions and and where my pte is living you can actually you have a lot of Prior information you can bake into the the pins or or other machine learning algorithms and so specifically if you look at the llas Beltrami operator on that geometry its igen functions give you a very good natural coordinate system in which to represent solution functions of that pde so kind of the simplest example that we've all seen before is that if I look at something like the heat equation on a rectangular geometry the Igan functions of the lass operator on that geometry are signs and cosin and that we know that the fora transform those signs and cosiness are a good coordinate system that linearize or simplify that heat equation so it's very similar idea but for more complex geometries that igen functions of the llast Beltrami operator on an interesting geometry will be a very good coordinate system in which to represent your Solutions of your pte this can be useful in things like cardiac flows if I take a weird irregular geometry like my heart with all the veins and arteries again I can look at the igen functions of the llas Beltrami operator and that gives me a good basis in which to represent and sum uh to approximate my pte solution and so you know there's a lot going on under the hood here um I'm not going to get into but this Delta pins kind of uses that idea to incorporate some idea of the geometry into the representation of the pde and you get these improved representations so um you know some kind of cool examples here this is a heat transfer problem where you have um some heat convection and some geometry with you know specified boundary conditions and you can see that you know the regular pin doesn't agree with ground truth as well as this Delta pin so having uh kind of the the some knowledge of the geometry and of the pte physics lets you do better than just having one or the other um this is kind of a cool example where um they use this method to compute geodesics on this kind of this geometry of a bunny geodesics is just a fancy word for the shortest distance path on a manifold so you know if I look at the shortest distance path from my location to um you know Switzerland it's the great circle that connects those two points on our sphere of Earth uh you know if you believe that the Earth is in fact a sphere um and so these geodesics are actually um the solution of a certain partial differential equation I'm not going to get into that um but so you know there is some pde that governs geodesics that you can add as a loss function and this just shows that when you use this kind of geometry these Lao Beltrami igen functions for this geometry you can very accurately get those Solutions uh for this geodesic problem it's a really cool uh example I think and it's kind of a clever example to Benchmark their their problem on um and I don't really know exactly what this example is but I think it's super cool that there is you know a dragon in their paper which I like okay good um so those are just two of many many many examples of you know powerful extensions to this idea of pins again because it's such a simple idea you're just adding a loss function that is literally the pte how much it's being satisfied or not um there's a lot of ways of extending this to other uh kind of more complicated systems but there are are some caveats these don't always train that well sometimes they're stiff sometimes they overfit there's issues anybody who has you know worked a lot with pins knows you know there's ways of massaging them to work and sometimes it doesn't work okay and that's like every method no method is a silver bullet that just doesn't exist um but it's good to understand what are the limitations uh and actually study them uh in a very methodical and process driven way and so that's one of the things I really really like about this paper um out of Michael Mahoney's uh kind of group and and collaboration on Co characterizing the actual failure modes in pins and I'm not going to expect you to read all of this you should download the paper and check it out because they have open source code and you can play with it yourself but you know some of the main upshots here is that um the main contributions as stated in this paper is that they they tried to train pins on relatively simple uh but relevant problems things like convection reaction reaction diffusion you know simple pdes and they basically found that um a standard pin often doesn't train and gives very large error except for you know really really simple parameter regimes of of these these pdes okay so they basically uh set up a problem a set of physics pde problems where they can show under certain parameter regimes the method fails to train um and then essentially they looked at you know adding how what if you balance that that pte loss term so you can crank that pte loss term up or down that's a hyperparameter of how important the physics is or is not and they found um that the loss function by increasing or decreasing that uh pte based soft constraint actually makes it more complex to optimize and harder to find a good solution in some cases um and I think this is a really careful paper I really like um the work um by Michael Mahoney and his colleagues because they are very very careful in setting up this problem I don't think they're just criticizing pins they're actually trying to systematically understand when they don't work and fix the problem and so they look at you know the actual neural network itself without the physics informed loss and they do in fact find that it does have the capacity to represent the solutions so it's not a capacity issue it's not the neural Network's fault there's something about having this kind of dueling data and physics loss that makes it stiff or difficult or prone to overfitting in some cases not all cases but some cases and again uh it's easy to show when something doesn't work but one of the cool things about this paper is that they actually propose two concrete suggested paths to improve the pin's training process um one of them is through uh curriculum regularization where they basically slowly increase that physics loss from from zero to you know some large value that's a very clever and kind of straightforward idea and then uh their second idea is posing the learning problem as a sequence to sequence learning task so two concrete ideas for how to actually uh fix these failure modes and all of their code is available in pytorch so you know a lot of work has gone into analyzing and understanding how to train these networks when they will and will not train when they will generalize um and you know how to make them more robust so I think that's also really important to know is that if you just you know design a vanilla pin it might work but you might run into some issues that uh you know other researchers have spent a lot of time and effort trying to diagnose there's another cool paper I like um that essentially looks at the Paro front um they actually did a bunch of hyperparameter um kind of characterization where they have this looks to me like a heat equation um you know so this is some true velocity field there is a predict velocity field with a pin and there's some amount of error this is just generic what they did was they looked at if I change the hyperparameter the balance in my loss function between the data loss and the physics informed loss so you can you can sweep through that hyperparameter and make one bigger or smaller you can essentially get different uh error accuracies of your reconstructed fields and so they did an extensive test where you know on a relatively simple heat equation example but they did an extensive test where they looked at the different physics lost and database loss for different um example scenarios and initial conditions and data where they change they swept through the loss uh weight where you know if it's a small loss weight it's um I forget if you is physics or an F is data or vice versa but one of these regimes makes it so only your data matters and the other other regime makes it so that only your physics loss matters and you can see this kind of really interesting landscape of different uh kind of you know different amounts of of loss here so you know in some cases maybe your physics is actually being really well satisfied but your you know data error is large or maybe your physics is actually not being satisfied very well but your data loss is small and there's uh you know this clear balance between those two loss functions again you're never going to get a perfect loss where your physics is exactly satisfied and your data is exactly um you know fit if there's any error on your data good uh and again you know just a pretty extensive view of what happens to this relative error for different values of you know those hyperparameters and they give a lot of suggestions on how to choose and tune those hyperparameters so it's kind of also a practical guide you might want to read this before you train your pen okay um that was a mile high overview of this very important and very widely used physics informed neural network very often called pins um that's the acronym and again just to recap um it is a really simple and intuitive way of taking just a standard vanilla neural network to predict you know some physical Fields as a function of space and time by using the automatic differentiability of these modern machine learning Frameworks pytorch and Jacks and so on to compute partial derivatives and add a custom loss function that is quantifying if the physics is or is not being satisfied so for example if I know Divergence should be zero for my velocity field I can literally add a loss term that is the norm of the Divergence of these predicted uh velocity fields and I can compute those on arbit points so I don't need to have this physics informed loss trained on data I can train it on Virtual points so that means I can use these pins to estimate flow fields and velocity Fields even if I have relatively sparse uh training data so it's a really cool idea it's very widely used it's highly extended there's lots of extensions to it but again really important for you to remember because it is a loss function uh you're adding physics into the loss function you're adding the suggestion that your solution is physical you're not enforcing that your solution conserves mass or momentum or energy you're just promoting that physicality through this loss term and again sometimes it's pretty stiff sometimes it doesn't generalize sometimes it doesn't work for chaotic systems like no method is perfect but it is a really simple and intuitive way of adding physics into your problem and I think that's probably one of the reasons it's been so widely adopted in the Community uh and it's still you know especially at The Sweet Spot machine learning people love it because it's easy for them physics people love it because it's easy you know to add neural networks into their you know physics uh kind of workflows and so it's really at that sweet spot um where it's kind of easy and pretty powerful okay um more on this definitely check out the resources in the description try to code it up yourself try to see where it does and doesn't work uh and we'll look at other methods soon all right thank you

Transcript for:Physics Informed Neural Networks (PINNs) Overview

Transcript for:
Physics Informed Neural Networks (PINNs) Overview