Molecular Dynamics and VASP Machine Learning Force Fields

uh welcome back to more molecular dynamics i'm pleased to announce uh the next speaker gear cresser geoclasse is the head of the vast software company and professor at the university of vienna he has dedicated his whole professional life to making vast the powerful software package that it is today in his talk he will present a brand new feature of basp so i think many of you are already looking forward to hear about machine learning force fields there will be also the chance to have a q a later after the talk so but you are able to enter your questions during the talk at any time in the q a window the stage is yours yes thanks a lot for the introduction the topic is a primer to machine learning i've changed and tweaked that a little bit with the focus on vasp of course i will tell you why we do it how we do it and give you a few examples to actually make this a little bit more lively and interested interesting so machine learning uh and how to train force fields uh in short machine learning force fields i will first explain why we do this and how we do it i will actually give you and discuss the flex and give a few illustrations with respect to these flags that we have built into us and then for the second part and depending on the time i have i will show you some applications to material science uh and my specifics and what i will concentrate on our phase transitions uh we'll start off with zirconia uh which shows a couple of phase transitions from monoclinic to tetragonal and finally to cubic and that will give you some idea what actually the machine got phosphate does where you have to be a little bit careful and what to worry about uh then it will give possibly a shorting tool to a simpler system that's the hcp to pcc zirconium phase transition and finish off a little bit with the many component system and maybe also talk about chemical potential in solutions and as a kind of appetizer i will show you that you can even learn beyond dfd energies namely rpn which is for zikani again i'm not sure how how much you will talk about this this depends a little bit on the time it takes me to finish the other subjects so materials modeling uh schrodinger equation i think you have heard a lot about this by uh martin massman and all the other people in the vast group in principle and that would be the thing we would like to solve the many body schrodinger equation which i've just for fun written down here so it's a kinetic energy operator and this laplacian acts on all the coordinates of the orbitals uh r1 to rn and this here is the interaction between the electrons as well as the interaction between the electrons and ions and the complicated thing here is that uh this guy here depends on many degrees of freedom so if you have 1000 electrons that will be 1 000 coordinates and even storing this object is impossible so that's why density function in theory became so extremely popular because it's a quite nice way to approximate or to get approximate solutions for the many body screening equation you have heard about this you have still silicon energy operator that now acts on the single coordinate actually this object is now a one uh one particle wave function or what we usually refer to as orbital and the complexity is massively reduced when you go from here to here essentially with most modern codes you have now cubic system size scaling and you can do something like 2000 electrons in few minutes for a single structure so it's an incredible versatile theory and i heard a talk by a folk behind me like 20 to 30 years ago where fulgerhan said it's it that's a sledgehammer dfd he was actually a guy who deeply thought about the electronic structure problem before actually density function theory came around and his in his opinion dfd was was a complete change of everything so with dfd uh you have been you were able to solve pretty much all the problems that were then of interest to people so it's reasonably fast it's reasonably accurate and maybe one thing that is not so often realized uh first derivatives are incredibly fast to calculate and simple to calculate using the helm and feynman theorem and because you have actually first derivatives you can calculate forces the stress tensor and do molecular dynamics as well as structural predictions this is still one of my favorite examples where we simulated liquids without any input but the ionic number and what you see here is essentially the structure factor and you can compare this uh with experiment uh and essentially this is smallest bottom again without any input but it's also clear that it's still a fairly slow siri when applied at finer temperature imagine that you want to do something like 100 000 time steps then it will even with modern computers take a month maybe even a year to simulate and also it's hardly applicable to more than say 2 000 electrons because then the cubic system size scaling becomes really annoying so to say so there have been many workarounds like force fields cluster expansion or core screening and there's the new kit on the block that's machine and force fields and i would call this the new slide jammer so we are now stepping up from density functional three and uh use machine learning force feeds to accelerate the calculations by another factor one thousand maybe ten thousand okay so how do we do this uh we essentially want to machine learn false fields from the first principles calculations so the basic principle is extremely simple you first construct the database uh doing first principles calculations up initial calculations typically we will do something like 1000 structures for all of these structures we calculate the energies the forces and the stress stenzo as you usually do in wasp and you put this into a small database the second step is that you choose a representation for the local environment so you kind of impose a cut of sphere around each atom and calculate what is called a descriptor that characterizes the local environment around each atom and in the final step you then fit a force field a finite ranged force field as a of fact and try to reproduce their initial data that you have produced before so there are three steps that's the database the choice of the representation for the environment around each atom and the final step is actually fitting uh using either actually neural networks or regression methods uh this is a fixture a figure from derringer cargo and johnny that i've copied here and it essentially illustrates these three steps so database feature and finally in our case the regression concerning the database construction uh we have been a little bit innovative we select the structures on the fly and this is something i will talk about a little bit later in the talk i would rather start with step two that is how we describe the local environment around each atom so we must map the environment that surrounds each atom onto a set of descriptors that kind of capture how the local environment is how things look around the central atom that we consider so here's a sketch of the boron nitride structure with a boron atom here nitrogen atom here and what you need to describe is the local environment around this say boron atom and around this nitrogen atom up to a certain cut of distance now how this is usually done is actually i mean first people tried a lot of things but at the end of the day everything boiled down that you actually try to describe the local environment by two quantities that's the pair correlation function and the angular correlation function around the central atom uh looking in in literature 2007 uh jacques bill and michaela paranello suggested to what they call symmetry functions but actually if you look carefully at what they really suggest you figure out that actually they suggested exactly the same thing a combination on one side pair correlation descriptors and on the other side angular correlation functions in 2013 bartok kondo and johnny suggested something with the code power spectrum it turns out if you look again carefully at what this power spectrum is about you find out again it's nothing but a combination of pair correlation functions and angular correlation functions so it's fair to say that these ideas are really essentially the same and that these are the ideas that ultimately everyone is now using well there have been some other ideas like back of bonds coolant matrix and long range descriptors but at this point in time i would say the combination of these two features have more or less survived than the other one that i'm you mostly used now so the descriptors are calculated from the classical density distribution around a central atom so you choose the central atom and you recast your environment the surrounding environment into actually you develop the local environment around the central anthem into a set of descriptors how is this specifically done so you look at the classical density distribution so we don't talk about electrons here we only talk about the distribution of the ions around the central atom so you cast this distribution of the other atoms into a basis set specifically our basis set of spherical harmonics ylm times bessel functions jl of qn of r so this is very similar to actually what you might have heard about in other electronic structures so codes where you use this to describe the electrons but here you don't apply this to the electrons but describe the distribution around each central atom so this is actually the density or the density distribution around the central atom is cast into this linear combination of uh spherical harmonics and vessel functions and you could finish here turns out that this is not very convenient because these coefficients clmn that you need to expand your classic density distribution around the central atom these coefficients are not rotationally invariant so if you rotate your coordinate frame you will get a different set of coefficient clm and that makes machine learning fairly slow so the idea is to recast this classical density distribution into a pair correlation function and angular correlation function so the pair correlation function measures how likely it is to find other atoms at a certain distance from the central atom r and how likely is to find one atom at the distance r the second one at the distance k and that both enclose an angular theta between each other okay so why these descriptors well the pair descriptor obviously is translationally invariant so if you shift everything or if you rotate your system the likelihood to find a certain atom at another distance remains exactly the same this three body correlation function so the likelihood to find the distance r and distance s at an angle at a certain atom also is invariant under rotation so if you rotate everything so if you take these two guys and rotate everything of course the likelihood remains the same so both are invariant under rotations so if the local energy is a function of only these two descriptors it is guaranteed that the energy remains entirely the same if you say in vast rotate the prevail of this so if you change to let these vectors rotate everything everything is guaranteed to always remain entirely invariant and the energy and as well the forces will remain the same also translations obviously do not matter since we essentially only refer to distances between two atoms right or angles between any two atoms and obviously these will not change if you rigidly translate the entire system so we preserve all the good properties we have already inside vastly these are translational and rotational invariance so how is this done in practice how are the calculations done under the hook inside fast essentially what you need to calculate are these expansion coefficients in the first step again i've already told you these expansion coefficients are given by evaluating the spherical harmonic at a certain uh normal distance well this here is the vector linking two atoms and this is taking the direction of this vector and then calculating this very harmonic dependency on this normal vector times the swelling vessel function that i've talked about and you have to evaluate this for a certain distance you sum this over all nearest neighbors in order to obtain these coefficients c i and ln okay this is the first step but i've already told you that this quantity is not rotationally invariant so in the second step you obtain the pair and angular descriptors from these intermediate functions turns out the pair descriptors are extremely simple to evaluate you just restrict yourself to the coefficient l is equal to zero and m is equal to zero and you then look at these coefficients for each atom of course so you obtain a value c i for each atom uh and for each basis function that you have chosen and you obtain one coefficient so if you have to say eight uh basis functions we will obtain eight coefficients this gives you essentially the likelihood synonymous to the likelihood to find a network at the certain distance uh and the equation to actually obtain the pair correlation function is essentially given by this transformation so the number of radial basis functions that we typically use inside vast is between 8 and 16 so you obtain typically eight coefficients for each atom in your system now the second descriptor is far more complicated it's the angular descriptor and there's a close correlation between these original coefficients that are not rotationally invariant in this angular descriptor again you calculate the angular descriptor for each atom i it then depends on two values n and nu n goes over all radial basis functions or also nu goes over all radial basis functions so if you have eight radial basis function this will give 68 64 values eight times eight and then you have a dependency on the l quantum number uh and essentially this is what kind of gap squadron relation that allows you to calculate from the original coefficients these descriptors for the angular features for the angular distribution around the central atom again there's a one-to-one relation between these coefficients and the angular distribution function which is given in this relation now how many radial basis functions do we typically have well we typically have eight radial basis functions gives a to the power of two so something like 64 coefficients and l max is typically chosen to be four furthermore you have always to be careful because if you have different species in your system say four different species the total number of these functions p and nu l and c and i can vary between something like 200 and up to 1 000. i've written down here 400 in 2 000 but in the present implementation it's more like between 200 and 1000 coefficients okay so for each atom we calculate these set of coefficients and we finally assume that the total energy is the sum of local energies so e the total energy is the sum of local energies e goes over all i goes over all the atoms in your system and the assumption we make is that the local energy is now a function of the local environment so ui the energy of this particular atom is given by a function that depends on these coefficients we have just obtained c i n and p i n eta maybe this here this dependency on the local environment is one of the key assumptions one does in machine learning and it can't be exact right we have actually truncated our local environment and we only look up to a certain distance around the central atom at what the other atoms are doing so we kind of assume that the dependency is only within a certain cut-off distance and this is in essence the key assumption that you make when you do a machine when you train a machine learn forcefield note also the density function theory itself never breaks up the energy into local contributions so this is something that local regression at the end of the or the regression model that you at the end of the day use will do for you so it breaks up your energy into local contributions in fact the machine learning code will try to do this breakup in the best possible way in order to obtain accurate energies as well as forces and an accurate stress tensor forces are in fact also a derivative of this function so we have already established that our local energy is a function of the local environment the forces are then necessarily given as the derivative of this local function with respect to the positions of the other atoms in the sphere surrounding the central atom so if you're really one of the atoms obviously the total energy will change and that kind of defines the force on the atom that has been videoing so the derivative of the energy with respect to the positions is given by the sum of all atoms that's our total energy and taking the derivative of the local energies with respect to the position of their atoms recall that the local energy is functional of the local environment that's this relationship here so our local energy is a function of the local environment and we have to take the derivative of this local environment with respect to the positions you can do this using chain rules by first taking the derivative for instance with respect to this coefficient c i n and then take the derivative of d c i n with respect to the other atomic positions surrounding the central atom again it's clear that vast does not provide anything such the local contribution to the forces so this is what the regression what the fitting has to take care of similar equations are obtained for the stress sensor but from now on we disregard with that we also fit the stress and so we do this but i will not write down an equations for this so the energies the stress tensor and the forces are simultaneously fitted by one and only one and local energy functional u that depends on the local environment surrounding the central atom so how many equations do we have to fit let us do some back of an envelope calculation let us say we have about five we have found something like 500 electronic structure calculations and each of this calculation was actually considering systems with 100 atoms well that's pretty much typically what we do how many equations do we then have how many fit equations do we have we have 500 equations for the energies so we have 500 equations that we need to fit we have 500 times 100 times three equations for the forces so each of these 500 structures has 100 atoms and we get three components to the force x y and set so these are 300 forces in each calculation and we have to simultaneously fit these 300 500 equations to fit essentially all the forces this gives us about 150 000 equations and finally we have also six stress components xx yy set set xy y set and set x that we have to fit for the 500 structures this gives us another 3 000 equations just look at this right you see immediately there's pretty much very little information in terms of energies only 500 equation so the bulk of the information is actually contained in the forces 150 000 equations that we need to fit actually there's something that's called gradient domain learning that doesn't completely away with the energies and only tries to fit the uh the forces essentially and that works pretty nicely but it doesn't work if your face transitions between completely different phases say liquid silicon and say cubic diamond silicon because there's no direct connection between the two phases so you cannot construct another partic pathway between both and then you have to take care that you fit the energies as well okay was i guess pretty much we have talked about let's do a small wrap up to get the main ideas across again and summarize those again okay the energies in the forces are functions of the local environment around each atom so each atom to each atom we assign an energy so the total energy is the sum of all the atoms in our simulation box say 100 atoms for each atom we have energy that depends essentially on the local environment around each of this atom same for the forces we obtain essentially closed equations for the forces from the force field by taking the derivative of this guy with respect to the positions the local environment is described by a set of coefficients that we kind of sub summarize as radial and angular distribution functions so around each atom we know exactly where our other neighbors at which distance and what are the angular distribution functions around that atom we typically have 1000 coefficients at least for systems with four species if you have only one species it can go down to something like 200 coefficients so for each atom we have about 1 000 coefficients that characterize the local environment the energy this local energy u here is therefore a function of 1 000 coefficients well that's pretty crazy so it's going to be a function of 1000 coefficients so this is the function we need to machine learn this is the function we need to determine and somehow in a very complex manner it should depend on these 1000 coefficients that's really a little bit crazy but that's typically what you do in machine learning you have a huge set of descriptors so to say and you want to establish a dependency of some quantity in dependency on a very large number set of coefficients in our case you can depend on up to 1000 coefficients what do we have we have typically 500 energy equations and some something like 150 000 force equations that we need to fit simultaneously so this is what we need to determine and this is what we have okay so how can you do this one way forward is actually to use neural networks so here are your 1000 descriptors these are the values characterizing the local environment and you know there should be a one-to-one or there should be a relationship between these descriptors and the local energy so this is exactly what uh you can now do with a deep neural network we can try to train this functional and learn this functional relationship between the 1000 input data and the output data which is essentially a single energy predicted for hetero we however use essentially regression methods and we use something specifically that is called uh kernel message but before i explain you this i will briefly mention why this is really going to be complicated imagine we want to do a linear regression we have here a set of 1000 coefficient for each atom now the functional dependency on these 1000 coefficients can be arbitrarily complex what do i mean by this imagine we have just two coefficients x and y but we're told we have one thousand right so but let's imagine we have just two coefficients now the energy must not necessarily be linearly related to these coefficients actually the relation between the energy and these two coefficients can be arbitrarily complex it could mean that we need actually a complete basis in these two coefficients to express the relation between the energy and these coefficients say for instance our coefficients are x and y a complete function in space would be given by a constant function x y x squared x y y squared x cubed x squared y x y square y square y cubed and finally maybe also functions of the fourth order okay but now we have not two two such uh coefficients we have one thousand coefficients and we have to construct all possible functions of these one thousand coefficients and then we need to determine the regression coefficients that's going to be really a tough problem so first which functions should we choose here in this set only linear functions quadratic functions cubic functions pretty difficult question so standard linear expression in the space of 1000 coefficients with an unknown functional relationship that's simply not going to work so how to hell are we going to construct a complete set of functions in 1000 dimensions the answer is you can't do this well some people have been very smart they must briefly mention in particular shkartev has actually introduced something that is called many body tensor potentials that astoundingly works pretty well but it's a pretty smart attack choice how this functional relationship could be the other way is neural networks and that seems to be the way forward but we use is a third approach that's kernel methods so i will try to explain briefly what this uh kernel methods are about i mean there are ubiquitous in machine learning certainly not our idea you will come across it if you study a little bit machine learning in many areas i would even say neural networks and kernel methods are kind of the two branches that are really currently dominating machine learning and kernel methods i will explain this a little bit later are generally a little bit more data efficient than neural networks but let's try to get across how these kernel methods work so the central idea is that you speak from your huge database a certain number of atoms say one thousand and i recall these atoms reference atoms and you calculate the local environment around these reference atoms so you get a set of descriptors for these reference atoms i've collected now the descriptors into a vector x ib just remember that this vector has 1 000 components it contains both pair descriptors as well as three body descriptors so 1000 coefficients for these reference atoms and we have something like 1000 reference atoms so this gives us 1 000 times 1000 matrix if you want so now the idea of the current trick is the following one you now evaluate the similarity between an environment where you want to know the local energy let's call it x and we over we essentially evaluate the similarity between this x and the x i b for the reference atoms and this is achieved by something that is called a kernel so this kernel should be one if these two are exactly identical and it should use those to zero if they are completely different if they are completely orthogonal so to say now the fitting is done by determining coefficients w i b for each of these reference atoms so you still do a linear regression but you have done completely away with a choice of our functions that we choose to construct in this one thousand dimensional space so instead we now pick reference atoms and determine and attach weights to these reference atoms now this was a kind of a mathematical concept but i will give you now a graphical illustration of what does this do so let's assume that that we have only a single descriptor x and our structures are more or less uh characterized by the single descriptor so essentially what you do is you introduce a kernel and the most common kernel is the so-called gaussian kernel and the gaussian kernel you measure the distance between the reference atom or the descriptors at the reference atom and the descriptors at the atom for which you desire to know the local energy obviously this will be zero if both are equivalent and then the exponential will give one so you get essentially one if both are exactly identical and if both are completely different far away from each other obviously you will get something like zero from the kernel so i've graphically drawn a one-dimensional gaussian kernel here for all the data points these are reference atoms that we have chosen and now we perform the regression so we have evaluated an observer to say the local energy at these data points and what we now do is essentially we fit it and that means we determine these weights here by essentially adjusting the height of these gaussians so we multiply the gauss in the kernel with a weight factor that is kind of attached to each of the reference structures and then sum these are gaussians and obtain kind of a moniker or a surrogate model for the original energy surface the original energy surface was the orange curve and the kind of surrogate model we have developed in this way is the blue curve okay time for a second wrap up uh so i've told you the energies and forces are functions of the local environment we can take the derivative of this local energy with respect to all positions the key point in the second part in this second part of the talk was that we don't try to establish closed functional form for u but instead use the kernel trick to approximate this function so we choose something like a certain number of reference atoms from our many calculations or from our many atoms and we then determine the fitting weight and the fitting rate will be determined in such a manner that we get a good approximation for the energy okay so the total energy is the sum of all the local energies and the local energies are expressed by this uh kernel this here is our local environment for the atom i and these are reference atoms and the weights are attached to these uh reference atoms we can also take the derivative of this equation that's going to be pretty nasty and there will be a lot of algebra if you actually try to run through it but it can be done in a couple of days to derive closed equations for this so you can take derivatives of these guys with respect to the positions and the important thing here is if you look how the derivatives are taken the derivatives will be taken with respect to the x i so the x i b remains unchanged and that means we have also a linear relationship between the forces this year the derivative of the energy with respect to position essentially forces and the weights vip so both u the internal energy as well as the forces are linear in the weights w i b okay there are still a couple of questions how do we choose these uh structures i've told you before that we typically have 500 structures for which we do the first principle calculations how do we choose them and second i've not yet told you how to we choose our reference atoms so we have something like 500 structures with 100 100 atoms is this gives us 50 000 atoms to pick from and we can select any of those to become reference atoms so how do we do this and here comes the slide trick the on the fly machine gun so how does it work we use or sorry we first read in existing machine learned force field if that is already available on the disk and then we made the prediction for the local energies and forces but we make one more thing we also predict the uncertainty of our predictions we can do this because we think we don't use not just kernel rich regression we actually use what is called bayesian regression and bayesian regression does not only allow to predict the total energies and forces but we can also predict how accurate this prediction is this is called bayesian variance or the base in standard error and we predict this error for the force that is acting on each atom the equation is not specifically important i've just written it down here it's uh in other terms it's the diagonals of the covariance matrix so we predict forces and energies but also how accurate the forces are now if the accuracy of the prediction is not good enough if it is above a certain threshold then we prefer my first principles calculation the created first principles energies and forces and stress tensors are added to our database and on the fly the force field is retrained once we have retrained the force filter we use actually this new force field to update our atomic position and perform a new md step and after the md step we go back here predict the potential energy surface and the uncertainties and again check whether our predictions are accurate enough if not we perform again the first principles calculations so recall that the error in the force is calculated for each atom in there to be considered structure and if the arrow exceeds the threshold for any atom for any single atom a dft calculation is performed furthermore this specific atom is actually becoming a reference atom so it's added to our set of reference atoms xib so those atoms where the force arrow was particularly large will be actually included in our what we call reference atom basis set actually it turns out we need with this strategy we need only fairly few first principal calculations typically on the order of 1000 to actually train very efficient force fields and also the number of reference atomic environments actually grows fairly slowly okay so uh the slide trick therefore is uh the on the fly learning so what we essentially do we perform the molecular dynamics and this is an old slide that shows how this works uh the red line is actually the predicted error and the black line was evaluated upwards theory by using the currently available force field say the force will be have at this step and comparing the predictions of this force field with actual first principle calculations so the black curve is true error the red curve actually shows the estimated error from the base in regression okay it doesn't involve any first principle calculation at this point here so without doing a first principles calculation we can look at the predicted error in the bayesian regression and indeed turns out that this correlates very well with the actual error this shows you one thing that the basic narrow estimate is in fact reliable and a true zero grade for the true error okay so what happened here specifically here we trained actually phosphate for metal ammonium lead iodine 3 and what happened here is that the metal ammonium molecule this is this guy here actually was rotating in its cage so we have a cage of lead iodine surrounding the metal ammonium ion and the ion starts rotating here and as it rotates the bayesian regression predicts that the errors will increase in the details to increase and that's to be expected since as the metal ammonium rotates it starts to form new bonds with the surrounding lead atoms and that makes our original force field unreliable and we need to perform first principle calculations so the dots here are actually those are cases where we have performed first principal calculation here obviously the error increases again here also there are increases in therefore again first principle calculation of the front okay let's do another wrap up so at the end of the day without going into too much detail we have to solve a simple least square problem a regression problem this guy is called the design matrix these are the weights that we have to determine and this y here collects all our desired output values so the vector y collects all the forces that we have obtained and all the energies that we have obtained from all the training structures again which all we do typically is something like 500 first principle calculation so we have 500 energies in this vector right and then we have 500 times 100 times three forces i already told you we have typically about 100 atoms for each structure we get three forces for each atom so we have 500 type 100 times 3 forces and we collect all the desired value into a big vector y w is the quantity that we need to determine let's go back a few steps this is this w i b here and that goes over all the reference atoms so it's a vector and the components of the vector are proportional to the number of reference atoms that we have picked these are the ones where the threshold was surpassed these are typically 500 maybe sometimes two three thousand reference systems okay so this here collects all the weights psi is the design matrix this is essentially for the energies the kernels and for the forces the derivative of the kernels with respect to the positions with respect to the position coordinates we have local energies that need to be summed so actually we have here some over all the atoms to get the total energy and ib are essentially the weights that we have to determine recall the number of ivs are proportional to the number of reference atoms we have picked so this matrix is pretty large on on the for the columns we have actually essentially the same number of components as we have in w so typically 500 and the number of rows is equal to the number of components in the vector y so it's actually 500 plus 500 times 100 times three equations so for this design matrix the row index actually is the sum of 500 plus this here and the column index essentially goes over all the chosen reference atoms typically 500 what does this mean well this means this is a hugely over determined system one about 150 000 equations but only 500 unknowns okay so how can you solve it well you can never solve it exactly you have to recast this into a minimization problem where you actually impose that the norm of this design matrix times w minus y should be a minimum so this is essentially what we need to solve at the end of the day okay so this is what we need to solve at the end of the day and this solution can be easily obtained it can be formally shown that this can be obtained by multiplying this equation with the transposed of the design matrix both on the left side as well as on the right side so you multiply this equation with psi transposed and this equation also is applied transposed to obtain something that is called the normal equation you can look this up everywhere so the normal equation is a pretty standard thing you find in a any good textbook let's look how large this equation is so this guy has 500 components and this guy psi transpose times psi actually has also only 500 is only a 500 by 500 matrix and this also cooks down the original device which had 150 000 lines to something like 500 lines yeah because the matrix side d is actually a matrix of 500 times 150 000 uh entries so it cooks down this quantity to something that has only 500 components so this is immediately and exactly solvable by inverting this equation here so the final step is you invert this guy here and multiply this this guy here to obtain the weights this is what we essentially do to actually machine learn our coefficients w and there's really nothing magic here i mean really this is essentially the standard least square the standard solution to a least square problem you can find in any textbook now there's only a small hinge here or a small update here a small change here this guy often becomes and in our case often is singular so it has uh some values that are zero and that you cannot invert so actually to get away with this uh one regularizes this equation by introducing and adding essentially the unity matrix multiplied by a tiny value that is typically of the order of 10 to the minus 4. so you place the normal equation by what is called the regularized normal equation so this is the final thing that we solve essentially by adding this quantity you avoid or this work this matrix essentially becomes exactly invertible now there's one small catch here in many of our studies we find that the regularization we have to use to get the most accurate results approaches machine position so this equation is actually solved in pretty much all machine learning codes that use kernel rich regression so it's kind of absolute standard but again the problem we observe is that this quantity often or this the condition number of this particular matrix of the squared normal equation often approaches the same position actually there's another way to solve this equation and this is done by just formally multiplying this here with the inverse of the design matrix psi to the minus 1 times psi obviously gives the one matrix and then you have here psi to the minus one times y so formally you can also obtain the solution by writing w is equal psi to the minus one y there's however one problem here this matrix is not in the strict ordinary sense invertible it's a rectangular matrix so the rank of the matrix is given by the number of let me think by the number of columns so the rank of the matrix is only 1 500 but it has 50 000 columns such a matrix is not exactly invertible so you have to use what is called the pseudo inverse in practice we use we do this actually what is called a singular value decomposition so we feed this design matrix into a single value decomposition and then can calculate the pseudo inverse of this matrix okay uh now i know this might have been a little bit diffu confusing so this is the standard machine learning way virtually everyone now uh that does uh perform machine learning it's called the inversion of the normal equation or of the regularized normal equation this is a twist we have introduced recently where we actually avoid a squaring of the problem and instead calculate the pseudo inverse using the singular value decomposition as i will show you this here the second step here where it's only done in post processing in the case of vast makes the force fields the machine learned force fit a little bit more accurate okay so time for you to dwelve a little bit in uh in how we do this inside wasp and what parameters you have to choose right i mean nice to hear about all the theory but i think we should look a little bit about the many parameters that are involved and what are the most important ones i told you that we have to characterize the local environment so obviously we have to decide where is the cutoff what cutoff do we choose okay typically we choose five functions cutoff you can change this by changing the barometer mlr cut one in the inca file but in our experience five angstrom is a perfectly acceptable value for most of studies we have performed sometimes we increase it to six the second thing is something i have not talked about very carefully but in fact it's clear that the atomic distribution is kind of made up of delta functions generally we and many other people actually was a suggestion by double channel you can replace this delta function by a gaussian problem function and the broadening parameter that vtp uses 0.5 again it's a parameter you don't have to worry about you can actually stick to this parameter for most of your studies the final quantity to choose is the number of radial basis functions recall we have svergy bessel functions but we have to title the code how many of those we use turns out it doesn't matter so much between 8 and 12 is typically the best choice but the current default is actually to use 8 such radial basis functions now we also have parameters for the angular descriptors the radial descriptors are first used to express the pair distribution function but we have to also care for the three body correlation functions for the angular distribution so we have similar parameters above for the angular distribution function and there's one additional parameter for the angular distribution function that's the maximum l quantum number recall we have ylm times the spherical bessel function so the maximum l we have to pick as well in virtually all stars we have used four i think there's no need to worry about these defaults they work 99 of the time okay you can increase a little bit the number of radial basis functions from 8 maybe to 10 or 12 but that's about the only thing you should do of course if you increase the number of radial basis functions you increase the compute cost but usually only slightly improve the fit quality so if you increase this beyond a we hardly ever found a significant improvement in terms of the fitting quality but if you're worried you might try this you can also decrease mlrp in between the number of radial basis functions which will obviously improve performance however in our experience it's not something that is desirable because usually it also worsens performance somewhat so the lowest value you can go for is around six so eight is pretty much the perfect choice so not a lot of things to worry about here you can more or less leave these parameters as they are and keep them fixed there's something you have to worry a little bit about and that's the on-the-fly training i mean we've really tried to make this black box but it's something you have to control fortunately there's only a single parameter essentially that you can change and that's called mlcti4 that's a little bit a strange word so it's essentially the threshold for the forces if this threshold is exceeded we perform a first principles calculation so it decides precisely when the first principle calculation is performed so it's essentially the error if the error in the if the bayesian variance is exceeding this threshold the first principles calculation is performed usually this threshold is and we make it always easy for our users to use it usually this threshold is automatically updated but whether this update is automatically done is again controlled by a specific flag inside bus okay so how is this update performed well the logical way how to do this is uh actually to consider previous bayesian errors so the predicted base scenarios that were that one observed in the previous steps so therefore we store the base scenario for typically 10 steps and average this and this average previous error is feeded into mlc ti4 becomes essentially the threshold now there's an important parameter to tweak it that's mlcx so if mlcx is positive a lot we pick a little bit of a larger threshold in the previous bayesian errors if mlcx is negative our threshold will be smaller than the previous average errors this means you perform fewer first principles the threshold becomes larger you less often surpass the threshold so fewer first principles calculations are performed if you have a smaller threshold so if mlcx is negative more first principles calculations are performed so essentially you have to worry mostly about this parameter which allows you to tune the number first principle calculation make it a little bit positive say zero to two you get more first principles calculation make it a little bit negative no sorry make it a little bit positive you get the less first principles calculation make it a little bit negative you have a smaller threshold you get more first principles calculations in our experience leaving this to zero which is the default works for almost all the systems we've considered however it's still something we are fine-tuning and working a little bit to improve okay so here is a typical learning run for solid silicon we have heated this up from 0 kelvin to 800 kelvin and this is a plot you need to make in order to control whether everything works well so what we show here is essentially the basin error the predicted base in errors in green it's actually rather the the maximum base in error we find on any of the atoms this is the actual error the average fitting error that we have in our run and this blue line here shows essentially the threshold that is currently chosen again the threshold is actually calculated from previous bayesian errors by averaging them so the threshold here slightly goes because the basin error also grows on average and whenever the cti4 is updated we have a jump in this criteria you see first principle calculation always performed if the green line is above the blue line so whenever this is the case of first principles calculations performed so this is a nice run here everything worked well because you continue to have first principle calculations you continue to have values above the green line so if it happens that the blue line is always above your green line of course learning stops so maybe you have to decrease your mlcx to make your threshold a little bit more tight here's another example where we have actually trained for co on the rhodium surface our initial cti for our initial threshold was zero zero five uh what you see here is uh actually again the average arrow which is the red line in this case the average error is actually dominated by rhodium atoms on which we have trained before so we've trained before on a clean surface and therefore the clean surface and ferrodium bulk like the atoms is pretty small so therefore the average air is pretty small as well and we have added co on the surface uh the bayesian arrows are pretty large because they are the maximum errors on any of the atoms and the base scenario is in this case dominated by the co where we haven't done yet any training on and what you see here is that the basin errors the green lines are sometimes really jumping up here it's jumping up tremendously and this is related to the fact that the ceo molecule shifts on the surface jumps away from the initial hcp side to the top side or vice versa i don't recall exactly which simulation this was but it jumps from one absorption side to another absorption site that causes huge space in error predicted error and therefore actually the machine immediately starts to perform a lot first principle calculation in order to increase uh our data space and to improve the fitting of the first print percentages again everything here works nice and smoothly the threshold nicely behaves and and we have continuously few first principles calculations not too many again you can steer whether you want to have more or less by setting this mlc spirometer so last not least how we do we typically perform the training uh for bulk materials we now always use launching constant pressure dynamics we set md algo23 which selects the language and dynamics that i think thomas has already talked about we said thermostats for both the cell shape as well as for the atoms in this case it's in monoatomic atom and at the end uh you have to be a little bit careful to use uh sufficiently large humid cells in order to reduce the volume fluctuations the smaller the unit cell is the larger the volume fluctuations are if you use a larger unit cell the volume fluctuations are comparatively smaller this is important to remember so we typically have around 64 to 100 atoms and finally be careful you have to increase the default cutoffs because you want to avoid what is called pull stress pull stress is an error in the predicted stress sensor that is related to the basis setting completeness errors so we run these calculations typically at the cut off this is 30 increased compared to the cutoff that we would use for standard calculations or to the defaults we increase ian cut from its default value by a typically effector 1.3 now there's one thing you have to be careful with this is the common mistake that we see in a lot of students that start with the code the preview vectors might substantially change so for instance you might set out from the perfectly orthorhombic lattice and after say 10 000 steps the system just changes its probabilities completely if this happens you can throw your simulation in your entire machine learning so you have to take care and watch carefully the private collectors that they remain essentially unchanged why is this so vast in these runs does not change the basis set it's a little bit of a problem with us currently vasp is unable to adopt the basis set to cell shape changes so if you encounter these kind of problems it's better to chop chop up the machine learning into small fragments and run not more than 1000 steps and re-adopt the basis set by restarting wasp after say 10 100 1000 steps of course for surfaces we keep typically the box fixed for surfaces we use cylinder and dynamics for the ions but not for the cell shield a few words here so when you run the machine learning code vastly create two files and one is called the mlrbn file this contains essentially our first principles database so it contains all these energies all the forces that we have collected from the simulations yeah so it's kind of our database our fitting database the second file that is created is the mlffm file and this file essentially contains the final force link so this is our database this is the final force field both can be recycled okay so you can copy the mlabn file to the mlab file and read it in and refit for instance or continue continuous training so for instance if you copy it here and add an additional atom say seo atom on the surface you can continue your training from the previous learned force field and add additional structures to your data set so you can start to train for instance for the ticonia metal add oxygen and continuous training for zirconia and this way you obtain simultaneously force feed for zirconium for the metal as well as for the zirconia for the oxide okay you can also retrain it without actually performing first principle calculations here i've set the number of steps to zero i've told the code to read in the mlap file and to kind of uh retrain the force field for instance using a different number of radial basis functions so you can read in the database file refit it and change for this fit the cutoff distance this marine parameter or whatever you want to change and obtain a refitted uh force field uh okay i've told you about that we can also do the pseudo inverse so usually vast actually uses and regression but there's also a way at the end to switch from this base in regression to the pseudo inverse essentially you read in the mlap file this is this database and you tell the code to read in the mlap file and to replace the basin regression by a singular value decomposition this was the pseudo inverse that i was talking about and i will show and demonstrate in a minute that this generally gives slightly superior results than using the basic reflection so again how does this work you run the training you get an mlab file and at the end of the day you read in the mlap file and re-parameterize your machine learned forceful using the singular value of decomposition okay so this is our database file the mlffn file is our final force field file you can also read that in by writing mle start is equal to and that tells the vast just rely on the machine learned forcefield and to use this forcefield for the subsequent runs without actually ever starting first principle calculation this doesn't allow you to retrain but you can now apply the force field say to a large much larger structure with maybe say 300 1000 or even 5000 atoms and run just the molecular dynamics for this uh particular system you're interested in i will give you a few demonstrations of what we have done in the past i will actually be much quicker than i planned to uh i will first cover zirconia taconia is an ultrahard material with excellent high temperature stability can be used for instance to make diamond imitates it's technologically relevant because it's so hard and actually it's also formed when the circonium metal is exposed to oxygen and zirconium metal itself is a very important uh material for a reactor encasings and one thing that always happens is that on the surface it becomes an oxide here we are interested in the fact that it has a complex phase diagram and it's uh challenging to calculate because it's a highly unharmonic material so what we wanted to calculate is the phase transition from the monoclinic phase which is the low temperature phase to the tetragonal phase and finally to a cubic phase so the training was here done by heating essentially both the monoclinic structure as well as the tetragonal structure from lower temperatures to high temperatures we we actually performed in total 592 first principle calculation and at the end of the day once we had acquired the mlap file we actually retrained with the singular value decomposition that typically takes about 30 minutes to actually re read in the mlap file and recreate the mlf file using the singular value decomposition here is a comparison between the base and the neural expression and the singular value of the composition so this is the force field you obtain directly from vast using basin linear regression solving normal equation and this is the one where we have retrained the force field using the thing or re we parameterize the force field using the singular value decomposition and the pseudo inverse what you see here is actually uh training and test errors for the energy for the forces and for the stress tensor what you see immediately is that the svd calculating the pluto inverse actually reduces the errors not by a lot by something like 20 percent here the phonons the predicted phonons look quite similar for the base in linear fashion and for the singular value decomposition uh the reference results for pb is all in this case are shown as dashed lines actually if you quantitatively compare the phonons for basin linear question and singular value decomposition you see that the basic linear rotation at the somewhat slightly larger area for all three phases then the singular value decomposition so clearly singular value decision singular value decomposition makes the force field slightly more accurate so that's why we use this at the end of the day so here we wanted to predict the phase transition from the monoclinic phase to the tetragonal phase and find it to cubic phase so how did we do this to start actually we just heated the monoclinic phase that's the lowest temperature phase we just heated it up and indeed at the temperature of about 1 800 kelvin there is a sudden change in the volume that is a component to a transition to the tetragonal phase okay so just running the machine learned force field on the system and doing this sufficiently slowly you be indeed the phase transition from the monoclinic to the detector phase if you would be do this with vasp without machine learning force field it would take you probably a year or so to do the required calculations because you have to do the heating so slowly then we continued the heating and at precisely 2400 kelvin we found all i think 2360 kelvin found the second phase transition to the cubic phase you see this here better the tetragonal phase is characterized by a lattice constant c and two lattice constants a as you heat it you see that these collapse so the c like this because let this constant collapse is also the a goes to single at this constant so also we let these constants become decreased and that characterizes the cubic face oh sorry these are the experimental data here are the data that we find from the simulation so pretty precisely 2400 kelvin the c like this constant in the a lattice constant collapses into everything that is constant that characterizes the cubic phase again nicely you can actually now start cooling the thing so once you're in the cubic face you start cooling this thing with the again running the machinery and forcefield police and nicely you observe that you have a reversible phase transition back from the cubic phase to the tetragonal phase upon cooling so it starts to distinguish again between the c and a like this constant so it runs without any barrier back into the tetragonal phase up and cooling unfortunately if you continue cooling it doesn't revert back to the monoclinic phase so continue cooling the tetragonal phase stays stable until you hit zero temperature so the tethoscape phases is so to say metastable phase and doesn't revert back to the monoclinic phase okay how can we accurately determine the phase transition temperature of course one way would be to just very very slowly cool but that's super inelegant and not very adequate so instead we did a different approach here we did something that is called thermodynamic integration uh and the trick is that the free energy or the key point here is that free energy is not an observable so you cannot calculate it as an ensemble average i think thomas has already told you about this so you need a different concept and the concept we have used here is to actually analytically calculate the free energy at low temperatures uh to do this we have used what is called the classic harmonic calculation cross harmonic approximation and we have obtained the free energy in the quasi-harmonic calculus harmonic approximation using phonon pi so we calculate this essentially for both the monoclinic and the tetragonal phase at low temperatures that's adequate if you are say 20 or 50 kelvin the quasi harmonic approximation or the harmonic approximation so to say is an excellent approximation you can use it reliably you just essentially use phonon pie to get free energy estimates you do formal calculations works without with charm works easily doesn't work at higher temperatures so within the quasi-harmonic approximation uh we have calculated the free energy this is uh this line here between the monoclinity and the pattern on our face and this is our prediction it would predict actually a transition temperature of 1 200 kelvin that's not an agreement not a good agreement with the experiment so you have to correct for anonymities and monuments are fairly large in this material because it's a highly unharmonic material okay so how did we do this again already a little bit too thermodynamically the specific thermodynamic integration we did we used here is something you find in monash confirming and it's even used by experimentalists to calculate the free or to measure the free energies of materials it's actually an integral from some temperature t 0 to t over the enthalpy which in our case is equivalent to the internal energy this is something that splits out it's essentially the average of the energies that was always calculates so it's just the thermodynamic average over the free energies well free energies that's what was cause it includes essentially the electronic entropy but for our purpose this is essentially the internal energy so you need to determine the internal energy at the set of temperatures along this uh trajectory essentially and then you can integrate this enthalpy or internal energy divided by uh the current temperature at which you're running squared and integrated over dt squared to obtain the free energy okay this is easy to do it's actually really straightforward with a few python scripts or a few batch scripts to calculate this you only need to run essentially zero pressure simulations using the machine learned phosphate at the certain number of temperatures but for both the monoclinic as well as the diatomic phase what was spits out or what the machine learning code spits out is the internal energy uh for each of these temperatures you average this and then you perform this integral and you essentially done and obtain the free energy for both uh the monoclinic as well as the tetragonal phase you then take the difference and look at this graph you predict actually a transition temperature of 1500 kelvin which is now in quite nice agreement with the experimental value of 1400 kilo okay uh one more thing you can do with machine learning force fields and that's a pretty cool feature is you can calculate the thermal conductivity some people have already done this but we have been really really careful to do this it's actually something that is virtually impossible to do with first principles calculations alone why so actually you need a local energy model to calculate uh this here is essentially the equation for the heat flux and you see already here that you need the derivative of the local energy with respect to the ionic positions and that's not available in wasp you don't have something like local energy so this quantity is not available in bus we can never ever calculate it invest but once you have performed the machine learned and calculated machine learning phosphate and obtained the local approximation for the energy or the energy is approximated that's the local energy of all the atoms you can plug it into the standard equation and calculate the heat conductivity using this closed equation we did this again for uh for tetragonal as well as monoclinic zirconia and this our data point there have been some attempts to do this in first principle calculations essentially using fhi aims uh and there they had were able to break up the energy into local energy contributions because the local use localized basis sets not something we can easily do and we don't trust those calculations think the arrow in this data is pretty large you also see that there's a huge noise and they used a lot of fancy tricks to get these values essentially here is the machine reinforcement it just pumps up essentially comes up from the calculation directly and you see here comparison this uh this experiment and the agreement i would say is actually pretty good i mean there are a few things one needs to highlight here the equation i've written down here is called green kubo equation and this green kubo equation needs a local energy model so it's not available if you don't have a local energy model so here the machine learning does more than just fitting it actually assigns energies to atoms so the total energy that was passed is broken up into local energy contributions so it's nothing a plane wave code ever does it's actually done by the regression model actually what does the regression do it kind of tries to give you the best local energy model that within well we kind of our model needs to depend on the local environment i've already told you this and what it does it tries to find the best fit for the forces in total energies within the boundaries of this local energy model so it's doing more than just an interpolation uh the results are largely independent of the machine learned false field so we have tried different machine learning force fields and upshot there is that the agreement between these simulations is really within five to ten percent okay so uh one hour 14 very quickly we have done similar things for ziconia uh here the graph compares again base in linear regression for the phonons with singular value decomposition his agreement is not so nice essentially because we have trained on fairly small unit cells so the training here was not done so carefully as it should have been done maybe but it was not our goal to get supremely accurate phone and dispersion relations but here's one important thing i've not shown you before these are the elastic constants what you see here is that our machine learned elastic constant so if you use the force field we can also calculate the elastic constants agree very nicely with the dfd calculated ones extremely nicely well there are some discrepancies the largest one for c3c there's an error of something like five to seven percent but otherwise the agreement between the machinery and force field and the vast predicted uh force constants is really exceedingly good of course stream in this experiment cannot be expected to be so great and indeed the c44 is quite off for this particular material for zirconia so you can also predict elastic constants using machine learning force but very quickly in this case uh it's a little bit easier than ticonia because if you heat you have a phase transition from the pcc phase to the hcp no from the hcp phase the hcp phase to the bcc phase if you cool it's reversed and here we actually predicted the phase transition temperature by very slowly heating and very slowly cooling four times and then we averaged the transition temperatures that we observed in the computer and obtained the value of 1047 kelvin for the transition temperature uh melting properties here we have used another method uh the so-called interface spinning method combined with machine learning force field to predict the melting temperatures for a couple of materials interface pending method works in the following way that you have the crystal on the left side in the central part you have the liquid and then the crystal because of biotic boundary conditions you find again the crystal so you simulate simultaneously a crystal liquid interface and then using some tricks you can actually essentially determine when the liquid and the crystalline equilibrium and from that you can deduct the melting temperature uh there's not a lot to say here we did this for different functions maybe the intriguing result is that the lda is really dreadful for silicon scan yields the best results best agreement with experiment but even for germanium scan underestimates the melting temperature by 200 kelvin uh this here is aluminium is the case where pb is always best and uh magnesium oxide is the case where actually lda is the best this shows you density function in theory is really not so easy so to say right depending on the function you use and depending on the material you want to consider you really have to pick the best functional for the material at hand to get the best degree in this experiment so this is something you should be always aware of density function theory is an approximation and yeah there's not yet one still functional that's best one must say however if you use can you get quite a good compromise for all materials so scan this is a new method gga function by and large gives you a pretty satisfactory description of the melting temperature of all materials but even then errors can be up to something like uh yeah i would say 10 or even in this case uh 20 uh salvation energies can be also calculated so here we calculate it again i will not give you the details the solvation energy of the lithium fluoride crystal so how much energy you gain when you dissolve the lithium fluid crystal and it becomes a solution in water so essentially this is here what happens you have one lithium atom in the water and the fluorine atom embedded in water essentially we did this again using thermodynamic integration but because of lack of time i i refer you this paper and i want to give you only one particular important thing that this paper kind of works out we train the machine learned force field only for the case of full interaction so that fluorine and lithium is fully interacting with the water as well as uh for the case that the lithium and the fluorine is not interacting with the water that means we have pure water so we train only on lithium fluoride in water and water that are the only draining data that we supply and regardless of that we perform a thermodynamic integration where we actually use kind of fractionally switched on interactions between lithium fluid now we have trained only at the end points where you have only water and lithium fluoride dissolved in water we have never done any training for the intermediate points it turns out to do this thermodynamic integration that we've used here you don't need to ever train on the intermediate data that's remarkable last not least we can also train using delta machine learning but i will not show you these results i've talked about them now twice recently in a talk given by materials design if you're interested in this probably you can still hear these talks online let me pop up the acknowledgements this work was mostly funded by the fwf as well as materials designed the last bit within the onset proposal so i thank them for financial support i need to i really need to emphasize who did most of the coding of the machine learned code that was oscar shinoshi he wrote most of the code and ference kazai is now in charge of rewriting the code and improving its performance and he also contributed a lot to the writing of the code when ryosuke was in ireland was in vienna uh many of the studies i've mentioned are have been performed by carla she did the zirconia and menopause and thank you for listening but i i want to pop on the summary i think it's pretty clear that finite temperature materials modeling is now at our fingertips uh the combination of density functions theory and machine learning can essentially resolve the intricacies of complex materials one thing that's really remarkable the amount of data that we need for machine learning is astoundingly small so we typically need 500 to 1000 first principles calculations for each phase that we want to treat it's clear that this machine learning is not a high but at this point i think everyone knows it's not a hype but even if it is hype it will stay uh in in my view machine learning is a kind of universal functional approximator and it can even fit if humans are unable to do it so we humans have very limited capabilities to resolve uh complicated relationships we have to train the machine we have to tell the machine how to establish the relationships but once we have trained the machine the machine is much better to actually resolving the relationships i think one thing i want to quote get across to you if you don't do machine learning you are that cheap in the water so none of these calculations would have been possible without machine learning or yes they would have been possible but it would wouldn't have taken 10 000 cpu hours but rather one million cpus but there's one thing i need to warn you of machine learning is totally agnostics so it maybe has a little bit too late in physics so always test test test test also this is a new code so the code can always fail it can give error knows predictions but what i mean is specifically by two tests is that uh here we have trained on different phases of silicon including cubic diamond hexagonal diamond bc8 simple hexagonal and so on the only structure we have omitted of course we could have done it was st12 so we think not included sd12 in our training set now this is our prediction for sd12 which we haven't trained on and this is the proper result for st12 so there's a huge error in the predicted energy for sd12 because we have not trained on it so machine learned force fields are not transferable they have no physics building they are totally agnostic they are really data based they do only what the data tells them to do so if you go outside the regime of validation you get totally wrong results so be careful specifically always look at the predicted base invariant so usually when we run our code even without training so if it's out kicking in the first principle calculations we calculate the we can predict how large the errors are it's your obligation to watch these errors carefully and if you find situations where the error grows you have to increase your data and decrease your data base with that kind of warning i i want to finish and thank you for your attention and i will now try to answer the questions okay great thank you very much gaia for this introduction into the machine learning methods implemented in vasp so let me start with this very fundamental and basic question what is the ground truth that is baked into these machine learning force fields so yeah yeah sure in there yeah so we are talking here about machine learning force field they are supposed to be a zero goal model for density function in theory most of the time time so the way we do it the ground truth our ground truth is density function in theory the information that we pass from density function in theory to the machine learned force field are the energies forces and the stress tensor so ground truth is density functions here yes so thank you for that answer and now let us go to the next question um it's if i have a given temperature and pressure then um the calculation is run in an npt ensemble so for a liquid phase um where we have a binary internally solid solution um should the larger gamma be the the same or how do you change i i i think yes yes good question i don't know the the ground truth is i have no clue i mean they you should have asked this in the previous talk because uh uh thomas butchko wrote the code but i can give you a few hints most importantly the lingering gamma is very system dependent so there's no upriot way to pick the best the best set of parameters you have to play a lot around a little bit actually in principle in principle it should not really matter right but actually choosing a too large friction parameter makes your your trajectory is very stochastic so to say because also the stochastic forces become larger so essentially your molecular dynamics becomes more like a monte carlo algorithm so you you don't move smoothly in face space so again there's no optimal choice it's highly system dependent if you if i mean there's i mean if you ask me i would probably choose the same value for liquid right so because i i just can't be bothered i would simply use the same i think as a practitioner that's what most people would do yeah with some with some testing in place yes with some testing but i mean testing is also difficult for these things right i mean okay with machine learning force fields it's okay with first principles it's really tedious i mean if you start testing the launcher in gamma we said first principle calculation you you use millions of cpo's for nothing yeah so i think it's best to train first the machine at first we can then play but i mean in principle the observables should not depend on the launch with gamma as far as i understand if you're not extremely unlucky so so maybe one important thing to add there is that if you choose the wrong legend gamma then your system could explode uh easily but if you don't explode and you assimilate the material then it doesn't have to be a perfect uh ensemble um because you're not you're um using the the energy to to learn the machine the force fields but and then the production run needs to assimilate yeah good assembly but yeah if you if you refer to machine learning training but how you said need to set launch on gamma yeah then you can probably rely mostly on the defaults cell yeah um okay so let us go to the next question um to create a force field do we start with an initial guess or um when is the first field uh force field first applied yes uh when you start it and you don't have a force field yet available the force field is actually i mean what the code reads in is the mlap file if there's no mlmr file it just starts from catch scratch so if this is not available you just start from scratch then uncertainty is essentially infinite so it will perform typically 10 first principle calculations in a row and create an initial database so to say yeah so there's no problem don't worry if you don't have initial data set it will work as well it will work uh properly actually you said in this case you set mli start to 0 which tells the code there is not yet an initial database and it will work flawlessly that's not really a big issue great i think that answered the questions and thank you very much for for this great presentation and i'm uh going to stop uh the recording at at this point yes thanks everyone and i wish you still a nice evening if you're in europe or a nice day if you are in the us right yes goodbye everyone [Music] you

Transcript for:Molecular Dynamics and VASP Machine Learning Force Fields

Transcript for:
Molecular Dynamics and VASP Machine Learning Force Fields