Transcript for:
C++ and Quantitative Finance Lecture

welcome to presentation on C++ and quantitative finance I don't want to bore you too much but just a little bit about myself and now starting my fourth year is a full-time instructor in the computational finance risk management program at the University of Washington say terminal master's program and we also place our students in internships and employment so it's not just the degree but we actually we are also rated on our placement rate then okay before that I spent about twenty four years in the finance field mostly in quantitative development and I got my first start with C++ in about 1998 so of course some of you also remember way back when when we didn't have some of this nice stuff okay so what we're going to talk about first I'm gonna focus more on the needs of end-users and what I mean by that are quants who usually use interpreted languages to develop models and such as VBA or MATLAB Python and you know they end up staying it at the office until 9:30 at night waiting for their programs to run or the other case you'd hear about is you know someone would have a model in MATLAB that would I'm not kiddin take two to three days and then someone would come along and code it up in C++ and it would run in about under five minutes on a standard desktop so I think what I'm going to talk about today in C++ in general would appeal to people like that and also include people who build financial libraries so things like option pricing models risk management models that sort of thing and that's kind of what I did in the after I got into C++ programming so also we're going to one thing this presentation is not complicated at all the theme really is easy to use and then but we're gonna get some powerful results out of that and in other words tools and C++ that we can leverage and I know leverage is kind of a cliche term right but it really rings true here and that we can think of the abstractions and new features in C++ that are like the torque around along the right there and so it takes less effort to lift a heavier load but I think more accurately it looks something like that so the idea is we want to let C++ do the driving as much as we can kind of like greyhound but a lot faster okay so at the end of the last decade we really started to see the start of something beautiful and C++ was soon to follow thank you john califor a good book I assigned that book to my students first day of the term and I have them read the whole book but we'll talk about math features coming up but also we got move semantics and along with that unique pointer and these actually helped make make models code more not just more efficient but more maintainable more readable and also the development time was was more rapid and I actually went back to some of the stuff that I worked on in the in the previous decade and I used some of these these new methods and it really does make a difference I could talk about that one topic for another hour but don't worry maybe another time and then we also got parallel STL algorithms in C++ seventeen everybody knows that but if there ever was an epitome for leverage it's it's those I mean you just had one parameter to an algorithm and you can just reap incredibly incredible efficiency gains okay the other thing that happened was better availability of decent open-source math libraries and for example you may have heard of eigen an armadillo these are both quite popular in finance and they don't just include matrices vectors and the basic operations they also include a lot of advanced decompositions which are which are necessary for a lot of financial modeling more recently I've just only now heard about these these new libraries here not try them but if they can do what they claim to do that would also be really great and then we have coming attractions we're finally going to get a date class in C++ 20 I know that there are other things like concepts and modules people are really and understandably so gaga about but for quant developers having a proper date class finally is really really great because dates are so important in finance and you know this was yet another example of something that we would often have to program ourselves well now we're finally going to get one and also as we speak the s g14 group is meeting and discussing putting linear algebra into probably C++ 23 so that will also be a welcome welcome improvement a few things about boost there are some things in there that are very intuitive and very useful but to put it politely there are some other things in there that could be a lot more user-friendly but we'll talk about a few things under the the first category toward the end of this presentation okay so what I want to do is demonstrate I realized C++ 11 is no longer that new but what I want to do is look at an example is is very typical in quant finance of having a model develop a model for an options price and to be honest I haven't we've got all this great stuff even in the C++ 11 but I'm not seeing it in in other curricula in textbooks it you know it's doing my little Rodney Dangerfield thing at the beginning there I mean it's like it gets no respect but I want to show how we can use these things and they're also very easy to easy to use to apply to a very common problem in finance so we will just take the easy case of a Monte Carlo option but what we do here could be applied to much more complex options so in case you don't know what a option is and you European option is it's a tradable contract that gives the holder the right to buy or sell a share of stock at a predetermined strike price in the future and more specifically for for European options at the expiration date other types of options like American options Bermudan options you can exercise before the option expires but even though you can't exercise until the option expires that option has value so the options themselves as many of you know are traded so we want to find a price for an option at the present time even though the payoffs won't be until the future so to get this set up what we what we need are since we we don't know what's going to happen in the future well we don't have that but what we can do is project stock prices into the future using random scenarios and so I'm sorry to illustrate that here of just five scenarios don't worry yet how it's done so we have five and you can see down here toward the left the current price at t equals zero is $100 a share now suppose that we buy a call option with a strike price of a hundred and five so at expiration let's look at what happens if we look at the blue scenario we see it the strike price is about a hundred and twenty dollars so that means we could buy the we could we have the right to buy the stock at a hundred and five and then we can turn around and sell it at a hundred and twenty and make fifteen dollars share profit but that's in the future but what we what we will need for our pricing model is we have to discount it back to today so what happens is we discount it all the way back using a discount factor based on the current interest rate now what happens if it if the terminal stock price is below the the strike well in that case the option expires worthless because it makes no sense you're not going to pay a hundred and five and sell it for less than you know sell for 100 unless you're the US government but sorry had to go there but we have to discount that back as well trivially to zero so and then it would be the same for the other payoffs just whatever whatever you get so this sets up the model now how do we use these to price the to price the option well suppose that the risk-free rate of interest is one point two percent and our time to expiration is four months so one third of the year then the way we compute the option value is we add up all of the payoffs and we have to account for the zero values I mean trivial is trivial but you'll see why in a minute I mean we don't actually in the computation have to include them but we have to be mindful of them then we're going to discount back to today using the interest rate and the the time to maturity or to expiration or you just use a continuous discount factor and then we take that value and we we average by the total number of scenarios and that gives us our option price in this case about six bucks but in reality five scenarios is not going to give us a good value for our option price and in reality we're typically going to be looking at about ten thousand even up to about a hundred thousand scenarios that we need to generate so as you can imagine this can lead to rather computationally intensive operations but the cool thing is if you plot it all out it kind of looks like a Jimi Hendrix album cover okay so let's impose the same example that we we use before so strike price at one hundred and five you can see down here the the initial price is 100 so if we end up above that strike price then we have to calculate the payoffs of of all those scenarios and then at the bottom there they expire worthless okay so that's what we need to do but before we can do anything we need to first generate one scenario and then once we have that reusable code right we just do it use the same thing many times so to do this we will use a stochastic process that you see there and what these things are so st is the let me back up a second what we're going to do first of all is we're going to take that that time frame from zero to capital T and we're going to chop it up into little increments delta T and this actually comes out of it's called stochastic calculus but it's it's like calculus at Delta T is an approximation for a differential small amount so we're gonna chop it up and then at each time step we're going to generate the next stock price by using this this formula this formula falls out of the black Scholes Theory some of you might be wondering well why don't you just use black Scholes we could for this one but I want to show how to use Monte Carlo pricing and there are a lot of options for which there are no closed-form solutions so you can use this method for more more complex options okay so most of these we take in at that we're going to write a class called equity price generator and so we're going to take in most of these at construction and they'll be stored as as member variables so pretty self-explanatory except maybe volatility if you don't know what that is it's the standard deviation of movements in the stock price okay so we have all that but you may notice there's one term up there epsilon sub T and that's what drives the randomness and what that is is a draw from a standard normal distribution and this is where the random feature in C++ 11 comes in very handy so well of course have to include our header file but we will also have to include the random header so we can use these features okay now the way it works in C++ is is a two-step process first we need to create an engine object and what that will do is it will generate positive integers that are uniformly distributed and then we need a distribution object in this case will be the normal distribution and it will apply a transformation of those integers to standard normal values and then the next one that's picked it is just the next standard the standard normal random draw that you get so for this we need first an engine object so we will use there are a number of different engine objects and there are a number of different distribution classes in the standard we will use the Mersenne twister algorithm the 64-bit version and this is the most robust engine available in the standard so what and what I commonly used when my group used when I worked in the private sector and then will also use the normal distribution so we need to sculpt that all right so the place where most of the work will be done is oh and by the way all this sample code not just what you see here but the entire code is available on my github site and I'll give you the URL at the end of the presentation okay so as you probably know when we generate a sequence of pseudo-random number variables are we need a seed value so that's what we'll have to take in here in this functor and we will store it we'll take those generated random stock prices and we'll store them in a vector V which you see there all right so this is where the magic starts so we're going to create an object called MT engine it takes in its constructor the seed value this is so this is our engine and then we create an instance of the normal distribution called nd you might notice that there are no constructor arguments for normal distribution that's because the default is 0-1 you might also note there's a default template parameter it's double precision okay so the next step is to implement this stochastic process well lambda expressions make this very very easy and you can basically see how it's it's been implemented here I don't think I need to go through all that but then the the next step would be we need to take the current stock price that we see in the market and that will be our first stock price in the vector for our scenario so now we have an ST minus one and we can start the process and so that goes on down here in we're going to iterate through all the the time steps and then at each time step we're going to call the the iterative stochastic process and get the next price in the in the scenario path so equity price here is the previous equity price and then to get the the draw from the normal distribution what we do is we call ND and it's functor and the argument is just the engine object and so every time you call that it's going to generate a new pseudo-random numbers taken from a standard normal distribution but it's important to remember that this is all made possible by the the random feature in C++ 11 so we now have a class that will generate one random equity scenario but remember we need about maybe 10,000 of these things so what we're going to do here is as you might imagine well we got 10,000 these things they don't care about each other so it's very easy to use test based concurrency so another feature in C++ 11 and so if we go back to our 5 scenario example so each each generated path is going to be is going to be a vector of these these prices to generate it in parallel we're going to use we're going to do that as a task and that's managed by a future object so the future manages creation of that vector so the vector is in the template it's the template parameter okay now assuming that's everything runs fine when it's done then we call get and that will get each of those individual vectors of random scenarios but remember for a European option all we care about is that last price so we just call back on the the vector that we that we get back from the future object and then like before we're going to discount these back and then take the average to get the price so let's look at how we might do this we'll create a class called MC euro opt price er we will use the previous result the equity price generator just an aside another nice feature from C++ 11 is enum class and we can use this to define our option type whether it's a call or a put ok now if you look at the constructor a lot of those variables are the same as what we needed for our equity price generator but there are a few extra that we need for specifically pricing an option and those are the strike price of the option the option type put our call and then there's quantity we will just assume 1 because we're buying an option on one share but that's if you are if you're taking a position in more than one but for our purposes we'll just we'll just assume one for simplicity okay then where most of the work will be done is in this class private function compute price async and so this is where we'll use test based concurrency to generate each one of those scenarios but so that we can compare runtimes I've also implemented a non parallel version and so when we get done with this we can actually compare how much better we do okay so again this is in this function this is where really the bulk of the work is done we will first we need to create an instance of our equity price generator and now those values the inputs are our members variables so that's done there's a we can assume we have a vector of seeds equal to the number of time steps that's done by this function generate seeds but each seed is a distinct English integer okay so since we're gonna be using future classes we need to include the header and then we're gonna have a lot of future objects and we need a place to store them use a container and when in doubt use a vector that's what herb sutter tells us so okay so now we've set it up so that each of these objects will be in a vector but at this point nothing has happened we have to give the command to execute each of these tests in parallel and the way that's done again so a lot of you may know this already but again I want to show how you can use this to solve a real world problem so we call stood async and it is going to take in for its inputs the function that is going to run in parallel and the argument for that function so we're just going to use the functor on the EPG object so now at this point as I'm sure many of you know there are a lot of nuances and obviously a lot but there a few and some things that you know you might want to consider about using test based concurrency but we will assume that everything runs fine we get the the scenarios and generated in parallel and we are now in a position to iterate through this vector of futures and again like I said we call the get function on future that returns the vector called back on the vector that gives us our terminal price and then we calculate the payoff based on whether its call or put discount it put it into another vector and then down at the bottom we compute the average and you see we get the price so that's how that works looking at some results so it's run on a hyper V twenty core virtual machine this is thanks to a friend of mine who works at Microsoft but in the real world though I mean a virtual machine like this would probably be pretty commonplace now in a in a Kwanza and you can see our first example there's really no difference in runtime so we're just monthly time steps one year 10,000 scenarios but as we start increasing the number of time steps and scenarios and then the value of capital T you can see we start to get you know some pretty significant improvements and then as we go even farther we start to converge around 90% so the upshot here though is that we've achieved this with very little effort so I can't tell my students don't do more work than you have to and there's all this great stuff now in C++ that makes that possible but my little rant is is that I can in the teaching materials and books and so forth you know they're still back at you know you know let's do new mime delete and design our own linked lists and things like that maybe that's good for computer science but for a quantitative development it's a really good at useful anyway that's that's really the upshot that and we have not had to deal with any manual spawning or killing of threads so less error-prone more maintainable and on top of it it can actually be more efficient if you don't believe me read Myers effective modern C++ that's where that little factoid came from okay so just a couple of other things to note we used the simple case of a European equity option but rest assured there are far more complex options that must be priced in in practice some of these and also some of you might be saying you know why in the Sam Hill is he price pricing a 10-year European stock option those don't exist in the market true they don't but a lot of other options do and some of them are for example interest rate and foreign exchange swap shion's those can go out five ten years hybrid structured derivatives which also involve interest rates and foreign exchange those are really fun to work with that's where the math gets really interesting and then there are also guaranteed investment products that are designed by life insurance companies and with they also have a life insurance component with them but these things can get very very complex and need to often need to be projected out 30 years and so believe me there it can get a lot more complex put a lot more fun and just as an aside this is the kind of stuff that I worked with before I went into teaching okay and then as I mentioned just to recap there are different types of engine algorithms and distributions available in the standard library so it's not just Mersenne twister normal distribution you know there there are others but for this type of problem where it's based on what's called the no arbitrage pricing theory and black Scholes Theory you're going to be using standard normal you want the most robust engine you can find okay so we've gone through kind of the modern C++ stuff Oh just to recap so resent twisters the most robust and I count seventeen distributions available in the standard library if you want more information on this I highly recommend Nico your services standard library book 2nd edition it's a great book for teaching I use it all the time and back when I was actually in the private sector I had the first edition I practically carried that thing everywhere kind of like when I carried a basketball everywhere when I was a kid ok so moving on we'll talk a little bit about boost as you know boost is divided into a lot of different libraries we'll look at some of the math related stuff one of those libraries is the boost math toolkit and there are two shall we say packages that these are all written by different authors so you could arguably call them libraries as well but two that are very intuitive and very useful our statistical distributions and that's a bit of a mouthful so I'll probably say probability distributions and numerical integration so and then in addition there are some other libraries outside the mathematical method toolkit and you see them here and we'll talk about those a little bit I don't have a lot of time to go into detail but there is again some sample code that you can see that I'll provide you at the end ok so probability distributions in boost so each distribution in this library is a class type now I can probably imagine that some of you are saying why are you talking about probability distributions and boost when we have them in the standard library the answer is in the standard library it's for ran a number generation only in boost what we get are the probability density function the cumulative distribution function and the quantile function but in reality we really need all four but I'll I'll talk about that at the end here but it's very easy to use you need to include the necessarily necessary headers with just a couple of examples here with T distribution and the normal distribution and then we can just create the objects very easily how do we determine a T distribution degrees of freedom right so students gd1 for degrees of freedom done easy and basically the same thing for the standard normal mean and standard deviation like before now what we're after are the those functions that I mentioned and the way that they're implemented and boost is they are generic not member functions but extremely easy to use so you want the PDF to say PDF your distribution object for example with d1 the T distribution object we created put PDF d1 value of x which you want to evaluate it done easy some same thing for the CDF and similar for the quantile function except we replace X with the percentile value but this kind of begs a natural question well first of all I count 30 for probability distributions in boost versus 17 in the standard library so and in addition the standard library gives us random number generation boosts gives us the you know the usual functions that we need now will we eventually see a union of all of these functions and distributions somewhere in the standard library so for example in the Arling which is everybody most most people familiar with our yeah so for math and statistics for a new distribution to go into either besar or into an approved package it must include all four it's required or else it'll be rejected and in practice we need all four so it would be nice to have it all in one place and have all 34 distributions and all the functions together but anyway okay so the other one that I really like in in boost is numerical integration there's a package in the toolkit called quadrature it actually does both numerical integration and differentiation but it is it is really really easy to use if suppose you want to calculate an integral using the trapezoidal use the trapezoidal function what does it take as inputs it takes in the function as either a function object or a lambda and then the the limits of integration and you're done very easy there are some additional parameters that have default sara didn't include them but for if you want to increase the number of iterations or put a different tolerance and you can now this kind of begs another question what if we wanted to write route finding algorithms now these are in boost but again because this is different author it's done completely differently from what you see here and I don't want to you know offend anyone but it's far more complex than what you see here but we could we could do the same thing and in fact a student of mine died over the summer we did some prototyping of the bisection method and steffensen's method and and we're able to get them to work we took a few hints from the actual source code for numerical integration in boost so again it's just like that you're done and you don't have to set up a lot of stuff and you know how it can be sometimes in some of these libraries so anyway there's some examples examples of those in the sample code if you want to take a look at them okay so to close it out there are three other libraries that I mean that are I found to be quite useful for financial modeling one is the circular buffers library this is an STL compliant container it's much like a stood deck except that it has a fixed capacity so it's very very useful for handling live rate feeds say from Reuters or Bloomberg so it will fill up to capacity and then when the next data point comes in it pops off the oldest one pushes on the new one and it's very convenient for that purpose another one are the accumulators these are also STL compliant they're ideal for managing data columns because they're equipped with the usual descriptive statistics' functions like mean median max standard deviation and so forth so those can be useful too and then finally is multi array and also STL compliant it's a templated multidimensional array and one thing it can be very useful for is lattice models for option pricing and to give you an example I'm going to show you a binomial lattice for pricing a European option and so it looks something like this it's similar to the Monte Carlo method except the up-and-down movements are described and the the probability is actually prescribed but it's not arbitrary it this also comes out of the black-scholes theory so again if you were to take all three you took the Monte Carlo and the this example they should all converge to the black-scholes price but it's similar in that you see if I notice so you can see we go out on the tree and it's just like you know scenarios in Monte Carlo and we generate the equity prices we get to the expiration date and we then look at the payoff so these are out of the money and these are in the money and then we use interest rates and the probabilities going back to calculate expected values of the option price at each one of these nodes until we get to our option price here now why are the why are these useful each one of these nodes is an object so this is a very simple case going out you could have a struct it stores the the equity price going out and the payoffs going deck but again there are much more complex options and in fact that object you have might be a full class with more calculations that need to be done along with member functions that do some of those interim calculations and the other thing too is like with with Monte Carlo you you would need you would need more time steps otherwise it's not the you're not really going to converge to anything meaningful okay so Wow I was worried about not having enough time but I'm we're almost there so you can see down the bottom this is from a book called option theory by Peter James if you're interested at all in this field I highly recommend this book I also use it for several of my classes and also use it in practice and speaking of references here are a few more that I used for for this talk if you would like more information on our program shameless plug that's our URL you can also ask me about it if you're interested we are a top 15 program in the country and we get essentially a hundred percent placement of graduates so then after that just to wrap up here's the github address for the sample code if you'd like to contact me this is my email address at the University I'm also on LinkedIn and I will also be around all week so for the rest of week wise say it's almost over we're at the past a halfway mark now but anyway I'll be around the rest of the week so if you see me if you'd like to ask any more questions that's fine or I conversely if you work in this field I'd like to hear from you just don't give me any proprietary information because I don't want to go to jail and I don't want you to go to jail so anyway if you see me around in the spirit of leverage and leverage the leverage of beverage and talk little shop so thank you very much for attending so we do have some time for questions hi thanks for presenting so much I see in this field devolves into some form of a matrix and linear algebra do you have any recommendations or suggestions for how to work with matrices or linear algebraic libraries that you've used thank you yeah well the two that I mentioned eigen and armadillo are are widely used I personally prefer eigen but a lot of people like armadillo I can one thing that's nice about it it's it's all templated so all you have to do is include the header files or as with armadillo I have run into some problems with the linking with some of the libraries in some complex cases where we were interfacing with another language so and also the documentation for eigen is is really really good so I hope the question can you recommend any good sources on what might be called software engineering best practices specifically targeted toward quants take my classes available online - I'm sorry shameless plug well that kind of gets back to my gripe at the beginning I'm unable to really find any decent resources on it that's why I asked they had the same problem okay yeah but if you I'm happy to share with you lecture notes or anything so you know just just drop me a note say you're the guy who asked and you know be happy - yeah you sure you want to do that hi thanks for you talk this is a ring from Bloomberg can actually go back to the slide where you showed the still a sink the like this one the the slide where you have the code were you creating the threads okay so right here I where you do create the e sink don't you have to specify like a launch policy here because that's why I said that's why I said there are some subtleties and nuances that I just I see yeah because I guess I just talked to Anthony Williams today and I think from C++ 14 this this syntax is invalid I mean undefined behavior according to the standard I've run it on a 14 and 17 compiler no problem I see yeah it's just blah but it but if you have you know if there's something I can be aware of you got my email address cool thank you I'd like to hear about it thank you I have a follow-up comment on this slide especially so first question I'm not commented questions right first question you created like ten thousand or a hundred thousand threads mm-hmm in the for loop don't you have like a thread congestion on your virtual machine it was just 24 course didn't have any problem and that's what I say is the beauty of tests based concurrency because you don't have to worry about it I've just never had a problem with it um okay oh yeah because it doesn't say policy then probably comment when you do the accumulate well there is an parallel accumulate right oh I know I know I just okay no question out of curiosity if you're just in this example you take the latest point why do you return the whole vector and then just drop all except a flat point because you need to generate the whole vector in order to get that that last price yeah but it can be just you know runtime variable because your your endless one time point depends on only on anther point so just a question I suppose I just found this way to be more straightforward I don't know if there what's it what do you mean I'm gonna go on this point here all I'm doing is I'm getting the the last price I'm not copying the vendor there no but but still there's there's no copy here yeah but you still vacuum I mean allocate the whole memory for the whole array but in principle you just trade through the two point and I suppose you could but it just mathematically it made a lot of sense to just put the whole thing in but also just to show how to use task based concurrency I'm sure there's some you know I some improvements that could be since we're talking about mathematics I have one man York comment there's a previous slide with the formula with a traitor formula can you please slide on it yeah not previous but yeah used to be some slide this one yeah so I shouldn't converge to something when you'd make delta T goes closer to zero I mean you're basically to find that it's in yes it does it it converges into the into a differential equation stochastic differential equations what happens I see my Bell asked Lisa fine but yes we're thirty consultant it says that this is what falls out of the theory when you when you discretize the you discretize the formula that you get from stochastic calculus okay well thank you sure III hope this isn't out of place but I'd like to actually comment on the previous question ORS question I happen to work in a quant group where we are producing data that's used for accounting public accounting and we get audited on a regular basis and having things like the entire sequences that you've generated in a Monte Carlo model end up being quite useful in the scenarios like that because sometimes we were fighting with you know one of the big a decor big however many big and the counter firms and we have to get to the point of giving them a spreadsheet with okay this is what our Monte Carlo method generated and so the question of how much data you keep somewhat depends on questions of whether you're having to do that type of validation to third parties so I know that's a rare that's a rare thing in a quant group no it's not oh no because there are lots of regulations that are involved and that's why it's important that's where the seed value comes in and in fact when I worked on the in variable annuities especially because you've got there you've got FINRA SEC you've got state insurance you've got standards of practice by the Society of Actuaries and you've got very some of the at the accounting groups a gap and faz and so you have to be able to reproduce your results and so that's a it's actually a very valid comment okay so anything else [Applause]