Transcript for:
Lecture Notes: Copulas Beyond Two Dimensions

In the last lecture, we discussed a simple method of fitting data to some well-known parametric families of copulas. Until now, you may have noticed that we've been working exclusively in two dimensions. We did this for a couple of reasons. Mainly, that it's easier to introduce and understand the topic and fundamentals, but also because as we scale up the number of dimensions, the computational problem, as well as the tools that are available to us, are a little less pervasive. But today, with that caveat in mind, I want to dive deeper into going beyond two dimensions when working with copulas. So at a high level, there are many ways to go beyond two dimensions with copulas. For starters, Sklar's theorem, which we introduced a couple of lectures ago in the theoretical foundations lecture, is well defined for more than two dimensions. So the multi-dimensional form of sclars theorem is shown here both variants here where in the first one we define a joint cdf of d dimensions as a function of the copula which defines the dependent structure and the marginals and we can write the inverse form of the scar's theorem as well so from a theoretical perspective sclars theorem holds for multidimensional cases beyond two dimensions. Now, I should note there are some technicalities related to the Frechet-Hofting bounds, which we covered last time as well in the theoretical foundations video. But this is more applicable really, I think, to theoretical copula researchers. So we can sort of ignore that for today. But of course, there's a lot of literature out there on this specific topic of the fresh a-hofting bounds for multidimensional copulas, and you can research that if you're interested. But for the purposes of this lecture, what we need to know is that while the theoretical framework of copulas holds for multiple dimensions, practically speaking, they become harder to use. And the reasons for this include there are fewer parametric forms of copulas in multi-dimensions. So you can fit empirical copulas. So just to rewind a little bit, the parametric forms that we talked about, remember last time we talked about a few important bivariate copula families, including the Gaussian, the student T, and the Archimedean form families. But as we scale up the number of dimensions, the number of parametric copulas that have been discovered or presented in the literature starts to get smaller so you can get around this by building an empirical copula but the problem with empirical copulas is that inference on an empirical copula slow basically it's kind of a curve fitting operation and the other problem with multi-dimensions and this applies to every multi-dimensional problem every statistical multi-dimensional problem is that As the number of dimensions grows, the amount of data that you need to fit your model for those multiple dimensions grows exponentially. So this is just a general statistical problem. Even with parametric forms, more data will be needed as you scale up the number of dimensions to properly fit it. And the inverse of that really is that as the number of dimensions increases, the degrees of freedom that you have with parametric... forms will decrease. So typically a parametric form imposes some sort of constraint on the model that you're trying to fit. And so as you get, as the dimensionality of your data gets bigger, those constraints that are imposed on that parametric form are not satisfied and they have a higher probability of not being satisfied. So direct multidimensional modeling, while theoretically sound, has some of these issues. So considering these, there have been, to my knowledge, two research thrusts to bridge this gap. One is called copulobasian networks, and the other is called vine copulas. So copulobasian networks were introduced by Gal Elidan. And they enabled Bayesian networks to have complex dependent structures and flexible marginal distributions with the copula framework. And so they were able to model data with better fidelity than traditional Bayesian networks, which just use Gaussian models to fit the data. Now, I don't want to go into Bayesian networks in this lecture because the second approach, the vine copulas, They were invented by Bedford and Cook in 2002, and from my survey of the literature, they seem to have been adopted by the statistical community more widely than copula-based networks. And so for that reason, in this video, we'll focus on getting a basic handle on fine copulas. So let's revisit some of the math that will be helpful in understanding why fine copulas work. Let's start with joint density functions. From basic probability theory, we know that the joint density function of x1 and x2 can be written in two mathematically equivalent ways. One is f of x1 x2 is equal to the density of x1 given x2 multiplied by the density of x2, but that is mathematically equivalent to the density of x2 conditioned on x1 multiplied by x1. And this follows, if you're interested, this actually follows from Bayes'rule. Now, so in two dimensions, there are two equivalent ways of representing the joint density in terms of conditional densities. Now let's look at three dimensions. So you can work this out if you want, but what the point is, is that there are six different mathematically equivalent representations of the joint density of x1 x2 x3 as a as a function of conditional probabilities so what we can take away from this is that the number of conditional parameterizations that a joint density of d dimensions it grows exponentially for two we have two different ways for three we have six for four we have i think it's 24 i'm not exactly sure but it grows very quickly so The basic idea of vine copulas is to model high dimensional probability distributions as using this conditional density construct here. So in order to make this final link between conditional densities and copulas, let's look at the relationship between conditional densities and copulas. And that's given right here. So what this is saying is this is basically just a derivation. So what we're saying here is that the conditional density of 1 given 2 is equal to the full density of x1, x2 divided by x2. Recall, this is just from Bayes'rule. But this full density here can be written. Recall that the density of this is actually the copula, which is the dependent structure. linking x1 and x2 multiplied by the marginals. So it would be, it would actually be c1 of 2 multiplied by f1 and f2, but because f2 is in the denominator as well, the f2 cancels out and we're left with f1. And then we can extend this again for three dimensions. So this, these formulas here are showing the, the link basically between conditional dense, how to represent conditional densities as copulas. So what we do next, now that we understand, okay, how to represent a joint density in terms of conditional densities, and we understand how to represent conditional densities as copulas, let's put these two ideas together and create a vine copula. And that's what we're showing right here. We're starting with the joint density of three variables. We rewrite it as the conditional densities, which was shown up here. And then we basically just substitute these relations here, which show the conditional densities with their copulas. And you can check my math on this. But what happens is, is when you write this all out in the right order and reduce things, what you get is that this joint density can be represented as the product of its marginals multiplied by the copula between both. of between the pairs of variables in these three variables and this copula which is conditioned upon the third variable here so here we have we have three copulas we have c one three which models the dependence between variable one and three c two three which models the variables between two and three and then we have c of one two given three so this copula here models the dependence between one and two conditioned on variable three. And so mathematically from all of this, what we know is that This is an equivalent representation to this. So this is basically showing a factorization. And what we can do with this factorization is we can basically turn it into a structure, into a graph structure. And that's what this graph is showing here, is we see in the first level, or the first, we'll get to this in a little bit, but this first level of the tree, we have nodes which represent the marginal distributions. And I've written, and we've shown them here in terms of 1, 3, and 2. The order is reversed, but that's just to make the diagram neater. So we have the marginal densities here. And we have this, which is this link is saying, is representing the dependence between these two marginal variables. So this is copula 1, 3, which is shown right here. This. is representing, this link represents the dependence between variables two and three, and this is shown, this represents this copula right here. And then the second level of the tree shows, is showing this dependence here, where, you know, you have the margin, you have the copula of one and two conditioned on three, with conditional uh marginal distributions which is what is shown in here and so this is the whole idea of a vine copula um now just just to zoom back out a little bit and uh recap all of this what essentially we are doing is is we're taking a joint density we're representing it as a uh as a function of its conditional densities we replace those conditional densities with copulas And then we convert that into a representation, a vine representation. And that's what a vine copula is. It's just a way to factorize a multidimensional probability distribution into bivariate conditional copulas. And that's the important part, is that now we've reduced a large dimensional problem into a bivariate problem. And we have a lot of bivariate. parametric families of copulas that we can utilize um and so uh moving forward on vines now that we have a general idea of what they are is there are three types of vine copulas the most general representation is called an r vine where r stands for regular and this has several properties um from our but i won't go into those because our vines are the most general form of vine copulas, but from that we have what are called C-vines and D-vines. And the C in C-vine stands for canonical, and a general form of this structure, you can think of a C-vine as kind of like a hub-and-spoke architecture. And this is what we're going to focus on today because they're actually the simplest kind of vine copulas. So if we had three variables, how we might represent... the vine copula is right here. This is a diagram which shows a vine copula where the first variable 1 is the center node and we model the dependence between 1 and 2 and 1 and 3 in tree 1 and we model the dependence between 2 and 3 conditioned on 1 in tree 2. In fact, this diagram up here is a C-vine copula as well, but the center node here is 3 instead of 1. But it may be easier to see that vine structure with this diagram rather than this one, but they're essentially equivalent, except for which is the center node. Just to talk through a 4-variable example, here we have 4 variables. one of the things with the with the c-bind is it's going to have n minus one trees where n is the number of variables that we have so in this four four vine example we have uh the center node is one and it models the difference between one and two one and three one and four and the conditional dependencies are shown in these other trees here uh just to give another note which is that i've arbitrarily decided that node one is the center node um and Really, that is kind of like a structural constraint that the C-Bind imposes. And it's useful for certain kinds of models or certain kinds of data sets where we might have... a dominant variable that is influencing or a dominant predictor in the other variables. And so that's when a C-vine copula might be useful in this sort of modeling. Now, the nodes, which are the center nodes, can be swapped out. A different structure can be used if we don't believe that there is sort of a dominance relationship. But this is a simple thing to... understand for our first example of vine copulis. So I want to go with this. So that is the theoretical introduction to vine copulis. I want to just say, so the nice thing about vine copulis is that they allow us to model large dimensional datasets through a combination of bivariate copulis using the conditional probability density construct. So I want to end this lecture with an example. of modeling using a C-Vine copula. And we'll follow standard time series modeling techniques for stock prices. But let's use crypto asset prices just for fun. And I just want to really quickly put a disclaimer out there that this is in no way investment advice. And the techniques here are not... being promoted as methods to do trading or being used for trading. But with that disclaimer out of the way, I just wanted to do this for fun. And yeah, so what we want to do is model a C-Vine copula to model the interactions between the prices of Bitcoin, Ethereum, and Filecoin. And the underlying assumption is that the Bitcoin price is a primary property. driver for both Ethereum and Filecoin. I'm not saying that this is actually a true statement, but what I'm saying is that in this specific Vine model, C-Vine structure that I've put together, this C-Vine structure captures that assumption. And so if we think this assumption is true, let's fit the price returns data to this model. and see how it does. That's kind of the premise that we're going with here. So to accomplish this, we're going to download data using Yahoo Finance. So we'll download two years worth of data, which contains the price of a Bitcoin, an Ethereum and a Filecoin. And then we're going to do some, remember earlier that I said that we will do standard time series modeling for stocks. So that's what we're going to do here. We're going to compute the closing we're going to take the closing prices we're going to compute log returns of that data and then so let's plot them this is the this is the closing price of these three cryptocurrencies and this is the log returns of those same currencies so given this what we first do is we're going to fit a remember that this is a time series so we're going to fit a time series model to this data And typically what people do is they use like a auto regressive moving average model with maybe some volatility model for the standard deviation. Here, what I've actually done is I'm using a deep learning model that Facebook produced. It's called Profit. It's a time series forecasting model. You can use whatever model you want. And so I've just chosen this one and I'm going to format the data to, you know. go into this model in the right way and compute the residuals which is basically the errors between the actual log returns and the log returns that the model is going to predict and i store that for all of the tickers and then let's plot those this is what the the model prediction residuals are so the residuals again are the errors is the error between the actual log return and the model predicted log return for each of these currencies. And so now if we take a look at how these are related to each other, we can do a scatterplot of the log returns, of the residuals of the log returns between Bitcoin and Ethereum and Bitcoin and Filecoin. And the reason that we keep Bitcoin the same again is really stems back to our model assumption here, where we're saying that Conditioned on Bitcoin, we're going to assume that Bitcoin is kind of the primary driver of the price of Ethereum and Filecoin. uh yeah so we can plot these residuals here and we see basically they're they have some different structures and so what we want to do is build a probability model that will help us predict the residuals or the errors of the model and so what we're going to do here is really we want to build a copula model a c-vine copula from this data basically and so this is this is kind of getting at the processing for that this computes the pseudo observations and then what we do is we fit the data to the pseudo observations this argument cvine structure 3 2 1 you can read the documentation for this package but basically this this array defines our c vine structure to be of this format right here of this structure So we specify that this is the structure we want, and then we fit it to this data set, which is basically our data set of residuals transformed into pseudo observations, which we talked about last time. Once we fit the data, we see here which copulas were fit to each of these interactions. So between Bitcoin and Ethereum, this BB1 copula is fit. Between Bitcoin and Filecoin. It is actually an empirical copula that was fit, and that's what's denoted by this TLL, and another empirical copula was fit. So just to do a quick eye test at the end of all of this, I'm not doing any deep financial analysis, but what I want to do is take the copula that was fit and generate some samples from this copula. We generate 500 samples from this copula, and we convert them back from pseudo-observations into the original residuals data. then we plot them over the original residuals and we see what we see here is that the the model at least at the eye test is showing that um the residuals that are predicted by the model by the copula model match the errors that would have been produced by the time series model and so this is kind of a a way that people use uh copulas in order to do, you know, sort of financial predictions. And you can certainly read more about this. There's a lot of the literature out there. So this is just a quick example. I'll upload this Jupyter Notebook to the GitHub and put it in the link. This example concludes our Copula short course. Please feel free to drop some comments below on any additional items that you'd like to see covered. Thanks so much for watching.