Transcript for:
Lecture on Financial Economics and Data Analysis

okay let us start so i will be talking on one particular aspect in economics or financial economic city particularly that is very important one that is why i have selected that so before that uh let us have some overview on economics we have a distinction of economics based on data and economic fields based on data we have three types that is cross-sectional parameters time series econometrics and final economics so um uh i believe that all of you are familiar with these datasets perception time series and binary the last two are very much important in the context of financial information especially tennessee's datasets occasionally or in some cases we also use final data in financial economics mostly type cities so we have a separate we have some separate sessions on time serious econometrics otherwise today we will be dealing with the general technologies now based on economic flips also we have two types micro econometrics and macroeconomics microeconomics usually is characterized by the analysis of cross-section and final data and usually it focuses on individual consumers firms and micro level decision makings other microalgae decision makings like yes now in this particular place we have a number of theoretical tools taken from micro economics including utility maximization profit maximization market equilibrium principle and these we are also applying financial economics and uh hence we have the significance of financial matters also in microwave numbers then coming to macro econometrics here the analysis based on time series and panel data and as the name of macro suggest the variables that we consider here are aggregates such as price levels money supply exchange rates national output or national income investment economic growth and so on a large number of macro variables and usually time series is associated with the macroeconomics financial economics also is usually within macroeconomics but we also use in financial economics final data now the actually the boundaries between microeconomics and macroeconomics in financial economies are not very short in we know that financial economics deals with the low serious data and sometimes as i told you with the panel data also but usually there is a sharp focus on models of individual behavior so that means in polymetrics to that respect that is as long as we give a focus on the individual behavior then we have microeconometrics in financial economics at the same time we know the analysis of market returns and exchange rate behavior cannot be exclusively macro or micro so as far as financial economics is concerned we have we can have both micro and micro kilometers that is we study long-term serious data or firms that is the micro aspects for individual agents financial agents and at the same time we also deal with the road financial aspects in assist in microeconomics now these are some of the books directly dealing with the financial economics the chris brooks which is very famous for an introductory economics for finance now it is in its formation and i think this some some earlier work even the fourth edition will be of chris brooks is available in in in the google so you can try that then there is another book by campbell law and my finlay this also is a very famous textbook called the economics of financial markets then by beijing a book by financial economics then it is not as famous as the next one by zay who is a says analysis of financial intensities another one is financial econometrics from basics to advanced modeling techniques is written by a number of uh economic things it's a very very very good very good book very simple one compared with of course symbol the raspberries book also is very very simple then we have an edited book by high frequency financial economics recent developments so of course a large number of other groups are also available but i'm just showing you now this financial economics is the is considered the science of modeling and forecasting financial data such as asset prices asset militants interest rates financial ratios defaults and recovery rates for debt obligations and risk exposure to segregate and this is also called the economics of financial markets moments of financial markets there are three fundamental enabling factors that makes the development of financial economics possible the three are the availability of data at any distance frequency including at the transaction level we know that today we are in the position of very very high frequency data including not only just the hourly data earlier of course we had daily data but now not only the hourly data we are getting the the every for every moment so that the availability of that high frequency letter was the first enabling factor for financial communities then availability of powerful desktop deputies at an affordable cost and the availability of of the self-economic software a large number of hypnotic software are available these days including stata reviews then of course we have the the free software such as our python then greatly also so i will be introducing greatly in the afternoon session today and we will continue with the with unlike so that's a beauty of greatly so actually this the combination of these three factories which advanced economics within the reach of most of the financial freedoms such as banks and asset management phase that's why today we have the particular field financial economics so as i have already told you financial economics is applying to either the serious data such as the returns of a stroke or professional data such as the market capitalization of whole stores in a given universe at a particular time with the progressive description of high frequency financial data and ultra high frequency financial data financial economies can now be applied to larger databases making statistical analysis more average as well as providing the opportunity to investigate wider interest questions regarding financial markets and investment strategies now there are three key steps in applying financial economics at work these three steps are there for general economic science so the three are first year we have model selection then we have model distribution then we have model testing so here i will be concentrating on the last one that is the model testing and assume that you are already familiar with the sum of this model selection and with the given statistical properties and of course we know these models entail mathematical analysis justify our particular choice what we do here is that we use for example an economic tool such as electrician analysis to forecast sales proper routines based on some fundamental corporate financial data and microeconomic videos then we we have the second step model estimation in general the models are embodied in mathematical expressions that includes a number of parameters that how to be estimated from some data i believe that you are all familiar with the statistics especially the distinction between a population and a sample and the difference between parameters and statistics and then estimation estimation procedure and so on if you have any doubt you can ask me then and then okay if you are not familiar with the parameters and statistics then you can ask me i shall explain now in this particular case suppose that we have decided to model retains on a major stock market index such as the s p 500 with a regression model and in this particular case we have to estimate the corresponding regression coefficients based on historical data for example the market capitalization of force is easily observed of course there are computations involved in finding the market capitalization that is we get market capitalization by multiplying the value of a stock by the number of outstanding stores but the process of commuting that market capitalization is essentially a process of bank observation but that is not the case in some other cases that is when we come to the estimation we are we are not able to directly or solve the parameters that appear in the model for example consider a very simple model of trying to estimate a linear relationship between the weekly return on general electric stock and the retained for the sap 500 in terms of the simple linear regression analysis we can write this particular relationship as the following that is return on ge stock general electric stroke is equal to alpha plus beta both return on snp 500 plus error t this we have two parameters in this particular single linear regression that is alpha and beta and they are referred to as regression coefficients and in this particular case the dependent variable that is the return on g stroke and the independent variable retained bone assembly 500 ah directly observable but the two parameters alpha and beta and the territory they are not their leap of sword we have to estimate them using the historical data so there comes the importance of estimation now we come to the last step of model testing now usually once we have estimated a particular model statistically that is economically what we do is we go for the hypothesis distinct or the coefficients of model that is the modal significance for hypothesis testing but actually and then we have estimated a particular model econometrically we go for the model significance testing that is the hypothesis testing on our estimated coefficients based on of course our null hypothesis for the parameters of the model and their everything strokes and then we we report the results that's all actually that is not the correct procedure the modern testing also includes something else and that is more important than this the this so-called concept of motor significance testing that is we need to see whether the assumptions that we have used in the case of the estimation are met or not so actually in respect of one only one method of method in the series econometrics that is in arima modeling this particular case is still followed but unfortunately in most other cases whether it is in construction data or type cd so final data the investigators usually do not care for this particular aspect but this is very very important part because the estimation is possible under certain conditions and whether the estimates are valid or not depends on making these assumptions so we have to see whether these assumptions are meant for knowledge i am i am now going to explain this particular modal testing that is the what usually called the model adequacy tests model adequacy in economics for financial economics so before going for the hypothesis testing on the estimated regression equation we have to see whether our model is adequate or not if the model is not added to it then it is useless we cannot use that that means we need to go for the significance of the estimated model so firstly we have to see whether our model is adequate or not and this is the case that i am discussing today and this is very very important because these days unfortunately people just ignore this particular aspect so we have to we have to take the research that's the reported results with them with the pitch of search it's doubtful whether the estimate the the estimates are really valid systems unless the modality receptors are reported the the estimation procedure and the results are under cloud so we cannot take them without checking it out now i shall start with the multiple regression analysis suppose we have n observations for the dependent variable y and we have k minus 1 variables and k parameters that means y is a function of k minus 1 independent variables and a random derivative and when we write it as a linear regression we have this equation y n is given as beta 1 plus we then do x sine 2 plus we generally x here remember without k parameters we have one beta 2 beta 3 and so on beta k are the parameters and this we have to estimate using the sample information we have no variable with beta1 because beta1 is the so-called intercept constant will be linear regression the coefficients are beta 2 beta 3 and beta k associated with the independent variables we have k minus 1 independent variables and remember k pyramidals and if we arrive in detail this particular expression called y i where i is from 1 2 to n then we have this particular this equations for the first observation or the first unit maybe the first three first factor then we have y 1 then y 2 by 3 and so on remember in all these cases the parameters remain constant you have the same in the search and the same coefficient for x 2 same coefficient for x3 and so on and it is these parameters that we have to estimate using the sample data so for estimating these parameters whether the sample data about y and x in general we usually use the ordinary squares oil is the simple very simple method these days of course all methods are simple given the powerful communities but traditionally we have used the ordinary displays now for a particular case suppose we have a we have a symbol that is by variate that is we have only two variables one independent and a dependent variable and like this y n equals alpha cross beta x i plus ui now this is given as the population relationship between the two variables y i and x i and based on this particular model i told you earlier that we have the mathematical models and these mathematical models mathematical models capture the population relationship among the variables of our interests and using the sample data we are estimating that particular population relationship and suppose we have estimated a regression using the sample data like y hat equals alpha hat plus beta hat x i now remember whenever we put a hatch on a particular variable for example y i we are putting a hat for this y i this becomes the estimated value or the computed value of this one so while is an estimate or estimator of y then alpha hat is an estimate of for the viral medium so alpha head is the sampling estimate with a hat is the sample estimate of this media and then we have the sample data x i so this gives us the estimated regression equation sampled regression equation and given these values now we can find out the an estimate of this error u i using this y i and y i hatch that is if we subtract by a hat if we subtract y hat from y i we get an estimate of this ui that is given here u i hat equals y minus y hat by a hat is our alpha hat plus beta hat this ui hatch is an estimate of the error game and it is obtained as a residual from while i this is the residual when a minus value is a residual therefore this estimated error chain is usually called residual so these are the residuals so remember we have these estimates now remember as i told you earlier these estimates are obtained from the directly observable stable values of sign and yi whereas this population this population values of variables or values are not directly observed that is we do not know anything about alpha we do not know anything about beta same is the case with this error but using the sample data we can estimate them now we come to a very famous theorem which is folder called markov theorem the theorem says that out of all the classes linear unbiased disabilities the oilless distributors have the smartest variance that is more they are more efficient the wireless estimators are more efficient for their the best the message means they have the minimum variance among all the linear and biased estimators that is in general the lisbon's estimators are said to be blue blue means b for best and linear you unbiased he estimates best linear unbiased estimators and the wellness estimators are best in the sense that they are they have the most efficient or the most efficient that is their the best estimator then they are the linear estimates they are unbiased estimators on us and all these the blue property properties are subject to certain assumptions these assumptions are called the gauss marcos assumptions so the oil is ordinary spoils are based on certain assumptions only if these assumptions are satisfied then the generally the displaced estimate is now all these assumptions we can write as an accounting line that is the assumption of non-stochastic independent variables then orthogonal x and that is the independent variable and therefore linearity then independence of areas we will we will come to all these uh assumptions uh in detail then we have the normality of error then equal variance which is our so-called so in general the assumption we have the assumptions of the oils in terms of assumptions of u and the assumptions for the independent variables so we can hold the assumptions we can divide into these two cases assumptions on the claim and the assumptions of the independent now these are the assumptions for the error u that is they have zero b on an average the error is zero then the disturbances are normally normality assumption disturbance is the error the error is normally distributed then the variance parameters in the covariance variance matrix of this error view are the same that that means this error view has a constant variance and that constant variance is called homoscotasticity we know homo means the same or constant and scholasticity is the greek name for variance so instead of variance sometimes this is characteristic so if we have a variance function then we have we also say that we have a scholasticity function so homoscenasticity constant variates so that simply means that there is no centromeres characteristic hetero means difference scheduling variance the variants are very the variances are not variances are not constant so with the with every independent variable for the error they reset different variance that is the assumption of petrol stress the next assumption is the no autocorrelation assumption that is the error terms are not serially or incorrelated so there is no autocorrelation as such we will study about this photocorrelation when we come to the dead series and this particular assumption also is called the independence of error that is the error types are independent of each other now see this u includes u 1 u 2 u 3 u 4 and so on so this u1 is independent of u2 u1 is independent of u3 u1 is independent of u4 and so on similarly u2 is independent of u3 u2 is independent of u4 and so that is this new variables are not correlated that is independence they are not correlated that means they have no serial or correlation if these three assumptions are satisfied then our error game is called a spherical error so we have by by a spherical error we mean that the error in satisfies this the that these three assumptions accept the second one it need not be disputed okay if it is normally distributed because the normal distribution also is called a gaussian distribution so if this normality assumption also is satisfied then we have a gaussian spherical otherwise with this euro homoscedasticity and no autocorrelation we have a spherical error now we have some some functions for the independent variable sets the entire independent variable such x is assumed to be non-stochastic that is they are not like the error we assumed that the they are like fixed violence from example of course we can relax this particular assumption and think of stochastic x also that is a separate subject in economics now we have a very important assumption in the framework of multiple regression that there is no perfect multiple linearity that is there is no perfect linear dependence between two or more explanatory variables this is very important because if there is any performance multiple linearity then we will not be able to estimate that particular equation so that is the case of multiple linearity remember all these assumptions we will consider again when we come to the model advocacy the next one is the orthogonal x and the error that is the explanatory variable and the error claim are independent of each other that is the explanatory variables and the error gain are not correlated their covariance is zero or their polarization is zero that is called orthogonal orthogonal simply means independence so independent x and security then finally we have another assumption that there should not be any specification error usually this specification error happens by bios excluding some relevant variables including some irrelevant variables or incorrectly specifying the functional form or committing some measurements we will come to all these assumptions later so these are the various assumptions and only if these assumptions satisfied the wireless estimate is become blue that is first linear and biased stimulus now how do we find whether these assumptions are satisfied or not so we are coming to what is called the modern adequacy test and in this modern advocacy test we carry a carry out on the residuals that we get from our regression you know the residuals we are getting from subtracting the estimated value from the original value so we analyze this residual to find whether our border is adequate or not so as i have already told you the the residuals are the estimate of the erratic ui that is uh obtained from obtained as a residual that is y i minus y hat so we have to check the assumptions of regression by examining this resemblance and this usually we do using the graphical analysis of recipients and in addition to the graphical analysis of residuals we have of course certain statistical tests for each of the assumptions so we will consider both the graphical analysis of residuals and these tests on the assumptions so we are coming to the modern idpc diagnosis and as i have already told you this is an important stage before hypothesis testing in forecast normally and in 2001 i had a serious working paper it is available in the next store it is available in my research gate page so i hope i believe you are all familiar with the research page you can go to the search page and find out my page and all my lecture series are available there for my publications and this working paper also is available there it is important that you read this mind this is electricity demand analysis and forecasting the tradition is questioned that is i was questioning the tradition of nodes reporting the model adequacy diagnosis so it is books that will reach this particular working paper now what is this modern adequacy now the estimated model we say as adequate if it explains the data set adequate that is if the estimated model contains or captures all the information in the dataset all the information in the dataset then that model that estimated model is said to be an adequate model so telling this is equivalent to telling that if the residual that is the residual we are getting from separating the fitted model from the actual one actual information so that that residual does not contain or does not conceive any explainable non-rangefinders left from the explained model then our model is standard which now what is this explainable known we know non-randomness or explainable non-contaminants refers to some patterns subpartings and patents represent information so if the residual actually has some information some explainable information then that particular model is not adequate so the recipient should be purely a branch of or what is called a vitamin in type series analysis or in general a spherical error in economics we have already seen getting you is said to be a spherical error similarly the residual that we get as the estimate of that error chain must be purely random or it must be a spherical one it must be a spherical one it should not contain any information at all so that the estimated model contains every information possible from the data in that case we can say that all the coins assumptions are satisfied for this model and the model becomes adequate that means it is the wireless assumptions that should be able to consider to see whether they are satisfied or not so once we have estimated an oiliness model remember oil is smooth ordinary squares model then we have to see that all the wireless assumptions are satisfied only then our model is adequate only if our model is adequate we can go for hypothesis testing and after that for the forecast if the model is not adequate then it is a useless model we cannot use that now this is the residual analysis for linearity so what we do is we get the residuals and the reciprocals be approached against the independent variable x and if they have some nonlinear pattern then that means they are not linear but here we find that the residuals are linear in this case or this case they are linear then whether they are independent or not the error independence there is no proper cognition so if the residuals have a pattern like this or a patent like this a cyclical party then they are not independent that means there is some autocorrelation in the residuals some auto correlation within the sequence so that the no autocorrelation assumption of the error is violated now in this particular case we don't find any photocorrelation there is no pattern therefore here the reciprocals are independent of each other there is no autocorrelation our assumption is satisfied then the normality assumption whether the arrays are normally distributed in this particular case the residuals are known normally and independently distributed with the zero mean and sigma standard deviation because the skewness and ketosis assumptions are not satisfied the skewness value is 5.17 one h it is very very very very high actually the for normally distributed error the spinous measure must be zero or around zero or statistically zero and the cartesis must be close to three or it must be three here the tortoise's motion also is very high so we have the p values given in the tests the p values are much less than five percent or zero close to zero therefore we reject the null hypothesis of those skewness antenna kurtosis that means the arrays are launched or the residuals are not normally distributed and the histogram of the residuals also show that they are not they are not normal now the what is that what are the implications of non-normally distributed surveys now if the residents are not following the normality assumption then the estimate is so alpha and so no normally distributed but remember the estimates that we get from boilers are still blue because this particular assumption is not essential for estimation the normality assumption is not essential for estimation we are not using this assumption for estimating the oil is distinguished so the estimators are not now the problem with that not normally resemblance is that we are not able to do our hypothesis tests so it is only the hypothesis which are affected by the violation of this assumption that is alpha and beta are normally distributed only if the residuals are normally and we can have parameters parametric tests based on sampling distributions that is the t tests for the z tests if they are not normally distributed then we are not able to do the parametric tests that is the problem so the non-normality problem affects our hypothesis tests parametric hypothesis tests now what are the courses for not normally distributed sharings this is generally posed by a specification error for example by omitting an important variable it can also research from outliers in data and having a wrong functional form so it is essential that once we have obtained our data we have to see whether there are any outliers in our data because if our clients are there in our data then that will that will also affect our estimates we know the mean is partly affected if a place are there so regression coefficients are also affected by outliers so we have to check out for the obligation data for example by using books diagram the books and this case diagram so those will give us some idea whether there are any outplays if there are no players then it is safe that we obey them then by using some wrong functional form we can generate non-normally distributed recipients that is using a linear format whereas the the original one is the linear one or the or the logarithmic one so using a