L11

Sep 20, 2024

Lecture Notes on Bivariate Data and Regression Analysis

Recap of Previous Lecture

  • Overview of using R for calculations and descriptive statistics
    • Entering vectors and basic operations
    • Calculating mean, median, standard deviation
  • Introduction to bivariate data and correlation

Bivariate Data

  • Definition: Involves two variables, e.g., x and y with values (1, 10), (2, 9), etc.
  • Visual representation: Plotting data on x-y plot
  • Interpretation: Identifying correlations (inverse in this example)

Correlation

  • Correlation coefficient (ρ or r)
    • Formula: ( \rho = \frac{S_{xy}}{S_x \times S_y} )
    • ( S_{xy} ): Covariance of x and y
  • Ideal values of ρ:
    • Highly positive correlation → ρ ≈ +1
    • Highly negative correlation → ρ ≈ -1
    • Uncorrelated → ρ ≈ 0

Covariance

  • Formula: ( S_{xy} = \frac{\sum{(x - \bar{x})(y - \bar{y})}}{n-1} )
  • Simplified Calculation: ( S_{xy} = \frac{\sum{xy} - n\bar{x}\bar{y}}{n-1} )

Example Calculation

  • Given values of x and y
    • Calculation of ( S_{xy} ) and correlation coefficient
    • Result: Negative correlation

Regression Analysis

  • Goal: Predict y given x using a line of best fit
  • Method: Least Squares Regression
    • Minimize the sum of squared errors between actual and predicted values

Linear Regression

  • Problem Statement:
    • Minimize error: ( \text{Error} = \sum{(y_i - (a + bx_i))^2} )
  • Use of Calculus:
    • Minimize function via partial derivatives
    • Equations derived: ( \frac{\partial E}{\partial a} = 0 ), ( \frac{\partial E}{\partial b} = 0 )

Example of Function Minimization

  • Use of partial derivatives to minimize sum of squared deviations

Conclusion

  • Discussion on correlation and regression analysis
  • Introduction to deriving regression equations using calculus
  • Preview of next lecture: Further exploration of regression and prediction methods