a dump of my plan + notes for studying for my finals for a class that i should be doing well but is not because i'm just not good at math and stats apparently. might be the most i've studied for a class ever in my life

go through

- notes
- hw
- quizzes
- pq1, q1
- pq2, q2
- pfinals

prediction

- distribution of SSE (sigma_hat)
- e(sse)
- show y_hat independent to residual
- distribution of beta_hat
- log reg why use logit? issues with linear model
- explain what hii is?
- why 0 < hii < 1
- what is stud(ei)?
- press

Outliers

- outlier in x: leverage (hii) > 3p/n
- outlier in y: discrepancy (studentized e) > t (n-1-p), 1-a/2 (outlier in
- both: influence (cooks distance) >4/n

multicollinearity

problems: inflated SE checks:

- swing/change sign coefficients in f-test
- correlation matrix
- VIF

solution

- drop
- feature engineer
- regularized regression
- dimensionality reduction
- partial least square

heteroskedasticity

- unbiased

detection

- residual plot

problem: no longer BLEU -> wrong SE(beta) and CI/PI widths solution

- log / square
- boxcox
- robust SE
- WLS

if ei is non linear, use nonparametric regression (knn, moving average)

non-normal

- still BLEU
- no inference,

detection

- histogram
- qq plot
- test for normality: shapiro

for normal: skewness: 0 (third moment) kurtosis: 3 (fourth moment)

- omnibus k2 test (want high p-value to reject)
- JB test

problems: unreliable t.test, wrong CI/PI

false assumption of linearity

- transform y -> may introduce hetero if homo
- transform x -> nice when only prob is non-linearity
- transform both

Model selection under: biased coefs + predictions (under/overstimate), overestimate sigma2

extra vars: unbiased, MSE has fewer degrees, wider CI and lower power

over(multicol): inflated SE for coefs, rank deficient

adjusted R 1 - MSE / SST = 1 - SSE / n-p / SST / n-1 takes into account the "cost" of losing DF

Mallows Cp

- identify subset where Cp is near k+1 where k is no of preds
- this means bias is small
- if all not near, missing predictor
- if a number of them, choose model with smallest

AIC, BIC

- estimates infomation lost in a model
- trade-off goodness in fit vs simplicity, penalized by no. of model params (p)
- larger penalty term in BIC than AIC : ln(n)p vs 2p

PRESS

- modified SEE, uses predicted value for ith obs from model fit on data excluding that point