(1) Chapter 2: 8+, 9* (Due 01/24) Filename: 3880hw-2-1.pdf
+8 Only submit an answer to 2.8.c.vi.
*9 Only submit an answer to 2.9.e (comments only, no plots) & 2.9.f.
Submit your code (or Rmarkdown code) for each entire problem as an appendix (at the end, after all answers).
Keep you answers to one page plus whatever you need for the code appendix.
(2) Chapter 3: 14+ (Due 02/07) Filename: 3880hw-3-1.pdf
+[g] Do not re-do parts (c)-(e), Just use the 3 different models from those parts and answer whether the new obs is an outlier and/or a high-leverage point.
NOTE: Remove this new obs for parts (h)-(k) below (or reload the data starting again with set.seed(1) ...)
(h) For the model predicting y with x1 & x2, compute the variance inflation factors (VIFs) and state an interpretation of the values.
(i) Compute the VIFs "by hand" in R based on 1/[1-(R2_{xj|x(-j)})] (see p. 102). Show your code for this.
(j) For the model predicting y with x1 & x2, how many obs may be outliers in the "x-direction"? Which observations do these correspond to? Identify these observations in the column space of the design matrix (the x1 vs. x2 plane) and describe their location.
(k) For the model predicting y with x1 alone, add a new data point that has high leverage but low Cook's distance.
(l) For the model predicting y with x1 alone, add a new data point that has moderate leverage but high Cook's distance (larger than any other observed value).
SUBMIT for 3.14: a (question answer only), f(no plots), g, h, i, j, (Do not submit anything for parts b-e)
SUBMIT for 3.14: k & l (give the coordinates of the point, leverage, Cook's dist, and show a plot of Y vs X1 with the point highlighted)
SUBMIT an appendix of code for all parts (at the end)
Please keep you answers to 2 pages plus whatever you need for the code appendix.
(HW#3) Logistic regression and LDA (Due 02/21) Filename: 3880hw-4-1.pdf
(HW#4) Naive Bayes 1 (Due 03/21) Filename: 3880hw-4-2.pdf
(HW#5) Chapter 5: 5+, 6* (Due 03/28) Filename: 3880hw-5-1.pdf
+* See the modifications here
NOTE: boot.ci() can throw an error related to the BCa version. If this causes problems
when knitting an .Rmd file, you can use the argument type=c("norm","basic","perc")
, or any
subset, for the Normal, Basic, and Percentile versions avoiding the BCa version.
Chapter 8: X, 8+ (Due 04/14 - Monday) Filename: 3880hw-8-1.pdf
(X) Give a brief explanation (in your own words) of the cost complexity pruning referred to in step 2 of algorithm 8.1 in the text.
Recall that we have visualized results for sequences of trees as a function of T (R calls this "size") rather than of alpha (R calls this "k").
NOTE: Use a 50% split for your train/test set and set.seed(1) before calling sample().
NOTE: use set.seed(1) before each gbm() call
NOTE: if gbm fails to load, try resizing the console window (larger)
(f) Use boosting on the training set with a depth of 3 splits and the default shrinkage to find the test MSE.
(g) Repeat part (f) with a shrinkage of .01 and .02, reporting the test MSE in each case.
(h) Repeat part (g) using "stumps" and compare the test MSE to parts (f) & (g)
Submit HWs at the tinurl until there is a fix for Brightspace.