Stat 3880: Statistical Learning
Final Project
For your project you will apply several different methods from the course to your dataset to answer the research questions that you have chosen to investigate. The idea behind the project is to have you compare the results of several different methods for addressing the same research questions (e.g., you may need to dichotomize a dependent variable for one approach and treat it as numeric in another). Plan to use logistic regression (and/or LDA), decision trees, and various linear regression methods. You will be working on the project in groups. The work should be a collaboration where decisions are made jointly at all levels. In the end, you and your partner will jointly create a 10-15 minute presentation and a final report.
Each group will need to identify a dataset from the sources listed below.
- Progress Report #1 (Due 4/04)
For this report (1 page maximum), you will submit a brief description of your proposed research questions, the dataset that you plan to use (with the ICPSR URL), and the variables of primary interest. Your report should be just a couple of paragraphs.
-
Submit this report as a PDF file named 3880proj1_NAMES.pdf at https://tinyurl.com/stat3880-work (one submission per group). "NAMES" should be replaced by your first names (listed alphabetically) separated by a dash (no spaces).
- Progress Report #2 (Due 4/11)
For this report (1 page maximum) you will clarify the information in your first report, with more clearly defined questions and sets of variables identified. Indicate any issues/challenges you have dealt with (or are dealing with). Submit this report as a PDF file named 3880proj2_NAMES.pdf
- Progress Report #3 (Due 4/18)
For this report (2 pages maximum), you will submit an outline of your final report and an outline of the slides you plan to present (e.g., slide 1: research question and data subset; ... ;slide 6: results from comparison of LDA and logistic regression results for variable X; slide 7: cross-validation error for predicting variable Y using variable X1-X3).
You can have between 10 and 15 slides total. Also, please note that your analyses do not need to be completed at this point, you have another week to finalize them.
-
Submit this report as a PDF file named 3880proj3_NAMES.pdf (one submission per group).
- Final Report (Due 4/25)
Your final report and presentation should be at most three typed pages with at most three pages of figures and tables (labeled as Figure 1, 2, ... and Table 1, 2, ...) that you refer to in the report. Your R code should be included as an appendix. Submit this report as a PDF file named 3880proj4_NAMES.pdf
The report should have the following sections:
-
a brief description of the dataset that you are using, the dependent variable(s) and how they are treated (continuous, dichotomous, ...), and the independent variables;
-
the research questions that you are addressing;
-
the methods that you are using to address your research questions;
-
a description of your results.
- Project Presentations (TBD)
- Project Critiques
-
Everyone will submit a critique of each of the projects (leaving your own blank). The comments should be based on the reports, not on the presentation or slides. The critiques should be written in complete sentences.