Project description

1 Timeline

Expectted results

The three primary deliverables for the final project are

2 Project draft

The purpose of writing the draft at the early stage is to help you think about your analysis strategy. Your presentation could also be prepared based on this structure.

In your draft, you should have:

2.1 Section 1 - Introduction

The introduction section includes

  • an introduction to the subject matter you’re investigating
  • the motivation for your research question (citing any relevant literature)
  • the general research question you wish to explore
  • your hypotheses regarding the research question of interest.

2.2 Section 2 - Data description

In this section, you will describe the data set you wish to explore. This includes

  • description of the observations in the data set,
  • description of how the data was originally collected (not how you found the data but how the original curator of the data collected it).

2.3 Section 3 - Analysis approach

In this section, you will provide a brief overview of your analysis approach. This includes:

  • Description of the response variable.
  • Visualization and summary statistics for the response variable.
  • List of variables that will be considered as predictors

2.4 Data dictionary (aka code book)

Submit a data dictionary for all the variables in your data set in the README of your project repo, in the data folder. Link to this file from your proposal.

2.5 Submission

Push all of your final changes to the GitHub repo, and submit the PDF of your proposal to Moodle.

3 Reproducibility of your repo

The GitHub repo should have the following structure:

  • README: Short project description and data dictionary

  • written-report.qmd & written-report.pdf: Final written report

  • /data: Folder that contains the data set for the final project.

  • /previous-work: Folder that contains the topic-ideas and project-proposal files.

  • /presentation: Folder with the presentation slides.

    • If your presentation slides are online, you can put a link to the slides in a README.md file in the presentation folder.

Evaluation will be based on the reproducibility of the written report and the organization of the project GitHub repo. The repo should be neatly organized as described above, there should be no extraneous files, all text in the README should be easily readable.

example

3.1 ReadMe

For each repo, include the following:

3.1.1 Introduction and data

  • Describe the observations and the general characteristics being measured in the data

3.1.2 Research question

  • Describe a research question you’re interested in answering using this data.

3.1.3 Glimpse of data

  • Use the glimpse function to provide an overview of each data set Example dataset code repo

4 Final prsentation

4.1 sturcture

The presentation neatly prepared and organized with clear section headers and appropriately sized figures with informative labels. Numerical results are displayed with a reasonable number of digits, and all visualizations are neatly formatted. All citations and links are properly formatted. All code, warnings, and messages are suppressed. Presentation is in time.

You need to create presentation slides and showcase your project. Introduce your research question and data set, showcase visualizations, and discuss the primary conclusions. These slides should serve as a brief visual addition to your written report and will be graded for content and quality.

The slide deck should have no more than 6 content slides + 1 title slide. Here is a suggested outline as you think through the slides; you do not have to use this exact format for the 6 slides.

  • Title Slide
  • Slide 1: Introduce the topic and motivation
  • Slide 2: Previous research that relevant to your hypothess
  • Slide 3: Hypotheses Research questions
  • Slide 4: Introduce the data
  • Slide 5: Calculation
  • Slide n: Conclustions= 3 take home messages

4.2 reminder

  1. page number
  2. unit of each axis in each figure
  3. text size > 20
  4. 15 min not more than 15 slides
  5. 3 take home messages in the last slide as conclusion

5 Project report

The purpose of the draft and peer review is to give you an opportunity to get early feedback on your analysis. Therefore, the draft and peer review will focus primarily on the exploratory data analysis, modeling, and initial interpretations.

Write the draft in the written-report.qmd file in your project repo. You do not need to submit the draft

Below is a brief description of the sections to focus on in the draft:

5.1 Introduction and data

This section includes an introduction to the project motivation, data, and research question. Describe the data and definitions of key variables. It should also include some exploratory data analysis. All of the EDA won’t fit in the paper, so focus on the EDA for the response variable and a few other interesting variables and relationships.

5.2 Methodology

This section includes a brief description of your modeling process. Explain the reasoning for the type of model you’re fitting, predictor variables considered for the model including any interactions. Additionally, show how you arrived at the final model by describing the model selection process, any variable transformations (if needed), and any other relevant considerations that were part of the model fitting process.

5.3 Results

In this section, you will output the final model and include a brief discussion of the model assumptions, diagnostics, and any relevant model fit statistics.

This section also includes initial interpretations and conclusions drawn from the model.

6 Grading criteria

The research question and motivation are clearly stated in the introduction, including citations for the data source and any external research. The data are clearly described, including a description about how the data were originally collected and a concise definition of the variables relevant to understanding the report. The data cleaning process is clearly described, including any decisions made in the process (e.g., creating new variables, removing observations, etc.) The explanatory data analysis helps the reader better understand the observations in the data along with interesting and relevant relationships between the variables. It incorporates appropriate visualizations and summary statistics.

6.1 Methodology

This section includes a brief description of your modeling process.

6.1.1 Grading criteria

The analysis steps are appropriate for the data and research question.

6.2 Results

Describe the key results from the graph. Focus on the variables that help you answer the research question and that provide relevant context for the reader.

6.2.1 Grading criteria

The visualization is clearly assessed, and interesting findings are clearly described. Interpretations of statistical test are used to support the key findings and conclusions, rather than merely visual comparison.

6.3 Discussion + Conclusion

In this section you’ll include a summary of what you have learned about your research question along with statistical arguments supporting your conclusions. In addition, discuss the limitations of your analysis and provide suggestions on ways the analysis could be improved. Any potential issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here. Lastly, this section will include ideas for future work.

6.3.1 Grading criteria

Overall conclusions from analysis are clearly described, and the statistical results are put into the larger context of the subject matter and original research question. There is thoughtful consideration of potential limitations of the data and/or analysis, and ideas for future work are clearly described.

6.4 Organization + formatting

This is an assessment of the overall presentation and formatting of the written report.