The goal of this culminating statistics project is to identify the factors that might have some influence on a chosen topic/variable. You will be collecting and analysing data for three different variables using a secondary source. There will be one dependent variable and two independent variables for which data will be needed.

For example, what are the factors that affect climate change? If climate change was your dependent variable, what could you pick for your independent variable? You will choose 2 factors that will create models that you can compare, to see if you can find a high correlation. Your final report can be as a word document and will include the following sections.

Dependent variable: Choose the main variable that you want to measure versus other variables. This will go on the vertical axis of your scatter plot. Provide rationale for why you chose this variable and brainstorm different variables that may impact it. Research the variable on the internet and collect data for at least 10 years (the more the better) or over at least 10 different places in the same year.

Note: Statistics Canada is our most reliable source for data and one of the best sources to get a wide variety of data. Be sure to cite your source in proper MLA or APA format. You may want to revisit your Portfolio entry from Unit 5 Activity 2 for the data that you found from Statistics Canada. You can also revisit the tutorial video in that activity.

If you are having trouble finding 2 variables, you can think about considering a large general variable like the ones on the statistics for 42 Countries.

Independent variables: Choose the independent variables that you want to measure against the dependent variable. You will find 2 different variables for which you can create a line of best fit with to test correlation. This will be your biggest challenge of the assignment. You need to find variables such that they overlap with the data that you chose for your dependent variable.

For example: If you found the years 1990 to 2010 for your dependent variable, you will need to find those same years for each of your independent variables. If you had data across the provinces and territories for your independent variable for 2016, you will need data across the provinces and territories for 2016 for each of your independent variables.

One variable measures with graphs and distributions: Describe each of the 3 of your variables (two independent and one dependent) by calculating and making reference to the measures of central tendency, spread, and creating a histogram and box plots. Describe the distribution of your data as well. For each of the variables, describe the meaning of each of the measures as you state what they are.

Two variable measures with linear models and correlations: Create 2 scatterplots, one for each of your independent variables. Create the linear model to describe the relationship. Describe the strength of the correlation and the meaning of the slopes and y-intercepts.

Identify and remove relevant points: Identify one point on each scatter plot that has the greatest impact on the model by using residual plots. Describe the percentile that the point would be if the data was normally distributed, by using the mean, standard deviation and z-Score. Describe the impact of the point by comparing the new linear model to the original.

Classification of correlation type: For each of the models, classify the correlation type as cause and effect, reverse cause and effect, accidental or common cause. Explain your choice and brainstorm possible common cause factors where applicable.

Conclusions from linear models: Make conclusions from the two variable analysis taking into consideration the removal of points that are far from the line of best fit. Discuss the overall impact, if any, that the independent variables have on the dependent variable by referring to other sections of the report.