Recent Question/Assignment
Assignment #2
Due Date: November 15th, 11:59pm
This assignment is worth 10% of your final grade and has 4 sections. Please save all your code for this assignment as a .R file and name that file:
“assignment_2_lastname_firstname.R”. If your .R file is not correctly named, you will lose 0.5 mark. Be sure to save while you write your code for this assignment. Please number each part in your assignment in the form of a comment. At the top of your code, please comment your name, student number, and “Assignment #2”.
Section 1.
Part 1.1. We have created three lines of code that you will need to run at the beginning of your R code. Each line of code will produce 20 numbers and you will need to create a vector with these numbers and assign it the appropriate name (please see below). Please only run these lines of code once. Please note that each student will receive a
random set of numbers (i.e. your numbers will not be the same as any of your classmates). (0.75 mark)
• sample(160:200, 20, replace=TRUE) o assign the above 20 numbers to the name ‘height’.
• sample(50:113, 20, replace=TRUE) o assign the above 20 numbers to the name ‘weight’.
• sample(0:1260, 20, replace=TRUE) o assign the above 20 numbers to the name ‘exercise_minutes’.
Part 1.2. Using the three vectors produced in Part 1.1, create a dataframe and name this dataframe ‘df’. Once you create this dataframe, it will appear in your Global
Environment and this dataframe should contain 20 observations and 3 variables (0.25 mark).
Section 2.
Part 2.1. Last week, you learned about covariance and correlation in the online lecture. You will now apply what you learned in this assignment. In the next parts, you will be calculating the correlation coefficient of the ‘weight’ and ‘exercise_minutes’ variables. (There are no marks for this part, this is just a description.)
Part 2.2. Plot a histogram of each variable in the ‘df’ dataframe, comment the shape, and any other relevant commentary about each histogram. (2.25 marks)
Part 2.3. Calculate the mean, median, and standard deviation of each of the variables, provide these values as comments, and comment how they relate to the histogram. (3 marks)
Part 2.4. Create a new variable (i.e. column) in the ‘df’ dataframe and name this column ‘new_variable1’. This variable should contain the values calculated using the following calculation: (???? - ??¯), where ???? is each value in the ‘weight’ variable (i.e. ??1 is the first value of the ‘weight’ variable) and ??¯ is the mean of all the values in the ‘weight’ variable. (0.25 mark)
Part 2.5. Create a new variable (i.e. column) in the ‘df’ dataframe and name this column ‘new_variable2’. This variable should contain the values calculated using the following calculation: (???? - ??¯), where ???? is each value in the ‘exericse_minutes’ variable (i.e. ??3is the third value of the y variable) and ??¯ is the mean of all the values in the ‘exericse_minutes’ variable. (0.25 mark)
Part 2.6. Create a new variable (i.e. column) in the ‘df’ dataframe and name this column ‘product_2_variables’. This variable should contain the values of the product between the ‘new_variable1’ and ‘new_variable2’ columns you created in Part 2.4 and Part 2.5. (Here is the equation: (???? - ??¯)(???? - ??¯) ). (0.25 mark)
Part 2.7. In this part you will need to calculate the correlation coefficient of the ‘weight’ and ‘exercise_minutes’ variables. We have walked you through the first steps of how to calculate this coefficient in Parts 2.3-2.6. Now using those parts, the knowledge that you gained from Lecture 7, as well as the equation presented in Lecture 7 Slide 21, calculate the correlation coefficient and comment this value. Please be sure to use R to calculate the coefficient and show all your steps in this component. (1.5 marks)
Section 3.
Part 3.1. Using two built-in R functions, calculate both the covariance and correlation of the ‘weight’ and ‘exericse_minutes’ variables. Provide both values as comments. (1.0 mark)
Section 4.
Part 4.1. Please describe the direction and effect size of your correlation coefficient calculated in Part 2.7 as comments in your code. (0.5 mark)