Weekly Coding Checkup 3

A Semester Project made by Custom Designed Activities

We continue working on the Dr. B & Class project. During the entire semester you will apply what we learned in class in a set of tasks and scenarios custom designed for you. Please remember that it is important that the code that you submit is your own code and not somebody else work. It is fine to make mistakes but only by practicing in R you can get a better grasp of the software. I also want you to try building your document as an official report for a potential company (Dreaming Diamonds LLC) for which you are getting to know and explore the diamonds dataset (e.g., spend time on storytelling, commenting results and providing insights and conclusions when possible).

Summary of this week tasks:

This week we learn how to combine data manipulations using pipes. In an ideal world, you would receive a dataset ready for analysis. However, this is almost never the case. Cleaning and wrangling data are critical skills of data scientists and should be performed before moving to modeling. So, let’s practice what we covered this week: - Manipulations using pipes

Tip

Use the diamonds dataset to solve all the below tasks.

Important

You must use the pipe (|>) to complete each task and do not create any intermediate object.

Q1: Load the tidyverse package. (1 point)

Q2: Use the diamonds dataset and complete the following (3 points):

  1. Group the dataset using the cut variable.
  2. Compute the following descriptive statistics for the carat variable: minimum (min_carat), average (avg_carat), median (median_carat), maximum (max_carat), standard deviation (sd_carat).
  3. Produce the count (diamonds_count) of how many diamonds you have in each cut.
Warning

Make sure to respect the order given in the above instructions (both for the functions and the columns) and to exactly match the columns name.

Q2b1: What is the cut with the lowest number of observations in the dataset? (1 point)

Q2b2: What is the cut with the largest number of observations in the dataset? (1 point)

Q2b3: What is the cut with the highest average carat? (1 point)

Q2b4: What is interesting about this analysis? (0 point)

Q3: Use the diamonds dataset and complete the following (4 points):

  1. Keep in the diamonds dataset only the carat, cut and price columns.
  2. Sort the dataset from the highest to the lowest price.
  3. Compute a new column named price_per_carat and equal to price/carat.
  4. Keep in the diamonds data frame only the observations with price_per_carat above 10000$ and with a fair cut.
Warning

Make sure to respect the order given in the above instructions (both for the functions and the columns) and to exactly match the columns name. Moreover, in the select you must select by columns position (not by columns names).

Q3b1: How many observations are left in the dataset? (1 point)

Q3b2: What is the highest price_per_carat for a diamond with fair cut? (1 point)

Q3b3: What is interesting about this analysis? (0 point)

Q4: Use the diamonds dataset and complete the following (4 points):

  1. Group the dataset using the color variable.
  2. Compute the following descriptive statistics for the price variable: minimum (min_price), average (avg_price), median (median_price), maximum (max_price), standard deviation (sd_price).
  3. Produce the count (diamonds_count) of how many diamonds you have per each color.
  4. Sort the data from the highest median_price to the lowest.
Warning

Make sure to respect the order given in the above instructions (both for the functions and the columns) and to exactly match the columns name.

Q4b1: What is the color with the lowest number of observations in the dataset? (1 point)

Q4b2: What is the color with the largest number of observations in the dataset? (1 point)

Q4b3: What is the color with the highest median price? (1 point)

Q4b4: What is interesting about this analysis? (0 point)

Q5: Use the diamonds dataset and complete the following (7 points):

  1. Keep in the diamonds dataset only the clarity, price, x, y and z columns.
  2. Compute a new column named size and equal to x*y*z.
  3. Compute a new column named price_by_size and equal to price/size.
  4. Sort the data from the smallest to the largest price_by_size.
  5. Group the observations by clarity.
  6. Compute the median price_by_size (median_price_by_size) per each clarity.
  7. Keep in the dataset only observations with clarity equal to “IF” or “I1”.
Warning

Make sure to respect the order given in the above instructions (both for the functions and the columns) and to exactly match the columns name. Moreover, in the select you must select by columns position (not by columns names).

Q5b1: What is the median price_by_size for diamonds with IF clarity? (1 point)

Q5b2: What is the median price_by_size for diamonds with I1 clarity? (1 point)

Q5b3: Does is make sense that the median price_by_size for the IF clarity is bigger than the one for the I1 clarity? Why? (0 point)

🛑 Don’t Click Submit Just Yet 🚧

Please read carefully the below information:

  • Once you have completed all the coding questions, and your confident in your work, copy and paste your responses from the chunk into the form fields below each question.

  • You are responsible for correctly coping and pasting only the required code to solve each question. We will grade only what you have submitted!

  • We will only grade 1 submission per student so do not click Submit until you are confident in your responses.

  • By submitting this form you are certifying that you have followed the academic integrity guidelines available in the syllabus. The code and answers submitted are the results of your work and your work only!

  • Make sure you have completed all the questions and included all the required personal information (e.g., full name, email, zid) in the respective form’s fields.

  • Now you are ready to click the above “Submit” button. Congrats you have completed your weekly coding check up!!!