Weekly Coding Checkup 14

A Semester Project made by Custom Designed Activities

We continue working on the Dr. B & Class project. In this second part of the semester, you will apply what we learned in class in a set of tasks and scenarios. Please remember that it is important that the code that you submit is your own code and not somebody else work. It is fine to make mistakes but only by practicing in R you can get a better grasp of the software.

I also want you to try building your document as an official report for a potential company (Dreaming Diamonds LLC) for which you are getting to know and explore the diamonds dataset (e.g., spend time on storytelling, commenting results and providing insights and conclusions when possible).

Summary of this week tasks:

This week we will work on modeling workflows using tidymodels nadd in assessing models. Tidymodels is a unified, integrated, and straightforward package that facilitates a smooth approach to data modeling. The bundle package includes individuals packages that enable you to perform necessary steps in a consistent, scalable and flexible way. So, let’s practice what we covered so far:

  • Checking correlations

  • Data Splitting

  • Running modeling pipelines

  • Evaluting models and identifying best model

Tip

Use the diamonds dataset to solve all the below tasks. I have subset it to 15k observations to speed up the modeling process.

Q1:

Write the code necessary to perform the following (2 points):

  • Compute the correlation matrix for all the interval variables in the new diamonds dataset. Make sure to name the matrix as diamonds_corr_matrix and to use only complete.obs.Finally, write the code to check the values inside the correlations matrix.
Important

Make sure to include the numerical columns using the column name.

Q2:

Write the code necessary to perform the following (1 points):

  • Visualize the correlation matrix created in the previous task with a correlation chart.

Q3:

What did you learn from the correlation analysis? Provide your interpretation of the correlation matrix values and the correlation matrix chart (assume that price is your dependent variable)- (0 points).

Q4:

Write the code necessary to perform the following (3 points):

  • Set a seed for the data splitting task (pick 007 as random number).

  • Then create an initial split that allocates 85% of the data to the training set and 15% to the test set. Assign it to an object named: “diamonds_split”.

  • Create the training and test set. Make sure to name the training and test set as “diamonds_train” and “diamonds_test”.

Q4b:

How many observations do you have in your train and test sets? (0 points)

Q5:

Write the code necessary to create the following recipes:

Q5a Recipe 1

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe1”.
  • Make sure that price is your dependent variable and that you use only the carat and cut variables a predictors.
  • Standardize mean and sd of the carat variables.
  • Create dummy variables for cut.

Q5b Recipe 2

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe2”.
  • Make sure that price is your dependent variable and that you use all the other variables a predictors.
  • Log transform carat (use base 2).
  • Create dummy variables for cut, color and clarity (all nominal columns).

Q6:

Write the code necessary to define the following models:

Q6a Model 1

Specify a linear regression model named “linear_reg” with “lm” engine and with “regression” as mode. (1.5 points).

Q6b Model 2

Specify a ridge regression model named “ridge_reg” with “glmnet” as engine and with “regression” as mode. (1.5 points)

Q6c Model 3

Specify a decision tree model named “decision_tree” with “rpart” as engine and with “regression” as mode. (1.5 points

Q7:

Write the code necessary to create the following workflows:

Tip

Follow the steps in the order of the instructions.

Q7a Workflow 1

Define a “recipe1_lm_reg_workflow1” object, make sure it is a workflow that uses the diamonds_recipe1 and the linear_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q7b Workflow 2

Define a “recipe2_lm_reg_workflow2” object, make sure it is a workflow that uses the diamonds_recipe2 and the linear_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q7c Workflow 3

Define a “recipe1_ridge_reg_workflow3” object, make sure it is a workflow that uses the diamonds_recipe1 and the ridge_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q7d Workflow 4

Define a “recipe2_ridge_reg_workflow4” object, make sure it is a workflow that uses the diamonds_recipe2 and the ridge_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q7e Workflow 5

Define a “recipe1_decision_tree_workflow5” object, make sure it is a workflow that uses the diamonds_recipe1 and the decision_tree model, then fit the workflow. Then access the results of the fitted workflow.(3 points)

Q7f Workflow 6

Define a “recipe2_decision_tree_workflow6” object, make sure it is a workflow that uses the diamonds_recipe2 and the decision_tree model, then fit the workflow. Then access the results of the fitted workflow.(3 points)

Q8:

Write the code necessary to create predictions with all of the above workflows. Keep in mind the following instructions (4 points):

  • Use the workflows in the following order: recipe1_lm_reg_workflow1; recipe2_lm_reg_workflow2; recipe1_ridge_reg_workflow3; recipe2_ridge_reg_workflow4; recipe1_decision_tree_workflow5; recipe2_decision_tree_workflow6

  • Use the following name for the predictions: pred_recipe1_lm_reg_workflow1; pred_recipe2_lm_reg_workflow2; pred_recipe1_ridge_reg_workflow3; pred_recipe2_ridge_reg_workflow4; pred_recipe1_decision_tree_workflow5; pred_recipe2_decision_tree_workflow6

  • Make sure the price predictions values are added to the test set

  • Finally show just the diamonds actual price and the predicted price

Important

Make sure to use each of the chunks and corresponding predictions in the exact order specified in the instructions above. Please do not rename the prediction column.

Q8a Prediction 1

Q8b Prediction 2

Q8c Prediction 3

Q8d Prediction 4

Q8e Prediction 5

Q8f Prediction 6

Q9:

Write the code necessary to perform the following (3 points):

  • Using the predictions above compute the MAE, RMSE, RSquare. Apply the following names: metrics_recipe1_lm_reg_workflow1; metrics_recipe2_lm_reg_workflow2; metrics_recipe1_ridge_reg_workflow3; metrics_recipe2_ridge_reg_workflow4; metrics_recipe1_decision_tree_workflow5; metrics_recipe2_decision_tree_workflow6
Important

Make sure to use each of the chunks and corresponding metrics in the exact order specified in the instructions above.

Tip

Make sure you can identify the model those metrics belong to. Add a model column with the model information (e.g., “recipe1_lm_reg_workflow1”). Finally, make sure to print the metrics object (e.g.,metrics_recipe1_lm_reg_workflow1 )

Q9a Model Metric 1

Q9b Model Metric 2

Q9c Model Metric 3

Q9d Model Metric 4

Q9e Model Metric 5

Q9f Model Metric 6

Q10:

Write the code necessary to perform the following (3 points):

  • Put all the metrics in one metrics dataset named modeling_metrics_summary.

  • Sort the modeling_metrics_summary object first by metrics in alphabetical order and then based on the values of their metrics from smallest to largest.

Q10b:

Which one is the best model? Why? (0 point)

🛑 Don’t Click Submit Just Yet 🚧

Please read carefully the below information:

  • Once you have completed all the coding questions, and you are confident in your work (take advantage of the immediate feedback feature), copy and paste your responses from the chunk into the form fields below each question.

  • You are responsible for correctly coping and pasting only the required code to solve each question. We will grade only what you have submitted!

  • We will only grade 1 submission per student so do not click Submit until you are confident in your responses.

  • By submitting this form you are certifying that you have followed the academic integrity guidelines available in the syllabus. The code and answers submitted are the results of your work and your work only!

  • Make sure you have completed all the questions and included all the required personal information (e.g., full name, email, zid) in the respective form’s fields.

  • Now you are ready to click the above “Submit” button. Congrats you have completed your weekly coding check up!!!