Weekly Coding Checkup 10

A Semester Project made by Custom Designed Activities

We continue working on the Dr. B & Class project. In this second part of the semester, you will apply what we learned in class in a set of tasks and scenarios. Please remember that it is important that the code that you submit is your own code and not somebody else work. It is fine to make mistakes but only by practicing in R you can get a better grasp of the software.

I also want you to try building your document as an official report for a potential company (Dreaming Diamonds LLC) for which you are getting to know and explore the diamonds dataset (e.g., spend time on storytelling, commenting results and providing insights and conclusions when possible).

Summary of this week tasks:

This week we will work on modeling workflows using tidymodels. Tidymodels is a unified, integrated, and straightforward package that facilitates a smooth approach to data modeling. The bundle package includes individuals packages that enable you to perform necessary steps in a consistent, scalable and flexible way. So, let’s practice what we covered this week:

• Perform preprocessing steps with recipes

• Specify models with parsnip

• Streamline recipes and models with workflows

Tip

Use the diamonds dataset to solve all the below tasks. I have subset it to 15k observations to speed up the modeling process.

Q1:

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe1”.
  • Make sure that price is your dependent variable and that you use all the other variables a predictors.
  • Standardize mean and sd of the price and carat variables.
  • Create dummy variables for cut, color and clarity (all nominal columns).
  • Create a “price_per_carat” column equal to price/carat.
  • Keep only the first 3000 rows.

Q1b:

Write the code necessary to create a “preprocessed_rec1_diamonds” dataset. Make sure to check all the preprocessing changes applied to original diamonds dataset with the diamonds_recipe1 recipe. What do you notice? (3 points)

Q2:

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe2”.
  • Make sure that price is your dependent variable and that you use all the other variables a predictors.
  • Log transform price and carat (use base 2).
  • Make sure infrequent values of all your nominal column are combined in an “other” category.
  • Create a “volume” column equal to x*y*z.
  • Keep only the rows with price bigger than $3000 [hint: ?step_filter].

Q2b:

Write the code necessary to create a “preprocessed_rec2_diamonds” dataset. Make sure to check all the preprocessing changes applied to original diamonds dataset with the diamonds_recipe2 recipe. What do you notice? (3 points)

Q3:

Write the code necessary to apply the following preprocessing steps (2.5 points):

  • Define a recipe with name “diamonds_recipe3”.
  • Make sure that price is your dependent variable and that you use just carat, x, y and z as predictors.
  • Scale price and carat to ensure that sd is equal to 1.
  • Create an interaction term between `x,y,z`.
  • Keep only the last 3000 rows.

Q3b:

Write the code necessary to create a “preprocessed_rec3_diamonds” dataset. Make sure to check all the preprocessing changes applied to original diamonds dataset with the diamonds_recipe3 recipe. What do you notice? (3 points)

Q4:

Specify a linear regression model named “linear_reg” with “lm” engine and with “regression” as mode. (1.5 points).

Q5:

Specify a ridge regression model named “ridge_reg” with “glmnet” as engine and with “regression” as mode. (1.5 points)

Q6:

Specify a decision tree model named “decision_tree” with “rpart” as engine and with “classification” as mode. (1.5 points)

Q7:

Define “lm_reg_workflow1”, make sure this workflow uses diamonds_recipe1 and linear_reg model. Write the code to fit lm_reg_workflow1 and access its results (3 points).

Q8:

Define “lm_reg_workflow2”, make sure this workflow uses diamonds_recipe2 and linear_reg model. Write the code to fit lm_reg_workflow2 and access its results. (3 points).

Q9:

Define “lm_reg_workflow3”, make sure this workflow uses diamonds_recipe3 and linear_reg model. Write the code to fit lm_reg_workflow3 and access its results.(3 points)

Important

How challenging would it be to set up three additional workflows for the three recipes using the ridge regression model? And approximately how long would it take to do the same with the decision tree model? With the tidymodels framework, especially recipes, parsnip, and workflow packages, running nine different models becomes fast and highly efficient. Just as importantly, adjusting them based on your results will be straightforward. Plus, anyone reviewing your code can easily understand and replicate your work—a feature that distinguishes tidymodels from many other packages and languages. Make sure you’re comfortable with everything we’ve covered so far, as we’ll be building on these skills going forward.

🛑 Don’t Click Submit Just Yet 🚧

Please read carefully the below information:

  • Once you have completed all the coding questions, and you are confident in your work (take advantage of the immediate feedback feature), copy and paste your responses from the chunk into the form fields below each question.

  • You are responsible for correctly coping and pasting only the required code to solve each question. We will grade only what you have submitted!

  • We will only grade 1 submission per student so do not click Submit until you are confident in your responses.

  • By submitting this form you are certifying that you have followed the academic integrity guidelines available in the syllabus. The code and answers submitted are the results of your work and your work only!

  • Make sure you have completed all the questions and included all the required personal information (e.g., full name, email, zid) in the respective form’s fields.

  • Now you are ready to click the above “Submit” button. Congrats you have completed your weekly coding check up!!!