Weekly Coding Checkup 10

A Semester Project made by Custom Designed Activities

We continue working on the Dr. B & Class project. In this second part of the semester, you will apply what we learned in class in a set of tasks and scenarios. Please remember that it is important that the code that you submit is your own code and not somebody else work. It is fine to make mistakes but only by practicing in R you can get a better grasp of the software.

I also want you to try building your document as an official report for a potential company (Dreaming Diamonds LLC) for which you are getting to know and explore the diamonds dataset (e.g., spend time on storytelling, commenting results and providing insights and conclusions when possible).

Summary of this week tasks:

This week we will work on modeling workflows using tidymodels. Tidymodels is a unified, integrated, and straightforward package that facilitates a smooth approach to data modeling. The bundle package includes individuals packages that enable you to perform necessary steps in a consistent, scalable and flexible way. So, let’s practice what we covered this week:

• Perform preprocessing steps with recipes

• Specify models with parsnip

• Streamline recipes and models with workflows

Tip

Use the diamonds dataset to solve all the below tasks. I have subset it to 500 observations to speed up the modeling process.

Q1:

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe1”.
  • Make sure that price is your dependent variable and that you use all the other variables a predictors.
  • Standardize mean and sd of the price and carat variables.
  • Create dummy variables for cut, color and clarity (all nominal columns).
  • Create a “price_per_carat” column equal to price/carat.
  • Keep only the first 300 rows.

Q1b:

Write the code necessary to create a “preprocessed_rec1_diamonds” dataset. Make sure to apply all the preprocessing changes specified in the diamonds_recipe1 to the original diamonds dataset. (2 points)

Q1c:

What do you notice ? (0 points)

Q2:

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe2”.
  • Make sure that price is your dependent variable and that you use all the other variables a predictors.
  • Keep only the rows with price bigger than $3000.
  • Log transform price and carat (use base 2).
  • Make sure infrequent values of all your nominal column are combined in an “other” category.
  • Create a “volume” column equal to x*y*z.
Warning

What happens if you apply this preprocessing step (Keep only the rows with price bigger than $3000.) as the last preprocessing step? Please remember that the order of the step matters!

Q2b:

Write the code necessary to create a “preprocessed_rec2_diamonds” dataset. Make sure to apply all the preprocessing changes specified in the diamonds_recipe2 to the original diamonds dataset. (2 points)

Q2c:

What do you notice ? (0 points)

Q3:

Write the code necessary to apply the following preprocessing steps (3 points):

  • Define a recipe with name “diamonds_recipe3”.
  • Make sure that price is your dependent variable and that you use just carat, x, y and z as predictors.
  • Scale price and carat to ensure that sd is equal to 1.
  • Create an interaction term between `x,y,z`.
  • Keep only the last 300 rows.

Q3b:

Write the code necessary to create a “preprocessed_rec3_diamonds” dataset. Make sure to apply all the preprocessing changes specified in the diamonds_recipe3 to the original diamonds dataset. (2 points)

Q3c:

What do you notice ? (0 points)

Q4:

Specify a linear regression model named “linear_reg” with “lm” engine and with “regression” as mode. (1.5 points).

Q5:

Specify a ridge regression model named “ridge_reg” with “glmnet” as engine and with “regression” as mode. (1.5 points)

Q6:

Specify a decision tree model named “decision_tree” with “rpart” as engine and with “classification” as mode. (1.5 points)

Q7:

Define a “lm_reg_workflow1” object, make sure it is a workflow that uses the diamonds_recipe1 and the linear_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q8:

Define a “lm_reg_workflow2” object, make sure it is a workflow that uses the diamonds_recipe2 and the linear_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Q9:

Define a “lm_reg_workflow3” object, make sure it is a workflow that uses the diamonds_recipe3 and the linear_reg model, then fit the workflow. Then access the results of the fitted workflow in a tidy format.(3 points)

Important

How challenging would it be to set up three additional workflows for the three recipes using the ridge regression model? And approximately how long would it take to do the same with the decision tree model? With the tidymodels framework, especially recipes, parsnip, and workflow packages, running nine different models becomes fast and highly efficient. Just as importantly, adjusting them based on your results will be straightforward. Plus, anyone reviewing your code can easily understand and replicate your work—a feature that distinguishes tidymodels from many other packages and languages. Make sure you’re comfortable with everything we’ve covered so far, as we’ll be building on these skills going forward.

🛑 Don’t Click Submit Just Yet 🚧

Please read carefully the below information:

  • Once you have completed all the coding questions, and you are confident in your work (take advantage of the immediate feedback feature), copy and paste your responses from the chunk into the form fields below each question.

  • You are responsible for correctly coping and pasting only the required code to solve each question. We will grade only what you have submitted!

  • We will only grade 1 submission per student so do not click Submit until you are confident in your responses.

  • By submitting this form you are certifying that you have followed the academic integrity guidelines available in the syllabus. The code and answers submitted are the results of your work and your work only!

  • Make sure you have completed all the questions and included all the required personal information (e.g., full name, email, zid) in the respective form’s fields.

  • Now you are ready to click the above “Submit” button. Congrats you have completed your weekly coding check up!!!