For this lab you will create a .zip file called which contains the following:

Submit your lab (the .zip file) to the corresponding assignment on Canvas. You have unlimited attempts before the deadline. Your final submission before the deadline will be graded.


Grading of this lab will largely be based on the ability of the grader to access and run your code. That is, the grader should be able to unzip your file, open lab10.Rproj, then finally open and knit lab10.Rmd without any modification or errors. If they are able to do so, and the resulting lab10.html contains the graphics described below, you will receive at least nine of the ten possible points for the lab.


The following video describes how to create all of the files described above. It will also walk through each of the exercises and describe and least one valid solution.

Exercise 1 (Setup)

Before creating lab10.Rmd you should first create an RStudio Project named lab10. (The video above will demonstrate this.) This will also create a folder named lab10. Create lab10.Rmd and place it inside this folder.

Add the following code to your .Rmd file which will load the tidyverse. Throughout this lab you may need functions from dplyr and ggplot2.


Additionally, add the following code to your .Rmd file which will load the data needed for this lab:

mlb_pitches_2021 = as_tibble(readRDS(url("")))

This data originates from Baseball Savant. In particular this data comes from the Statcast that MLB collects. Several data transformations have been done to the originally accessed data. Ultimately this data contains information on the pitch type, velocity, and spin rate of every MLB pitch thrown in 2021.

Exercise 2 (Pitch Type Frequency)

The following video explains the various “pitch types” used in baseball:

The following table explains the abbreviations used by Statcast:

Pitch Type Pitch Name
CH Changeup
CS Curveball
CU Curveball
EP Eephus
FA Fastball
FC Cutter
FF 4-Seam Fastball
FS Split-Finger
KC Knuckle Curve
KN Knuckleball
SC Screwball
SI Sinker
SL Slider

Create a bar plot that shows the frequency of each pitch type in 2021. Order the bars according to frequency.


mlb_pitches_2021 %>% 
  filter(pitch_type != "") %>% 
  ggplot(aes(x = fct_infreq(pitch_type), fill = pitch_type)) + 
  geom_bar(show.legend = FALSE) +
  labs(title = "Frequency of MLB Pitch Types",
       subtitle = "2021 Season",
       caption = "Data Source: Baseball Savant") +
  xlab("Pitch Type") +
  ylab("Count") +

Exercise 3 (Pitch Type Velocity and Spin)

Can you guess the type of pitch just by watching it?

To get a sense of how this is more easily done by looking at velocity and spin rates, create a plot of spin rate versus velocity for Carlos Rodon. Use color and shapes to indicate the pitch types.


mlb_pitches_2021 %>%
  filter(pitch_type != "") %>%
  filter(name == "Carlos Rodon") %>%
  na.omit() %>% 
    x = release_speed,
    y = release_spin_rate,
    color = pitch_type,
    shape = pitch_type
  )) +
  geom_point() +
  labs(title = "Spin Rate versus Velocity",
       subtitle = "Carlos Rodon, 2021",
       caption = "Data Source: Baseball Savant", 
       color = "Pitch Type", 
       shape = "Pitch Type") +
  xlab("Velocity") +
  ylab("Spin Rate") +
  scale_color_brewer(palette = "Set1") +