For this lab you will create a .zip
file called lab10.zip
which contains the following:
lab10.Rmd
- An RMarkdown file.lab10.html
- The results of knitting the RMarkdown file.lab10.Rproj
- An RStudio project file.Submit your lab (the .zip
file) to the corresponding assignment on Canvas. You have unlimited attempts before the deadline. Your final submission before the deadline will be graded.
Grading of this lab will largely be based on the ability of the grader to access and run your code. That is, the grader should be able to unzip your lab10.zip
file, open lab10.Rproj
, then finally open and knit lab10.Rmd
without any modification or errors. If they are able to do so, and the resulting lab10.html
contains the graphics described below, you will receive at least nine of the ten possible points for the lab.
The following video describes how to create all of the files described above. It will also walk through each of the exercises and describe and least one valid solution.
Before creating lab10.Rmd
you should first create an RStudio Project named lab10
. (The video above will demonstrate this.) This will also create a folder named lab10
. Create lab10.Rmd
and place it inside this folder.
Add the following code to your .Rmd
file which will load the tidyverse
. Throughout this lab you may need functions from dplyr
and ggplot2
.
library(tidyverse)
Additionally, add the following code to your .Rmd
file which will load the data needed for this lab:
mlb_pitches_2021 = as_tibble(readRDS(url("https://stat385.org/data/mlb_pitches_2021.rds")))
This data originates from Baseball Savant. In particular this data comes from the Statcast that MLB collects. Several data transformations have been done to the originally accessed data. Ultimately this data contains information on the pitch type, velocity, and spin rate of every MLB pitch thrown in 2021.
The following video explains the various “pitch types” used in baseball:
The following table explains the abbreviations used by Statcast:
Pitch Type | Pitch Name |
---|---|
CH | Changeup |
CS | Curveball |
CU | Curveball |
EP | Eephus |
FA | Fastball |
FC | Cutter |
FF | 4-Seam Fastball |
FS | Split-Finger |
KC | Knuckle Curve |
KN | Knuckleball |
SC | Screwball |
SI | Sinker |
SL | Slider |
Create a bar plot that shows the frequency of each pitch type in 2021. Order the bars according to frequency.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
ggplot(aes(x = fct_infreq(pitch_type), fill = pitch_type)) +
geom_bar(show.legend = FALSE) +
labs(title = "Frequency of MLB Pitch Types",
subtitle = "2021 Season",
caption = "Data Source: Baseball Savant") +
xlab("Pitch Type") +
ylab("Count") +
theme_bw()
Can you guess the type of pitch just by watching it?
To get a sense of how this is more easily done by looking at velocity and spin rates, create a plot of spin rate versus velocity for Carlos Rodon. Use color and shapes to indicate the pitch types.
mlb_pitches_2021 %>%
filter(pitch_type != "") %>%
filter(name == "Carlos Rodon") %>%
na.omit() %>%
ggplot(aes(
x = release_speed,
y = release_spin_rate,
color = pitch_type,
shape = pitch_type
)) +
geom_point() +
labs(title = "Spin Rate versus Velocity",
subtitle = "Carlos Rodon, 2021",
caption = "Data Source: Baseball Savant",
color = "Pitch Type",
shape = "Pitch Type") +
xlab("Velocity") +
ylab("Spin Rate") +
scale_color_brewer(palette = "Set1") +
theme_bw()