# Analyzing Baseball’s Spin Rate Revolution

## Using data science to understand the increase in pitchers’ spin rates

Throughout recent years, it has been well known that the spin rate of pitches in Major League Baseball has increased significantly. Furthermore, it has been speculated that illegal foreign substances have played a role in this increase. Before any discussion regarding the statistics and data science of this topic, however, it is important to discuss what the spin rate is and why it is so principal for a pitcher’s success at the highest level of the game. Why would pitchers risk punishment from the MLB just so they could improve this variable?

# What is the Spin Rate? Why is it Important?

Spin rate is a pitching measurement that measures the rotation of the ball immediately after it is released from the pitcher’s hand. It is measured in revolutions per minute (RPMs). A higher spin rate will result in more movement of the ball before it reaches home plate, meaning that the final pitch location will end up further away from its original straight-line trajectory.

Pitchers usually like to mix up their pitch types in order to confuse the batter and prevent predictability as different types of pitches are associated with different spin rates. Fastballs, for example, are pitches thrown with the purpose of overcoming the batter with speed rather than a large amount of spin. All other types of pitches are designed to fool the batter with more spin, such as curveballs, a pitch that carries a large amount of downward spin that causes the ball to break straight down before it reaches home plate. A curveball (or any other breaking pitch type) with a higher spin rate will be more effective than the same ball with a lower spin rate because these pitches will be quite vulnerable in the eyes of the batter if their final location over the plate is easy to recognize in addition to their slower speed.

This same principle also applies to fastballs, the type of pitch that I will focus on in this analysis. Imagine being a batter and facing a pitch that is both fast and spin-heavy… it would be overwhelming! Even if you are able to predict the final location of the pitch with all of its spin, it would be equally difficult to get the bat to the ball at the correct time (fastballs with a large amount of spin often break to the side or even upward rather than downward). The best pitchers in modern-day baseball can throw incredibly fast with their average fastball speeds ranging from 95–100 MPH and their average spin rates on these pitches ranging from 2400–2800 RPM. These numbers represent a dramatic increase from only a few years ago, so what has caused this phenomenon? Better pitching coaches? Differences in the production of baseballs? Or perhaps some chemical on those baseballs?…

# The Foreign Substance Scandal

Baseball has been plagued with cheating scandals as of recent, the most famous being the Houston Astros using center field cameras to steal signs from pitchers with the purpose of relaying information to their batters. However, pitchers are not so innocent themselves… there has been abundant evidence that suggests, within the MLB, there is an entire market of chemicals that allow pitchers to gain a better grip on the baseball as this increases the spin rate. In fact, a pitcher on the Los Angeles Dodgers named Trevor Bauer estimated that 70% of pitchers use some type of foreign substance for assistance (in this analysis, we will see that Bauer has likely not been so innocent himself since making that comment).

# Methods and Data Collection

Now, it is time to use data science tools in order to look into this matter! I will be filtering for individual pitches that are fastballs, have come from pitchers with at least 100 fastballs thrown that season, and have taken place between the 2015 and current 2021 MLB season through May 21st (Statcast data on specific pitching data such as spin rate was not available before 2015). I will be analyzing the distributions of spin rate during these years, distributions of spin rate changes across seasons, the players most responsible for these changes, and a model that tells us how spin rate affects a player’s success. In Python, I will use Plotly, Matplotlib, and Seaborn to create visualizations and Scikit-learn to create machine learning models.

Fortunately, there is a Python library called baseball-scraper (https://pypi.org/project/baseball-scraper/) that obtains Statcast data and makes publically available every variable you could ever dream of relating to in-game data. My Github page with all of the code will be linked at the end of the article.

# Analyzing the Spin Rate Distribution

First, I will provide distributions of the fastball spin rate in each of the 7 seasons. I will pick a sample size of 5000 pitches in each season as 2020 was a shortened season and the 2021 season is still in progress. A smaller sample size will also allow us to more easily recognize differences between distributions both visually and when statistically testing.

At first look, these distributions appear very similar. However, a closer look will show us that the average spin rate increases slightly each year as illustrated by the bar chart. As all distributions appear approximately normal to the bare eye, it may be appropriate to figure out which distributions are least normal in order to detect irregularities. For this task, we can use the **Shapiro-Wilk test of normality. **In this statistical test, the null hypothesis is that the sample comes from a normally distributed population. A p-value below the typical alpha value of 0.05 means that we can reject the null hypothesis and conclude that the sample comes from a non-normal distribution, as a lower test statistic will result in a lower p-value. The theory of the test can be found here: https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test

None of the distributions are statistically normal, but it is more useful to observe which distributions deviate most from normality. Interestingly, the 2015 season is the least normal, but we can also see a scatter of extremely low spin rates that might be causing this. The 2020 and 2021 seasons are the least normal, possibly due to noticeable “spikes” of values around the 3000 RPM mark for both seasons. In previous years, there appear to have been little to no pitches above 2800 RPM. Let’s take a closer look at the upper ends of the distributions by comparing a more “normal” year (say, 2017) to the 2020 season. Note that the following plot will include **all** pitches greater than 2500 RPM in both seasons rather than a smaller sample size.

Now we can see something very strange… the 2020 (and 2021) distribution of spin rate is actually bimodal! When bimodal distributions appear within events that usually yield a normal-looking distribution, it often points to some extraneous factor at play. In 2017, pitches above 2850 RPM were basically non-existent, but now, there is a relatively small yet separate and significant “class” of pitches that average 3000 RPM. Who are the individual players responsible for these pitches?

There are 193 pitchers who threw at least 100 pitches in 2020 that were able to reach the 2900 RPM mark. Among them is Trevor Bauer who threw nearly 70 such pitches, more than twice as many as anybody else.

# Analyzing Pitchers’ Changes in Spin Rate

The distributions of fastball spin rates have illustrated that something strange has been occurring especially throughout the past couple of years. Now, I will dig a bit deeper by analyzing the individual changes in spin rate across seasons. The event of an individual pitcher improving their average spin rate from 2300 RPM to 2500 RPM across seasons, for example, is a far more likely suspect of cheating than a pitcher who simply has a consistently high average spin rate of 2550 RPM. I will create a violin plot to visualize these changes between seasons.

The most noticeable element of this plot is the larger spread in the distributions of 2019–2020 and 2020–2021 pitcher spin rate changes. Many pitchers experienced large increases in their average spin rate across seasons, but many pitchers also experienced significant decreases (perhaps some players stop cheating across seasons?). In fact, the average pitcher on a median basis had their spin rate go slightly down in both 2020 and 2021! However, the increased number of pitchers who upped their spin rate by 100 RPM+ are skewing the distributions. Now, let’s look at which players improved their spin rates by 200 RPM+ across seasons, as Trevor Bauer claimed that 90 MPH pitches could increase this much with the usage of pine tar.

Many players on this list increased their spin rate by 200 RPM+ in 2020, though none have achieved this (yet) in 2021. Trevor Bauer himself once again appears, increasing the spin rate of his fastball by an unbelievable figure of close to 375 RPM between the 2019 and 2020 seasons. Gerrit Cole, who Bauer originally accused, appears on this list between 2017 and 2018. Cole went on to be one to be a consistent top pitcher in the American League from 2018 through today, and Bauer went on to win a Cy Young award in 2020 and remains a dominant force on the Los Angeles Dodgers. However, does every pitcher see an improved performance come with an increase in their spin rate?

# The Effect of Spin Rate Change on ERA

Bauer and Cole may have been able to drastically improve their performance with an increase in their spin rate, but these players are two individuals who are already extremely talented. I will use a linear regression model to determine if both spin rate change and the transition in seasons improve a pitcher’s ERA. More specifically, my dependent variable will be the difference between a pitcher’s ERA in the previous year and their ERA in the current year. I am including pitchers who have thrown at least 100 fastballs in two consecutive given seasons.

A negative relationship between the two variables is visible, meaning that an increase in spin rate results in a decrease in the number of runs allowed, on average. The Pearson R coefficient is -0.11, pointing to a slight negative relationship. Now, the linear regression model, which also includes the categorical year values.

As shown by the P>|t| column, there is a statistically significant negative relationship between the change in spin rate and the change in ERA, though the year doesn’t seem to make a difference. For every 100 RPM increase in spin rate change across seasons, the pitcher’s ERA goes down by 0.36 on average across seasons.

# Conclusions

- The spin rate of fastballs in the MLB has gone up consistently throughout the past several years, and foreign substance usage is a likely explanation.
- The distribution of fastball spin rate has been more “non-normal” during the past couple of years with some pitchers throwing a whopping 3000 RPM.
- The distribution of individuals players’ spin rate difference across consecutive seasons has been very spread out, with many pitchers experiencing a large increase in their spin rate while others experience a large decrease in their spin rate.
- On average, an increase in spin rate across seasons improves the pitcher’s ERA, but not to the same degree as Trevor Bauer and Gerrit Cole, the well-known examples who both became elite players with their drastic increases in spin rate.

In a future, more in-depth analysis, I may look for additional covariates with the spin rate that have a positive impact on the pitcher’s ERA. Perhaps Bauer and Cole were more likely to improve their performances further because their statistics were already above league-average? It will be interesting to see how this story unfolds as the MLB raises its awareness of this scandal.

The code used for my analysis can be found here: https://github.com/etsc9287/Spin-Rate-Project. I will create more baseball analyses such as these, so stay tuned!