Swing length has many root causes
The nerds are really diving into the new bat speed and swing length data, quickly building an understanding of what it means and what it can be used for. One important observation, which I first saw from Robert Orr, is that swing length is influenced by the contact point of a swing. Pitches contacted out in front of home plate require longer swings than pitches contacted behind home plate, and shorter swing are required to go the opposite way. Hitters vary in their pull/oppo tendencies, and thus the spray angle preference of a hitter is going to *cause* their longer or shorter swings, not the other way around.
A popular example is Isaac Paredes, who pulls lots of fly balls for home runs. He averages long swings *because* he pulls the ball. If he were to take a pitch the other way more often, his swings would necessarily shorten. There is likely a component of swing length that is due to a hitter’s mechanics, but the raw swing length metric from Baseball Savant is presenting a jumble of those hitter qualities combined with spray angle. ~10% of swing length being a function of spray angle is nothing to sneeze at:
Similarly, pitches lower and more outside tend to require longer swings, again in order to meet the ball where it’s pitched. From Kyle Bland:
When you look at a leader board of swing length, it’s telling you something about a hitter’s swing, but it’s also telling you about other aspects of batted balls, including contact point. Wouldn’t it be great to normalize hitters’ swings, so you could separate out the hitter-specific info from the rest? In other words, given a bunch of qualities of a pitch, how long was a hitter’s swing compared to how the average hitter would have swung? These normalized swing lengths could then be used as a less conflated input into other analyses. Let’s go!
Modeling expected swing length
Based on a great article by Jonathan Judge about catcher framing from 2014 (holy crap, 2014) I’m going to leverage mixed model linear regression to find the relationships between a bunch of inputs and swing length, then remove the impact of those other factors to arrive at an estimate of how much longer or shorter a swing was than you would have expected given the inputs. You could interpret this as a proxy for how far in front or behind a pitch was contacted. And the glory of a mixed model is that it will tell us about differences in hitter swings along the way.
lmer(swing_length ~ plate_x_adj + plate_z + bat_speed + launch_angle + spray_angle + (1|batter) + (1|pitch_type), data=bbes)
Here’s an explanation of the inputs and the resulting impact of each factor on swing length:
plate_x_adj and plate_y are the horizontal and vertical location of the pitch in the strike zone (normalized for batter handedness.) Each horizontal foot further inside adds .42 feet to the swing length. Each vertical foot downward adds .49 feet to the swing length. Pretty meaningful!
bat_speed, while a function of the hitter, can vary pitch by pitch. Each 5 mph difference lengthens the swing by .25 feet.
launch_angle —Each 25 degrees of higher launch angle adds .17 feet. My hunch is that this is due to higher launch angles being caused by contact underneath the ball and, as discussed above, lower swings have to be longer.
spray_angle is based on the batted ball location (and normalized for handedness: -45 degrees is pulled down the line, +45 degrees is down the opposite line.) Ideally you’d use the angle immediately off the bat to remove influences of spin and wind, but this is the best we have. Each 10 degrees towards the pull side adds .07 feet. 45 degrees has an impact of about .30 feet. (Note that a model that uses only spray_angle and hitter as inputs shows the same impact of spray angle.)
1|batter is a random effect for the specific hitter. More on hitters below.
1|pitch_type is a random effect for the pitch type. More on pitch types below.
Note that the correlation of *just* spray angle and batter to swing length is .92, while this full model is .87, so these other factors do make a difference. (On the aggregated hitter level.)
Here are some other factors I tried that didn’t seem to have a meaningful impact (only dropped the correlation to 0.86 with everything included):
two_strikes is a flag for two-strike counts. Apparently hitters shorten their swings by 1 inch in two strikes, all else being equal. No, that’s not much. Others have found signals of swing changes in two-strike counts, but they are likely tied more to swinging slower, not shorter. (And that combination seems like a bad idea.)
release_speed is the initial pitch velocity. Little impact beyond what spray angle explains. Worth .05 feet per 10 mph.
plat_adv is a flag for the hitter having the platoon advantage. Worth only .04 feet.
launch_speed is exit velocity. This didn’t have a large impact, and is likely redundant due to pitch speed and launch angle. Worth .05 feet per 10 mph (matches pitch speed, coincidence?)
I also tried something I called shoulder_distance, which is the distance of the point of contact from the hitter’s shoulders, because hitters reach for the ball from their shoulders. It’s a two-dimensional combination of plate_x_adj and plate_y. (Hello, Pythagorean Theorem.) I tried both a basic model assuming the shoulders were located at the top of the average strike zone and a foot inside the edge of home plate. I also tried a version that moved the shoulders depending on the height of the batter, using their unique top of strike zone and a percentage of the difference between their zone and the average zone added to the horizontal distance from the plate. It didn’t improve anything, pulling much of the descriptive power from the plate_x_adj variable and a little from plate_z. Maybe it would become more explanatory if the location estimate were more exact? Dunno, but if it had gone anywhere, I definitely would have called it Armpit+.
Hitter Results
Ok, so now that we can adjust individual swing lengths to account for all the factors listed above, one main thing that remains is a hitter’s individual swing style. Here are the adjusted longest and shortest swings (relative to average swing length):
Perhaps more interestingly, here are the hitters whose swing lengths change the most when looking at the model instead of their actual swing lengths. These are likely the hitters who are pitched a certain way, or have a preference (or anti-preference) to pull the ball. Removing all that noise (some of which is under the control of the hitter) should better isolate personal swing path length:
Some of the biggest shifts are from hitters already at the extremes, who are regressed towards the mean. But many others end up near an average swing or even cross that threshold. Cole Tucker (longer) and Oneil Cruz (shorter) are two big name examples.
Data for all hitters with at least 25 BBEs as of May 20th can be found here.
Pitch Type Results
The second effect that remains is pitch type. There are many fewer pitch types than hitters, so we can view them all in one chart:
In general, offspeed pitches garner longer swings, perhaps because hitters are adjusting after being fooled, and also have the time to adjust? (The top four pitches above are splitters, traditional sliders, changeups, and sweepers.) Fastballs, especially sinkers, elicit shorter swings. The impact isn’t large, under .20 feet difference between the extremes. The model already accounts for pitch location, so these effects go beyond the fact that non-sinker fastballs are more often high in the zone and offspeed stuff is more often located down and/or away. (A full list of Savant pitch type codes can be found here.)
What’s next?
I’m a modeling green belt. I’m curious what the black belts would do with this, or what they would yell at me about.
Is there a better way to approximate the armpit location? Is there any value there?
You could aggregate by some sort of pitch quality, such as pitch type or location, and see which hitters have longer/shorter swings on that subset.
For what purposes could you use normalized swing length as an input for other things?
If you interpret this as a proxy for contact point, how could that that be used for other things?
The main questions here were “how far out in front/behind home plate was contact made?” and “what is a hitter’s normalized swing length?” There are probably other questions you could ask, where a model like this is useful—what are they? (The inputs might change.)
Feedback and questions are, as always, appreciated! Leave a comment or find me on Twitter.
Nice article! I did chuckle when you referred to Cole Tucker as a big name lol