Converting Stuff+ to ERA and wOBA

Making sense of the Stuff+ scale

May 22, 2024

Stuff+ is taking over, or at least that’s my perspective as someone who’s actively seeking it out. A bit of a tautology there, I realize. But Stuff+ is powerful insight into aspects of pitching that have historically been in the realm of scouting.

How fast is a four-seamer at the plate? How much does it rise? How much does a slider break? How well does it play off the pitcher’s fastball?

Those are all pitch qualities measured by Stuff+. (If instead you consider pitch location and ignore the Stuff+ qualities, you’ve got Location+. And if you consider all of it, you’ve got Pitching+.)

I’ve recently started seeing reporting of Stuff+ faced by hitters, which is pretty cool. Eno Sarris wrote a whole article about it, and Thomas Nestico has been sharing snippets on Twitter. Here are the five hitters who have faced the toughest and easiest pitches, as judged by Nestico’s Stuff+ metric (tjStuff+), as of May 15th:

That’s fun! Victor Robles has a valid reason to stuff the complaint box. Tucker Barnhart is having his best year since 2021, but is he just feasting on easy pitches? What does his 96.8 Stuff+ faced actually mean? Wouldn’t it be nice to translate Stuff+ onto a scale we’re more familiar with, like wOBA? I think so. Let’s do that!

What the Stuff+ scale means

Many Stuff+ scores are put on a scale where 100 means average and higher is better (for pitchers). But the calculations begin with run values. Based on the outcome of a pitch, whether it’s a strike or ball (smaller changes) or a home run or strikeout (larger changes), we know the value of the pitch as measured in runs. Statistical models can then take that run value and compare it to the qualities of every pitch by every pitcher, finding the qualities that tend to result in good results (e.g. higher velocity, painting the corners.)

Using that league-wide model, all of an individual pitcher’s pitches are given run value estimates and those run values are combined—across pitch types or across the whole repertoire—and converted to a rate stat: run value per 100 pitches. Then the run values are often converted to the 100 scale in order to make things easier to interpret. The standard deviation of the xRuns/100 distribution is also usually set to be a convenient number, such as 10 or 50.

For tjStuff+, 100 represents the average pitch thrown and 10 represents one standard deviation, which is 0.8 runs better than average per 100 pitches. We can thus convert the Stuff+ score to runs by undoing all the work Thomas did to give us the Stuff+ score in the first place: (Stuff+ - 100) / 10 * 0.8 * num_pitches / 100. For Robles, that becomes (104.3 - 100) / 10 * 0.8 * 113 / 100 = 0.39 runs.

The next step is to calculate runs per plate appearance and convert to points of wOBA. For Robles that’s 0.39 / 23 * 1.278 = 0.022. The 1.278 number is a conversion value that changes based on the run scoring environment and can be found on the Fangraphs Guts page. (I don’t know if the conversion factor should change based on team and ballpark, but it does seem like something where that would be the case in the perfect world.)

An Oral History of 'Nickelodeon GUTS' - The Ringer — Guts

In other words, instead of a 0.241 wOBA, Robles deserves more like a 0.263 wOBA. Is that a lot? I’d say it’s enough to matter, but not enough to scream from the rooftops that wOBA is useless, especially considering Robles is the most extreme example. (Note that Baseball Prospectus’s DRA and DRC metrics do adjust for quality of opponents faced, which is both cool and uncommon in the nerdosphere.)

However… tjStuff+ is self-professed to be more conservative than other models. In addition, since Stuff+ does not include pitch location, looking at Pitching+ instead might find some hitters who have been double unlucky to have faced both high quality stuff thrown in high quality locations. I wouldn’t be surprised if other models show the unluckiest hitter to have more like a 40-50 point difference in wOBA, although that anti-luck pace is very unlikely to continue for a full season. (Everyone gets to hit against the Angels at some point, after all.)

Note: facing an average of a 103 Stuff+ should NOT be interpreted as being 3% harder than average. Because the values of 100 and 10 are arbitrarily chosen, they could just as well have been 1000 and 10 (and thus would appear only 0.3% harder than average) or 10 and 5 (and thus the claim would be 15% harder than average.) You really need to talk about these things in terms of standard deviations or run values.

The impact on teams

If you expand the impact of Stuff+ faced to entire teams, you observe a more significant impact. Again from Thomas, here are the Stuff+ values faced by teams through May 12th:

Using most of the formulas from above, we can get to the point where the Rays have lost 13.7 runs through 44 games. Pro-rating that to a 162 game season and dividing by 9.55 runs per win, you get 5.6 wins. Holy crap! Instead of being on an 81 win pace, they’re really an 86-pace team. Add in some potential improvement and farm system call-ups, and Rays fans should be optimistic about the rest of the season.

On the other hand, the Rockies offense is probably overrated to a tune of 3.1 wins. Of course, the Rockies are a special case where they likely will always face worse Stuff because of what Coors does to opposing pitchers, but that also bites their own pitchers, making them appear worse than they actually are.

How to convert Stuff+ to ERA for pitchers

You can also use a Stuff+ conversion to help understand the impact on pitchers’ ERAs. Why would you want to do this? Some examples…

If a pitcher improves their slider from a 90 Stuff+ to a 110 and throws 20 sliders per start, how much better will their ERA be?
Or if a 90-grade pitch is replaced by a new 110-grade pitch?
Or a 90-grade pitch is thrown less and a 110-grade pitch is thrown more?

These all imply the same improvement. Start by converting to runs, just as we did with hitters: (110-90) / 10 * 0.8 * 20% = 0.32 runs per 20 improved sliders per 100 pitch game. Let’s assume 100 pitches gets a starter through 6 innings. (That’s not an awful assumption as there tend to be just under 150 pitches per team per game, although you could certainly use more exact numbers.) 0.32 runs per six innings is 0.48 runs per nine, or an ERA improvement from, say, 4.00 to 3.52. Pretty good!

Other Stuff+ models

As previously mentioned, there are many Stuff+ models available publicly. Many include Stuff, Location and Pitching. Some only provide a subset. Each seems to use a unique combination of average and standard deviation. Here’s my attempt to summarize the info you’d need to perform your own transformations between Stuff+ metrics and common hitting or pitching metrics. If you notice any errors, or think I missed a public model, please let me know.

Table updated 6/7/24 with three additional models.

(PLV and PitchPro don’t bother with scaling their metrics using standard deviations, although they obviously exist. They just get right to the point and use run values as their scale, thus 1 unit = 1 run.)

Extra notes

PitcherProfile is the only model I’ve found that gives Stuff+ numbers for all pitch types ignoring the relationship with the pitcher’s primary fastball. I like this version for answering questions about a pitch ignorant of a pitcher’s repertoire. It aligns less with actual results, but is better in a vacuum. They are also the only site showing Stuff+ by platoon split. This is useful for answering questions like, “what would the Stuff+ of this slider be if it stopped being thrown to lefties” or “how good is this pitch against lefties?”

PitchPro is the only one of these models where lower is better (which makes intuitive sense, because better Stuff results in fewer runs allowed.) BPro does not have a Loc+ metric, but you can sort of imply it. They do not aggregate across pitch types for a pitcher, although you can download the leaderboard and do that yourself. Here’s some more precise info on PitchPro StDevs. I like how they show a few descriptive metrics in addition to run values:

Swing Rate is how often a hitter is expected to swing at that pitch
Whiff Rate is how often a hitter is expected to whiff when swinging at that pitch
RV/100BIP is the expected run value of pitches put into play (lower is better)

Fangraphs has both enoStuff and botStuff, which is nice. You can add a custom table on player pages above the dashboard to show numbers from either model. (You have to be signed in, and maybe a member?)

tjStuff is a great Twitter follow, sharing multiple pitcher scorecards every day as starters finish their outing, a recap of the best Stuff from the previous day, and other ad hoc observations. The Patreon gets you a bunch of interactive tools.

I’m not very familiar with Drummey’s Stuff model, but it includes an arsenal feature, measuring the interact impact of individual pitches.

PLV is analogous to Pitching+ (not Stuff+), although they have a Stuff+ metric, too. Kyle puts out tons of great content at PitcherList, some of it behind the paywall—apps, leaderboards, intense player pages, analysis, projections, etc. Also an excellent Twitter follow.

cStuff+ is open source! You could use it as a starting point for your own model, or, as I’d like to do, just use it as-is as input for other studies.

Orr’s Pitch Quality is a two-part metric, non-BIP are graded on run values, while BIPs are graded on outcome probabilities, thus making it tough to put the whole thing on the runs scale. If you made assumptions that it’s similar to other model per SD, you wouldn’t likely be super wrong. On a pitch-level basis, the standard deviation is set to 50, but that gets squooshed (technical term) when aggregating by pitcher across seasons.

Asking Questions

Discussion about this post

Ready for more?