Creativity

Innovation

Originality

Imagination

 

Salient

Salient is an excellent design with a fresh approach for the ever-changing Web. Integrated with Gantry 5, it is infinitely customizable, incredibly powerful, and remarkably simple.

Download
Sunday, May 05, 2024

This is part of part of the Z-Book, an ongoing compilation of new and refreshed pieces. It's part of the 2020 Mastersball Platinum subscription, available for just $39.95, featuring the industry's earliest and most comprehensive set of player projections.

 

Statcast data and some very clever colleagues have fueled several improvements in the manner Mastersball projects batters and pitchers. Here is the detailed description of the hitting process, coined The Hitting Zystem.

1. Convert major league performance to neutral

Aging curves and park factors are applied. Aging is applied to all skills while the following are subject to park factors:

• Hits*
• Doubles*
• Triples*
• Homers*
• Strikeouts
• Walks
• RBI
• Runs

*Distinguished for left-handed, right-handed and switch hitters

Neutralized singles are neutralized hits minus neutralized doubles, triples and homers.

The stats for a specific season are neutralized via the indices for that season. This is opposed to using a three-year average (which will be incorporated later). I’ve run the numbers both ways, using the associated season and the current three-year average and prefer the former. However, there are occasional instances where a player changes teams and there’s an extreme in park factors on one of the two clubs skewing translations where some personal massaging is necessary. When the player is on the same team all three years, the extreme factor is baked into the three-year average and everything comes out in the wash.

The conventional method for park factors is taking the specific factor and averaging it with 100, then adjusting. The reason is it’s assumed a batter splits his playing time evenly home and on the road with the further presumption the aggregate away factors are 100. While the former isn’t always the case, there’s no way to predict the home and away plate appearances. However, something can be done about the aggregate away parks not always being 100. I employ composite park factors, a weighted average of every park on that team’s schedule. The composite factor doesn’t need to be averaged with 100 since home and away are already baked in. Look for a separate essay reviewing composite park factors.
All other stats (hit by pitch, sacrifice, sacrifice fly, grounded into double play) are carried through without adjustment.

2. Convert minor league performance to neutral

Way back in 1985 Baseball Abstract, Bill James first introduced the concept of the major league equivalencies (MLE). In short, an MLE translates the minor league numbers to how the player would have performed in the major leagues, exhibiting the same skill level. They’re adjusted for park, overall hitting environment and quality of competition. The skeletal process has remained the same for almost 40 years, but the conversions have been refined as the available data has improved. MLEs have good predictive value hence serve as a surrogate for MLB performance. They’re not perfect as there’s selection bias with respect to the players being good enough to make the Show, but they’re sufficiently accurate relative to the general reliability of projections.

MLEs are only trustworthy for Double-A and Triple-A, so every batter playing at those levels has his numbers translated. If he appears at both levels or is traded, the MLE for each individual team is determined.

A recent trend is for some top prospects being promoted to the major league level with just a little action in Double-A. It’s usually outstanding in nature, lest they wouldn’t have been advanced. Even with the MLE tempering the translation, these cases result in an impractical MLB projection, so I’ll sum up the numbers at the lower levels and apply the average MLEs from all the Double-A teams. It’s not perfect but its better than using a hyperbolic translation.

Foreign players are also subject to MLEs. It’s not exact, but I’ll apply the average Triple-A MLEs to those coming from Japan (Nippon Professional Baseball) while those coming from Cuba and South Korea (Korea Baseball Organization) get adjusted using Double-A MLEs.

3. Regress homers

Major league stats and MLEs are treated differently since the necessary data isn’t as accessible for the minor leagues. The described treatment is for MLB numbers.

Hat tip to Fangraphs Mike Podhorzer for this research. Home runs correlate quite well with average fly ball distance. This will be described in detail in a future site piece, but the correlation is stronger than average home run distance. It’s also stronger than adjusting both average fly ball distance and average home run distance by park factors.

The process involves using the results from qualified players and determining an expected home run level based on average fly ball distance. The players actual homers are regressed to the expected amount using regression levels discussed previously. The starting point is 50%, which is just an average of expected and actual. This is the number carried through the rest of the process.

4. Neutralized stats are summed using a weighted average

Different sources use varying number of seasons and assorted weighting. After some back-testing, it’s been decided to use three years’ worth of data, weighted 11:7:4 with 11 most recent. Each normalized stat is multiplied by the associated coefficient then the yearly results are summed and divided by the total of the applicable coefficients. For example, if the player was active all three seasons, the denominator is 22.
The weighted average is also carried out on plate appearances as well as catcher interference. The reason for the latter is the number of projected at bats is plate appearances less the sum of walks, hit by pitch, sacrifice flies, sacrifice bunts and catcher interference. It doesn’t affect many hitters, and even them most are just a couple at bats, but it helps with bookkeeping and logical checks to make sure everything is coded correctly.

5. Determine adjusted hits

The nHits determined from Step 4 need to be regressed using xBABIP (expected batting average on balls in play). This is adjusted hits.
I deployed a new procedure for xBABIP this season. Previously, I determined it by breaking batted balls into grounders, infield line drives, outfield line drives, pop-ups, bunts and fly balls. Further, each was classified into hard, medium and soft hit. The expected hits for each was calculated using the league BABIP for the respective components. The primary shortcoming is this method didn’t account for the player’s speed, so the BABIP on grounders for some is better or worse than league average.

The new method uses Statcast’s xBA (expected batting average). This is determined by comparing the exit velocity, launch angle and runner’s sprint speed of a specific batted ball to the outcome of all similar batted balls. The result is a probability of being a hit. By means of example, if a specific batted ball was deemed 79% likely to be a hit, the batter is credited with .79 hits, yielding expected hits. The caveat here is xBA is not park corrected. This is fine for the ensuing analysis since we’re working with neutralized numbers, but there are a lot of folks incorrectly using xBA (and xwOBA) as a means to identify lucky or unluck players simply by looking at the difference between xBA (or xwOBA) and they actual number. The expected number needs to be park-corrected before the comparison is made. The reason is the Statcast data is lumped together with all parks included in the comparison. A simplified example could be a certain batted ball would be a homer in Yankee Stadium 100% of the time and an out in Oracle Park 100% of the time. If these were the only two venues, and an equal number of this type of batted ball occurred in each park, the hit probability is 50%. Obviously, this is an oversimplification of what occurs within all 30 ballparks. I’m getting off on a tangent here, but a Colorado hitters wOBA should be higher than his xwOBA with a San Francisco batter should sport a wOBA lower than expected. The difference does not render the Rockies guy lucky and the Giants dude unlucky.

Getting back on point, knowing xBA, xBABIP can be extrapolated by determining the number of xHits and plugging it into the standard BABIP formula. The xBABIP for each season is determined and carried through Step 4 as its own entity.

Neutralized BABIP (nBABIP) is determined from the standard BABIP formula, using the neutralized stats from Step 4.

To get the target BABIP (tBABIP), nBABIP is regressed to xBABIP using a regression lever with the default set to 50%.

The adjusted hits are determined using the tBABIP as follows:

Adjusted hits = tBABIP x (nAB – nHR – nK + nSF) + nHR

Singles can be derived from adjusted hits by subtracting extra base hits. This isn’t perfect, it assumes all lucky/unlucky hits are singles which obviously isn’t the case, but the majority are so it serves as a viable proxy.

6. Determine RBI and Runs Index

I’ve developed an xRBI and xRuns formula using the same principle as wOBA. All the factors contributing to an RBI (single, double, triple, homer, sacrifice fly) and run (single, double, triple, homer, hit by pitch, stolen base, caught stealing) are assigned a corresponding coefficient relative in accordance with the run-scoring matrix. These coefficients use aggregate stats.

Each player has a park-adjusted number of RBI and runs calculated from Step 4, nRBI and nRuns. The respective indices are xRBI/nRBI and xRuns/nRBI.

Players on better teams often possess indices greater than one. Place in the batting order also influence the indices. Leadoff and two-hole hitters often have a run index greater than one and an RBI index less than one. Batters hitting in the meat of the order probably sport an RBI index above one.

7. Determine Stolen Base Opportunity and Success Rate

Unadjusted stats at the MLB level are used to compute the SBO (stolen base opportunity) and success rate (SB%). An MLE is applied to SB% for minor leaguers. The formula for SBO is

(singles + walks + hit by pitch)/(stolen bases + caught stealing)

8. Convert Neutral stats to Projected Stats

Some stats will be grouped together since the same operation is conducted on each. The action applied to every individual stat is multiplying the neutralized stat by the projected plate appearances then dividing by the neutralized plate appearances.

Playing time in general is worthy of its own discussion and will be reviewed in upcoming essays.

Aging is also universally applied to everything.

All results are rounded off to an integer

A. Park-corrected stats

Hits, double, triples, homers, walks and strikeouts are park-adjusted using the three-year composite average.

B. Non-park corrected stats

Hit by pitch, sacrifice, sacrifice fly, catcher interference and grounded into double play are projected without park adjustment.

C. RBI and Runs

The number of RBI and runs are projected using the aforementioned formulas then multiplied by their respective indices from Step 6.

D. Stolen bases

The number of attempts is calculated by summing projected singles, walks and HBP and multiplying by SBO from Step 7. The successful steals take that number and multiply by SB%. Caught stealing are attempts less successful tries, rounded to an integer.

E. Batting Average and Slugging Percentage

At bats are plate appearances minus the sum of walks, HBP and catcher interference.

F. On base percentage

The denominator in OBP isn’t projected PA since those include catcher interference and sacrifices. To get OBP, those need to be subtracted from projected plate appearances.

There you have it friends, the Mastersball Hitting Zystem. It's a lot to digest, so please feel free to pose questions, comments and criticisms on the message forum.

Todd Zola is the Primary Owner and Lead Content Provider for Mastersball. He’s the defending Great Fantasy Baseball Invitational champion, besting 314 of the industry’s finest. Todd is a former Tout Wars and LABR champion as well as a multi-time NFBC league winner.