The Mastersball Projection Process: Hitters

I don’t usually say things like this, but I doubt there’s anyone in the industry with a more thorough understanding of projection and valuation methodology than your normally humble author. That said, the deeper I understand the concepts, the more I realize it’s what you do with them that matters. It’s an oversimplification, but projections beget rankings, which spawn a cheat sheet. However, drafting isn’t taking one from the top. Among other things, having an appreciation where a projection and ranking emanate help direct your draft pick or auction bid.

The Mastersball projection and valuation methodologies have always been available to Platinum subscribers. Since both need a refresh, and the site recently underwent a facelift of its own, I decided it was time to take these processes out from behind the firewall. Yes, it’s thinly veiled attempt at recruiting more subscribers, but I think there’s probably some interest, regardless, in these topics.

The rest of the discussion will detail how I generate hitting projections. A follow-up piece will describe the process for pitchers. As always, I’ll be happy to address questions in the comments or on the newly revamped message forums.

In a nutshell, past performance is distilled to a neutral per plate appearance basis. Regression, aging and team context considerations are applied, then the projection is generated by multiplying by projected plate appearances.

Let’s start with past performance. I’ve looked at how many years provide the best baseline and have landed on the same number many others employ: three seasons. Others use more, I doubt any use fewer. Within this three-year spread, most recent performance carries the greatest weight with the oldest contributing the least. Again, nothing ground-breaking here, though we all may deploy different weighted averages. With the recent power surge, I looked at my current weightings to see if they’re still optimal and ultimately opted to leave things status quo, with the option of overriding when warranted. In fact, the older I get, the more comfortable I am massaging here and there without the threat of being struck by lightning.

Most of you are likely aware of Major League Equivalencies (MLE), used to project players spending time in the minors within the three-year span. For those not familiar, an MLE takes the prospect’s numbers and translates them to what they would have been in the majors. They’re adjusted for park, age and league environment. Unless forced due to no other data, only Double and Triple A numbers are included. MLEs aren’t perfect, but they’re useful by adding another data point. I use a pseudo-MLE for players coming from Japan, Korea, Cuba, etc.

An area I’m sure I differ from everyone else is my procedure for applying park factors. The means of determining park effects goes beyond the scope of this discussion. Platinum subscribers can access a lengthy essay on the subject in the Z Book. In short, the calculation is designed to flesh out all team factors and biases, strictly measuring the venue. Since this is impossible, a three-year average is used when taking the neutral projection and accounting for home venue.

Park factors for batters are computed for left-handed and right-handed swingers. For switch hitters, the factor is a weighted average based on the average number of times a switch hitter bats from each side of the plate. Currently, switch-hitters hit left-handed 73 percent of the time.

The conventional manner to apply factors is as follows. Let’s say the park-neutral homer projection is 20 and a hitter’s home park has a factor of 120. This means that venue increases a neutral projection by 20 percent. However, since the hitter only plays half his games at home, he realizes only half of that, or ten percent, yielding an actual projection of 22 homers.

Intrinsic to this is assuming the sum of the road venues is neutral. Here’s where my treatment differs as I utilize composite park factors. What I do is use a weighted average for all 162 games. The result is close to the normal method, with sufficient differences to make the effort worthwhile.

The first step in the neutralization process is taking the actual production and fleshing out the park factors specific to that season (not the three-year average). Most are familiar with factors for homers, hits and runs. They exist for everything. I apply park effects to doubles, triples, walks and strikeouts as well as the others. Players on multiple teams in a season have the numbers adjusted per that factor. This can cause issues since the away games don’t match the composite factor, but the same error is present with the usual method, since the smaller sample of games could skew the away parks away from neutral.

At this point, I have the park-neutral projection for all major leaguers as well as the MLE for all minor leaguers. All the yearly components are summed to neutral performance for that season and the weighted average applied to each, culminating in single stat line. The number of plate appearances for each season are carried through the calculation. Here’s an example, using homers, and my 11:7:4 weights.

2017: 20 HR is 500 PA

2016: 16 HR is 480 PA

2015: 12 HR in 400 PA

HR: ((20 x 11) + (16 x 7) + (12 x 4)) / (11+7+4) = 17.2

PA: ((500 x 11) + (480 x 7) + (400 x 4)) / (11+7+4) = 475.5

So, the neutral projection for this player is 17.2 HR in 475.5 PA. This is done for all the stats involved in the projection.

Now it’s time to go through the individual stat projections. The final projection is a lot more than just taking the above, applying an aging and park factor.

Home runs

There are two components to a home run projection: home run per fly ball (HR/FB) and fly ball percent (FB%). A hitter develops his own baseline for each. There’s a skill and luck aspect to each, though there’s thought to be more luck associated with HR/FB.

There’s a lot out there with respect to lucky versus deserved homers. Some studies look at fly ball distance. Others use park overlays or scrape Statcast data and use exit velocity and launch angle to derive an expected home run calculation. While I’m aware of the current research landscape, I don’t presently incorporate an algorithm to adjust for lucky homers. I’ll note players others have recognized as outliers and put them under the microscope, adjusting on an individual basis.

The subset I struggle with most are those known to have added loft to their swing. This is a recent occurrence, so the conundrum is whether last season, or possibly 2016 as well should carry even more weight than before the adjustment was made. One of the elegant aspects of the three-year weighted average is this happens organically. That is, the chance the player can’t sustain the change is accounted for. There are a handful of batters I decided to buy into the notion of a conscious elevated launch angle and those projections reflect that. It’s my expectation this will soon be handled with a refined expected home run algorithm, once more data is available. That said, the likelihood of the juiced ball is skewing things too. It isn’t skewing the calculations, but if the ball doesn’t jump off the bat as much, the hitter’s exit velocity could drop from what’s expected based on previous season’s measurements.

Hits (Part One)

You’re no doubt familiar with batting average on balls in play (BABIP). A player has some control over how hard they hit the ball (now called exit velocity) and the hit type (now called launch angle), but there’s little, if any control exactly where it goes and whether there’s a fielder in range to make a play.

Hit types can be parsed into several classifications. I have data for groundballs, fly balls and line drives, all hit hard, medium and soft along with bunts and pop-ups. That’s eleven classifications. I determine a global BABIP for each, then based on each hitter’s distribution, calculate an expected BABIP (xBABIP). This is done for the same three-year spread of data fueling the projection.

There are several reasons a player’s BABIP differs from their xBABIP. The notion of luck (good and bad) was just suggested. Speedy players can beat out grounders at a clip higher then league average while plodding hitter get fewer infield hits. The shift is also influencing BABIP.

Since I don’t know the exact reasons for a hitter’s delta between BABIP and xBABIP, I regress BABIP towards xBABIP, with .5 the initial setting. This amounts to an average of the two. It’s not perfect, there are some park factors that can’t be fleshed out since that requires a ton of work at the granular level, but I’ve found this treatment to be effective. Of course, I reserve the right to override the regression and force the projected BABIP to be more like the player’s baseline when apropos, usually for speedsters.

The equation for BABIP is

BABIP = (Hits - HR) / (At Bats – HR – K + Sac Fly)

Solving for Hits:

Hits = (AB – HR – K + SF) x BABIP + HR

So, to get the adjusted park neutral hits, a little more information is necessary. Let’s put this section on hold.

At Bats

I project plate appearances then determine at bats by subtracting out the non-AB components. Specifically,

PA = AB + BB + HBP + SF

AB = PA – BB – HBP – SF

There’s a component missing and that’s catcher’s interference (CI). Jacoby Ellsbury’s record-setting numbers aside, there’s not enough instances to worry about. To wit, in 2017, there 43 CI calls, with Houston leading the way with nine. Ellsbury would approach that number by himself in his prime but even then, I didn’t make any adjustments.

Bases on Balls

As alluded to earlier, walks are influenced by park effects. Foul territory, batter’s eye and atmospheric conditions contribute to how a venue affects walks. The composite BB factor is applied to yield the park neutral walks.

Hit by Pitch

No park factor is applied so this is just the weighted average of the three years in question.

Sacrifice Fly

Ditto.

We now have all that’s necessary to compute the number of projected AB. There’s only one component of BABIP left.

Strikeouts

The same thinking applied to walks is germane here. Punch outs are influenced by venue, so the park neutral adjustment is made.

Hits (Part Two)

OK, so now we have everything needed for the hits formula:

Hits = (AB – HR – K + SF) x BABIP + HR

This is the park-neutral, regressed hits projection.

Doubles and Triples

As mentioned previously, there are park factors for doubles and triples. As such, I treat them every other projected stat and adjust using composite park factors. It’s not perfect, since the regressed BABIP influences doubles and triples. I don’t sweat triples since the number is usually the same after rounding off. I’ll inspect the projection for hitters swatting a lot of two-baggers to make sure the number passes the sniff test. To be honest, usually, BABIP and xBABIP are close enough so doubles are usually the same after rounding.

Let’s take a moment to review all the stats projected to this point, all park-neutral: Hits, 2B, 3B, HR, HBP and SF.

Still left are singles, stolen bases, caught stealing, runs and RBI. Singles are necessary to project the others so let’s attack that next.

Singles

The singles projection is hits minus extra base hits. To get there, we need to take a step back and explain how to go from the park-neutral projection to the final projection. If you recall, plate appearances were carried through the weighted average, just like all the stats. As such, each can be presented per PA. Using the HR example above, 26.2 HR in 475.5 PA equates to .055 HR/PA.

Sticking with HR but keeping in mind this is the process for all skills-based stats, an aging factor is applied. Let’s say the player’s age factor is 1.02. The .055 HR/PA is multiplied by 1.02, yielding .0561.

The next step is applying the corresponding composite park factors. Remember, to get the park-neutral adjustment, the factor for just that season was used. Here, a three-year average park factor is employed since it’s the best reflection of how the venue will play in the upcoming season. For the sake of this example, let’s say the appropriate composite HR factor for a hitter of this handedness is 106. We take .0561 and multiply it by 1.06, rendering .0595.

The final step is multiplying .0595 by the projected PA. I’ll discuss that process later, but for the sake of an example, let’s say I’m projecting 517 PA. So, 517 x .0595 = 30.74. I round all projections to the nearest integer, so our mythical hitter is projected for 31 long balls.

As stated, this is done for all the stats, so to get singles, all the projected extra base hits are subtracted from the projected hits.

Stolen Bases

Using the actual stats, not park-neutral, I calculate stolen base opportunities (SBO) as a percentage of attempts per times the hitter reached first base (1B + BB + HBP). I realize this omits steals of third, but it serves as an adequate proxy for this purpose. I also calculate the success rate (SB%). Both are carried through the three-year weighted average formula generating a player’s projected SBO and SB%.

You can probably figure out the rest. The number of opportunities based on the projected 1B, BB and HBP is determined and multiplied by SBO to yield the number of attempts. From there, the number of swipes is calculated using SB%.

Since stats are so team oriented, if a player switches teams, I may adjust SBO as I see fit. I’ll also make tweaks for a someone expected to hit in a different part of the order.

Runs and RBI

The process for runs and RBI are the same. What I do is determine a index for each, using a formula I derived for expected runs and another for expected RBI. These do not consider place in the order or team context; they’re essentially how many runs and RBI, on average, a player would generate with their performance.

I determine the xRuns and xRBI using the park adjusted stats for each season. I then adjust their actual runs and RBI using the corresponding composite runs park factor. The index is expected/adjusted. This is then carried through the three-year average formula like the other stats.

Next, I go to the final projection and calculate expected runs and RBI, then adjust by dividing by the appropriate index. The team context and batting order position is baked into the index, so if the player’s situation is similar, nothing needs to be done. However, if the player’s role has changed, he’s switched teams, or even if he stays on the same club but the lineup is better or worse, like the Miami Marlins this season, I can tweak the indices to bring the runs and RBI in line with the new scenario.

Occasionally, a player’s batting average with runners in scoring position (BAwRISP) is out of sync with his overall average. I don’t believe in the notion of a clutch player. A high BAwRISP is simply a cluster of hits with ducks on the pond. Globally, BAwRISP is a little higher than overall average, likely due to pitcher’s working from the stretch with RISP, hence their skills drop a bit. I run a scan of player’s BAwRISP, and if there’s an outlier which isn’t softened by the three-year average, I’ll adjust the RBI index.

We now have all we need for a hitter’s projection. All that’s left is detailing the process, and philosophy for generating plate appearance.

Plate Appearances

There was a time I was a stickler for projecting exactly as many plate appearances per teams as are likely available. I have since come to realize this is impractical. The primary reason being injuries are predictable to a degree, but there’ still so unpredictable time lost that has to go somewhere. I call this the Ty Wigginton theory. For years, we didn’t know where Wigginton would pick up his 400 PA, we just knew he would. I could either project Wigginton for the 200 PA I could account for, or project 400 and take 200 from elsewhere, even though I had no clue where it would eventually come from.

The key is this is fantasy baseball and my job is to best prepare you for your drafts and auction. Projecting Wigginton for 200 PA would eliminate you from draft him. In deeper formats, this is a useful player. As such, I wasn’t doing my job but taking you out of the running for Wigginton, and other like players. Now, my approach is simple being honest with each playing time appraisal, even if it means projecting extra plate appearances for a team’s outfield or any other position.

Additionally, with the advent of the deep draft and hold format such as the National Fantasy Baseball Championship Draft Champions competition, I need to have a lot more names out there for consideration. This format consists of 15 teams with 50 roster spots, or 750 roster spots. Coincidentally, this is exactly how many players break camp and are on opening day MLB rosters. The thing is, the NFNC DC drafts from a pool outside of this 750. As such, I need to project everyone with a plausible pathway to 2018 MLB playing time. Obviously, this entails over-projecting the expected PA for each MLB team.

However, through it all, I make sure the playing time for the draft-worthy hitters is as practical as possible. To facilitate this, I employ a nifty double-grid method.

The first grid assigns the percentage of playing time each hitter should get per position, including allowances for pinch hitting. The second allots the percentage of time expected in each spot in the batting order. The two are crosschecked to make sure everything is accounted for.

Since teams with more prolific offenses turn the order over more, the hitter projected for 90 percent of the leadoff PA for the highest scoring team should be more than the like batter on the lowest scoring club. As such, I use a three-year weighted average of each squad’s PA per spot in the batting order as the target. When necessary, I’ll override if the upcoming season’s outlook is significantly better or worse than recent seasons. Two examples for 2018 are raising the targets for the Angels while lowering them for the Marlins.

Well friends, we’ve finally reached the end. I’ve been as transparent as possible. Every season I look at the process and tweak where necessary. As discussed, I envision home run projections to be refined over the next several seasons, but am confident there are ample checks and balances currently in place to capture power-hitting outliers and adjust accordingly. All that’s left is remind you the pitching process will be next, followed by valuation theory. Well, that and reiterating I’m happy to address questions in the comments, or better yet, the newly revamped forums.{jcomments on}

Creativity

Innovation

Originality

Imagination

The Mastersball Projection Process: Hitters

Creativity

Innovation

Originality

Imagination

Salient

The Mastersball Projection Process: Hitters