Organized Chaos

Thursday, May 09, 2024

Details: Written by: Todd Zola; Category: Organized Chaos; Published: 14 December 2019

This is part of part of the Z-Book, an ongoing compilation of new and refreshed pieces. It's part of the 2020 Mastersball Platinum subscription, available for just $39.95, featuring the industry's earliest and most comprehensive set of player projections.

Statcast data and some very clever colleagues have fueled several improvements in the manner Mastersball projects batters and pitchers. Here is the detailed description of the hitting process, coined The Hitting Zystem.

1. Convert major league performance to neutral

Aging curves and park factors are applied. Aging is applied to all skills while the following are subject to park factors:

• Hits*
• Doubles*
• Triples*
• Homers*
• Strikeouts
• Walks
• RBI
• Runs

*Distinguished for left-handed, right-handed and switch hitters

Neutralized singles are neutralized hits minus neutralized doubles, triples and homers.

The stats for a specific season are neutralized via the indices for that season. This is opposed to using a three-year average (which will be incorporated later). I’ve run the numbers both ways, using the associated season and the current three-year average and prefer the former. However, there are occasional instances where a player changes teams and there’s an extreme in park factors on one of the two clubs skewing translations where some personal massaging is necessary. When the player is on the same team all three years, the extreme factor is baked into the three-year average and everything comes out in the wash.

The conventional method for park factors is taking the specific factor and averaging it with 100, then adjusting. The reason is it’s assumed a batter splits his playing time evenly home and on the road with the further presumption the aggregate away factors are 100. While the former isn’t always the case, there’s no way to predict the home and away plate appearances. However, something can be done about the aggregate away parks not always being 100. I employ composite park factors, a weighted average of every park on that team’s schedule. The composite factor doesn’t need to be averaged with 100 since home and away are already baked in. Look for a separate essay reviewing composite park factors.
All other stats (hit by pitch, sacrifice, sacrifice fly, grounded into double play) are carried through without adjustment.

2. Convert minor league performance to neutral

Way back in 1985 Baseball Abstract, Bill James first introduced the concept of the major league equivalencies (MLE). In short, an MLE translates the minor league numbers to how the player would have performed in the major leagues, exhibiting the same skill level. They’re adjusted for park, overall hitting environment and quality of competition. The skeletal process has remained the same for almost 40 years, but the conversions have been refined as the available data has improved. MLEs have good predictive value hence serve as a surrogate for MLB performance. They’re not perfect as there’s selection bias with respect to the players being good enough to make the Show, but they’re sufficiently accurate relative to the general reliability of projections.

MLEs are only trustworthy for Double-A and Triple-A, so every batter playing at those levels has his numbers translated. If he appears at both levels or is traded, the MLE for each individual team is determined.

A recent trend is for some top prospects being promoted to the major league level with just a little action in Double-A. It’s usually outstanding in nature, lest they wouldn’t have been advanced. Even with the MLE tempering the translation, these cases result in an impractical MLB projection, so I’ll sum up the numbers at the lower levels and apply the average MLEs from all the Double-A teams. It’s not perfect but its better than using a hyperbolic translation.

Foreign players are also subject to MLEs. It’s not exact, but I’ll apply the average Triple-A MLEs to those coming from Japan (Nippon Professional Baseball) while those coming from Cuba and South Korea (Korea Baseball Organization) get adjusted using Double-A MLEs.

3. Regress homers

Major league stats and MLEs are treated differently since the necessary data isn’t as accessible for the minor leagues. The described treatment is for MLB numbers.

Hat tip to Fangraphs Mike Podhorzer for this research. Home runs correlate quite well with average fly ball distance. This will be described in detail in a future site piece, but the correlation is stronger than average home run distance. It’s also stronger than adjusting both average fly ball distance and average home run distance by park factors.

The process involves using the results from qualified players and determining an expected home run level based on average fly ball distance. The players actual homers are regressed to the expected amount using regression levels discussed previously. The starting point is 50%, which is just an average of expected and actual. This is the number carried through the rest of the process.

4. Neutralized stats are summed using a weighted average

Different sources use varying number of seasons and assorted weighting. After some back-testing, it’s been decided to use three years’ worth of data, weighted 11:7:4 with 11 most recent. Each normalized stat is multiplied by the associated coefficient then the yearly results are summed and divided by the total of the applicable coefficients. For example, if the player was active all three seasons, the denominator is 22.
The weighted average is also carried out on plate appearances as well as catcher interference. The reason for the latter is the number of projected at bats is plate appearances less the sum of walks, hit by pitch, sacrifice flies, sacrifice bunts and catcher interference. It doesn’t affect many hitters, and even them most are just a couple at bats, but it helps with bookkeeping and logical checks to make sure everything is coded correctly.

5. Determine adjusted hits

The nHits determined from Step 4 need to be regressed using xBABIP (expected batting average on balls in play). This is adjusted hits.
I deployed a new procedure for xBABIP this season. Previously, I determined it by breaking batted balls into grounders, infield line drives, outfield line drives, pop-ups, bunts and fly balls. Further, each was classified into hard, medium and soft hit. The expected hits for each was calculated using the league BABIP for the respective components. The primary shortcoming is this method didn’t account for the player’s speed, so the BABIP on grounders for some is better or worse than league average.

The new method uses Statcast’s xBA (expected batting average). This is determined by comparing the exit velocity, launch angle and runner’s sprint speed of a specific batted ball to the outcome of all similar batted balls. The result is a probability of being a hit. By means of example, if a specific batted ball was deemed 79% likely to be a hit, the batter is credited with .79 hits, yielding expected hits. The caveat here is xBA is not park corrected. This is fine for the ensuing analysis since we’re working with neutralized numbers, but there are a lot of folks incorrectly using xBA (and xwOBA) as a means to identify lucky or unluck players simply by looking at the difference between xBA (or xwOBA) and they actual number. The expected number needs to be park-corrected before the comparison is made. The reason is the Statcast data is lumped together with all parks included in the comparison. A simplified example could be a certain batted ball would be a homer in Yankee Stadium 100% of the time and an out in Oracle Park 100% of the time. If these were the only two venues, and an equal number of this type of batted ball occurred in each park, the hit probability is 50%. Obviously, this is an oversimplification of what occurs within all 30 ballparks. I’m getting off on a tangent here, but a Colorado hitters wOBA should be higher than his xwOBA with a San Francisco batter should sport a wOBA lower than expected. The difference does not render the Rockies guy lucky and the Giants dude unlucky.

Getting back on point, knowing xBA, xBABIP can be extrapolated by determining the number of xHits and plugging it into the standard BABIP formula. The xBABIP for each season is determined and carried through Step 4 as its own entity.

Neutralized BABIP (nBABIP) is determined from the standard BABIP formula, using the neutralized stats from Step 4.

To get the target BABIP (tBABIP), nBABIP is regressed to xBABIP using a regression lever with the default set to 50%.

The adjusted hits are determined using the tBABIP as follows:

Adjusted hits = tBABIP x (nAB – nHR – nK + nSF) + nHR

Singles can be derived from adjusted hits by subtracting extra base hits. This isn’t perfect, it assumes all lucky/unlucky hits are singles which obviously isn’t the case, but the majority are so it serves as a viable proxy.

6. Determine RBI and Runs Index

I’ve developed an xRBI and xRuns formula using the same principle as wOBA. All the factors contributing to an RBI (single, double, triple, homer, sacrifice fly) and run (single, double, triple, homer, hit by pitch, stolen base, caught stealing) are assigned a corresponding coefficient relative in accordance with the run-scoring matrix. These coefficients use aggregate stats.

Each player has a park-adjusted number of RBI and runs calculated from Step 4, nRBI and nRuns. The respective indices are xRBI/nRBI and xRuns/nRBI.

Players on better teams often possess indices greater than one. Place in the batting order also influence the indices. Leadoff and two-hole hitters often have a run index greater than one and an RBI index less than one. Batters hitting in the meat of the order probably sport an RBI index above one.

7. Determine Stolen Base Opportunity and Success Rate

Unadjusted stats at the MLB level are used to compute the SBO (stolen base opportunity) and success rate (SB%). An MLE is applied to SB% for minor leaguers. The formula for SBO is

(singles + walks + hit by pitch)/(stolen bases + caught stealing)

8. Convert Neutral stats to Projected Stats

Some stats will be grouped together since the same operation is conducted on each. The action applied to every individual stat is multiplying the neutralized stat by the projected plate appearances then dividing by the neutralized plate appearances.

Playing time in general is worthy of its own discussion and will be reviewed in upcoming essays.

Aging is also universally applied to everything.

All results are rounded off to an integer

A. Park-corrected stats

Hits, double, triples, homers, walks and strikeouts are park-adjusted using the three-year composite average.

B. Non-park corrected stats

Hit by pitch, sacrifice, sacrifice fly, catcher interference and grounded into double play are projected without park adjustment.

C. RBI and Runs

The number of RBI and runs are projected using the aforementioned formulas then multiplied by their respective indices from Step 6.

D. Stolen bases

The number of attempts is calculated by summing projected singles, walks and HBP and multiplying by SBO from Step 7. The successful steals take that number and multiply by SB%. Caught stealing are attempts less successful tries, rounded to an integer.

E. Batting Average and Slugging Percentage

At bats are plate appearances minus the sum of walks, HBP and catcher interference.

F. On base percentage

The denominator in OBP isn’t projected PA since those include catcher interference and sacrifices. To get OBP, those need to be subtracted from projected plate appearances.

There you have it friends, the Mastersball Hitting Zystem. It's a lot to digest, so please feel free to pose questions, comments and criticisms on the message forum.

Todd Zola is the Primary Owner and Lead Content Provider for Mastersball. He’s the defending Great Fantasy Baseball Invitational champion, besting 314 of the industry’s finest. Todd is a former Tout Wars and LABR champion as well as a multi-time NFBC league winner.

Details: Written by: Todd Zola; Category: Organized Chaos; Published: 11 December 2019

One of the things I take pride in is the transparency of my methods and thought processes. Be it projection, valuation, game theory, etc., I’m an open book. Some feel it hurts my game play; maybe they’re right. It doesn’t matter. Once this became my job, be it through one of the companies I work for on a freelance basis or Mastersball subscribers, you get an honest evaluation, unbiased analysis and sincere advice, regardless if it can be used against me in a competitive nature.

Since I began doing this, I’ve publicly detailed how I generate projections and rank players. With the proliferation of new metrics and analysis, mostly Statcast but also elegant research by colleagues, I’ve improved my projections methodology. As such, it’s time to refresh the publicly facing description of the Mastersball projection process, heretofore known as The Zystem.

Truth be told, there will be multiple Zystems: projection, valuation and DFS to name a few. Call it my lame attempt at marketing.

The initial focus will be on The Projection Zystem. The rest of this essay will set the stage for the nitty gritty. Over the next week or so, the nuts and bolts regarding hitting and pitching projections will be shared.

Big picture, the Projection Zystem works as follow:

1. Convert past performance to a neutral environment

MLB players have their numbers normalized using age and park factors. Minor league players and those coming from foreign leagues are converted to an MLE (major league equivalency) which is used as a surrogate for performance.

2. Distill skills to a per plate appearance, inning pitched level

Self-explanatory, more of a bookkeeping process than anything.

3. Apply appropriate regressions

This will be one of the chief cruxes of the ensuing hitting and pitching projections methods. For the purpose of The Zystem, regression will specifically refer to elements of performance out of the player’s control as opposed to a change of skills. For example, a batter can hit a ball at a certain exit velocity and launch angle with a multitude of outcomes. The result is affected by park, weather, atmospheric conditions, men on base, etc. All the hitter controls is the manner they struck the ball.

The mechanism the adjustments are made is via regression levers. All the different levers will be detailed in future pieces, but here’s an idea how they work. In each case where regression applies, there’s an expected result. In statistical terms, it’s the mean. As mentioned, skills do not always manifest in the expected outcome. Regression is adjusting the actual outcome to the expected one. The regression levers set the extent of regression.

It’s a bit presumptuous to consider the methodology behind determining expected results as completely scientifically accurate. Think about Voros McCracken’s DIPS theory. Initially, all pitcher’s regressed to the same BABIP (batting average on balls in play). Then, this was refined to differentiate ground ball from fly ball pitchers. We’re still trying to discern the extent each pitcher controls different batted ball types and the quality of contact.

The point is the original concept of lucky/unlucky centered around how far a pitcher was from the league average BABIP. There wasn’t any accounting for a pitcher’s skill, because it couldn’t be identified. We’re still at the point where the player is driving some of the difference between actual and expected outcomes, but we're not sure to the extent.

Setting the regression lever at 100 brings the projected skill to the expected outcome. Leaving it at zero pegs the skill to the actual level. By default, the regression levels are all set at 50%. This allows subjectivity based on each individual scenario. Admittedly, this removes some objectivity, which should be a tenet of projection methodology, but it allows for personal seasoning.

Another way to look at it is expected and actual skills are a range. It’s plausible for a player to fall in between. Using 50% regression splits the difference. Overriding is a subjective determination where the player will land.

Regression levers are utilized for the following:

Hitters

Hits
Homers
RBI
Runs
Stolen base opportunities

Pitchers

Hits
Homers
Strikeouts
Walks
ERA

4. Determine projected skill levels using weighted average of three years’ worth of neutralized and regressed past performances

Marcels uses a three-year weighted average of 5:4:3 with 5 most recent. Based on some back-testing, The Zystem uses 11:7:4. With the tests showing the 2019 baseball was subject to reduce drag, serious consideration was given to changing so the 2019 data was weighted more than 50%. However, it was decided the 2017 ball was close enough to last year’s it served as the adjustor, with 2018 being the hedge in case the ball is changed.

5. Apply appropriate park indices

Self-explanatory, noting the following stats are park-adjusted:

Hits
Doubles
Triples
Homers
Strikeouts
Walks

6. Multiply by projected playing time

Self-explanatory, with the note playing time projections are even more important than skills. This is often overlooked and is usually the difference between projection sources. As such, projecting playing time deserves its own treatment thus will be included in the forthcoming essays.

There you have, an overview of the Projection Zystem. Please follow social media (@Mastersball, @toddzola on Twitter, Mastersball on Facebook) for notification when follow-up pieces are posted.

Details: Written by: Todd Zola; Category: Organized Chaos; Published: 18 November 2019

This is a simple game. You project the player, you rank the player, you draft the player.

lollygaggers

With a hat-tip to Durham Bulls Manager Skip Riggins, that’s fantasy baseball in a nutshell. Sure, there are a plethora of formats with all sorts of rules. Regardless, it all begins with how one feels a player will perform.

Player expectations aren’t all formulaic. To be honest, most aren’t. They aren’t all a specific stat line either. Some simply frame a guy as about .280 with 20-something homers and 80-90 runs with similar RBI plus a handful of steals in around 500 at bats.

Call it a projection, an expectation, or whatever you want. If you play fantasy baseball, everything is based on how you feel everyone will perform.

Personally, I prefer using an algorithm-driven method. It’s proven effective and it’s my nature. The key is understanding the limitations and not being married to the result. It’s not the projection itself, it’s what you do with it. To be honest, my projection of .282 with 24 HR, 83 RBI, 86 runs and 4 SB in 517 at bats isn’t any better than that referenced earlier. I know that and believe it. Yet, I strive to produce the best foundation for drafting in the industry.

I see a projection as an average of all plausible outcomes. One way to look at it is the average of all outcomes if a season were played a gazillion times. These would include some instances the player getting hurt opening day and others where he plays nearly every game. Over a gazillion years, a lot can happen.

Here’s an example, albeit oversimplified.

A. 5% chance the player gets hurt early
.284 with 5 HR, 16 RBI, 17 runs and a steal in 100 AB

B. 15% chance the player has a career year and gets more playing time than ever
.298 with 35 HR, 108 RBI, 110 runs and 7 SB in 625 AB

C. 30% chance the player has an IL stint or two, with lower than normal numbers
.269 with 19 HR, 69 RBI, 72 runs and 3 SB in 478 AB

D. 50% chance the player comes close to the last couple of season’s numbers
.285 with 26 HR, 91 RBI, 94 runs and 4 SB in 550 AB

Determining the weighted average of the above yields the exact line mentioned above: .282 with 24 HR, 83 RBI, 86 runs, 4 SB in 517 AB.

Even this simple example illustrates why the projection is only the starting point. Some touch-and-feel drafters may have an inkling on the player and are willing to pay for B, the career year scenario. Someone else may not weigh the downside risk of A and C and pay for D, which is a bit better than the final projection which factors in the risk.

You’re not drafting the projection; you’re drafting the player with a wide range of plausible outcomes. For me, it’s about knowing what the projection represents and applying that contextually to team needs.

There will be some instances where I’ll be willing to bet on the come and jump the player up my cheat sheet. The above isn’t the ideal example, but for a more injury prone player, I may hedge more towards the lower playing time outcomes and require a decent discount at the draft table. It all depends on my team at the time, or maybe general strategy I wish to deploy.

A winning team is all about balance. Most think of power versus speed or hitting versus pitching in this realm. The balance here is paying for the 90^th percentile projection versus needing a discount to invest. Both types of players contribute to a winning roster, with the fulcrum often being league context.

Many fantasy baseball enthusiasts say the reason they don’t do their own projections is not having enough time. What they really mean is the reason they don’t do their own spreadsheet/database driven projections is time. As explained, if you draft a player, you did it based on some level of expectation.

There’s a simple way one can generate their own projections. While there will be skills differences between different sources of projections, they’ll usually all be in the same ballpark. The source of the numbers may disagree, but they’re within the variance intrinsic to the process.

Playing time is the differentiator. Do you want to generate your own projections? Assemble a couple of different sets and drill each stat down to per PA and or per IP, then season with your own estimation of PA or IP. Voila, you have your own projection!

The caveat is, everyone (including yours truly) usually over projects playing time if the model is the average of a gazillion seasons. That said, if everyone is assumed to play more than they should, the relative ranking for fantasy purposes doesn’t change. What changes is the delta between the players, not the order. Generally, the players at the top are assigned more playing time than they should. The effect is more downside risk, more relevant in an auction when you can plan accordingly. In a draft you’re forced to take on the risk at the top. Though, a compiler (someone as much, if not more reliant on volume than skills) can leapfrog a higher skilled player with adequate playing time.

franciscolindor This example was better before this past season, but Francisco Lindor was the perfect player to cite. In his first three full years, Lindor played an average of 158 games, amassing 684 PA in 2016, followed by 723 then 745. It’s extremely rare to repeat 700-plus PA campaigns, much less three-peat. Yet, we all pegged Lindor for another 700-something in 2019. Of course, Lindor was injured early and finished with “only” 654. He was also hurt before the bulk of drafts so it could be considered. What if he got hurt the first week of the season instead? His ranking incorporated the huge level of PA. The key with Lindor is while his skills are obviously excellent, he’s still a compiler, needing volume for top-5 or top-10 ranking. Giving Mike Trout, Ronald Acuna Jr. or Christian Yelich that level of plate appearances makes them $55-plus players.

Note: Look for a refresh of a piece I did on projecting playing time to be posted soon

Future chapters will detail the Mastersball Projection Process. Hopefully this provides a backdrop for a projection and how to look at it when assembling a roster.

Details: Written by: Todd Zola; Category: Organized Chaos; Published: 20 December 2018

As you likely know by now, we lost Lawr Michaels on Wednesday morning. After a literal lifetime of looking death in the eyes and blowing smoke in its face, he's moved onto a better place. Those of us here in his former place are so much better off for the all too short time we were blessed by his grace, warmth, sincerity and exuberance.

I've spent most of the past day pondering how to express my feelings, and love for my mate. Lawr used to tell me one of the things he liked about me was I could find a way to write 1200 words on anything. The problem is, it was anything baseball. Funny, I write words for a living and I can't come up with the right ones. Maybe because they don't exist. But, I'll try anyway.

I think the reason I'm struggling is I'm not very good at this sort of thing whereas Lawr was a natural. I'm introverted, especially in public. Lawr was as extroverted as they come, but in a good way. He was the life of the party, but was never looking to be the center of attention, it would just naturally happen. He could carry on conversations with both sides of the table without missing a beat. To his left, he'd quote lines from "The Simpsons", to his right talking about the time he saw the Kinks live, recanting their set song-by-song.

Lawr and I were polar opposites in other ways. He was tall, thin, had long flowing hair and did yoga. I'm short, have consumed far too much pizza, sport a thinning buzz cut and get winded jogging my memory. But yet, we cared for each other like brothers.

Lawr's favorite way of referring to us was as the artist and scientist. We called ourselves, "Zen and Now," and were forever talking about launching a podcast with that name.

Mainly because we were in business together, I saw a side of Lawr not many knew. Our differing approaches often clashed. However, we kept it in the family, always making up. I'd like to think we each taught the other something, Lawr picking up some science while I grew an appreciation for art.

That said, I owe Lawr a huge debt of gratitude for helping me get back on my fantasy feet, first housing Mastersball then graciously agreeing to give up his beloved Creative Sports brand when we merged, it's his friendship I cherish the most. I've never met a more genuine, caring, loving individual.

Family was always first. I had the pleasure of meeting his late wife Cathy and departed son Joey, both of which also left us too soon. Later, I got to know his wife Diane, who was lovingly by his side, comforting him to the end. Lawr took pride in telling people I was one of the few that met all three. While writing that sentence, I think I realized why. It was his way of expressing his fondness for me. Like I said, I'm not very good at this stuff. I should have figured that out a long time ago and figured out a way to reciprocate.

Lawr's influence on the fantasy industry and more importantly our lives has been expressed eloquently by a bevy of friends and colleagues much better at this than I. Below are the links. Plus, if you're on Twitter, grab a box of tissues and search @LawrMichaels.

Brian Walton: The Man and His Brand

Joe Sheehan: Lawr Michaels

Jason Grey, via Jeff Erickson's Twitter

Ron Shandler: Facebook Post

Steve Gardner: Remembering Lawr Michaels

Justin Mason: Unabashedly Lawr

You're not too old to rock but you're too young to roll.

Way, way, too young.

RIP mate.

Details: Written by: Todd Zola; Category: Organized Chaos; Published: 16 November 2018

Ladies and gentlemen, please meet Ziddy. Who's Ziddy? Other than my dormant, edgy alter-ego that used to frequent the Boston bar and club scene in the 80's and 90's, Ziddy is an Excel spreadsheet designed to facilitate managing your fantasy baseball teams as well as serving as a research tool in the off-season.

Ziddy will be updated daily for 2019 Platinum subscribers. However, to wet your your whistle, the 2018 version is available for free download (see below).

In short, you enter the player and the date range you want to look at and Ziddy will provide an array of stats for that period. You can enter as many players as you want. Here's an example of the whole season data for the recently named MVP awardees:

And one for the 2018 Cy Young award winners:

Now let's change the dates to an arbitrary slice of the season:

The player pool and dates are easily entered via a pull down menu.

This is a beta-version. I already have a couple of tweaks planned for the upcoming season. For example, HR/FB will be added for both hitters and pitchers. The wOBA for hitters isn't accurate as it doesn't include intentional walks. I'll clean that up for the daily-updated 2019 version.

In addition, there will be three different formats. The first will be exactly as displayed here where you can customize the dates. The second will limit the number of players to five, but you'll be able to customize several sets of dates. The idea here is to help evaluating free agents or for roster moves. Below is a screenshot of the hitting version. The pitching one is similar.

l don't have a sample of the third version yet (primarily because I just thought of it). The last one will be dedicated to points leagues. I'll have preset options for the house settings on the major sites along with some DFS options. You'll also be able to input the scoring system for your league, so long as I'm tracking the scoring category.

Being a beta-version, I'm happy to take suggestions to incorporate into the 2019 version. Please post these on the site forum. In addition, if you need help manipulating the data in Excel, or any other questions about Ziddy, I'll offer assistance when able, also on the forum. Platinum subscribers will get priority.

For those considering a Platinum subscription, Ziddy is only a small part of the content. You get:

Projections and rankings updated weekly
Over 400 player profiles
START (STandings And Roster Tracker)
Rankings for points leagues including NFBC's Cutline and Fantrax Best Ball
Strategy articles including my contributions to Rotowire as well as all archived content
Minor League rankings.

Six of the last seven NFBC Main Event Champions are Platinum subscribers. Three of 2018's NFBC Overall champions subscribed (Main Event, Rotowire Online Championship, Draft Championship).

Click HERE for a free download of the final version of Ziddy for the 2018 season.

Page 3 of 26

Creativity

Innovation

Originality

Imagination

The Hitting Zystem

Ladies and Gentlemen: Meet The Zystem

What is a Projection?

Not Too Old to Rock but Too Young to Roll

Introducing Ziddy

Salient