Friday, April 6, 2007

The Win/Loss Fallacy

I briefly mentioned the error of using Win/Loss (W-L) records to judge a pitcher in an early post by referencing Horacio Ramirez’s 2003 campaign (to recap, he went 12-4 with over 9 runs of support per win). I thought that it may be of use to look at the significance of these records over a larger sample size and not just one player in one season, since we are constantly bombarded (by announcers, writers, co-workers, roommates, etc.) with how much a pitcher won in order to determine his worth.

Before looking at it mathematically, the reasons for a W-L record being irrelevant really should be fairly obvious. No matter how well a pitcher throws, he is completely reliant upon his offense for help in getting the win. Even in the case of a shutout, the pitcher requires at least one run for the W (that is, of course, assuming that this isn’t taking place in the NL and the pitcher could potentially hit a homerun to provide his own offense as well).

To test the validity of the W-L records, I ran regressions of several pitching metrics against Winning Percentage (W%) to try and gauge the affect that pitchers have directly on their own records. A few notes on the model:

  • The pitching metrics used in the models are Stikeouts per 9 innings pitched (K/9), Homeruns allowed per 9 innings pitched (HR/9), Walks per 9 innings pitched (BB/9), and Earned Run Average (ERA).
  • The ERA metric is used as more of an “anti-metric” to the first three, as it is not near as good of an indicator of actual pitcher performance. Explaining/proving this would make this way too long, so I would suggest reading JC Bradbury’s paper “Does the Baseball Labor Market Properly Value Pitchers?”, as it does a great job of breaking down the deceptions of ERAs.
  • I used the performance records of every pitcher that has thrown over 160 innings in each of the past 7 years (yielding 628 samples). I omitted relievers because the rules for getting a win are ridiculous enough as they are, without taking into account that relievers can actually give up the lead and still obtain the win under the right conditions.
  • I ran regressions of these metrics against Winning Percentage instead of number of wins to account for the bias that would occur among pitchers with a different number of innings pitched. If two pitchers of the same caliber (and with the same teams and luck) pitch different amounts of innings (assuming that this is due to one having more starts), then the pitcher with more innings has the likelihood of having more wins.

After attempting several different models, the one that most explained Winning Percentage with pitcher-controlled performances was:

W% = βº + β¹(HR/9) + β²(K/9) + β³(BB/9) + ε

(I know that these should be subscript by the Betas, but I can’t figure out how to do that in Word, so I’m using superscript instead)

Per the t-stats, all of these variables are significant given a 5% significance level and the degrees of freedom for this model. Additionally, the signs of the coefficients are as predicted; HR/9 and BB/9 have a negative affect on W%, while K/9 has a positive affect. Having said that, the model doesn’t explain all that much.

The R² of .2289 shows that only ~23% of the variance in W% is explained by outcomes that pitchers directly have control over. Individually, K/9 has the closest correlation with W% (.346). Coincidentally, this is probably the best single stat to measure a pitcher by, as it reflects an ability to get batters out, and is the least volatile (from year to year) of the metrics. BB/9 has the smallest correlation (negative, of course), at -.174…maybe this is why Russ Ortiz still has a job.

With coefficients in place, this is how the model looks:

W% = .587 - .104(HR/9) + .023(K/9) – .028(BB/9) + ε

Again, the low correlation and R² results illustrate just how little the pitcher-only influenced metrics have to do with W-L records. This is not an argument against these metrics, but rather against the notion of judging a player by W%.

To contrast, we can run regressions to determine how much of W% is explained by a pitchers ERA. Referring back to JC Bradbury’s paper, remember that ERA his highly based on luck and the quality of defense behind you. At any rate, the model (much like the first) looks like this:

W% = β¹ + β²(ERA) + ε

(In fairness, it should be noted that I didn’t adjust for park factor or league in this model. I’m assuming that it would make a negligible difference.)

As expected, since ERA takes some luck and defense into account as well, it provides more explanation for the variance of W%. The t-stat is significant under the same conditions as above (save for a slight change in degrees of freedom), the sign of the coefficient is as expected (negative), and the R² of .351 explains ~35% of the variance in W%. While this is an improvement, it still does not serve as a particularly accurate model. However, it does prove a point further: even when taking some luck and team defensive skill into account with ERA, ~65% of the variance in W% is left unexplained.

The model with coefficients:

W% = .915 - .089(ERA)

(Note that this is saying that even if a pitcher never gives up an earned run, he will still lose ~8% of the time. This could be explained by errors and unearned runs…those are flawed for different reasons, though.)

Summing up, pitchers actually have a small part in influencing their W-L record. Defense, offensive run support, and dumb, blind luck play a much larger role in deciding what record is fixed to their name than they actually can. As simplistic as it may sound, the best thing a pitcher can do is to strike an opposing hitter out. Not only does this project out well from year to year, but if a pitcher strikes out enough hitters consistently; eventually the Yankees will sign him, and he’ll have all the run support he needs.

Kind of interesting from the models run using the data over the past 7 seasons:

  • Zach Greinke had the most unlucky year of all in 2005. In that season, he had metrics of 5.61 K/9, 1.13 HR/9, and a walk rate of 2.61 BB/9. Not great, but certainly warranting more than the 5-17 record he was saddled with.
  • On the opposite end of the spectrum, Paul Abbott enjoyed the luckiest year out of everyone in 2001. Abbott posted rates of 6.5 K/9, 1.16 HR/9, and 4.80 BB/9. Except for the walk rates (which were actually almost twice a bad), these aren't all that different from Greinke, and Abbott was rewarded with a 17-4 record.

3 comments:

WC said...

What database do you use to get your stats from...particularly historical stats? Is there any special export function into excel? I like using WHIP instead of ERA (You already had the BB/IP) but I am not sure how that effects things like HR/9 or if it makes that stat less relevent. I would be interested to see if there were any correlation between something like first pitch strikes and W%. Anyways, nice work.

bstewart said...

There are several sites to get the stats from; ESPN.com and MLB.com are obvious ones, but baseballreference.com is outstanding...haven't found a way to export them smoothly, so I generally just copy and paste. I neglected using WHIP due to the findings that suggest that the pitcher has little control over balls put in play due to reliance on fielders (JC Bradbury mentions this in his paper). The effort here was to isolate things that only the pitcher has control over. I don't really like the ERA stat, but again I used that to show that even taking into account some luck didn't expaline W% all that great. I think you're right, though, that including WHIP while HR/9 and BB/9 are in the model would yield one insignificant. Seems that you'd bee capturing some of the same information multiple times.

bstewart said...

Check that...subscribing to Baseball Prospectus lets you export stats.