Ode to the Great Bambino

How the Best of the Best Performed Relative to Their Time Period

By Keith Gladstone

Only the best players of a given era are inducted into the National Baseball Hall of Fame in Cooperstown, from classic names like Babe Ruth and Lou Gehrig, to the most recent nominees of Mike Piazza and Ken Griffey Jr. We are in a Hall of Fame era tainted by PEDs with unthinkable, sky-high hitting totals.


Here, I did an analysis to normalize the career HR totals of all Hall of Famers based on their historical era. Babe Ruth held the career home run record at 714 upon retiring in 1935. Hank Aaron shattered the record almost 40 years years later, but what does this actually mean? Was Hank Aaron better than Babe Ruth?

I calculated a new statistic to measure a player’s HR performance relative to the era in which they played. I call it the “Home Runs to Benchmark Ratio.”

HR to Benchmark Ratio = Annual Career HR Average / HR Era Benchmark

*see appendix for technical details

  • A ratio of 1 means the player was an average home run hitter in his own era.
  • A ratio of 2 means the player hit twice as many HR as the average player.

Pitching dominated the game in the “Dead Ball Era,” which ended upon the emergence of Babe Ruth and the Bronx Bombers in the 1920s. 714 HR in an era when the average player hit only 100 HR in a career underscores how impressive Ruth’s prowess was.

The Home Run to Benchmark Ratio rankings below confirm this, with Babe Ruth miles above the rest, followed by other classic Yankee heroes Lou Gehrig and Joe DiMaggio. Stunningly, Hank Aaron does not even crack the top ten. His ratio is 2.48, leaving him 26th overall. The HR performances of The Great Bambino, The Iron Horse, and The Yankee Clipper relative to their contemporaries shows just how incredible they must have been to watch.

MLB HOF All-time HR Rankings – Normalized by Era

MidYear Name Career HR HR to Benchmark Ratio
1 1924 Babe Ruth 714 7.13
2 1931 Lou Gehrig 493 4.76
3 1944 Joe DiMaggio 361 4.45
25 1964 Harmon Killebrew 573 2.49
26 1965 Hank Aaron 755 2.48
27 1950 Ted Williams 521 2.43
28 1935 Earl Averill 238 2.37
29 1960 Mickey Mantle 536 2.33
30 1975 Johnny Bench 389 2.31


Below is a graph of career HR per game against the average HR per game in that era . Names that appear above the line toward the top-left have higher ratios. Babe Ruth is the top left point. Player names are revealed if you mouseover them.

Explore the graph.


The following assumptions were made for data collection and analysis:

  • Player performance is symmetrical over time with a peak in the middle of the player’s career
  • League averages are decent estimates of the “benchmark” over which a player could measure
  • This analysis will consider the modern era (Hall of Famers whose careers occurred mostly after 1900) and those with career batting averages above 0.250
  • Since Hall of Famers had relatively long careers, their statistics are reliable estimates of their abilities

Using “Midyear” as a barometer for a player’s peak

Since the number of players in this dataset is so large, we need a simplified way to capture a player’s top-performing year. For this analysis, we can take the player’s career totals and divide by the number of years played to get a yearly average for the player, and measure this average against the benchmark for the year (selected as the middle year of the player’s career). While this analysis is therefore not perfectly rigorous, it stills serves as a useful method for comparing players from different eras. Put another way, the performance benchmark in 1995 should be similar enough to 1997, and the benchmarks in the 1990s are different enough from those in the 1920s where a benchmark a few years off wouldn’t be a significant issue.

Data Sources

mets strikeout

The Mets’ World Series offensive collapse was inevitable

By Ben Ulene

After this year’s World Series ended in a Game 5 comeback win for the Royals, plenty of questions remain about what caused the Mets – who almost nobody [1] predicted would go home after just five games – to lose so quickly. While sloppy defense certainly contributed to their collapse, an even bigger liability was their offense, which only managed a meager 7 extra-base hits in the series.

Should we be surprised that the same team that had excelled at the plate during the NLCS, putting up 21 runs in a four-game sweep of the Cubs [2], could only manage 10 runs over their four losses to Kansas City? Probably not; as the statistics show, the Mets not only came into the World Series with a historically weak offense, but they also were up against a Kansas City bullpen that dominated games like perhaps no other bullpen before.

2015 Mets Offense
Statistic Value All-Time Rank (out of 202 W.S.  teams since 1914)
BA .244 200th
SO 1290 203rd
R / Game 4.22 177th
OPS+ 97 181st


First, the Mets’ offense, for a pennant-winning team, had been weak throughout the regular season. The team’s .244 regular season batting average was the fourth-worst of any World Series team since 1914; on top of that, their 1,290 regular season strikeouts were more than any other pennant-winner aside from the 2013 Red Sox (who more than compensated with a .277 regular season team average).

The Mets’ regular season mark of 4.22 runs per game was also the third-lowest of any World Series team in the last twenty years – and the only two to score less played each other (the 2014 Royals and Giants).

Perhaps most strikingly, the team’s OPS+ for the season – a statistic that measures a team’s OPS (on-base percentage + slugging percentage) relative to the rest of the league, with 100 being the league average – was 97, putting it below average in the big leagues this year. Only 23 other teams have ever made it to the World Series with an OPS+ of 97 or lower; of those, only 9 managed to win the series, and none since the 1997 Florida Marlins.

All in all, this was not an offense that anybody should have expected to put up huge numbers against any pitching staff in the World Series.

2015 Royals Bullpen
Statistic Value All-Time Rank (out of 202 WS teams since 1914)
Innings 539 2/3 1st
ERA 2.72 22nd
BAA .214 8th
K/BB 2.63 6th
WHIP 1.13 12th
tOPS+ 78 4th
sOPS+ 80 29th


The Mets weren’t just facing any ordinary pitching unit in the World Series, however, but rather one with a historically dominant bullpen for a World Series team.

Not only did the Royals bullpen hold opposing batters to a .214 average during the regular season, the 8th lowest for any pennant-winning club, but simultaneously posted a 2.63 strikeout-to-walk ratio, the 6th best regular season mark for a World Series team. The bullpen also maintained a 2.72 ERA during the regular season, the lowest for any World Series team since the 1990 Oakland A’s.

More complex statistics also reflect the dominance of the Royals’ bullpen. Its tOPS+ against – which reflects opposing hitters’ OPS relative to how they hit against starting pitching – was 78 (the 4th lowest for a World Series bullpen), making the Royals’ bullpen one of the best all-time at shutting down opposing offenses mid-game. And the bullpen’s sOPS+ against – which reflects opposing hitters’ OPS relative to the average OPS of hitters across the league – was 80, highlighting the bullpen’s excellence at shutting down hitters entirely.

While all of these numbers are impressive, what will go in the history books is how manager Ned Yost used his bullpen, which was a lot. The Royals’ bullpen pitched 539 2/3 innings this season, more than any other pennant-winning team in history. It’s not surprising that winning teams generally pitch their bullpens less than average, since more bullpen innings generally signifies bad starting pitching; in the Royals’ case, however, their bullpen was just really effective.

During the World Series, Royals relievers pitched 23 2/3 innings, compared to their starters’ 28 1/3. Take away Franklin Morales’s 6th inning implosion in Game 3, and the numbers are staggering: 1 run and 14 hits in just over 23 innings (an ERA of 0.39), with 4 walks and 30 strikeouts. And given just how dominant those relievers had been all year – and how susceptible to offensive slumps the Mets had been – the Royals’ dominant and decisive showing might just have been a foregone conclusion.

[1] http://espn.go.com/mlb/playoffs2015/story/_/page/playoffs15_worldseriespredictions/espn-experts-make-their-world-series-predictions

[2] http://www.baseball-reference.com/postseason/2015_WS.shtml

The Hot Hand: NBA Shot Streaks and the Geometric Distribution

By Neil Rangwani

Each year, as the NBA season kicks off, the “hot hand” debate (or, according to Wikipedia, the hot hand fallacy) resurfaces – are streaks of made shots indicative of a player getting hot, or are they just random occurrences? Here at Princeton Sports Analytics, we’re not happy discussing this with just anecdotal evidence (I mean, did you see Steph last night?), so we did some analysis(!). It turns out that (surprise) the data show that NBA superstars Steph Curry, LeBron James, and James Harden don’t get hot any more than a coin that lands on heads 5 times in a row does.

The argument that shot streaks are random is based on probability. Any time an event (with two outcomes) is repeated many times, streaks are bound to occur. To study whether NBA shot streaks are random, or if players have disproportionately long “hot” and “cold” streaks, I used shot-by-shot data from the 2014-2015 season (from nbasavant.com) and applied a geometric distribution framework.

The geometric distribution is a probability distribution that is used to model repeated trials of events that have two distinct outcomes, each of which occurs with a constant probability. NBA shots roughly fit these requirements – there are clearly two outcomes (makes and misses), and I’ll assume that a player’s season-long field goal percentage is the “true” probability that they make any given shot.

Using the geometric distribution, we can model the number of made shots in a row. Essentially, a streak means that a player makes a certain number of shots in a row and then misses the next. Mathematically, this is the probability of making k shots in a row (pk) multiplied by the probability of missing (1-p)k.

P(X = k) = pk * (1 – p)

Next, I applied this framework to the shot-by-shot data from last season for Steph Curry, LeBron James, and James Harden. Using the data and the geometric distribution, here’s the expected and observed shot streaks.

Curry James Harden
Streak Length Probability Expected Observed Probability Expected Observed Probability Expected Observed
1 24.98% 178.09 184 24.99% 174.15 176 24.64% 205.03 207
2 12.16% 86.69 85 12.19% 84.95 96 10.84% 90.17 88
3 5.92% 42.20 47 5.95% 41.44 38 4.77% 39.66 41
4 2.88% 20.54 17 2.90% 20.21 19 2.10% 17.44 18
5 1.40% 10.00 7 1.41% 9.86 7 0.92% 7.67 7
6 0.68% 4.87 4 0.69% 4.81 2 0.41% 3.37 5
7 0.33% 2.37 2 0.34% 2.35 1 0.18% 1.48 0
8 0.16% 1.15 1 0.16% 1.14 1 0.08% 0.65 0
9+ 0.15% 1.09 0 0.16% 1.09 0 0.06% 0.51 0

Here’s a visual version of the same data:

Stephen Curry Shot Streaks vs Expectation

LeBron James Shot Streaks vs Expectation

James Harden Shot Streaks vs Expectation

While it seems that the observed and expected distributions are close, we can actually quantify whether they are. We’ll use a chi-squared goodness-of-fit test to tell us whether the observed data fits the geometric distribution.

p-value Chi-Squared Test Conclusion
Curry 0.89 Fails Random
James 0.63 Fails Random
Harden 0.89 Fails Random

The p-value of a chi-squared test tells us the probability that the observed values are from the theoretical distribution. These p-values strongly suggest that the observed shot streaks are well explained by the geometric distribution.

Bringing this back to basketball… what we get from this analysis is that shot streaks match what we’d expect if they were truly random. Even NBA superstars don’t “get hot” – instead, since they tend to have higher field goal percentages in general, they naturally have longer streaks of made shots.

One thing to keep in mind, however, is that the model doesn’t account for timing. It’s totally possible that a player’s shot streaks are geometrically distributed overall, but when you isolate playoffs or overtime games, as examples, they tend to have longer streaks than expected. I mean, did you see LeBron in that Pistons game?

3v3 Overtime is Working

By Antonio Papa

This season, the NHL has initiated a rule change to create more overtime goals and fewer shootouts. Now, overtime play will be 3-on-3, instead of 4-on-4. A quick statistical analysis shows us that the new rule has – and will continue to – increase overtime scoring.

Shootouts were added after the 2005-06 lockout as an alternative to ties in the regular season, but they have been criticized as essentially flipping a coin to decide the winner. 3-on-3 play, in contrast, gives stronger teams an increased chance of scoring goals. In the early 1980s, the Edmonton Oilers even told their defenders to get into mutual roughing penalties on purpose so that the game would become 4-on-4 or 3-on-3. Then, Wayne Gretzky, Jari Kurri and Mark Messier would take over on the open ice. This was effective because players with superior skating ability gain an upper hand in 4-on-4 and 3-on-3 situations, resulting in more goals scored.

The NHL instituted the “Gretzky rule” in 1985 as a direct response to these shenanigans. The “Gretzky Rule” created the concept of coincidental minor penalties and allowed full strength play for offsetting penalties. A few years later, the NHL reversed the change in an attempt to reclaim some of that high-scoring open play. Expect this year’s 3-on-3 overtime to benefit top-heavy teams, like the Pittsburgh Penguins, who are sure to take advantage of the situation with skaters like Sidney Crosby, Evgeni Malkin and Phil Kessel.

Over the past eight seasons, 43% of overtime games had a goal and the other 57% needed a shootout (2,227 games). In this preseason’s overtime games with the rule change in place, 72% of overtime games had a goal and only 28% needed a shootout (24 games). Even with the small sample size, we can use a T-test (difference of means) to determine whether this change is statistically significant. The standard binomial error is σ = .0105 for the regular season set and σ = .0926 for the preseason set. The result is that we are 99% confident that the new rule decreases the proportion of shootouts in overtime by 40%-58% (about half) and should lead to high-octane teams winning more games in overtime.

[Editor’s Note: The last paragraph was edited post-publication to clarify the statistical test used]

AL Wild Card Live Probability Tracker

By Patrick Harrel

With three days left in the MLB season, there is still a lot to settle. The Astros looked strong as they sat atop the AL West for much of the year, but are now just trying to hold onto the second wild card spot. Meanwhile, the Angels have surged in September, and the Twins have also stayed in the race. With three games left, the Astros are holding on by 1 game over the Twins and Angels, and will look to stay ahead as they play in Arizona this weekend.

But how will the season finish? We will be tracking just that here, using live probabilities from Fangraphs’ win expectancy model. Follow here and see how your team’s chances at the playoffs change as the games go on tonight and into the weekend.

KO With A Side of Marrone to GO

By Dana Fesjian


In the last two weeks of the Buffalo Bills season I have cried, hollered, cheered, pouted, and smiled. The Bills have had such a tumultuous end to 2014 – it is just very emotional. Bottom line, they lost a lot: an owner, a head coach, a quarterback, a game against the 2-12 Raiders, and a chance at a playoff spot.

But there is one thing they haven’t lost: their fans. I am more happy and excited about the Bills than ever before. Seeing their first winning season since I became a die-hard Bills fan is exciting. Considering the challenges the Bills faced as a team this year, this is an unbelievable feat. So here is my year in review.

The Saga Begins

Let’s commence with that tragic July 2nd when I got that Bills app update that Kiko Alonso hurt his knee. My first thought was just “no.” When I found out he tore his ACL, I was speechless. Could you have imagined how powerful the defensive line would have been with both Kiko and the Brandon Spikes?


Then came the disappointing preseason with three straight losses going into the regular season, and I expected another 6-10 season or worse. The next few weeks came as a nice surprise though- two wins with EJ! But the two losses afterwards led them to put KO in. He did well, but we lost Fred Jackson and CJ Spiller to injuries in the process.

The Saga Continues

After the bye week, we had two disappointing losses in games the Bills should have won. And then there came another obstacle they had to go through: Mother Nature. Buffalo got about 8 feet of snow and I got a ScoreCenter update asking me to call a number to come shovel Ralph Wilson Stadium (aka THE best invitation ever). Then the Bills beat the Jets 38-3 as if there had never been a snowstorm in the first place.



CJ returned against Oakland and, with the way Sammy had been playing all season, I was ready for the playoff push to keep going. After that game there was an afternoon of tears. The Bills lost to the Raiders 26-24. They almost made a comeback, but the Oakland defense was just too good that Sunday. Playoff chances were gone, but my hope was definitely not.

The Perks

The main things that kept me devoted to the 2014 season were Sammy Watkins, Dan Carpenter, and the defensive line. The All-Pro duo and the defensive line were stellar this year and Dan Carpenter got a career high 34 field goals and set the Buffalo Bills record. Sammy was just Sammy and set some rookie records all across the board.


I have a lot to be upset about because the Bills didn’t make it into the playoffs, but I am also so happy with this Bills season. The Bills will keep improving even without Doug Marrone and KO because Doug Marrone was not the defensive coordinator and KO could have been better.

I see great things in store for 2015 and we shall see if Rex Ryan does become the new head coach. Boy, will I have a lot of things to say about that.