Tagged: stats

A New Metric to Analyze Viewer Experience in Pro Tennis: Full Paper

By Rohan Rao


Recently, organizations such as the NCAA have been attempting to increase viewership of tennis by implementing rule changes to reduce the length of the individual matches. The logic behind these changes is to increase the relative importance of each point making the overall experience more exciting. I think this is a particularly interesting problem for the sport of tennis, which is currently fighting falling ratings (losing 1.4 million viewers this year for the men’s U.S. Open finals) but is increasing the uncertainty of games the best way to gain viewership or increase the excitement of the sport? The process for determining which rule changes lead to more viewers can be a complicated question; however, I would say that by statistically examining the shot selection across a variety of tournaments and players, we can get an alternate and useful metric to determine how exciting or interesting a match is, which could provide some insight into the issue.

View Rao’s “A New Metric to Analyze Viewer Experience in Pro Tennis” full paper here.

Age in the NBA: Do older teams “find ways to win games”?

By: Patrick Harrel

Tune in to any NBA team’s local broadcast, and you will be sure to hear a litany of clichés from the commentators. Most are quick to praise older players, noting on occasion that “veteran teams just know how to win games.” The San Antonio Spurs, for example, are lauded for their ability to defy expectations, winning another NBA title last year despite Tim Duncan and Manu Ginobili closing in on 40 and Tony Parker getting well into his 30’s. Like most musings from broadcasters, these assertions are driven by little more than perception—completely unfounded points made to maintain conversation in a long season.

But could there be some merit to these thoughts? There is no doubt that NBA teams are looking for any edge they can get, and do veteran players give them an advantage? Do older teams in the NBA truly “find ways to win games” like so many claim? That’s what we aimed to find out in this research.

To look at this issue, we consider an expected wins model first constructed by Bill James for baseball and later adapted by Daryl Morey for use in basketball. James dubbed his formula for expected wins the “Pythagorean expectation” because of its similarity to the Pythagorean theorem, and it relies on the well-established principle that point differential is a better estimator of team performance than raw Win-Loss data. James’ formula for the Pythagorean expectation of winning percentage is as follows:


The adapted NBA formula which we have used is the following:


Using this formula, we built a database with each team season for the last 20 years (excluding lockout years, which had odd statistical trends) with their Pythagorean expectation of wins. Using that figure, we compared the actual wins for every team to their expectation, and generated a residual figure for each individual team season. A positive residual means that a team won more games than their point differential would forecast, and a negative residual means the opposite.

The other major metric used in this section of the research is minutes-weighted age. Averaging all the ages of players on the roster does not give an accurate representation of a team’s effective age as it fails to distinguish the impact of a player like Tim Duncan, who played over 2000 minutes for the Spurs in 2013-14 at the age of 37, versus Steve Nash, who played all of 313 minutes in that same season. However, by weighting the average by the amount of minutes a player has played, we get a much stronger metric for the effective age of an NBA team. We then normalized the weighted age values to zero using the league average for each season to account for changing ages in any given year.

With these two metrics in mind, we compared the residual values of expected versus actual wins to a team’s age vs. the league average, using a linear regression, and came up with the following results:


With the regression model above, we found that there was no significant relationship between a team’s minutes weighted age and their residual wins. Because the residual wins value was centered on 0, there is no intercept term, and the only thing that remains in the regression is the slope value times the explanatory variable, age. That slope value was essentially zero, with no statistical significance whatsoever.


If you are interested, the data was as follows:


As you can see, the data yielded no statistically significant results with respect to a relationship between older teams outperforming or underperforming their expected wins. In fact, looking at publicly available data for the last 20 years, there were no apparent trends for teams outperforming their expected wins. Faster paced teams, teams that shot a lot of threes, won a lot of games, or got to the free throw line a lot all did not see a statistically significant improvement in outpacing their Pythagorean expectation.

Perhaps this is not a groundbreaking result, but it highlights the effectiveness of Pythagorean expectation that not only is it an unbiased estimator of a team’s winning percentage regardless of age, it is unbiased regardless of virtually any factor you can check. Teams that scored a lot, very little, or in the middle all tended to match their Pythagorean expectation on average. This unbiased nature of the Pythagorean estimator has its roots in the derivation of it. Research has shown point differential to be a better indicator of a team’s performance than winning percentage, and this further investigation supports that research.

This is just one way of evaluating whether veteran teams get an edge, but at least in this sector of our research, it is clear that older teams do not have any advantage. Older teams might play slower, shoot more threes, or dunk less, but they will match their Pythagorean expectation over time.

Using SportVU to find the most overrated defenders in the NBA

dj block

By: Patrick Harrel

In the quest for advanced statistics capable of accurately quantifying defense, NBA analysts have always faced an uphill battle. Unlike offense, which had easily quantifiable measures of success, readily available statistics came nowhere close to establishing how effective a defensive player was on the floor. If a player blocked a lot of shots, he was often lauded as a tremendous defender, but what if those blocks came at the cost of missed rotations and wide open layups on failed attempts? Until very recently, we couldn’t dream of answering a question like that comprehensively.

When the NBA announced this year that they would be making the SportVU data available to the public for the 2013-14 season, the news was met with raucous applause from all circles involved with basketball. Writers loved it, fans loved it, and statisticians, who had always only been able to make educated guesses about certain factors, adored it. At Princeton Sports Analytics, we are going to make the data more accessible to you in a bi-weekly column, with each entry dedicated to a specific aspect of what is going on in the NBA.

If you are unfamiliar with SportVU, it is a system that is now installed in all 29 NBA arenas that tracks the movement of all 10 players on the court, the 3 referees, and the ball, and automatically generates an incredible amount of data about the various outcomes on the floor. It tracks average speed of every player, how many touches any given player gets per game, and much more.

Today, we’re going to discuss the ability to better quantify defense. Specifically, we will look at who have been some of the surprisingly poor interior defensive players this season. SportVU measures how well players defend inside by charting every shot attempt that an offensive player takes when a defender is both within five feet of the basket and within five feet of the offensive player. It then measures what percentage of shots the defensive player allows to be made under these conditions.

Continue reading