Category: NBA

Theory of the Big 3: Predicting NBA Team Win % from Individual Performance: Full Paper

By Gene Li


Nowhere is the concept of the “Big 3” more relevant than basketball. As a relatively star-dominated game compared to football, soccer, etc., NBA games are determined by the performance of a few players who can deliver offensive firepower. NBA fans often view their team’s success as driven by the top three players on each team. Just last season, we saw the trio of Stephen Curry, Klay Thompson, and Draymond Green from the Golden State Warriors face off against Lebron James, Kyrie Irving, and Kevin Love of the Cleveland Cavaliers. Historic “Big Threes” include the infamous James-Wade-Bosh trio in Miami, and the Duncan-Parker-Ginobili Spurs offense that won four championships over 13 years. But just how much can a team’s performance be attributed to its top three players?

View Li’s “Theory of the Big 3: Predicting NBA Team Win % from Individual Performance” full paper here.


Does Order Matter? An Analysis of Round 1 vs Round 2 Picks in the NBA

By Alex Vukasin

Are teams really making use of their first round picks? Have scouts been able to pinpoint the best talent with their first round picks, or is the draft round not a significant indicator of the talent and future of players in the league?


To answer this question, I analyzed the first and second round players drafted to the NBA from 2005 to 2014. All players who were drafted and played at least one game were included in the analysis, in order to identify only the players who have had NBA experience.

Performance Variables

The response variable throughout the analysis was the draft round, while the explanatory variables were games played, years played, minutes played in total, total rebounds, field goal percentage, three-point percentage, free throw percentage, minutes per game, points, points per game, rebounds per game, assists, and assists per game.

Advanced statistics used as explanatory variables in this study were “win shares”, (the number of wins contributed to a player), win shares per 48 minutes, “box plus-minus”, (the number of points out of the past 100 possessions a player contributes to his team above the average player), and “value over replacement player”, (the number of points a player scores on average given 100 team possessions over a replacement player compared to an average team over the 82-game schedule).

These variables created all share the common rule that a higher value resulted in a better career, while a lower value resulted in a less successful career. Below is a summary of all the variables.


Correlation Analysis: Positive among Performance Indicators, Negative with Round

The method used to test whether there was a causal relationship between “round” and all of these explanatory variables began by analyzing the correlation matrix of all the variables using STATA (Figures 1a and 1b). Although all the variables have a negative correlation with “round”, none are very high, as no value exceeds -0.5. There are also positive correlation coefficients between many of the explanatory variables, so it was not possible to include all the variables in one single regression without having multi-collinearity issues.

Regression Analysis: Performance Variables to Predict Round

Next, I ran some regressions to test which performance variables could help predict the player’s draft round, which would indeed suggest a relationship between the draft round and the player’s career performance.

In Figure 2a, field goal percent, minutes per game, rebounds per game and points per game all decrease as round increases, while minutes per game is the only statistically significant value (at 0.05 significance). This result seems to be notable as it supports the negative correlations between the explanatory variables and “round” as well as the fact that the correlation between “round” and minutes per game was the highest among the relationships between explanatory and response variables. Single regression tests were then conducted between each explanatory variable and response variable “round” in order to account for the high correlation between the explanatory variables (these regressions are not shown). All of these tests result in a negative coefficient for the explanatory variable that is significant at a level of 0.05.

Due to all of these factors having negative but small correlation coefficients with “round” and negative coefficients of significance for each single linear regression, it seems probable that the round in which a player is picked has a slight relationship with how their career will turn out. Although the correlations are not extremely high, the fact that there is a common negative relationship between all of these variables and “round” leads me to believe that there could be other variables indicative of success besides those listed which could be strongly correlated with “round”, that I could study further in another analysis.


Figure 1a: Performance variables negatively correlated with Round


Figure 1b: Performance variables positively correlated with each other


Figure 2a: Regression of Round with Field Goal Percent, Minutes per Game, Rebounds per Game and Points per Game


Editor’s Note: Edits have been made for clarity.

The Hot Hand: NBA Shot Streaks and the Geometric Distribution

By Neil Rangwani

Each year, as the NBA season kicks off, the “hot hand” debate (or, according to Wikipedia, the hot hand fallacy) resurfaces – are streaks of made shots indicative of a player getting hot, or are they just random occurrences? Here at Princeton Sports Analytics, we’re not happy discussing this with just anecdotal evidence (I mean, did you see Steph last night?), so we did some analysis(!). It turns out that (surprise) the data show that NBA superstars Steph Curry, LeBron James, and James Harden don’t get hot any more than a coin that lands on heads 5 times in a row does.

The argument that shot streaks are random is based on probability. Any time an event (with two outcomes) is repeated many times, streaks are bound to occur. To study whether NBA shot streaks are random, or if players have disproportionately long “hot” and “cold” streaks, I used shot-by-shot data from the 2014-2015 season (from and applied a geometric distribution framework.

The geometric distribution is a probability distribution that is used to model repeated trials of events that have two distinct outcomes, each of which occurs with a constant probability. NBA shots roughly fit these requirements – there are clearly two outcomes (makes and misses), and I’ll assume that a player’s season-long field goal percentage is the “true” probability that they make any given shot.

Using the geometric distribution, we can model the number of made shots in a row. Essentially, a streak means that a player makes a certain number of shots in a row and then misses the next. Mathematically, this is the probability of making k shots in a row (pk) multiplied by the probability of missing (1-p)k.

P(X = k) = pk * (1 – p)

Next, I applied this framework to the shot-by-shot data from last season for Steph Curry, LeBron James, and James Harden. Using the data and the geometric distribution, here’s the expected and observed shot streaks.

Curry James Harden
Streak Length Probability Expected Observed Probability Expected Observed Probability Expected Observed
1 24.98% 178.09 184 24.99% 174.15 176 24.64% 205.03 207
2 12.16% 86.69 85 12.19% 84.95 96 10.84% 90.17 88
3 5.92% 42.20 47 5.95% 41.44 38 4.77% 39.66 41
4 2.88% 20.54 17 2.90% 20.21 19 2.10% 17.44 18
5 1.40% 10.00 7 1.41% 9.86 7 0.92% 7.67 7
6 0.68% 4.87 4 0.69% 4.81 2 0.41% 3.37 5
7 0.33% 2.37 2 0.34% 2.35 1 0.18% 1.48 0
8 0.16% 1.15 1 0.16% 1.14 1 0.08% 0.65 0
9+ 0.15% 1.09 0 0.16% 1.09 0 0.06% 0.51 0

Here’s a visual version of the same data:

Stephen Curry Shot Streaks vs Expectation

LeBron James Shot Streaks vs Expectation

James Harden Shot Streaks vs Expectation

While it seems that the observed and expected distributions are close, we can actually quantify whether they are. We’ll use a chi-squared goodness-of-fit test to tell us whether the observed data fits the geometric distribution.

p-value Chi-Squared Test Conclusion
Curry 0.89 Fails Random
James 0.63 Fails Random
Harden 0.89 Fails Random

The p-value of a chi-squared test tells us the probability that the observed values are from the theoretical distribution. These p-values strongly suggest that the observed shot streaks are well explained by the geometric distribution.

Bringing this back to basketball… what we get from this analysis is that shot streaks match what we’d expect if they were truly random. Even NBA superstars don’t “get hot” – instead, since they tend to have higher field goal percentages in general, they naturally have longer streaks of made shots.

One thing to keep in mind, however, is that the model doesn’t account for timing. It’s totally possible that a player’s shot streaks are geometrically distributed overall, but when you isolate playoffs or overtime games, as examples, they tend to have longer streaks than expected. I mean, did you see LeBron in that Pistons game?

There is No Place Like Home

By Jeffrey Gleason

Nine weeks into the NFL season, no teams remain unbeaten. This could’ve actually been said after eight weeks, after seven weeks, and after six weeks as well. Week 5 was the last time an unbeaten team remained, when both the Cardinals and Bengals were sitting at 3-0.

However, after these same nine weeks, five teams remain unbeaten at home. The Patriots, Broncos, Eagles, Packers, and Cardinals have yet to lose on their own turf.

Home field advantage is a phenomenon that gets a lot of traction in sports. Experts often use it to justify their predictions and betting lines usually reflect the perceived advantage of the home side. However, people often generalize home field advantage with a “one size fits all” approach, acknowledging its presence, but assuming it displays a constant impact across different situations.

With five unbeaten NFL home teams and the recent impetus of a road team finally winning Game 7 of the World Series (the Giants topped the Royals on October 29th to capture their third championship in five years), I was interested in how home field advantage was quantitatively different in different situations. How does it vary across sports? Do both good teams and bad teams experience the same advantage? Is it magnified in the postseason? What about differences in earlier eras? These are the questions I set out to resolve.

Continue reading

Age in the NBA: Do older teams “find ways to win games”?

By: Patrick Harrel

Tune in to any NBA team’s local broadcast, and you will be sure to hear a litany of clichés from the commentators. Most are quick to praise older players, noting on occasion that “veteran teams just know how to win games.” The San Antonio Spurs, for example, are lauded for their ability to defy expectations, winning another NBA title last year despite Tim Duncan and Manu Ginobili closing in on 40 and Tony Parker getting well into his 30’s. Like most musings from broadcasters, these assertions are driven by little more than perception—completely unfounded points made to maintain conversation in a long season.

But could there be some merit to these thoughts? There is no doubt that NBA teams are looking for any edge they can get, and do veteran players give them an advantage? Do older teams in the NBA truly “find ways to win games” like so many claim? That’s what we aimed to find out in this research.

To look at this issue, we consider an expected wins model first constructed by Bill James for baseball and later adapted by Daryl Morey for use in basketball. James dubbed his formula for expected wins the “Pythagorean expectation” because of its similarity to the Pythagorean theorem, and it relies on the well-established principle that point differential is a better estimator of team performance than raw Win-Loss data. James’ formula for the Pythagorean expectation of winning percentage is as follows:


The adapted NBA formula which we have used is the following:


Using this formula, we built a database with each team season for the last 20 years (excluding lockout years, which had odd statistical trends) with their Pythagorean expectation of wins. Using that figure, we compared the actual wins for every team to their expectation, and generated a residual figure for each individual team season. A positive residual means that a team won more games than their point differential would forecast, and a negative residual means the opposite.

The other major metric used in this section of the research is minutes-weighted age. Averaging all the ages of players on the roster does not give an accurate representation of a team’s effective age as it fails to distinguish the impact of a player like Tim Duncan, who played over 2000 minutes for the Spurs in 2013-14 at the age of 37, versus Steve Nash, who played all of 313 minutes in that same season. However, by weighting the average by the amount of minutes a player has played, we get a much stronger metric for the effective age of an NBA team. We then normalized the weighted age values to zero using the league average for each season to account for changing ages in any given year.

With these two metrics in mind, we compared the residual values of expected versus actual wins to a team’s age vs. the league average, using a linear regression, and came up with the following results:


With the regression model above, we found that there was no significant relationship between a team’s minutes weighted age and their residual wins. Because the residual wins value was centered on 0, there is no intercept term, and the only thing that remains in the regression is the slope value times the explanatory variable, age. That slope value was essentially zero, with no statistical significance whatsoever.


If you are interested, the data was as follows:


As you can see, the data yielded no statistically significant results with respect to a relationship between older teams outperforming or underperforming their expected wins. In fact, looking at publicly available data for the last 20 years, there were no apparent trends for teams outperforming their expected wins. Faster paced teams, teams that shot a lot of threes, won a lot of games, or got to the free throw line a lot all did not see a statistically significant improvement in outpacing their Pythagorean expectation.

Perhaps this is not a groundbreaking result, but it highlights the effectiveness of Pythagorean expectation that not only is it an unbiased estimator of a team’s winning percentage regardless of age, it is unbiased regardless of virtually any factor you can check. Teams that scored a lot, very little, or in the middle all tended to match their Pythagorean expectation on average. This unbiased nature of the Pythagorean estimator has its roots in the derivation of it. Research has shown point differential to be a better indicator of a team’s performance than winning percentage, and this further investigation supports that research.

This is just one way of evaluating whether veteran teams get an edge, but at least in this sector of our research, it is clear that older teams do not have any advantage. Older teams might play slower, shoot more threes, or dunk less, but they will match their Pythagorean expectation over time.

Catching Kareem

By Neil Rangwani

With opening night for the NBA regular season one week away, one storyline that isn’t getting much attention is Kobe Bryant’s pursuit of greatness. Already one of the greatest players of all time, Kobe enters this season with five championships, two Finals MVP Awards, a regular season MVP Award, fifteen All-NBA selections, two scoring championships, and innumerable comparisons to the G.O.A.T. However, one often overlooked career milestone is total points, in which Kobe is fourth, all-time, with 31,700 career points. The all-time leader, of course, is Kareem Abdul-Jabbar, with 38,387 points. With no top-tier teammates this year and in the foreseeable future to share the ball with, Kobe is uniquely positioned to make a run at the points record.

However, this past season certainly did not go according to plan for Kobe, who played in only 6 games as he recovered from injury. Now 35 years old, with 18 NBA seasons under his belt, and still recovering from a series of injuries, popular opinion is that Kobe’s chances of catching Kareem are slim. After reading this article, I decided to analyze Kobe’s chances of catching Kareem.

For reference, here’s a table of some of the top scorers in NBA history:


Although Kobe is pretty far from Kareem, he’s closing in on Michael Jordan, so I added Jordan’s 32,392 points as a benchmark in the analysis. I’ve also included some of the other leading scorers in the NBA: LeBron James, Carmelo Anthony, and Kevin Durant, to see if they have any chance of reaching the upper echelon of NBA scorers.

Continue reading

Assessing NBA Scoring Champions Relative to League Average

A Historical Study

by Aqeel Phillips

With just a few weeks left in the regular season, some of us are left without much to root for anymore. HEAT fans remain optimistic in the surprisingly competitive battle for the first seed, and Suns, Mavs, and Grizzlies fans are biting their nails short in hopes that their teams can grab a playoff spot. However, a good percentage of us basketball fans now realize we have little to root for anymore (or if you’re a Sixers fan like me, you realized in about August), and are just waiting to see the final playoff seedings and end-of-season awards before the playoffs get underway. Besides the MVP, one of the most notable awards each year is the Scoring Title. Last season, we were treated with a thrilling ending as the battle for the Scoring Title came down to the wire between Kevin Durant and Carmelo Anthony.

This season, Kevin Durant aka the Slim Reaper has made things less interesting, currently scoring 32.2 points per game (PPG) over 2nd place Melo’s 28.0 PPG. Durant is the only player to average 30 points since he did in the 2009-10 season. The NBA has had a notable drop in scoring lately, a trend first starting when hand checking was instituted in the early 2000’s and extended as many teams have embraced sharing the ball throughout the team in order to better find open looks, namely threes, rather than relying on singular scorers. Durant’s current season widens eyes at first glance — averaging 4 points more than his next closest competitor will do that. But I find that PPG by itself doesn’t tell the full picture. Elgin Baylor averaged over 38 points in 1961-62, but that was over 50 years ago in a completely different league. So who had the most impressive season: 2014 Durant? 1962 Baylor? 2006 Kobe? We’ve witnessed plenty of monstrous seasons, and this study examines them in relation to the rest of the league at the time to contextualize the simple PPG marks.

League Scoring Average (Season)

To get a better comparison between scoring performances, we can divide a player’s PPG by their minutes per game (MPG) marks to see how they’re scoring with regard to the opportunities they’re being given. This is especially useful in calculating a league average scoring mark. We don’t want end bench players that average 0.6 PPG to drag down the entire league scoring average, most importantly because they outnumber the talented, 20+ PPG scorers in the league. Dividing PPG by MPG for each player across the league levels the playing field, and also accounts for the possibility that in any given season the league as a whole significantly played more or less bench/low-scoring players for whatever reason (for example, in the ‘60s there were much fewer players in the league and more minutes and points to go around).

For reference, here are the Points Per Minute values for the current league leaders in scoring:

League Leaders

(For those wondering about a full list of the league leaders in PPM, see the appendix)

In terms of points scored per time played, you can see that Durant is not just scoring at an average rate while playing more minutes, he is scoring more efficiently than the players below him on the list (shown by a higher PPM value than his competitors). It’s interesting to note that Melo averages more minutes than Durant, but Durant makes much better use of his time, scoring-wise, than Melo (Durant is also more efficient with his shot attempts – averaging 20.7 field goal attempts per game to Melo’s 21.5). This gives more evidence to Durant’s case for “best scorer in the league” – not only does he have the sheer output, but he also has the efficiency.

Next, we’ll calculate the average PPM value for the entire league, and compare each individual player to that average, to see how much better they score than the average replacement.

Unlike other studies I’ve done, I haven’t artificially subtracted out all of the players that aren’t contributing much (<20 MPG, <30 GP in previous articles), as using PPM should even out all contributions.

Continue reading

Using Weighted Player Efficiency Rating to Predict the NBA Playoffs

By Neil Rangwani

This time of year means a few things in the world of sports: March Madness highlights take over ESPN, baseball stadiums start to fill up, and Knicks fans await their inevitable disappointment.

This NBA season looks remarkably competitive: the top of the league is crowded with legitimate contenders. The defending champion Heat and the Pacers, although sliding a bit recently, look to be the favorites in a weak East, while the Thunder, Clippers, and an extremely hot Spurs team each look like they could win the West.

In order to take a closer look at the playoff picture, we wanted to rank teams according to a metric that took into account various facets of a player’s game, so we decided to calculate a team equivalent of Player Efficiency Rating (PER). We took a relatively simple approach, since PER encompasses a number of basic statistics.

Introducing Weighted Player Efficiency Rating (WPER)

Using data for each player over the past four NBA seasons, we weighted each player’s PER by their playing time as a fraction of their team’s total playing time in order to account for a player’s actual usage. Then, we found each team’s Weighted Player Efficiency Rating (WPER) by summing the values for each player on each team.

Continue reading

Dishes and Dimes Part II – Passing Efficiency

by Aqeel Phillips

Halfway through the current NBA season, fans have celebrated and lamented the position of their teams as the contenders and lottery teams separate themselves from the pack. On the flip side, NBA stat geeks have begun universally celebrating as the SportVU player tracking system has filled up with an ample pool of data and now possesses a respectable sample size. More than 41 games into the season, we can not only start to project playoff seeding and start pondering matchups, but we can also begin to accept players’ performances so far as an expectation of how they will finish the season as well (barring injury or possible team-afflicting swaps at the trade deadline). SportVU allows us to take a deeper look at these performances, past the simple statlines of points, rebounds, and assists, and really get our hands dirty in finding out what might makes each team and player special.

A Revisit

To start, I’d like to revisit my previous article with a few revisions. A reader pointed out that the passing player’s free throws were not being subtracted from the team free throws, so players like LeBron James and Russell Westbrook benefitted from taking many free throws. In addition, it appears that Assist Percentage is a more helpful stat to use than Assist Rate for calculating free throws. The former is simply a percentage created by the amount of field goals assisted by a player out of the total team field goals made, while Assist Rate is a more involved metric that counts assists versus possessions in a game. Lastly, player minutes need to be factored in as well. Team points from free throws are tallied over the entire game, but a player is only on the court for a fraction of the game to assist on those free throws. As a result, we need to multiply the team free throws per game by the fraction of the game that a player is on the court.

Here is a comparison of my formula (specified in previous article) compared to the concrete data that SportVU provides this season, using this season’s data rather than the 2012-13 data I used previously.

Screenshot 2014-03-27 15.56.09

The formula has its flaws, specifically it has a tendency to overestimate the number of free throws catalyzed by a player’s passing. For example, the formula assumes that Chris Paul’s ridiculous 53.8% assist percentage also applies to the amount of free throws shots while he is on the floor. The formula projects him to catalyze 5.8 FTs per game, while reports that he only catalyzes 0.9 per game (almost the full difference between his projected points and his contributed points). Overall I believe it still gives a fairly good projection of how many points a player is contributing total. I think that it can still be a valuable tool for getting a picture of players’ contributions before SportVU was available.

(Note: AST+ is not available for this season, so I was forced to calculate it myself. A full explanation can be found after the conclusion of the article).

Introducing Passing Efficiency

SportVU has been tracking two pieces of player data never readily available before: Passes per Game and Points Created by Assist per Game (as mentioned previously). The points are a combination of passes leading to two-pointers, threes, free throws, and passes leading to assists (“Hockey assists”). To get a picture, here are the current top five in Passes per Game and Points Created by Assist per Game (which is desperately in need of a fancy acronym).

Screenshot 2014-03-27 16.12.02

Return to old format to affect length, not champion, of NBA Finals

by Gina Talt


The 2013-14 NBA season is officially underway, and more has changed than just the Brooklyn Nets and Dwight Howard’s jersey.  Last week the NBA owners voted unanimously to return to the 2-2-1-1-1 format of the pre-1985 Finals after 29 years of playing under the 2-3-2 format. This change will take place immediately, beginning with the 2014 Finals.

How will the new format affect future Finals series?

According to Commissioner David Stern, the format change reflected a sentiment among the teams in the league that the higher seed was not sufficiently rewarded for its better season record under the old 2-3-2 format. They believed it was unfair for the home team to have to play on the road for a crucial Game 5. Over the last 29 years, the higher seed has been down two games to three after Game 5 nine times. When you take a closer look however, this situation has been a recent trend of late. The higher seed in four of the previous eight Finals and three of the last four Finals has trailed the lower seed after Game 5. While the higher seed had home court advantage in Game 6 and 7, it needed to win both games to come out on top. Yet history shows that the lower seed had a slight advantage at this point. The lower seed ended up with the O’Brien trophy five out of the nine times when it held a one-game lead going into Game 6.

The data seem to back the NBA’s rationale for the change, but how likely was the same scenario under the new 2-2-1-1-1 format which was in place for 28 years before 1985? Continue reading