So, do you disagree with Wikipedia entry on R squared when it says: "It is the proportion of variability in a data set that is accounted for by the statistical model." or that "An interior value such as R2 = 0.7 may be interpreted as follows: "Approximately seventy percent of the variation in the response variable can be explained by the explanatory variable. The remaining thirty percent can be explained by unknown, lurking variables or inherent variability." "?
[rquoter] R2 is often interpreted as the proportion of response variation "explained" by the regressors in the model. Thus, R2 = 1 indicates that the fitted model explains all variability in y, while R2 = 0 indicates no 'linear' relationship (for straight line regression, this means that the straight line model is a constant line (slope=0, intercept=\bar{y}) between the response variable and regressors). An interior value such as R2 = 0.7 may be interpreted as follows: "Approximately seventy percent of the variation in the response variable can be explained by the explanatory variable. The remaining thirty percent can be explained by unknown, lurking variables or inherent variability." A caution that applies to R2, as to other statistical descriptions of correlation and association is that "correlation does not imply causation." In other words, while correlations may provide valuable clues regarding causal relationships among variables, a high correlation between two variables does not represent adequate evidence that changing one variable has resulted, or may result, from changes of other variables. [/rquoter] Wow, you have nice way to quote stuff, got to learn that trick. Good you quote wiki, hopefully now you could understand you interpreted your own graph wrong. Oh, you read stuff by full paragraph, not by key words, right?
"Approximately seventy percent of the variation in the response variable can be explained by the explanatory variable" does not equal to "70% variation". And it explain the R square meaning, yet does not say people use 30% for a 0.3 R square. I have been using R square calculation all the time, never see people use percentage to represent this statistical calculation. Just like people never use 530% for $5.30. Even though mathematically equivalent.
Let's not be rude. I'm not interested in getting into a heated argument on this topic. Let's stick to what each of us has said, rather than make tiresome assumptions about eachother's agenda or competence. Are you agreeing or disagreeing here that one can intepret R squared as percent of variation in one variable explained by another. You wrote that I didn't understand what R squared means, but what were the particular words I used that you took exception to? The article says that R-squared "may provide valuable clues regarding causal relationship" though it can not be considered adequate evidence that changes in variables are inter-dependent. I feel this is in keeping with the words I used:
I don't understand. I'll quote myself again: How is that different from: "Approximately X percent of the variation in the response variable can be explained by the explanatory variable"?
Durvasa, Thanks for putting this together. This sounds like stuff we already know or kind of assumed. Coach doesn't trust Lin (inconsistency) and Parsons is overrated. Savvy beyond years?...only on here.
I agree. How many games has McHale coached any teams in the playoffs? The reality is the whole team and the coaches need time to learn and grow, so we'd better check our emotional reactions to game wins and losses a little bit based on that. PS: I have all the respect for McHale as an NBA Hall of Fame player, but just to quote him, "I don't know" for McHale as a head coach for a title contending team.
Durvasa, I appreciate the effort you put into this analysis to help us enjoy the NBA games more with the left side of our brain.
As always, great post. I think as a casual eye test observation (even though I like to avoid those), Asik's minutes seem to vary based on his fatigue as well. Even when he's performing well McHale may sub him out due to that factor. He seems to tire out more quickly than the other rotation guys.
I thought it might be interesting to zero in on how the metrics of players who are at the same positions relate to eachother. I did this with Lin and Douglas: Correlation Table ---------------------------------------------------------------------------------- Lin Douglas Min GmScr/min +/-/min Min GmScr/min +/-/min ---------------------------------------------------------------------------------- Min +1.000 +0.454 +0.391 -0.448 -0.322 -0.168 Lin GmScr/min +0.454 +1.000 +0.525 +0.188 -0.212 -0.011 +/-/min +0.391 +0.525 +1.000 -0.159 -0.238 -0.137 ----------------------------------------------------------------------------------- Min -0.448 +0.188 -0.159 +1.000 +0.448 +0.198 Douglas GmScr/min -0.322 -0.212 -0.238 +0.448 +1.000 +0.467 +/-/min -0.168 -0.011 -0.137 +0.190 +0.467 +1.000 ----------------------------------------------------------------------------------- R squared Table ---------------------------------------------------------------------------------- Lin Douglas Min GmScr/min +/-/min Min GmScr/min +/-/min ---------------------------------------------------------------------------------- Min 100.0% 20.6% 15.3% 20.1% 10.4% 2.8% Lin GmScr/min 20.6% 100.0% 27.6% 3.5% 4.5% 0.0% +/-/min 15.3% 27.6% 100.0% 2.5% 5.7% 1.9% ---------------------------------------------------------------------------------- Min 20.1% 3.5% 2.5% 100.0% 20.1% 3.6% Douglas GmScr/min 10.4% 4.5% 5.7% 20.1% 100.0% 21.8% +/-/min 2.8% 0.0% 1.9% 3.6% 21.8% 100.0% ---------------------------------------------------------------------------------- So, based on the sample of games in which both players played this season, about 20.1% of the variation in one player's minutes can be explained by the other player's minutes (though, of course, negatively correlated), which is roughly on the same scale as how much the variation in their minutes is explained by their own per-minute statistical performance. However, discarding the San Antonio game that was played without Harden (which, I had a hunch, distorted these relationships) changes the picture: Correlation Table ---------------------------------------------------------------------------------- Lin Douglas Min GmScr/min +/-/min Min GmScr/min +/-/min ---------------------------------------------------------------------------------- Min +1.000 +0.395 +0.419 -0.675 -0.347 -0.169 Lin GmScr/min +0.395 +1.000 +0.616 -0.040 -0.265 -0.002 +/-/min +0.419 +0.616 +1.000 -0.152 -0.236 -0.138 ----------------------------------------------------------------------------------- Min -0.675 -0.040 -0.152 +1.000 +0.484 +0.228 Douglas GmScr/min -0.347 -0.265 -0.236 +0.484 +1.000 +0.469 +/-/min -0.169 -0.002 -0.138 +0.228 +0.469 +1.000 ----------------------------------------------------------------------------------- R squared Table ---------------------------------------------------------------------------------- Lin Douglas Min GmScr/min +/-/min Min GmScr/min +/-/min ---------------------------------------------------------------------------------- Min 100.0% 15.6% 17.5% 45.5% 12.0% 2.9% Lin GmScr/min 15.6% 100.0% 38.0% 0.2% 7.0% 0.0% +/-/min 17.5% 38.0% 100.0% 2.3% 5.6% 1.9% ---------------------------------------------------------------------------------- Min 45.5% 0.2% 2.3% 100.0% 23.4% 5.2% Douglas GmScr/min 12.0% 7.0% 5.6% 23.4% 100.0% 22.0% +/-/min 2.9% 0.0% 1.9% 5.2% 22.0% 100.0% ----------------------------------------------------------------------------------
Over all the games it seems that the minutes of the other player at the position doesn't do any better at explaining the variation in a player's minutes than his individual performance (both ~20%) which I thought to be a curious result. However, if we discard the San Antonio game when both players were forced to play big minutes due to other circumstances, the picture aligns more with what one would expect. Now, ~45% of the variation in their minutes can be explained by the minutes of the other player, while individual performance explains ~16% and ~23% for Lin and Douglas, respectively (which is still higher than for the other 4 players). Also, the fact that removing a single game from the sample can so dramatically change the correlation also highlights that this sort of analysis over a 30-35 game sample probably should be taken with a grain of salt. Again, just food for thought.
I have short-term memory, but did McHale exhibit the same lack of trust to Dragic/Lowry last year. Might explain why both point guards were discontent at some point throughout the season. It's possible McHale may just not be a good PG coach.
Nice work. Since you first explained the three stats in order of Min, GmScr/min, +/-/min first, shouldn't you reverse the blue and red metrics in the graph to avoid confusion? I had to keep looking back and forth to better understand the bars order and meaning.
The bars don't correspond to those 3 stats. They correspond to relationship between Min and GmScr/min, Min and +/-/min, GmScr/min and +/-/min, in that order.
Thanks for sticking up for statistical science, especially "correlation does not equal causation". And, Durvasa, thanks for the original analysis which has led to a very interesting discussion.
For last year, I needed to distinguish games when each player started versus came off the bench. So, for 38 games of Lowry as a starter, the variation in his minutes had very little connection to his individual production (2.1%, and the correlation was actually slightly negative) or how the team was doing with him (4.3%). In 9 games as a reserve at the end of the season (caution: very small sample size), his GmScr/min explained 53.1% of the variation in his minutes and his +/-/min with him explained 0.7% of the variation in his minutes. In 38 games as a reserve, his GmScr/min explained 12.8% of the variation in his minutes, and his +/-/min explained 19.8% of that variation. As a starter (28 games), his GmScr/min explained only 6.8% of the variation in his minutes, and his +/-/min explained 24.8% of the variation. In terms of "trust factor", I think McHale trusted Lowry for the first half of the season when he was a starter. Understandably so. Lowry was essentially our leader and one of the more experienced players on the team. He had a very good start to the season, and Dragic was somewhat shaky as a reserve. That's how I'd intepret the fact the correlation between his minutes and in-game factors like GmScr/min and +/-/min were so low.