Data analyst and statisticians, nba data trend

grt004 · Nov 27, 2015

I am looking to maximize daily fantasy points in the NBA. One of variables I am considering is if number of rest days between games affects a player performs. I pulled all the box score data into an excel. How do I go about confirming my hypothesis that the more rest days that a player has the better he performs.

I was thinking about using a one-way anova testing, p value, and f value to determine this. I am in the right direction?

I used 2014-2015 Carmelo Anthony data for which he played more than 25 min:He had 8 back to back games, 22 1-full day rest between games, and 10 2-or-more between games.

I found the mean and std for the following groups:
Back to Back: 31,9
1 full day rest: 39,9.6
2 or more rest: 38, 8

I ran a one-way anova test and received a p value of less than .01 and F value of 2.5.
From these numbers there seems to be a slight correlation. How do I generalize this for the whole NBA?

Cohete Rojo · Nov 27, 2015

So you like to gamble?

CCorn · Nov 27, 2015

Mean std? They make a pill for that.

grt004 · Nov 27, 2015

Cohete Rojo said: ↑

So you like to gamble?
Click to expand...

More like a website homie

grt004 · Nov 27, 2015

CCorn said: ↑

Mean std? They make a pill for that.
Click to expand...

standard deviation

moestavern19 · Nov 27, 2015

Fantasylabs.com

RedRedemption · Nov 27, 2015

Statistics doesn't hold in sports because you don't have a control. If Melo gets 3 days of rest and plays against the Sixers, opposed to a tail end back to back against say the Warriors...

Not to mention injuries, off the court issues, etc.

Too many variables to really make any sort of meaningful statistical model for sports.

grt004 · Nov 27, 2015

RedRedemption said: ↑

Statistics doesn't hold in sports because you don't have a control. If Melo gets 3 days of rest and plays against the Sixers, opposed to a tail end back to back against say the Warriors...

Not to mention injuries, off the court issues, etc.

Too many variables to really make any sort of meaningful statistical model for sports.
Click to expand...

I know there is a large amount of variance. However, through out the course of the season those variances even out.

grt004 · Nov 27, 2015

RedRedemption said: ↑

Statistics doesn't hold in sports because you don't have a control. If Melo gets 3 days of rest and plays against the Sixers, opposed to a tail end back to back against say the Warriors...

Not to mention injuries, off the court issues, etc.

Too many variables to really make any sort of meaningful statistical model for sports.
Click to expand...

I just want to conclude a general statement: in general, does back to back affect player's performance? there are many stats test you can run. I just don't know which one, or if any at all can be applied to this situation. I am just looking for stats advice. Thanks for the constructive input!

grt004 · Nov 27, 2015

I am looking at randomizing all nba teams into three groups: Back to Back, 1 full day rest, and 2 or more of full day rest. Then, group 1 will only include the mean average of back to back games. Next, group 2 will use the mean average of 1 full day rest. Lastly, group 3 will use the mean average of 2 or more days rest.

From the results, the F value will be more important the p value, as long as it is above .2, I am happy. The last test I will do is a post hoc analysis to measure the variance between each group. ie 1 and 3.... 2 and 3..... 1 and 2.

I just want to conclude a general assumption. Nothing less, nothing more.

Mr. Brightside · Nov 27, 2015

Is this gambling? If so, it is a sin and the Lord and government frowns down upon you.

durvasa · Nov 27, 2015

grt004 said: ↑

I am looking to maximize daily fantasy points in the NBA. One of variables I am considering is if number of rest days between games affects a player performs. I pulled all the box score data into an excel. How do I go about confirming my hypothesis that the more rest days that a player has the better he performs.

I was thinking about using a one-way anova testing, p value, and f value to determine this. I am in the right direction?

I used 2014-2015 Carmelo Anthony data for which he played more than 25 min:He had 8 back to back games, 22 1-full day rest between games, and 10 2-or-more between games.

I found the mean and std for the following groups:
Back to Back: 31,9
1 full day rest: 39,9.6
2 or more rest: 38, 8

I ran a one-way anova test and received a p value of less than .01 and F value of 2.5.
From these numbers there seems to be a slight correlation. How do I generalize this for the whole NBA?
Click to expand...

You should post your question in the following forum rather than Clutchfans Hangout:

http://apbr.org/metrics/viewforum.php?f=2

grt004 · Nov 27, 2015

durvasa said: ↑

You should post your question in the following forum rather than Clutchfans Hangout:

http://apbr.org/metrics/viewforum.php?f=2
Click to expand...

Repped

fallenphoenix · Nov 27, 2015

run a regression and check the adjusted R square

professorjay · Nov 27, 2015

I believe Haralabos Voulgaris said so and he's broken all this stuff down into a science. I think he has a PhD statistician on staff.

You can Google around to see if he's been quoted or just ask on Twitter.

Cohete Rojo · Nov 27, 2015

grt004 said: ↑

More like a website homie
Click to expand...

Why don't you just break everything down into a RDB and starting SQLing?

Northside Storm · Nov 28, 2015

What you could do is go with a new model, a Gaussian Naive Bayes classifier and spit out a probability on a range of fantasy points (assuming a combination of steals/points/ASTs/rebounds) depending on a set of previously observed variables. You could predict from # hours of rest (getting even more granular than days), but other interesting variables to consider would be the opposing team's current defensive rating, and the pace of the team in question (a proxy for in-game extertion hard to define with the standard of minutes played--you playing 30 minutes with the slowest-paced team in the NBA could be a lot less tiring then the fastest-paced).

Bayes > Frequentist for this

Here's a tutorial (ignore the math, you can implement with sci-kit learn/Python without knowing any of the equations if you know Python or have some programming experience, if not skip this section):

http://www.autonlab.org/tutorials/gaussbc12.pdf

The programming implementation can be just a few lines if you know Python:

http://scikit-learn.org/stable/modules/naive_bayes.html

if you don't know Python, Bayes, or machine learning:

http://learnpythonthehardway.org/

then

http://www.diveintopython.net/

then

https://github.com/hangtwenty/dive-into-machine-learning

then

https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

------------------------------------------------

if that's not the path you want to take, I'd probably pay up for https://downloads.nbastuffer.com/nba-team-data-sets?ap_id=nbastuffer or find a way to get data from http://www.basketball-reference.com/ and continue the work you've been doing. That paid data set looks like it has everything you'd need (# days of rest, pace, opposing team defensive efficiency etc.), and I know basketball reference has everything you'd need, though I don't know what export options there are with Basketball Reference.

With larger data sets that still fit in the memory of your computer, you may want to consider switching to Pandas/Python code to read CSVs/spreadsheet documents instead of using Excel, since Excel has a lot of overhead for even reasonably sized spreadsheets.

http://jvns.ca/blog/2013/12/22/cooking-with-pandas/

Be aware of the fact that if the mean and STD are for minutes played and not daily fantasy points generated, what you've just figured out is that Carmelo Anthony plays a significant amount more when the Knicks are not in a back-to-back situation (this is probably due to the coaches deciding to rest him)--but that isn't what you're looking for unless that is strongly correlated with # amount of fantasy points (i.e for every % increase in minutes, there is a corresponding proportional % increase in fantasy points).

Under that case, if you generalized your current model to all players, you'd just be figuring out that players tend to play less minutes in back-to-backs, not that they'd be better players as a whole. It's a reasonable assumption to say the two may be correlated, but you're not looking for exactly what you need to look for to figure that out. You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test.

(random question, why did you decide to make the cutoff 25 minutes? Was that a certain significance level for the mean of minutes Carmelo plays in a "normal, injury-free" situation, or an arbitrary metric?)

grt004 · Nov 28, 2015

Northside Storm said: ↑

What you could do is go with a new model, a Gaussian Naive Bayes classifier and spit out a probability on a range of fantasy points (assuming a combination of steals/points/ASTs/rebounds) depending on a set of previously observed variables. You could predict from # hours of rest (getting even more granular than days), but other interesting variables to consider would be the opposing team's current defensive rating, and the pace of the team in question (a proxy for in-game extertion hard to define with the standard of minutes played--you playing 30 minutes with the slowest-paced team in the NBA could be a lot less tiring then the fastest-paced).

Bayes > Frequentist for this

Here's a tutorial (ignore the math, you can implement with sci-kit learn/Python without knowing any of the equations if you know Python or have some programming experience, if not skip this section):

http://www.autonlab.org/tutorials/gaussbc12.pdf

The programming implementation can be just a few lines if you know Python:

http://scikit-learn.org/stable/modules/naive_bayes.html

if you don't know Python, Bayes, or machine learning:

http://learnpythonthehardway.org/

then

http://www.diveintopython.net/

then

https://github.com/hangtwenty/dive-into-machine-learning

then

https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

------------------------------------------------

if that's not the path you want to take, I'd probably pay up for https://downloads.nbastuffer.com/nba-team-data-sets?ap_id=nbastuffer or find a way to get data from http://www.basketball-reference.com/ and continue the work you've been doing. That paid data set looks like it has everything you'd need (# days of rest, pace, opposing team defensive efficiency etc.), and I know basketball reference has everything you'd need, though I don't know what export options there are with Basketball Reference.

With larger data sets that still fit in the memory of your computer, you may want to consider switching to Pandas/Python code to read CSVs/spreadsheet documents instead of using Excel, since Excel has a lot of overhead for even reasonably sized spreadsheets.

http://jvns.ca/blog/2013/12/22/cooking-with-pandas/

Be aware of the fact that if the mean and STD are for minutes played and not daily fantasy points generated, what you've just figured out is that Carmelo Anthony plays a significant amount more when the Knicks are not in a back-to-back situation (this is probably due to the coaches deciding to rest him)--but that isn't what you're looking for unless that is strongly correlated with # amount of fantasy points (i.e for every % increase in minutes, there is a corresponding proportional % increase in fantasy points).

Under that case, if you generalized your current model to all players, you'd just be figuring out that players tend to play less minutes in back-to-backs, not that they'd be better players as a whole. It's a reasonable assumption to say the two may be correlated, but you're not looking for exactly what you need to look for to figure that out. You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test.

(random question, why did you decide to make the cutoff 25 minutes? Was that a certain significance level for the mean of minutes Carmelo plays in a "normal, injury-free" situation, or an arbitrary metric?)
Click to expand...

Great information. I will look into the suggested material to further analyze the information. I retrieved all the box score information by using python to extract all the player ID's from espn, since that is what only thing that differentiates between players. Basketball reference was not so easy... I am more of an excel/VBA person than a web scraper. I used this format (http://espn.go.com/nba/player/gamelog/_/id/[playerID]/year/2015/) into an .iqy file. I used that file as a reference for for collecting external data. I used Carmelo as an example. His player ID is 1975. This method is robust, but it got the job done.

You are right. I need to see how many minutes were played during back to backs. Then, make a correlation between minutes and fantasy points. I chose 25 minutes just as a general marker.

I am not sure what you mean by, "You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test." Could you possibly Eli5? lol

Northside Storm · Nov 28, 2015

grt004 said: ↑

Great information. I will look into the suggested material to further analyze the information. I retrieved all the box score information by using python to extract all the player ID's from espn, since that is what only thing that differentiates between players. Basketball reference was not so easy... I am more of an excel/VBA person than a web scraper. I used this format (http://espn.go.com/nba/player/gamelog/_/id/[playerID]/year/2015/) into an .iqy file. I used that file as a reference for for collecting external data. I used Carmelo as an example. His player ID is 1975. This method is robust, but it got the job done.

You are right. I need to see how many minutes were played during back to backs. Then, make a correlation between minutes and fantasy points. I chose 25 minutes just as a general marker.

I am not sure what you mean by, "You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test." Could you possibly Eli5? lol
Click to expand...

Hope it helps. Naive Bayes isn't the best for this, especially if you're looking to estimate a rock solid probability of fantasy points as opposed to classifying what sort of performance (chance of 20+ or 20- fantasy points) you would expect, but it's fairly easy to get up and running.

Your method of scraping right now, hopefully you have a loop to collect resources over a large amount of player pages from a range and ESPN doesn't have a weird method of distributing player profiles. I can see that becoming pretty heavy if you were looking to do an aggregate of all players.

This might help, they have a free trial going: https://probasketballapi.com/
(An API or way to automatically source data on NBA stats, including the advanced stuff they're collecting on player motion in-game, so you can run your analysis on an aggregated collection of player data like we talked about, and maybe add cool variables like the number of meters the player actually runs in-game).

With regards to ELI5

I am not sure what you mean by, "You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test."
Click to expand...

What I meant was you could do the exact same thing you did with minutes, a one-way ANOVA to differentiate between distributions of a variable (in your case minutes played) in three conditions (back-to-back, 1 day full rest, 2 or more rest) and do it with fantasy points. I.e get the mean and STD of fantasy points in those three situations and run the exact same test.

You could get the number of fantasy points under each situation by creating an extra column which sourced from the points/reb/ast or whatever that forms your fantasy pool rules (ex if a point is worth 1 point, a rebound is worth 2, an AST is worth 3, sum up 10 points, 2 rebounds ,and 3 assists to make up 23 fantasy points for that game). Make that column of fantasy points, then run the same analysis you did with minutes played.

You wouldn't even have to do the extra step of calculating minutes played then the correlation to fantasy points if you did this.

Forums

Data analyst and statisticians, nba data trend

grt004 Member

Cohete Rojo Contributing Member

CCorn Member

grt004 Member

grt004 Member

moestavern19 Member

RedRedemption Contributing Member

grt004 Member

grt004 Member

grt004 Member

Mr. Brightside Contributing Member

durvasa Contributing Member

grt004 Member

fallenphoenix Contributing Member

professorjay Contributing Member

Cohete Rojo Contributing Member

Northside Storm Contributing Member

grt004 Member

Northside Storm Contributing Member

Share This Page

About ClutchFans

Rockets Content

Support ClutchFans!