I am looking to maximize daily fantasy points in the NBA. One of variables I am considering is if number of rest days between games affects a player performs. I pulled all the box score data into an excel. How do I go about confirming my hypothesis that the more rest days that a player has the better he performs. I was thinking about using a one-way anova testing, p value, and f value to determine this. I am in the right direction? I used 2014-2015 Carmelo Anthony data for which he played more than 25 min:He had 8 back to back games, 22 1-full day rest between games, and 10 2-or-more between games. I found the mean and std for the following groups: Back to Back: 31,9 1 full day rest: 39,9.6 2 or more rest: 38, 8 I ran a one-way anova test and received a p value of less than .01 and F value of 2.5. From these numbers there seems to be a slight correlation. How do I generalize this for the whole NBA?
Statistics doesn't hold in sports because you don't have a control. If Melo gets 3 days of rest and plays against the Sixers, opposed to a tail end back to back against say the Warriors... Not to mention injuries, off the court issues, etc. Too many variables to really make any sort of meaningful statistical model for sports.
I know there is a large amount of variance. However, through out the course of the season those variances even out.
I just want to conclude a general statement: in general, does back to back affect player's performance? there are many stats test you can run. I just don't know which one, or if any at all can be applied to this situation. I am just looking for stats advice. Thanks for the constructive input!
I am looking at randomizing all nba teams into three groups: Back to Back, 1 full day rest, and 2 or more of full day rest. Then, group 1 will only include the mean average of back to back games. Next, group 2 will use the mean average of 1 full day rest. Lastly, group 3 will use the mean average of 2 or more days rest. From the results, the F value will be more important the p value, as long as it is above .2, I am happy. The last test I will do is a post hoc analysis to measure the variance between each group. ie 1 and 3.... 2 and 3..... 1 and 2. I just want to conclude a general assumption. Nothing less, nothing more.
You should post your question in the following forum rather than Clutchfans Hangout: http://apbr.org/metrics/viewforum.php?f=2
I believe Haralabos Voulgaris said so and he's broken all this stuff down into a science. I think he has a PhD statistician on staff. You can Google around to see if he's been quoted or just ask on Twitter.
What you could do is go with a new model, a Gaussian Naive Bayes classifier and spit out a probability on a range of fantasy points (assuming a combination of steals/points/ASTs/rebounds) depending on a set of previously observed variables. You could predict from # hours of rest (getting even more granular than days), but other interesting variables to consider would be the opposing team's current defensive rating, and the pace of the team in question (a proxy for in-game extertion hard to define with the standard of minutes played--you playing 30 minutes with the slowest-paced team in the NBA could be a lot less tiring then the fastest-paced). Bayes > Frequentist for this Here's a tutorial (ignore the math, you can implement with sci-kit learn/Python without knowing any of the equations if you know Python or have some programming experience, if not skip this section): http://www.autonlab.org/tutorials/gaussbc12.pdf The programming implementation can be just a few lines if you know Python: http://scikit-learn.org/stable/modules/naive_bayes.html if you don't know Python, Bayes, or machine learning: http://learnpythonthehardway.org/ then http://www.diveintopython.net/ then https://github.com/hangtwenty/dive-into-machine-learning then https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/ ------------------------------------------------ if that's not the path you want to take, I'd probably pay up for https://downloads.nbastuffer.com/nba-team-data-sets?ap_id=nbastuffer or find a way to get data from http://www.basketball-reference.com/ and continue the work you've been doing. That paid data set looks like it has everything you'd need (# days of rest, pace, opposing team defensive efficiency etc.), and I know basketball reference has everything you'd need, though I don't know what export options there are with Basketball Reference. With larger data sets that still fit in the memory of your computer, you may want to consider switching to Pandas/Python code to read CSVs/spreadsheet documents instead of using Excel, since Excel has a lot of overhead for even reasonably sized spreadsheets. http://jvns.ca/blog/2013/12/22/cooking-with-pandas/ Be aware of the fact that if the mean and STD are for minutes played and not daily fantasy points generated, what you've just figured out is that Carmelo Anthony plays a significant amount more when the Knicks are not in a back-to-back situation (this is probably due to the coaches deciding to rest him)--but that isn't what you're looking for unless that is strongly correlated with # amount of fantasy points (i.e for every % increase in minutes, there is a corresponding proportional % increase in fantasy points). Under that case, if you generalized your current model to all players, you'd just be figuring out that players tend to play less minutes in back-to-backs, not that they'd be better players as a whole. It's a reasonable assumption to say the two may be correlated, but you're not looking for exactly what you need to look for to figure that out. You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test. (random question, why did you decide to make the cutoff 25 minutes? Was that a certain significance level for the mean of minutes Carmelo plays in a "normal, injury-free" situation, or an arbitrary metric?)
Great information. I will look into the suggested material to further analyze the information. I retrieved all the box score information by using python to extract all the player ID's from espn, since that is what only thing that differentiates between players. Basketball reference was not so easy... I am more of an excel/VBA person than a web scraper. I used this format (http://espn.go.com/nba/player/gamelog/_/id/[playerID]/year/2015/) into an .iqy file. I used that file as a reference for for collecting external data. I used Carmelo as an example. His player ID is 1975. This method is robust, but it got the job done. You are right. I need to see how many minutes were played during back to backs. Then, make a correlation between minutes and fantasy points. I chose 25 minutes just as a general marker. I am not sure what you mean by, "You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test." Could you possibly Eli5? lol
Hope it helps. Naive Bayes isn't the best for this, especially if you're looking to estimate a rock solid probability of fantasy points as opposed to classifying what sort of performance (chance of 20+ or 20- fantasy points) you would expect, but it's fairly easy to get up and running. Your method of scraping right now, hopefully you have a loop to collect resources over a large amount of player pages from a range and ESPN doesn't have a weird method of distributing player profiles. I can see that becoming pretty heavy if you were looking to do an aggregate of all players. This might help, they have a free trial going: https://probasketballapi.com/ (An API or way to automatically source data on NBA stats, including the advanced stuff they're collecting on player motion in-game, so you can run your analysis on an aggregated collection of player data like we talked about, and maybe add cool variables like the number of meters the player actually runs in-game). With regards to ELI5 What I meant was you could do the exact same thing you did with minutes, a one-way ANOVA to differentiate between distributions of a variable (in your case minutes played) in three conditions (back-to-back, 1 day full rest, 2 or more rest) and do it with fantasy points. I.e get the mean and STD of fantasy points in those three situations and run the exact same test. You could get the number of fantasy points under each situation by creating an extra column which sourced from the points/reb/ast or whatever that forms your fantasy pool rules (ex if a point is worth 1 point, a rebound is worth 2, an AST is worth 3, sum up 10 points, 2 rebounds ,and 3 assists to make up 23 fantasy points for that game). Make that column of fantasy points, then run the same analysis you did with minutes played. You wouldn't even have to do the extra step of calculating minutes played then the correlation to fantasy points if you did this.