1. Welcome! Please take a few seconds to create your free account to post threads, make some friends, remove a few ads while surfing and much more. ClutchFans has been bringing fans together to talk Houston Sports since 1996. Join us!

Data analyst and statisticians, nba data trend

Discussion in 'BBS Hangout' started by grt004, Nov 27, 2015.

  1. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    I am looking to maximize daily fantasy points in the NBA. One of variables I am considering is if number of rest days between games affects a player performs. I pulled all the box score data into an excel. How do I go about confirming my hypothesis that the more rest days that a player has the better he performs.

    I was thinking about using a one-way anova testing, p value, and f value to determine this. I am in the right direction?

    I used 2014-2015 Carmelo Anthony data for which he played more than 25 min:He had 8 back to back games, 22 1-full day rest between games, and 10 2-or-more between games.

    I found the mean and std for the following groups:
    Back to Back: 31,9
    1 full day rest: 39,9.6
    2 or more rest: 38, 8

    I ran a one-way anova test and received a p value of less than .01 and F value of 2.5.
    From these numbers there seems to be a slight correlation. How do I generalize this for the whole NBA?
     
  2. Cohete Rojo

    Cohete Rojo Contributing Member

    Joined:
    Oct 29, 2009
    Messages:
    10,344
    Likes Received:
    1,203
    So you like to gamble?
     
  3. CCorn

    CCorn Member

    Joined:
    Dec 26, 2010
    Messages:
    21,439
    Likes Received:
    21,234
    Mean std? They make a pill for that.
     
  4. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    More like a website homie
     
  5. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    standard deviation
     
  6. moestavern19

    moestavern19 Member

    Joined:
    Dec 8, 1999
    Messages:
    39,003
    Likes Received:
    3,637
    Fantasylabs.com
     
  7. RedRedemption

    RedRedemption Contributing Member

    Joined:
    Jul 21, 2009
    Messages:
    32,470
    Likes Received:
    7,648
    Statistics doesn't hold in sports because you don't have a control. If Melo gets 3 days of rest and plays against the Sixers, opposed to a tail end back to back against say the Warriors...

    Not to mention injuries, off the court issues, etc.

    Too many variables to really make any sort of meaningful statistical model for sports.
     
  8. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    I know there is a large amount of variance. However, through out the course of the season those variances even out.
     
  9. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    I just want to conclude a general statement: in general, does back to back affect player's performance? there are many stats test you can run. I just don't know which one, or if any at all can be applied to this situation. I am just looking for stats advice. Thanks for the constructive input!
     
  10. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    I am looking at randomizing all nba teams into three groups: Back to Back, 1 full day rest, and 2 or more of full day rest. Then, group 1 will only include the mean average of back to back games. Next, group 2 will use the mean average of 1 full day rest. Lastly, group 3 will use the mean average of 2 or more days rest.

    From the results, the F value will be more important the p value, as long as it is above .2, I am happy. The last test I will do is a post hoc analysis to measure the variance between each group. ie 1 and 3.... 2 and 3..... 1 and 2.

    I just want to conclude a general assumption. Nothing less, nothing more.
     
  11. Mr. Brightside

    Mr. Brightside Contributing Member

    Joined:
    Mar 27, 2005
    Messages:
    18,950
    Likes Received:
    2,137
    Is this gambling? If so, it is a sin and the Lord and government frowns down upon you.
     
  12. durvasa

    durvasa Contributing Member

    Joined:
    Feb 11, 2006
    Messages:
    37,997
    Likes Received:
    15,461
    You should post your question in the following forum rather than Clutchfans Hangout:

    http://apbr.org/metrics/viewforum.php?f=2
     
    1 person likes this.
  13. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
  14. fallenphoenix

    fallenphoenix Contributing Member

    Joined:
    Jun 20, 2009
    Messages:
    9,821
    Likes Received:
    1,619
    run a regression and check the adjusted R square
     
  15. professorjay

    professorjay Contributing Member

    Joined:
    Oct 20, 2006
    Messages:
    9,676
    Likes Received:
    388
    I believe Haralabos Voulgaris said so and he's broken all this stuff down into a science. I think he has a PhD statistician on staff.

    You can Google around to see if he's been quoted or just ask on Twitter.
     
  16. Cohete Rojo

    Cohete Rojo Contributing Member

    Joined:
    Oct 29, 2009
    Messages:
    10,344
    Likes Received:
    1,203
    Why don't you just break everything down into a RDB and starting SQLing?
     
  17. Northside Storm

    Northside Storm Contributing Member

    Joined:
    Dec 24, 2007
    Messages:
    11,262
    Likes Received:
    450
    What you could do is go with a new model, a Gaussian Naive Bayes classifier and spit out a probability on a range of fantasy points (assuming a combination of steals/points/ASTs/rebounds) depending on a set of previously observed variables. You could predict from # hours of rest (getting even more granular than days), but other interesting variables to consider would be the opposing team's current defensive rating, and the pace of the team in question (a proxy for in-game extertion hard to define with the standard of minutes played--you playing 30 minutes with the slowest-paced team in the NBA could be a lot less tiring then the fastest-paced).

    Bayes > Frequentist for this

    Here's a tutorial (ignore the math, you can implement with sci-kit learn/Python without knowing any of the equations if you know Python or have some programming experience, if not skip this section):

    http://www.autonlab.org/tutorials/gaussbc12.pdf

    The programming implementation can be just a few lines if you know Python:

    http://scikit-learn.org/stable/modules/naive_bayes.html

    if you don't know Python, Bayes, or machine learning:

    http://learnpythonthehardway.org/

    then

    http://www.diveintopython.net/

    then

    https://github.com/hangtwenty/dive-into-machine-learning

    then

    https://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/

    ------------------------------------------------

    if that's not the path you want to take, I'd probably pay up for https://downloads.nbastuffer.com/nba-team-data-sets?ap_id=nbastuffer or find a way to get data from http://www.basketball-reference.com/ and continue the work you've been doing. That paid data set looks like it has everything you'd need (# days of rest, pace, opposing team defensive efficiency etc.), and I know basketball reference has everything you'd need, though I don't know what export options there are with Basketball Reference.

    With larger data sets that still fit in the memory of your computer, you may want to consider switching to Pandas/Python code to read CSVs/spreadsheet documents instead of using Excel, since Excel has a lot of overhead for even reasonably sized spreadsheets.

    http://jvns.ca/blog/2013/12/22/cooking-with-pandas/

    Be aware of the fact that if the mean and STD are for minutes played and not daily fantasy points generated, what you've just figured out is that Carmelo Anthony plays a significant amount more when the Knicks are not in a back-to-back situation (this is probably due to the coaches deciding to rest him)--but that isn't what you're looking for unless that is strongly correlated with # amount of fantasy points (i.e for every % increase in minutes, there is a corresponding proportional % increase in fantasy points).

    Under that case, if you generalized your current model to all players, you'd just be figuring out that players tend to play less minutes in back-to-backs, not that they'd be better players as a whole. It's a reasonable assumption to say the two may be correlated, but you're not looking for exactly what you need to look for to figure that out. You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test.

    (random question, why did you decide to make the cutoff 25 minutes? Was that a certain significance level for the mean of minutes Carmelo plays in a "normal, injury-free" situation, or an arbitrary metric?)
     
    #17 Northside Storm, Nov 28, 2015
    Last edited: Nov 28, 2015
    1 person likes this.
  18. grt004

    grt004 Member

    Joined:
    Jun 26, 2012
    Messages:
    123
    Likes Received:
    9
    Great information. I will look into the suggested material to further analyze the information. I retrieved all the box score information by using python to extract all the player ID's from espn, since that is what only thing that differentiates between players. Basketball reference was not so easy... I am more of an excel/VBA person than a web scraper. I used this format (http://espn.go.com/nba/player/gamelog/_/id/[playerID]/year/2015/) into an .iqy file. I used that file as a reference for for collecting external data. I used Carmelo as an example. His player ID is 1975. This method is robust, but it got the job done.

    You are right. I need to see how many minutes were played during back to backs. Then, make a correlation between minutes and fantasy points. I chose 25 minutes just as a general marker.

    I am not sure what you mean by, "You need to use daily fantasy points accrued to make that measure, which you could do by applying your league rules to previous box scores, and then measuring for variance between the three conditions you used with the same test." Could you possibly Eli5? lol
     
  19. Northside Storm

    Northside Storm Contributing Member

    Joined:
    Dec 24, 2007
    Messages:
    11,262
    Likes Received:
    450
    Hope it helps. Naive Bayes isn't the best for this, especially if you're looking to estimate a rock solid probability of fantasy points as opposed to classifying what sort of performance (chance of 20+ or 20- fantasy points) you would expect, but it's fairly easy to get up and running.

    Your method of scraping right now, hopefully you have a loop to collect resources over a large amount of player pages from a range and ESPN doesn't have a weird method of distributing player profiles. I can see that becoming pretty heavy if you were looking to do an aggregate of all players.

    This might help, they have a free trial going: https://probasketballapi.com/
    (An API or way to automatically source data on NBA stats, including the advanced stuff they're collecting on player motion in-game, so you can run your analysis on an aggregated collection of player data like we talked about, and maybe add cool variables like the number of meters the player actually runs in-game).

    With regards to ELI5

    What I meant was you could do the exact same thing you did with minutes, a one-way ANOVA to differentiate between distributions of a variable (in your case minutes played) in three conditions (back-to-back, 1 day full rest, 2 or more rest) and do it with fantasy points. I.e get the mean and STD of fantasy points in those three situations and run the exact same test.

    You could get the number of fantasy points under each situation by creating an extra column which sourced from the points/reb/ast or whatever that forms your fantasy pool rules (ex if a point is worth 1 point, a rebound is worth 2, an AST is worth 3, sum up 10 points, 2 rebounds ,and 3 assists to make up 23 fantasy points for that game). Make that column of fantasy points, then run the same analysis you did with minutes played.

    You wouldn't even have to do the extra step of calculating minutes played then the correlation to fantasy points if you did this.
     
    #19 Northside Storm, Nov 28, 2015
    Last edited: Nov 28, 2015

Share This Page

  • About ClutchFans

    Since 1996, ClutchFans has been loud and proud covering the Houston Rockets, helping set an industry standard for team fan sites. The forums have been a home for Houston sports fans as well as basketball fanatics around the globe.

  • Support ClutchFans!

    If you find that ClutchFans is a valuable resource for you, please consider becoming a Supporting Member. Supporting Members can upload photos and attachments directly to their posts, customize their user title and more. Gold Supporters see zero ads!


    Upgrade Now