1. Welcome! Please take a few seconds to create your free account to post threads, make some friends, remove a few ads while surfing and much more. ClutchFans has been bringing fans together to talk Houston Sports since 1996. Join us!

Jeremy Lin out 2 weeks with grade one knee sprain (UPDATE: Will return vs. Portland)

Discussion in 'Houston Rockets: Game Action & Roster Moves' started by thesonofsam, Nov 29, 2013.

  1. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
    On sample size:

    No one has a definitive definition of how many player minutes comprise a statistically significant sample size.

    Any sample size mathematical based post will be subject to ridicule from posters who struggled with college algebra.

    Verifying sample size significance is impossible using the amount of data generated by the NBA. Analysts struggle to attain a 95% confidence level using a full seasons data. So obviously this young season will generate ample discussion about sample size.

    I can say that CF members should not post their opinion about sample size significance unless they have at least a B.S. in math. Or the equivalent.

    Over the last few years there are new data collection NBA sites that provide insights into specific performance metrics of NBA players. Many even have their own imbedded sample size filters. Use those whenever possible.

    Good luck to us all this season. We will revisit this topic.
     
  2. meh

    meh Contributing Member

    Joined:
    Jun 16, 2002
    Messages:
    15,386
    Likes Received:
    2,259
    Interesting. But shouldn't there be differences depending on the type of stat? I know 3 pointers in particular has been a kind of stat that can fluctuate from year to year because many players don't take that many. But stats like rebounds has generally been pretty stable for players.

    Also, since NBA teams, like the Rockets, do need to make decisions based on in-season stats, what kind of confidence level are they looking at here? Suppose they look at Portland and their hot start. Say, Mathews shooting this year. How would the Rockets go about determining if Mathews incredible shooting is just due to lucky start and not him becoming better?
     
  3. lfw

    lfw Rookie

    Joined:
    Nov 15, 2012
    Messages:
    1,221
    Likes Received:
    33
    But if you wanted to, you can prove it right?
     
  4. Air Langhi

    Air Langhi Contributing Member

    Joined:
    Aug 26, 2000
    Messages:
    21,625
    Likes Received:
    6,257
    Use the student t distribution. It was made for small sample sizes.

    I think Beverly sucks and would actually prefer Lin, but I can't believe for how much LOF fans want to try to discredit Bev. Stats can be interpreted any way you want.
     
  5. lfw

    lfw Rookie

    Joined:
    Nov 15, 2012
    Messages:
    1,221
    Likes Received:
    33
    LMAO. Sample size was the primary argument against Lin's Linsanity stats last season. Fair is fair.
     
  6. gene18

    gene18 Rookie

    Joined:
    Dec 29, 2012
    Messages:
    990
    Likes Received:
    23
    Becareful. The last time I looked at the 82games web site (yesterday) it only contained data up to 11/27/13.
     
  7. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
    The answer is yes. All NBA players other than young improving players revert to their mean. Thus we can be confident that George's growth as a player is more significant than the jump in performance of Monta this season.

    Rebounds are an intriguing stat. One that has shown to have relevance to as to how a college player will transition to the NBA. I honestly have no idea why that is true, but it exhibits a high confidence level.

    How the Rockets scout teams? I honestly have no idea. But I would imagine they put significant weight on the last 10 and 3 games or so of player performance. Hot streaks, or momentum, or whatever predictions are intractable mathematical problems. However scheming for them after the fact is simple.

    But what data is Morey using to sign someone like Casspi? Morey has a stats degree and I only have a math degree - big difference. But I would imagine it involves a comprehensive analysis of the current system a player is constrained by. Along with a stats based analysis of a players strengths. Inefficiencies occur when a player's strengths are minimized by the system they are playing within.

    I hope I have been informative.
     
  8. JustAGuy

    JustAGuy Member

    Joined:
    Dec 17, 2012
    Messages:
    1,464
    Likes Received:
    70
    I only minored in math, so according to the JTR rule :) I shouldn't be saying this, but yes there are differences depending on the stat. Your very next sentence: "I know 3 pointers in particular has been a kind of stat that can fluctuate from year to year..." And you also know that eFG has 3 pointers rolled into it. That should indicate to you that sample size can matter for that stat.
     
  9. eslate22

    eslate22 Member

    Joined:
    Nov 9, 2009
    Messages:
    819
    Likes Received:
    746
     
  10. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
    Stats interpretation is constrained by knowledge. The more a person understands about stats the more they pose relevant questions about a statistical analysis. And of course the reverse is true. The less a person understands about stats the more they pose nonsensical questions about an analysis.
     
  11. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
    Hey. The difference between me minoring in math and obtaining a B.S. was 5 senior level math courses. None in stats. I did say the equivalent didn't I?
     
    #851 jtr, Dec 12, 2013
    Last edited: Dec 12, 2013
  12. jtr

    jtr Contributing Member

    Joined:
    Dec 4, 2011
    Messages:
    7,470
    Likes Received:
    275
    Arggh. Double post.
     
  13. kuku

    kuku Contributing Member

    Joined:
    Jul 17, 2012
    Messages:
    2,158
    Likes Received:
    125
    Brooks' 82.com's Opponent Counterpart eFG% is amazingly low at 35.5%. Does that mean he is our best defensive PG? Most people will agree Brooks is our worst defensive PG yet his opponents' eFG is best among the three.

    When taking into account a certain statistics, the first question to ask is their reliability and validity against eye test and other advance stats. As in this case, they, along with conventional wisdom, do NOT support that Opponent Counterpart eFG% is not a good advanced metric on a player's defensive production.
     
  14. JustAGuy

    JustAGuy Member

    Joined:
    Dec 17, 2012
    Messages:
    1,464
    Likes Received:
    70
    You can spend money and time on a hobby and be happy, but when it becomes a job they'd better pay you what your time is worth. I doubt nba scout pays sufficiently well to be worth it as a job for torocan.
     
  15. lightningbolt

    lightningbolt Member

    Joined:
    Dec 15, 2011
    Messages:
    105
    Likes Received:
    25
    Sweet, I have a degree in Math from Courant. My opinion on sample size significance is that it sold out and is too mainstream, but I was down with it when it was more underground.
     
  16. JustAGuy

    JustAGuy Member

    Joined:
    Dec 17, 2012
    Messages:
    1,464
    Likes Received:
    70
    Yes and no. There is knowledge of statistics and there is knowledge of the domain. You need both to make intelligent statistical observations in a given field, but you can poke holes in an argument with knowledge in only one of the two things.
     
  17. torocan

    torocan Member

    Joined:
    Oct 15, 2012
    Messages:
    4,228
    Likes Received:
    436
    Okay, once again I apologize for the length of this post, but there's no easy way to explain this briefly. To keep it simpler, I will not include the equations and mathematical reasoning for the usage of standard Statistical calculations and assumptions. If folks would like to learn more about basic statistics including mean/average, confidence level and variation/volatility, feel free to search the Internet for good statistical primer sites.

    Now, on to your comments...

    You're right. I was thinking of the estimated return time of 10-14 days and forgot that Beverley returned early.

    At the time of your original post (Dec 9), Beverley missed 1 week. So, he had a sample size of 19 games totaling 1322 minutes.

    So, your original statement was...

    And my reply was....

    At the time of your post, I noted the stats across existing PG's in the line up, mainly to display that the data wasn't confirming the hypothesis that Beverley was being the superior defender, and that the data may be flawed due to sample size.

    The reason sample size is important is sample size determines standard deviation. Standard deviation is basically a statistical method that is used to eliminate Random Noise from a sample size.

    The idea is that over a small number of events, you can have wildly varying results, thus making that data more subject to pure luck than any predictable outcome. For example, you can flip a coin 2x, and 25% of the time you'll get 2 heads, 25% you'll get 2 tails, and 50% you'll get 1 head and 1 tail. If you were to flip a coin 2x and see 2 heads and drew the conclusion that the coin was a 2-headed coin, you would be wrong 75% of the time.

    As you increase the sample and the numbers normalize (fewer outliers, high variance outcomes), the standard deviation decreases... in other words, the more times you flip the coin, the more likely you will eventually reach a result that more truly represents the true nature of the coin.

    Let's say we do a basic list of samples with opposing PG's faced (we'll assume he does 100% minutes since it's too much work to extract individual line up times and actually to pull video on who is defending who).

    These are the games that Beverley has been in... the data was pulled from basketball reference box score/advanced scores.

    Kemba Walker - 12 points, .500 eFG%
    Damian Lillard - 22 points, .529 eFG%
    Steve Blake - 14 points, .500 eFG%
    Chris Paul - 14 points, .423 eFG%
    Kyle Lowry - 16 points, .400 eFG%

    Tony Wroten - 18 points, .523 eFG%
    Raymond Felton - 8 points, .333 eFG%
    Ty Lawson - 28 points, .632 eFG%
    Jordan Crawford - 6 points, .188 eFG%
    Jose Calderon - 13 points, .688 eFG%

    Ricky Rubio - 19 points, .389 eFG%
    Mike Conley - 10 points, .179 eFG%
    Jeff Teague - 4 points, 0 eFG%
    Shaun Livingston - 0 points, 0 eFG%
    Tony Parker - 27 points, .473 eFG%

    Trey Burke - 21 points, .583 eFG%
    Goran Dragic - 19 points, .636 eFG%
    Steph Curry - 22 points, .393 eFG%
    Jameer Nelson - 15 points, .583 eFG%

    So, using a standard sample size calculator like.... http://easycalculation.com/statistics/standard-deviation.php

    You have the following outcome for points...

    Total numbers - 19
    Mean (Average) - 15.12
    Standard Deviation - 7.47
    Variance (Standard Deviation) - 55.80
    Population Standard Deviation - 7.27
    Variance (Population Standard Deviation (52.87)

    We'll not discuss Variance or Population Standard as those are a lot more complicated and really aren't necessary to our discussion.

    Standard deviation is basically a derived statistic that attempts to measure how much variation there is in the data, in other words, how random that data is over the sample. If your results were 5, 6, 5, 4 then your standard deviation would be small. If your results were 1, 4, 10, 0 then your standard deviation would be large, in other words highly random in nature.

    Let's say we then do the same with eFG% in a basic calculation of standard deviation (I know, it's methodically bad as we haven't weighed minutes but let's just say for argument's sake)...

    Total numbers - 19
    Mean (Average) - .419
    Standard Deviation - .201
    Variance (Standard Deviation) - 40.499
    Population Standard Deviation - .196
    Variance (Population Standard Deviation (52.87) - 38.368

    This is all assuming that we're using a standard distribution/bell curve, which is an entirely different discussion. That's something we're not going to get into as debates over curve distribution makes stat heads eyes go wonky.

    So what about this confidence level thing? Well, confidence level is a statistical method for calculating the statistical reliability of your statistical modeling being correct. I'm not going to get into the math, but let's just say that the confidence level is something that will affect the actual calculation of the variance.

    This is NOT to be confused with confidence interval which is essentially a range for values that are within a specified margin of confidence.

    In essence, the higher the confidence level, the higher the confidence that your margin of error will catch outcomes reliably versus having anomalous results. In mathematical terms, low confidence levels indicate a higher probability that your mathematical results are actually random errors.

    So, let's say we want to muck with the confidence level... we might go to a basic calculator like this... http://www.mccallum-layton.co.uk/stats/ConfidenceIntervalCalc.aspx

    This will affect the confidence interval (or range of error). So let's take a look at different values with different confidence levels...

    Opponent PPG

    95% - CI 11.521 - 18.719
    90% - CI 12.155 - 18.805
    85% - CI 12.65 - 17.59
    80% - CI 12.92 - 17.32

    Opponent eFG%

    95% - CI .328 - .409
    90% - CI .343 - .495
    85% - CI .352 - .485
    80% - CI .360 - .478

    So how unreliable is this 19 game sample size? Just looking at point values, you're looking at the true defensive results being anywhere from 12.92 points to 17.32 points per game and an opponent eFG% of .360 to .478 with a confidence level of 80%.

    In other words, you could almost as easily argue that Beverley's true opponent PG scoring average could be 12 as 18 ppg as you could argue that his true opponent eFG% will be .360 as .478.

    Given that league average eFG% is .494 http://www.basketball-reference.com/leagues/NBA_2014.html and last year's league average eFG% was .487 (I haven't calculated this year's eFG% by position and sadly Hoopdata is not doing this year's stats), then we have a large range for this being random noise (variance) and a 20% that it falls outside of both ranges.

    The difference is on the one end you have All-star level defense, on the other end you have slightly above average defense with a confidence level of 80%.

    So, in conclusion, based upon standard statistical method, given the size of the sample and high variation in the sample, then the sample size is too small and the variation from NBA level average defense is too small to proclaim much of anything.

    Note that when I speak of the variation being too small, if Beverley had an an average opponent PPG of 5 PPG and .opponent eFG% of let's say .150, then you could argue that despite the small sample size that it was SO far outside of the normal deviation that it would be somewhat unlikely that he had average level defense. And if opponent PPG was let's say 1 PPG and opponent eFG% was .050, then you would have the basis of an argument that he was at Least above average in defense and potentially better, Despite a small sample size.

    So, sample size tends to discredit statistical value, especially if there's high variation, especially if you haven't included context, and especially if you have no corroborative data or secondary modeling to indicate that your primary data is curving in the same direction.

    If you require any further clarification feel free to ask...
     
  18. gene18

    gene18 Rookie

    Joined:
    Dec 29, 2012
    Messages:
    990
    Likes Received:
    23
    I did not say that it was not an impressive performance. I thought it was. I was responding to a poster who thought it was impressive. I then calculated the percentage decrease from his average and found it was only a 4% decrease. I then questioned when a decrease of 4% was statistically significant. An effect size of this magnitude did not seem to be. If I had the standard error of the mean it would give me a better handle on this question. However, I am writing this at the spur of the moment and did not have the time to compute this important measure . You are misinterpreting my statement. All measures that are a sample have what is called error variance. That is, a variation in the measure that is unaccounted for. That's why we use tests of significanse, and we use the term statistic for these measures instead of a parameter which is derived from the entire population. FG% is no exception. Most if not all students of the game use FG% or a variant, such as eFG% and TS% I have not computed the correlations among these measures, but I am sure they are hightly correlated. It has been my experience that if one measurement is incorporated into another (as these are) they are usually highly correlated. That's why we usually use factor analysis with orthogonal rotations ( non orthoganal rotation are really impossible to interpret) to determine underlying factors in a correlation matrix.If you formed a correlation matrix of these measures I am almost positive only one general factor will logically emerge, as they measure essentially the same thing(IMO) BTW the NBA stat site give FG%, eFG%, and TS% as the measures of shooting efficiency in their shoot effeciency section. Why should I be any different? Also, who said that FG% is notoriously deceptive, and more so than other measures. As I mentioned the measures that I stated above do not seem to me to be subtantially more accurate because they are highly correlated with each other.
    To answer your question about using FG% for this calculation. Simple, It is the only one that I had available at the time, and as I said the variants of this measure, eFG% and TS% are very highly correlated with FG% and will most likely yeild the same results. If I had others I would use them.
     
  19. gene18

    gene18 Rookie

    Joined:
    Dec 29, 2012
    Messages:
    990
    Likes Received:
    23
    Correction of second sentence. .... A poster who thought it was not impressive. If 4% deviance from Tony Parker's average FG% is not stat. significant than it was an impressive performance since his average performance is considered impressive. I hope my basic hypothesis is clear to you.
     
  20. munsteur

    munsteur Member

    Joined:
    Jan 9, 2013
    Messages:
    516
    Likes Received:
    16
    Thanks Prof Torocan for this incredible well-done Stats 101 lecture. Where do you teach in real life?

    I now feel that I have a better practical understanding of these stats tools, a year after I took the course.
     

Share This Page

  • About ClutchFans

    Since 1996, ClutchFans has been loud and proud covering the Houston Rockets, helping set an industry standard for team fan sites. The forums have been a home for Houston sports fans as well as basketball fanatics around the globe.

  • Support ClutchFans!

    If you find that ClutchFans is a valuable resource for you, please consider becoming a Supporting Member. Supporting Members can upload photos and attachments directly to their posts, customize their user title and more. Gold Supporters see zero ads!


    Upgrade Now