MLB FORUM

Comments

  • ballslaughter

    I just wanted to get some opinions on how you all see using last years stats versus this years. As part of my mlb research, next to a team name I’ll write what place in the league they are in runs scored, home runs, etc… It would seem like 2019 would be the instinctive route to take, but being only so far in to the season, you have only so large a sample size. So when I deferred to last years, that had more than double the sample size, there were examples of a sizeable disparity. Last year Cleveland was 3rd in runs & 6th in HRs; this year, 24th & 25th.
    I realize there can/will be lineup differences…as a quick fix, my thoughts were to take the # of gp last year & this year to find a ratio, & then assign each team my own ranking based on that. Like, if a team played 162 games last year & 54 this year, thats 3 to 1. Then, say the disparity between last years home run placement versus this years is 12; as in last year they were 7th, this year they’re 19th. I’d rank them 11th.
    Doing it that way places all the emphasis on quantity & none on recency though, so, perhaps some math to move it slightly more towards the current years stats…
    And also, for anyone of the using-previous-years-stats mindset, at what point into a season would you guess you could just focus on current stats—halfway? Three quarters?
    Im also curious as to how to rank teams defensively. Some sites will list teams simply in runs scored, and others have things like fielding percentage, errors, & so on.

  • Jvanspro

    For team stats I work exclusively off this season. The roster of teams can change considerably from season to season and I believe using last season is a massive mistake. Take Cleveland for example they only have 3 players from last season in the lineup most days. Brantley, E5, Alonso, Naquin, Gomes and Chisenhall are all gone. That is going to massively skew your data.

    For the individual players I do combine last season and this season though.

  • sochoice

    • 2017 DraftKings FBWC Finalist

    • 2017 FanDuel WFFC Champion

    At this point, this season exclusively.

  • rannas23

    Doesn’t Noto use last yr and this yr data when doing his First Look data tables? I understand the reasoning of doing this for the first 2 months until you have enough data but I kind of wished it would switch to exclusively this years data as the season moves along and would seem more accurate. Or have an extra line under each, one for 18/19 data and one for exclusively 19 data.

  • Jvanspro

    @sochoice said...

    At this point, this season exclusively.

    I would argue for a player you need at least 500AB in order to get a true representation of the numbers. The data for just this season doesn’t show that yet as the league leaders in AB are just approaching 200. Different story if we’re talking team stats though.

  • ballslaughter

    I’m glad I asked!! Thank you guys.
    How bout the defensive aspect? Im wondering if using fielding percentages wouldn’t be more accurate than just runs scored against per game…
    And oh yeah…
    Your go-to place for finding the park rankings for hitter friendliness/pitcher friendliness? I see some disparity in various places on that subject too. Bleacher report has them ranked using like 5 or 6 different aspects, which sounded good, but…

  • d735123

    @ballslaughter said...

    I’m glad I asked!! Thank you guys.
    How bout the defensive aspect? Im wondering if using fielding percentages wouldn’t be more accurate than just runs scored against per game…
    And oh yeah…
    Your go-to place for finding the park rankings for hitter friendliness/pitcher friendliness? I see some disparity in various places on that subject too. Bleacher report has them ranked using like 5 or 6 different aspects, which sounded good, but…

    What is your ultimate goal? All this stuff (and much more) will already be reflected in the Vegas line for any particular game.

  • rgriffin37

    For what it’s worth, I like to look at both the start of last year through the present as well as this year only for individual hitters. I feel like it helps identify guys who have made improvements and or regressed this year. Determining whether the changes are sustainable or will even out as the season progresses is another story but I like to have the info.

    As some others have said, I’d go with this year’s stats for teams as some teams have changed drastically from last year to this year.

  • TheDataDetective

    • Blogger of the Month

    I would be skeptical about using team stats from previous seasons because so many things change about a team (most notably the roster) from year to year.

    As for individual stats, I think you absolutely need to look beyond this season if you want a statistically significant sample size. My app uses the last 3 seasons when building it’s player projections, but I overweight recent performance (last 30 days weighs the most, previous 60 days the next most, etc) to account for hot/cold streaks and player evolution. I have run large sample back-testing to demonstrate that this improves predictive accuracy (measured by correlation of my app’s projections vs actual player performance).

  • ballslaughter

    @d735123 said...

    What is your ultimate goal? All this stuff (and much more) will already be reflected in the Vegas line for any particular game.

    But seriously.. Vegas could miss something, man. Have you ever been there? It’s a busy place. Lol.
    No, I’m sure you have a point. As a single, childless guy, I have a fair amount of time for researching, which, as a math guy, I enjoy. And I feel like the more you can know about the games, the better.
    It might not happen super often, but Im sure there’s been a situation where I was looking at two different pitcher’s/hitter stacks who seem fairly similar, & the deciding point was something small, like one was in the 4th best hitter park vs the 28th best. Or maybe two pitchers going against two offenses with similar run totals, but one is 18th defensively, one’s 3rd…that kind of stuff happens.

  • d735123

    @ballslaughter said...

    But seriously.. Vegas could miss something, man. Have you ever been there? It’s a busy place. Lol.
    No, I’m sure you have a point. As a single, childless guy, I have a fair amount of time for researching, which, as a math guy, I enjoy. And I feel like the more you can know about the games, the better.
    It might not happen super often, but Im sure there’s been a situation where I was looking at two different pitcher’s/hitter stacks who seem fairly similar, & the deciding point was something small, like one was in the 4th best hitter park vs the 28th best. Or maybe two pitchers going against two offenses with similar run totals, but one is 18th defensively, one’s 3rd…that kind of stuff happens.

    Fair points. When I started, I spent a lot of time doing similar. And I came to the conclusion that spending so much time researching baseball stats was mostly a waste of time. Just knowing, for example, that a batter was batting third and on a team with an implied run total of 5 told me almost all of what I’d learn when digging into more detailed stats. And what little extra information could be gained was basically useless because a) others could see that information too with freely available stats provided on multiple sites, and b) that little information gain is more than washed out by normal day-to-day variability. (As the other guy here on often says, Mike Trout is always top batter, but it’s not a surprise if he goes 0 for 4 on any given night).

    Time is much better spent researching, not baseball, but your real competition — other DFS players. Lot of people can tell you, for example, how this left-handed batter fares against right-handed pitching, or which stadium is favorable to right-handed power hitters. But few can tell you what percentage of lineups are 5-man stacks vs. non-stacks, and how well the former fares against the latter, or how often the winning lineup includes a min salary guy, or what percentage of lineups will be stacks of the night’s highest run-total favorite, etc, etc. All much more useful info than any batting or pitching stats or team stats.

  • Jvanspro

    @d735123 said...

    Fair points. When I started, I spent a lot of time doing similar. And I came to the conclusion that spending so much time researching baseball stats was mostly a waste of time. Just knowing, for example, that a batter was batting third and on a team with an implied run total of 5 told me almost all of what I’d learn when digging into more detailed stats. And what little extra information could be gained was basically useless because a) others could see that information too with freely available stats provided on multiple sites, and b) that little information gain is more than washed out by normal day-to-day variability. (As the other guy here on often says, Mike Trout is always top batter, but it’s not a surprise if he goes 0 for 4 on any given night).

    Time is much better spent researching, not baseball, but your real competition — other DFS players. Lot of people can tell you, for example, how this left-handed batter fares against right-handed pitching, or which stadium is favorable to right-handed power hitters. But few can tell you what percentage of lineups are 5-man stacks vs. non-stacks, and how well the former fares against the latter, or how often the winning lineup includes a min salary guy, or what percentage of lineups will be stacks of the night’s highest run-total favorite, etc, etc. All much more useful info than any batting or pitching stats or team stats.

    I would disagree with the Trout take. While he may be the best player in the game he is nowhere near the top play each night. There are plenty of night he’s not in my top 10 or 20 plays. There are also slates I completely cross him off. Every night is different but I do feel like just looking at Vegas and batting order is a mistake.

  • DFSx42

    The only reason every to not play trout in cash is pricing, weather and incompatible team builds.

    I don’t think he’s been outside the the top 5 of any consensus rankings a single night all year. If he’s regularly not even in your top 20 you are either a psychic or using a wildly biased projection model.

  • TheDataDetective

    • Blogger of the Month

    @d735123 said...

    Fair points. When I started, I spent a lot of time doing similar. And I came to the conclusion that spending so much time researching baseball stats was mostly a waste of time. Just knowing, for example, that a batter was batting third and on a team with an implied run total of 5 told me almost all of what I’d learn when digging into more detailed stats. And what little extra information could be gained was basically useless because a) others could see that information too with freely available stats provided on multiple sites, and b) that little information gain is more than washed out by normal day-to-day variability. (As the other guy here on often says, Mike Trout is always top batter, but it’s not a surprise if he goes 0 for 4 on any given night).

    Time is much better spent researching, not baseball, but your real competition — other DFS players. Lot of people can tell you, for example, how this left-handed batter fares against right-handed pitching, or which stadium is favorable to right-handed power hitters. But few can tell you what percentage of lineups are 5-man stacks vs. non-stacks, and how well the former fares against the latter, or how often the winning lineup includes a min salary guy, or what percentage of lineups will be stacks of the night’s highest run-total favorite, etc, etc. All much more useful info than any batting or pitching stats or team stats.

    I strongly disagree with this sentiment. There are so many factors beyond implied run total and batting position that impact how a hitter will fare (e.g. BvP matchup, weather, opposing bullpen, hot/cold streaks, etc) and his price also influences whether he is a good value. Besides, of you only look at implied run totals and batting position then you will likely follow the herd and won’t have an edge on the competition. While it’s true that everyone has access to the same data, it’s incredibly difficult for the average casual player to effectively process all this information without the help of a spreadsheet or app to crunch the numbers and distill them into simple metrics (e.g. projections) that can be used when building lineups. It’s true that variance can rear its ugly head and screw up even the best strategy on any given slate, but over the long haul a good projections model will lead to better results. I’m living proof as I have been successful thanks to my app despite knowing very little about the sport.

  • AlexSonty

    • Blogger of the Month

    @Jvanspro said...

    For team stats I work exclusively off this season. The roster of teams can change considerably from season to season and I believe using last season is a massive mistake. Take Cleveland for example they only have 3 players from last season in the lineup most days. Brantley, E5, Alonso, Naquin, Gomes and Chisenhall are all gone. That is going to massively skew your data.

    For the individual players I do combine last season and this season though.

    Fangraphs has an active roster box to click. You should use that for team stats.

  • AlexSonty

    • Blogger of the Month

    @Jvanspro said...

    I would argue for a player you need at least 500AB in order to get a true representation of the numbers. The data for just this season doesn’t show that yet as the league leaders in AB are just approaching 200. Different story if we’re talking team stats though.

    500 is still liberal but for platoon stats we don’t have much choice for versus lefties than to uae small samples.

  • AlexSonty

    • Blogger of the Month

    @ballslaughter said...

    I’m glad I asked!! Thank you guys.
    How bout the defensive aspect? Im wondering if using fielding percentages wouldn’t be more accurate than just runs scored against per game…
    And oh yeah…
    Your go-to place for finding the park rankings for hitter friendliness/pitcher friendliness? I see some disparity in various places on that subject too. Bleacher report has them ranked using like 5 or 6 different aspects, which sounded good, but…

    I use Baseball Prospectus’ three-year data for park factors. Swish Analytics is good for tournaments. Their data is pretty aggressive both ways.

  • d735123

    @TheDataDetective said...

    I strongly disagree with this sentiment. There are so many factors beyond implied run total and batting position that impact how a hitter will fare (e.g. BvP matchup, weather, opposing bullpen, hot/cold streaks, etc) and his price also influences whether he is a good value. Besides, of you only look at implied run totals and batting position then you will likely follow the herd and won’t have an edge on the competition. While it’s true that everyone has access to the same data, it’s incredibly difficult for the average casual player to effectively process all this information without the help of a spreadsheet or app to crunch the numbers and distill them into simple metrics (e.g. projections) that can be used when building lineups. It’s true that variance can rear its ugly head and screw up even the best strategy on any given slate, but over the long haul a good projections model will lead to better results. I’m living proof as I have been successful thanks to my app despite knowing very little about the sport.

    I offer you a friendly challenge. You use your model. I will use only implied run total + batting order. Pick 3 batters a night. Forget salary for the sake of simplicity. Let’s see where we are after 100 nights.

  • TheDataDetective

    • Blogger of the Month

    @d735123 said...

    I offer you a friendly challenge. You use your model. I will use only implied run total + batting order. Pick 3 batters a night. Forget salary for the sake of simplicity. Let’s see where we are after 100 nights.

    But 3 high-priced hitters does not equate to a lineup. The key to a winning lineup is in finding value somewhere to complement the blue chip players that you pay up for. It doesn’t take a rocket scientist to play Arenado at Coors, etc. I accept your challenge, but for a daily FD head to head…how about it?

  • d735123

    No need, I just ran the experiment. I went through your daily blog postings going back to May 1 and tabulated the FD points for your top 3 highest projections. Then I did the same for the best 3 batting order spots for the team with the highest implied run total. Your model’s top 3 guys’ mean points were 14.7. Just going with batting order+run total resulted in a mean of 16.5. And it’s far, far worse for you if you were adjust for salary because your top picks are always top dollar guys, whereas the simple method’s guys are not always. Standard deviation was about the same (your was slightly higher, 13.7 vs. 12.7).

    So where is the added value of those factors beyond the Vegas line that you mention? If there were value, it would be incorporated into the line already. If it is not incorporated into the line, then it has no value. In other words, the line is efficient. Processing publicly available information every which way till Sunday will not improve on that (as this little experiment shows).

  • DFSx42

    Pretty weak to offer a challenge, have it accepted (but with more reasonable terms) and then say no need and without showing the actual data (Google sheet) just issue a blanket statement that you are right and he is wrong

  • TheDataDetective

    • Blogger of the Month

    @d735123 said...

    No need, I just ran the experiment. I went through your daily blog postings going back to May 1 and tabulated the FD points for your top 3 highest projections. Then I did the same for the best 3 batting order spots for the team with the highest implied run total. Your model’s top 3 guys’ mean points were 14.7. Just going with batting order+run total resulted in a mean of 16.5. And it’s far, far worse for you if you were adjust for salary because your top picks are always top dollar guys, whereas the simple method’s guys are not always. Standard deviation was about the same (your was slightly higher, 13.7 vs. 12.7).

    So where is the added value of those factors beyond the Vegas line that you mention? If there were value, it would be incorporated into the line already. If it is not incorporated into the line, then it has no value. In other words, the line is efficient. Processing publicly available information every which way till Sunday will not improve on that (as this little experiment shows).

    That’s interesting but it’s based on a very small sample and looking only at the top-3 highest-projected hitters for a slate is rather arbitrary. I.e. it doesn’t really prove much about identifying players with high value relative to salary (which is the strength of my model) or helping to construct successful lineups. To do that, you really need to look at all players from top to bottom. My app is able to consistently identify lower-cost (and often lower-owned) players who do well, and they’re often on teams with low Vegas run totals and/or batting low in the order. A great example from this past week was my killer Diamondbacks stack on May 25 (https://rotogrinders.com/blog-posts/mlb-data-detective-best-player-values-may-25-early-slate-3037772)…your approach would have completely overlooked them due to the low Vegas run total.

    In any case, there’s more than one way to skin a cat, so if that simple approach works for you then more power to you.

  • d735123

    @DFSx42 said...

    Pretty weak to offer a challenge, have it accepted (but with more reasonable terms) and then say no need and without showing the actual data (Google sheet) just issue a blanket statement that you are right and he is wrong

    What? I provided the data. You mean the raw data? What is that going to tell you beyond the means and SDs that I provided? I’m not wasting two hours typing names and scores into a spreadsheet. If it’s that you doubt the veracity of my numbers, it’s all publicly available info so anyone can verify them if they’re really interested.

    I have little interest in a daily h2h. My challenge was to compare batter projections generated by the model vs. the implied run total to see whether there really is useful information that is not already reflected in the line. A h2h is not testing that. If DataDetective wants to proceed with the original challenge, fine. You already know who my picks are every night., I don’t even have to post them. DataDetective can post his picks here. Based on what I’ve seen, though, this will be a waste of time.

  • d735123

    @TheDataDetective said...

    That’s interesting but it’s based on a very small sample and looking only at the top-3 highest-projected hitters for a slate is rather arbitrary. I.e. it doesn’t really prove much about identifying players with high value relative to salary (which is the strength of my model) or helping to construct successful lineups. To do that, you really need to look at all players from top to bottom. My app is able to consistently identify lower-cost (and often lower-owned) players who do well, and they’re often on teams with low Vegas run totals and/or batting low in the order. A great example from this past week was my killer Diamondbacks stack on May 25 (https://rotogrinders.com/blog-posts/mlb-data-detective-best-player-values-may-25-early-slate-3037772)…your approach would have completely overlooked them due to the low Vegas run total.

    In any case, there’s more than one way to skin a cat, so if that simple approach works for you then more power to you.

    This simple approach doesn’t work for me. It won’t work for anyone. That’s part of my point. The other, main part, is that a model using a whole bunch of data supposedly not reflected in the line doesn’t do better than a simple line-based approach.

  • DFSx42

    @d735123 said...

    What? I provided the data. You mean the raw data? What is that going to tell you beyond the means and SDs that I provided? I’m not wasting two hours typing names and scores into a spreadsheet. If it’s that you doubt the veracity of my numbers, it’s all publicly available info so anyone can verify them if they’re really interested.

    I have little interest in a daily h2h. My challenge was to compare batter projections generated by the model vs. the implied run total to see whether there really is useful information that is not already reflected in the line. A h2h is not testing that. If DataDetective wants to proceed with the original challenge, fine. You already know who my picks are every night., I don’t even have to post them. DataDetective can post his picks here. Based on what I’ve seen, though, this will be a waste of time.

    so you just did the math analyzing it all, including standard deviation all in your head and never once wrote anything down, never used excel or R to do any of it where you could easily export the results to a shareable google sheet… how stupid do you think we are that you did that for 80 slates all in your head and would now need to write it down for the first time??? Seriously, have you even thought through the implications of what you’ve said?

    I lean more towards your system to be honest, the second i saw his bvp i cringed a little, but while I think data detective reaches a little too far into the realm of noise, he’s a legit quantitative analyst. When I see his work, I see the work of an intelligent and trained professional. I wish I had the formalized data skills he possesses. For example, he would never just pull out of his ass that he magically computed something in his mind that you yourself admitted would take hours to write down. How is it again it would take you hours to write down yet you can just magically compute it all in your head?

    While I can’t abide by data detective using things like bvp, he has my respect because he’d never attempt some outright nonsense like you are right now.

    You’re taking something reasonable and making it absurd. When he accepts your challenge you could have just said you changed your mind, there’s no need to go heads up for rollz.
    Instead you just state no need, you magically computed it all in your head and have thus proven his system to be worse. Are you 12?

    You overextended yourself in a dick measuring contest and didn’t expect him to agree and pull out the ruler, this happens to the best of us, just back out with dignity next time.

  • TheDataDetective

    • Blogger of the Month

    @DFSx42 said...

    I lean more towards your system to be honest, the second i saw his bvp i cringed a little, but while I think data detective reaches a little too far into the realm of noise, he’s a legit quantitative analyst. When I see his work, I see the work of an intelligent and trained professional. I wish I had the formalized data skills he possesses. For example, he would never just pull out of his ass that he magically computed something in his mind that you yourself admitted would take hours to write down. How is it again it would take you hours to write down yet you can just magically compute it all in your head?

    While I can’t abide by data detective using things like bvp, he has my respect because he’d never attempt some outright nonsense like you are right now.

    Thanks, I appreciate the kind words! Just to clarify, when I mentioned “BvP” I meant it in more of a general “batter vs pitcher matchup” context as opposed to actual BvP stats. In order words, I place a minuscule weight on the past history of batter X vs pitcher Y because I weight everything by # of plate appearances, and these tend to be tiny. I meant that I look at how batter X has fared against pitchers similar to pitcher Y (e.g. same handedness) and how pitcher Y has fared against hitters similar to batter X. These sample sizes tend to be large enough to be statistically significant.

    In the interest of improving my model I’d love to better understand what types of noise you refer to. Besides variance, which impacts us all, my biggest issue is that historical data (and therefore my app) aren’t able to identify fundamental changes in a player’s game. E.g. when a pitcher changes his pitch repertoire or when a hitter adjusts his swing. I do overweight recent performance, but in some cases this might not be enough. A good example is Steven Matz, who had an awful 2017 season that in hindsight appears to be an anomaly when taken in the context of his larger body of work…since I look back 3 seasons, that one bad season still impacts my projections.

  • X Unread Thread
  • X Thread with New Replies*
  • *Jumps to your first unread reply

Sites mentioned in this thread

Use our links to sign up and deposit on sites listed in this thread to get these bonuses:

Subforum Index

New RotoGrinders Sports Betting Section!

Are you a DFS player who wants to get into sports betting?

If you have access to New Jersey sports betting, then use our DraftKings Sportsbook promo code and our FanDuel Sportsbook promo code to get the best bonuses in the NJ industry.

Those who can take advantage of PA online sports betting should use our SugarHouse PA promo code to get the best sports betting bonus in Pennsylvania.

If you don't yet have access to an online sportsbook, check out Monkey Knife Fight, a prop betting platform available in 31 states. Use our Monkey Knife Fight promo code to get a fantastic bonus.

RotoGrinders.com is the home of the daily fantasy sports community. Our content, rankings, member blogs, promotions and forum discussion all cater to the players that like to create a new fantasy team every day of the week. Our goal is to help all of our members make more money playing daily fantasy sports!

Disclosures: All RotoGrinders content contributors are active DFS players. Contributor screen names can be found on their respective RotoGrinders profile pages. Contributors reserve the right to use players or strategies not discussed in their content on RotoGrinders.