PGA Model and Introduction
This is intended to a reference page that will be URL’d into the beginning of each future entry – it’s essentially an FAQ more than anything and will expand over time as more questions come in.
Analytically minded person who isn’t professionally trained in the dark arts of data has built his own models which he applies to DFS. Since golf is by far the greatest struggle, I blog about it here as being forced to verbalize my thoughts in a coherent manner – even if I’m only going skin deep – really helps me better understand and think it through better as a result. Each time I write a new entry, I’m left full of new thoughts and more importantly, a plan of action for how I will adapt my strategies going forward.
In short, while I will give my player pool, I won’t be giving picks nor lineups nor even recommending you play any of the guys I’m playing. I may mention my model really liked a guy but I probably won’t disclsoe why. It’s more about journey and the thought process not the actual decisions nor the results themselves.
Who are you?
I’m a former poker player turned professional pedant. I like sports and data so DFS was a pretty natural undertaking for me despite that I was a bit of a late arrival. While I’m by no means a data scientist nor even an engineer, I know enough to carefully and slowly craft a model together. There’s a lot of trial and even more errors involved – I am not naturally trained in any of this. I do however work heavily in technology and data for a career so a lot of has seeped in through osmosis. This has been more of a struggle than a process as a result.
I actually have models for all the sports I play but Golf has a bit of seduction to it because we’re dealing with edges of the map where the sea monsters lay and nothing is solidly filled out. There is something a bit more seductive and romantic about chasing the elusive and mysterious golf data that attracts me greatly.
While I still have a positive ROI, golf is actually my worst DFS sport and by a decent margin. I also can’t rule out luck is responsible given that I’ve only played 10x lineups a slate for 2 years and the sample size is lacking. Furthermore, while I am learning more by the day, when I first started I knew nothing. I didn’t even know the tournaments were multiple day events nor that they had a cut. I have golfed so few times I’m still unsure if I’m a lefty or righty (righty who plays most stick sports as a lefty).
When I read golf analysis whether on a forum or by a tout, I’m often laughing silently at some of the absurd conclusions they draw. They’ll say “this guy swapped caddies and missed two cuts since” and I’ll laugh at the fact that he never bothered to actually see if his theory is real or some false narrative he just fabricated. People switch caddies, people miss cuts, both these things will happen naturally and aren’t necessarily related… and like most of the science behing golfalytics, it’s just as easily reversed with “this guy swapped caddies and after two events they are finally ready to gel and win a tourney together.” While obviously the caddy plays a role, unless we define that role and how much influence it may have, then everything else is needless speculation. Unless someone does a study on it proving some correlation or another, this is the kind of noise I’ll readily ignore and is low enough on the totem pole that I probably won’t investigate the truth behind caddy influence for some time.
But then again, despite how I laughed off the caddy comment earlier, the next night I won’t be able to sleep because I’ll be wondering if maybe I should swap that guy out because “what if the caddy problem is real?” This is golfalytics at it’s finest. Every instinct you have says ignore that information, but given how little is a known known is the world of DFS golf, we dwell on these things whether they be real or imaginary… and this has seduced me into really looking forward to each new slate and trying to crack the code if you will.
How does the model work?
I’m not going to get into too many details, if I ever do disclose something specific then it’s because it’s either so obvious I don’t believe it represents any possible edge or I know for a fact that it doesn’t work and nobody else is making that mistake – so pointing out it doesn’t work doesn’t exactly wisen up the dumb money so to speak.
What I will do is lay out my general process. I gather data (all publicly available that anyone could scrape or copy paste into excel if they just spent some time doing it) and then create separate algorithms to optimize lineups from that data. This started as a couple very simple lineup formulas and from there picked winners and iterated out. If a lineup isn’t a winner, I don’t play it, but I still keep producing the lineups for data gathering purposes. Maybe what doesn’t work in 2019 will be gold in 2022, I really have no idea so when in doubt, I collect everything. At this point there are more than 2 dozen competing models pumping out lineups.
I’ve also found that some metrics, while very poor as standalone metrics, are very helpful when combined with others so even if some lineups have been retired, parts of them may live on in other lineups that do actually get played for money. These dud lineups also help fill out my starting player pool even though they won’t be used in that lineup, it helps my better decide which of the other lineups to put money on or just simply track each slate.
Lineup A could just be something as simple as a single metric, Lineup B could be 8 separate factors all combined together. Some are highly correlated and there could be 3 different models that basically produce slight variations of each. My end goal is to locate the one metric(s) that matters and then focus down construction upon various iterations of that model. That however seems to be difficult to find, as while majority of the non-retired models are profitable, none really stand out from each other. Lineup A may hit in weeks 1,3,5 and Lineup B hits in weeks 2,4,6 – and before you ask, the differences can’t be explained by course history/course fit etc etc. Maybe with time I’ll have that Eureka moment, but my guess is the best I can hope for is what I have now, something that slightly overperforms the field on the whole to give consistent single digit ROI returns.
The art within the science
Once I have my raw lineups, I’ll first check all the data and make sure I didn’t make any mistakes. This happens far more than I’d like to admit. Then, if everything is on the up and up, I’ll eliminate the newer ones that haven’t been fully scrutinized and the weaker performers and then see what the remaining player pool is and what kind of exposure I have remaining.
Let’s say I have 12 lineups left so need to trim down 2 more. Among those lineups, I have 8 shares of Dustin Johnson and one share of Molinari and 3 shares of Rose. I’ll make my best efforts to preserve the Rose lineups and dump two of the Dustin Johnson lineups so I end up with 60% exposure with him instead of 70% or 80%. However, I will also try to rid myself of the Molinari lineup as well. Ideally DJ is in that lineup. Otherwise, I’m more apt to favor 7 DJs and no Molinari than 6 and 1 respectively. While I do use the historical ROIs for each lineup model when deciding upon the trim, it’s very much an art instead of a science.
I’ll also have weather contingent models that I operate if the weather should be a large enough factor to impact the play (only used once and results were great but sample size is suspect) . I also employ a few homer lineups where I’ll manipulate and lock in a player or two or exclude a player or two that I have too much exposure too and then run the model again. This way I can still jam in my favorites for each slate that my model didn’t quite agree with or reduce dependency upon some higher owned players. Once my model put Jim Furyk in every lineup. I thought sure, why not. Never again. Each slate I make countless mistakes like that Furykapocalypse and this blog is to better help me verbalize my thoughts so I don’t repeat those stakes.
Although I’m at a confidence level in the model that I use it to put into play a not insignificant amount into play each slate, it’s still very much so a work in progress. Since starting this blog, I’ve actually had a number of breakthroughs – not necessarily in results themselves but I’ll think of a new data angle to research or come up with a more efficient way to gather the data or produce the lineups. For example, I’ve specifically gone down a rabbit hole of Shots Gained academia since writing about how sketchy that data is.