Probability Theory in the Context of DFS

A random variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). As we’ll see shortly, random variables abound in daily fantasy sports.

A large part of the skill in DFS involves dealing with random variables. There isn’t room in this course for a complete discussion of probability theory, but there are certain parts a player absolutely must know to be successful.

H3. Discrete Random Variables

There are two types of random variables, discrete and continuous. Discrete random variables usually represent one of a finite set of possibilities. For example, a roll of a pair of dice results in a total between 2 and 12.

A discrete random variable has a probability mass function, which specifies the probability for each of the possible outcomes. For example, for the pair of dice, the probability mass function is

21208925755_ff021bbfee_o

Continuous Random Variables

A continuous random variable can take on any value, usually a real number. For example, the heights of NBA players measured in inches would specify a continuous random variable.

A continuous random variable has a probability density function. For example, the familiar standardized Gaussian bell-shaped curve has the probability density function

21022113529_92b45857e6_o

Some random variables we see in daily fantasy sports:

H2. The Bernoulli and Binomial Distributions

The last entry in the list above — whether a lineup cashed or not — is an example of a Bernoulli distribution. A Bernoulli random variable has two possible outcomes, which in games we usually refer to as “win” and “lose”.

To make calculations easier, we’ll use “1” for win and “0” for lose. The probability of a win is usually denoted by the letter p. The probability of a loss is usually denoted by the letter q; p + q = 1 and q = 1 – p.

Bernoulli variables aren’t very interesting; we wouldn’t just enter one lineup in one contest and walk away forever. So we need a random variable that models how many times we cash over a number of contests. And that’s a binomial random variable.

A binomial random variable has an underlying Bernoulli random variable with parameters p and q. We ask the question, “If we enter N contests, what’s the probability that we win none, one, two, and so on up to N?” And that’s the probability mass function for the binomial,

If we know N and we know p, we can compute the probability of winning exactly k contests out of N tries. That probability is

21216840091_7ac7c983b7_o

where (N/k) is the number of combinations of N things taken k at a time. That’s interesting, but that doesn’t solve our problem. We know N – how many contests we entered. And we know k – how many we won. But we don’t know p. We need to know p to calculate expected values.

It turns out we can estimate p easily. The estimate of p is just

20587794683_52bc2c357c_o

So if I entered 100 50/50 contests and cashed 60 of them, the estimate of p is 0.6 and the estimate of q is 0.4.

H2. Confidence Interval For p

Before we move on to expectations, there’s one more tool we’ll need. It turns out that not only can we estimate p, we can compute a confidence interval for p.

We want to say, “there’s a 95% probability that the real value of p is between plower and pupper”. As the Wikipedia article above notes, there are a number of options for doing this and all have certain limitations. For our purposes, the simplest one that we can copy and paste into a spreadsheet will do. In the equations, pest is the estimate of p we computed above and qest = 1 – pest.

21021004608_e3a04ee93d_o

H3. Expectations

Now that we have an estimate and a confidence interval for p, we can estimate how much we expect to win or lose per dollar of entry fees. For a $1 50/50, we pay a dollar to enter. If we win, we get $1.80 back, so we win $0.80. If we lose, we lose the dollar. The estimated expectation per dollar is

21020805630_2be671ea14_o

In general, if F is the entry fee in dollars and C is the cash paid for a win in dollars, then

and the estimated expectation EVest is

21022113369_f34ab290aa_o

with confidence interval

21020805560_ac56549d4e_o

In the spreadsheets, we’ll do this calculation for pest, plower and pupper, generating a 95 percent confidence interval for EV.

About the Author