In
order to make statistical predictions about the long-term results of a random
process, it is often useful to do a simulation based on one's understanding of
the underlying probabilities. This
procedure is referred to as the Monte
Carlo method.
As an example, consider a casino game in which a player bets against the house and the house wins 51% of the time. The question is, how many games have to be played before the house is reasonably sure of coming out ahead. This scenario is common enough that mathematicians long ago figured out very precisely what the statistics are, but here we want to illustrate how to get a good idea of what can happen in practice without having to absorb a lot of mathematics.
First
we construct an expression that computes the net revenue to the house for a
single game, based on a random number chosen between 0 and 1 by the MATLAB
function rand. If the random number is less than or equal
to 0.51, the house wins one betting unit, whereas if the number exceeds 0.51,
the house loses one unit. (In a
high-stakes game, each bet may be worth $1000 or more. Thus it is important for the casino to know
how bad a losing streak it may have to weather in order to turn a profit — so
that it doesn't go bankrupt first!)
Here is an expression that returns 1 if the output of rand is less than 0.51 and –1 if
the output of rand is greater than 0.51 (it will also return 0 if the output of
rand is exactly 0.51, but this
is extremely unlikely).
-1
In
the game simulated above, the house lost.
To simulate several games at once, say 10 games, we can generate a
vector of 10 random numbers with the command rand(1,10) and then apply the same
operation.
revenues = sign(0.51 - rand(1, 10))
1 -1 1 -1 -1 1 1 -1 1 -1
In this case the house won 5 times and lost 5
times, for a net profit of 0 units. For
a larger number of games, say 100, we can let MATLAB sum the revenue from the
individual bets as follows:
profit = sum(sign(0.51 - rand(1, 100)))
-4
For
this trial, the house had a net loss of 4 units after 100 games. On average, every 100 games the house should
win 51 times and the player(s) should win 49 times, so the house should make a
profit of 2 units (on average). Let's
see what happens in a few trial runs.
profits = sum(sign(0.51 - rand(100, 10)))
14 -12 6 2 -4 0 -10 12 0 12
We
see that the net profit can fluctuate significantly from one set of 100 games
to the next, and there is a sizable probability that the house has lost money
after 100 games. To get an idea of how
the net profit is likely to be distributed in general, we can repeat the
experiment a large number of times and make a histogram of the results. The following function computes the net
profits for k different trials of n games each.
profits = inline('sum(sign(0.51 - rand(n, k)))', 'n', 'k')
Inline function:
profits(n,k) = sum(sign(0.51 - rand(n, k)))
What
this function does is to generate an n´k matrix of random numbers
and then perform the same operations as above on each entry of the matrix to
obtain a matrix with entries 1 for bets the house won and –1 for bets it lost. Finally it sums the columns of the matrix to
obtain a row vector of k elements,
each of which represents the total profit from a column of n bets.
Now
we make a histogram of the output of profits using n = 100 and k = 100. Theoretically the house could win or lose up to 100 units, but in
practice we find that the outcomes are almost always within 30 or so of 0. Thus we let the bins of the histogram range
from –40 to 40 in increments of 2 (since the net profit is always even afer 100
bets).
hist(profits(100, 100), -40:2:40); axis tight
The
histogram confirms our impression that there is a wide variation in the
outcomes after 100 games. It looks like
the house is about as likely to have lost money as to have profited. However, the distribution shown above is
irregular enough to indicate that we really should run more trials to see a
better approximation to the actual distribution. Let's try 1000 trials.
hist(profits(100, 1000), -40:2:40); axis tight
According
to the Central Limit Theorem, when
both n and k are large, the histogram should be shaped like a "bell
curve", and we begin to see this shape emerging above. Let's move on to 10,000 trials.
hist(profits(100, 10000), -40:2:40); axis tight
Here
we see very clearly the shape of a bell curve.
Though we haven't gained that much in terms of knowing how likely the
house is to be behind after 100 games, and how large its net loss is likely to
be in that case, we do gain confidence that our results after 1000 trials are a
good depiction of the distribution of possible outcomes.
Now we consider the net profit after 1000 games. We expect on average the house to win 510 games and the player(s) to win 490, for a net profit of 20 units. Again we start with just 100 trials.
hist(profits(1000, 100), -100:10:150); axis tight
Though
the range of values we observe for the profit after 1000 games is larger than
the range for 100 games, the range of possible values is 10 times as large, so
that relatively speaking the outcomes are closer together than before. This reflects the theoretical principle
(also a consequence of the Central Limit
Theorem) that the average "spread" of outcomes after a large
number of trials should be proportional to the square root of n, the number of games played in each
trial. This is important for the
casino, since if the spread were proportional to n, then the casino could never be too sure of making a profit. When we increase n by a factor of 10, the spread should only increase by a factor of
Ö10, or a little more than 3.
Note
that after 1000 games, the house is definitely more likely to be ahead than
behind. However, the chances of being
behind are still sizable. Let's repeat
with 1000 trials to be more certain of our results.
hist(profits(1000, 1000), -100:10:150); axis tight
We
see the bell curve shape emerging again.
Though it is unlikely, the chances are not insignificant that the house
is behind by more than 50 units after 1000 games. If each unit is worth $1000, then we might advise the casino to
have at least $100,000 cash on hand to be prepared for this possibility. Maybe even that is not enough—to see we
would have to experiment further.
Finally, let's see what happens after 10,000 games. We expect on average the house to be ahead by 200 units at this point, and based on our earlier discussion the range of values we use to make the histogram need only go up by a factor of 3 or so from the previous case. Even 100 trials will take a while to run now, but we have to start somewhere.
hist(profits(10000, 100), -200:25:600); axis tight
It
seems that turning a profit after 10,000 games is highly likely, though with
only 100 trials we do not get such a good idea of the worst-case scenario. Though it will take a good bit of time, we
should certainly do 1000 trials or more if we are considering putting our money
into such a venture.
hist(profits(10000, 1000), -200:25:600); axis tight
???Error using ==> inlineeval
Error in inline expression ==> sum(sign(0.51 – rand(n, k))
Out of memory. Type HELP MEMORY for your options.
Error in ==> C:\MATLABR12\toolbox\matlab\funfun\@inline\subsref.m
On line 25 ==> INLINE_OUT_
= inlineeval(INLINE_INPUTS_,
INLINE_OBJ_.inputExpr, INLINE_OBJ_.expr);
This
error message illustrates a potential hazard in using MATLAB's vector and
matrix operations in place of a loop, in this case that the matrix rand(n,k) generated within the profits function must fit in the
memory of the computer. Since n is
10,000 and k is 1000 in our most
recent attempt to run this function, we requested a matrix of 10,000,000 random
numbers. Each floating point number in
MATLAB takes up 8 bytes of memory, so the matrix would have required 80MB to
store, which is too much for some computers.
Since k represents a number of
trials that can be done independently, a solution to the memory problem is to
break the 1000 trials into 10 groups of 100, using a loop to run 100 trials 10
times and assemble the results.
for i = 1:10
profitvec = [profitvec
profits(10000, 100)];
end
hist(profitvec, -200:25:600); axis tight
Though the chances of a loss after 10,000 games is quite small, the possibility cannot be ignored, and we might judge that the house should not rule out being behind at some point by 100 or more units. However, the overall upward trend seems clear, and we may expect that after 100,000 games the casino is overwhelmingly likely to have made a profit. Based on our previous observations of the growth of the spread of outcomes, we expect that most of the time the net profit will be within 400 of the expected value of 2200. We show the results of 10 trials of 100,000 games below.
Columns 1 through 5
2538 1830 1654
1936 2116
Columns 6 through 10
1636 1974 1968 2054 1806