Monte Carlo Simulation

 

In order to make statistical predictions about the long-term results of a random process, it is often useful to do a simulation based on one's understanding of the underlying probabilities.  This procedure is referred to as the Monte Carlo method.

 

As an example, consider a casino game in which a player bets against the house and the house wins 51% of the time.  The question is, how many games have to be played before the house is reasonably sure of coming out ahead.  This scenario is common enough that mathematicians long ago figured out very precisely what the statistics are, but here we want to illustrate how to get a good idea of what can happen in practice without having to absorb a lot of mathematics.

 

First we construct an expression that computes the net revenue to the house for a single game, based on a random number chosen between 0 and 1 by the MATLAB function rand.  If the random number is less than or equal to 0.51, the house wins one betting unit, whereas if the number exceeds 0.51, the house loses one unit.  (In a high-stakes game, each bet may be worth $1000 or more.  Thus it is important for the casino to know how bad a losing streak it may have to weather in order to turn a profit — so that it doesn't go bankrupt first!)  Here is an expression that returns 1 if the output of rand is less than 0.51 and –1 if the output of rand is greater than 0.51 (it will also return 0 if the output of rand is exactly 0.51, but this is extremely unlikely).

 

revenue = sign(0.51 - rand)  

 

revenue =

    -1  

 

In the game simulated above, the house lost.  To simulate several games at once, say 10 games, we can generate a vector of 10 random numbers with the command rand(1,10) and then apply the same operation.

 

revenues = sign(0.51 - rand(1, 10))  

 

revenues =

     1    -1     1    -1    -1     1     1    -1     1    -1  

 

In this case the house won 5 times and lost 5 times, for a net profit of 0 units.  For a larger number of games, say 100, we can let MATLAB sum the revenue from the individual bets as follows:

 

profit = sum(sign(0.51 - rand(1, 100)))

 

profit =

    -4  

 

For this trial, the house had a net loss of 4 units after 100 games.  On average, every 100 games the house should win 51 times and the player(s) should win 49 times, so the house should make a profit of 2 units (on average).  Let's see what happens in a few trial runs.

 

profits = sum(sign(0.51 - rand(100, 10)))  

 

profits =

    14   -12     6     2    -4     0   -10    12     0    12  

 

We see that the net profit can fluctuate significantly from one set of 100 games to the next, and there is a sizable probability that the house has lost money after 100 games.  To get an idea of how the net profit is likely to be distributed in general, we can repeat the experiment a large number of times and make a histogram of the results.  The following function computes the net profits for k different trials of n games each.

 

profits = inline('sum(sign(0.51 - rand(n, k)))', 'n', 'k')  

 

profits =

     Inline function:

     profits(n,k) = sum(sign(0.51 - rand(n, k)))  

 

What this function does is to generate an n´k matrix of random numbers and then perform the same operations as above on each entry of the matrix to obtain a matrix with entries 1 for bets the house won and –1 for bets it lost.  Finally it sums the columns of the matrix to obtain a row vector of k elements, each of which represents the total profit from a column of n bets.

 

Now we make a histogram of the output of profits using n = 100 and k = 100.  Theoretically the house could win or lose up to 100 units, but in practice we find that the outcomes are almost always within 30 or so of 0.  Thus we let the bins of the histogram range from –40 to 40 in increments of 2 (since the net profit is always even afer 100 bets).

 

hist(profits(100, 100), -40:2:40); axis tight  

 

 

The histogram confirms our impression that there is a wide variation in the outcomes after 100 games.  It looks like the house is about as likely to have lost money as to have profited.  However, the distribution shown above is irregular enough to indicate that we really should run more trials to see a better approximation to the actual distribution.  Let's try 1000 trials.

 

hist(profits(100, 1000), -40:2:40); axis tight  

 

According to the Central Limit Theorem, when both n and k are large, the histogram should be shaped like a "bell curve", and we begin to see this shape emerging above.  Let's move on to 10,000 trials.

 

hist(profits(100, 10000), -40:2:40); axis tight  

 

 

Here we see very clearly the shape of a bell curve.  Though we haven't gained that much in terms of knowing how likely the house is to be behind after 100 games, and how large its net loss is likely to be in that case, we do gain confidence that our results after 1000 trials are a good depiction of the distribution of possible outcomes.

 

Now we consider the net profit after 1000 games.  We expect on average the house to win 510 games and the player(s) to win 490, for a net profit of 20 units.  Again we start with just 100 trials. 

 

hist(profits(1000, 100), -100:10:150); axis tight  

 

 

Though the range of values we observe for the profit after 1000 games is larger than the range for 100 games, the range of possible values is 10 times as large, so that relatively speaking the outcomes are closer together than before.  This reflects the theoretical principle (also a consequence of the Central Limit Theorem) that the average "spread" of outcomes after a large number of trials should be proportional to the square root of n, the number of games played in each trial.  This is important for the casino, since if the spread were proportional to n, then the casino could never be too sure of making a profit.  When we increase n by a factor of 10, the spread should only increase by a factor of Ö10, or a little more than 3.

 

Note that after 1000 games, the house is definitely more likely to be ahead than behind.  However, the chances of being behind are still sizable.  Let's repeat with 1000 trials to be more certain of our results.

 

hist(profits(1000, 1000), -100:10:150); axis tight  

 

 

We see the bell curve shape emerging again.  Though it is unlikely, the chances are not insignificant that the house is behind by more than 50 units after 1000 games.  If each unit is worth $1000, then we might advise the casino to have at least $100,000 cash on hand to be prepared for this possibility.  Maybe even that is not enough—to see we would have to experiment further.

 

Finally, let's see what happens after 10,000 games.  We expect on average the house to be ahead by 200 units at this point, and based on our earlier discussion the range of values we use to make the histogram need only go up by a factor of 3 or so from the previous case.  Even 100 trials will take a while to run now, but we have to start somewhere.

 

hist(profits(10000, 100), -200:25:600); axis tight  

 

 

It seems that turning a profit after 10,000 games is highly likely, though with only 100 trials we do not get such a good idea of the worst-case scenario.  Though it will take a good bit of time, we should certainly do 1000 trials or more if we are considering putting our money into such a venture.

 

hist(profits(10000, 1000), -200:25:600); axis tight  

 

???Error using ==> inlineeval

Error in inline expression ==> sum(sign(0.51 – rand(n, k))

Out of memory. Type HELP MEMORY for your options.

 

Error in ==> C:\MATLABR12\toolbox\matlab\funfun\@inline\subsref.m

On line 25 ==>     INLINE_OUT_ = inlineeval(INLINE_INPUTS_,

INLINE_OBJ_.inputExpr, INLINE_OBJ_.expr);

 

This error message illustrates a potential hazard in using MATLAB's vector and matrix operations in place of a loop, in this case that the matrix rand(n,k) generated within the profits function must fit in the memory of the computer.  Since n is 10,000 and k is 1000 in our most recent attempt to run this function, we requested a matrix of 10,000,000 random numbers.  Each floating point number in MATLAB takes up 8 bytes of memory, so the matrix would have required 80MB to store, which is too much for some computers.  Since k represents a number of trials that can be done independently, a solution to the memory problem is to break the 1000 trials into 10 groups of 100, using a loop to run 100 trials 10 times and assemble the results.

 

profitvec = [];

for i = 1:10

  profitvec = [profitvec profits(10000, 100)];

end

hist(profitvec, -200:25:600); axis tight  

 

Though the chances of a loss after 10,000 games is quite small, the possibility cannot be ignored, and we might judge that the house should not rule out being behind at some point by 100 or more units.  However, the overall upward trend seems clear, and we may expect that after 100,000 games the casino is overwhelmingly likely to have made a profit.  Based on our previous observations of the growth of the spread of outcomes, we expect that most of the time the net profit will be within 400 of the expected value of 2200.  We show the results of 10 trials of 100,000 games below.

 

profits(100000, 10)  

 

ans =

  Columns 1 through 5

        2538        1830        1654        1936        2116

  Columns 6 through 10

        1636        1974        1968        2054        1806