Probability of Informed Trade - Easley et al model

The assumptions made in the computation of the PIN include:
1) epsilon buys is the same as epsilon sells
2) order intensities from informed and uninformed traders are described by simple independent Poisson processes.

As those of you who have estimated PINs yourself are doubtless aware, the basic model does not describe the data terribly well and it is often the case that some of the resulting parameters are suspect (i.e. corner solutions are found.) Also, computing the likelihood function model often involves numerical-overflow problems.

To get around these issues and to reduce the impact of "outliers" in the data, I have computed the PINs here using a "robust regression" approach in which a lower weight is placed on "really really" unlikely observations. For example, the basic EKO model implies that we are very unlikely to observe high levels of buys and sells on the same day. However, in practice, such days often occur, e.g. around the days of earnings announcements. Therefore, in maximizing the likelihood function, I adjust the raw value of the likelihood for a single observation by adjusting it if it is less than (10e-22 * the value of what the likelihood function would be at its maximum given the current parameters)

i.e. for buys = epsilon, sells = epsilon on a no-news day.

Using this low-weighting technique is clearly somewhat arbitrary, but it enables the optimization process to converge for more large firms than using the raw likelihood function.

However, a far better solution to deal with the days on which both buys and sells are high is to use a more flexible model that seems to more accurately describe the market's microstructure, namely the extended PIN model as proposed by Venter and DeJongh. For a description of this extended model and a discussion of the extent to which it is better than the basic model, see "How Disclosure Quality Affects the Level of Information Asymmetry" - Review of Accounting Studies, 2007, co-authored with Stephen A Hillegeist. Click here to download full paper.

In fact, as discussed in that paper, the PIN estimates arising from the basic model are reasonably close to those estimated using the extended model (which is why I am prepared to leave these PIN estimates posted). However, the estimates of the underlying parameters (particularly alpha and mu) are not reliable. It is because of the unreliability of these parameters in the basic model that I do not make them available. I fear that any research carried out using those parameter estimates may generate spurious results.

********************************************************************

In each of the files, the data are:
permno - the CRSP permno company identifier
yyyyq - the period identifier year and calendar quarter
pinsas - as computed using an EKO model.

Corner solutions inherent in the computation of PIN, indicating that it may be unreliable are indicated by the three columns, probacn, probdcn, probecn.

probacn - a corner solution for alpha parameter (i.e. less than 0.02 or more than 0.98)
probdcn - a corner solution for delta parameter (i.e. less than 0.02 or more than 0.98)
probecn - odd parameter estimates for epsilon and mu (i.e. epsilon > 50 * mu, or mu > 50 * epsilon)

In addition, probl is set equal to 1 if the minimum value of the likelihood function that SAS produces is greater than approximately 0.05. (In such cases, to reach such a high probability, something has almost surely gone wrong in the optimization process!)

extract from the tab delimited file:
permno yyyyq pinsas probacn probdcn probecn probl</br>

permno yyyyq     pinsas  probacn  probdcn   probecn   probl

10001  19931  .70343606        0        0         0       0

10001  19932  .41210478        0        1         0       0

10001  19933  .53451005        1        0         0       0

10001  19934  .25426439        0        0         0       0

10001  19941  .16973552        0        1         0       0

10001  19942  .35025038        0        1         0       0

10001  19943  .44403904        0        1         0       0

10001  19944  .11238877        0        0         0       0

10001  19951  .20877597        0        1         0       0

10001  19952  .12379615        0        1         0       0

10001  19953  .37591511        1        0         0       0

10001  19954  .30323397        0        1         0       0

10001  19961  .19112218        0        0         0       0

10001  19962  .52507565        1        0         0       0

10001  19963  .16656038        0        1         0       0

10001  19964   .3551755        1        0         0       0

10001  19971  .46380882        1        0         0       0

10001  19972  .24875999        0        0         0       0

10001  19973  .21581681        0        0         0       0

10001  19974  .27371207        0        0         0       0

10001  19981  .28971531        0        1         0       0

10001  19982  .11273631        0        1         0       0

10001  19983          0        1        0         0       0

10001  19984   .3424062        0        0         0       0

 

 

It so happens that 16 of the 24 PIN estimates for firm 10001 (Great Falls Gas Company) listed above gave problems. However, in the whole sample, approximately 3% of the estimates of alpha are corner solutions, 14% of the estimates of delta are corner solutions and less than 1% of the estimates of epsilon/mu and minimum likelihood were suspicious. Nevertheless, be aware that all these PIN estimates are suspicious.

The files are:

tab delimited file of data up to 2006 (10MB):
gzipped tab delimited file of data up to 2006 (2.7MB):

Files added May 2012 -

The following files include parameter estimates (alpha, delta, epsilon, mu) so that you can draw your own conculsions about whether outliers and corner solutions should be included. However, see Brown and Hillegeist (2007) about the dangers of using these parameter estimates. The optimization process generates biased estimates of alpha and mu - which tend to offset one another so that the resulting PIN estimate is not too far off that obtained from the more general VdJ model.

gzipped tab delimited file of annual data from 1993 to 2010 (3MB):
gzipped sas file of annual data from 1993 to 2010 (8MB):
gzipped tab delimited file of quarterly data from 1993 to 2010 (12MB):
gzipped sas file of quarterly data from 1993 to 2010 (26MB):


If you do find the above data useful in your work, please let me know.

Thank you.



Last update: 22 May, 2012