Probability of Informed Trade - a measure of information asymmetry

This page primarily contains links to the PINs computed using the Venter and DeJong model (Venter, J.H., de Jongh, D., 2006. Extending the ekop model to estimate the probability of informed trading. Studies in Economics and Econometrics 30, 25-39). At the bottom the page are links to the basic PINs computed using the basic EKO model.

The files accessible from this page contain PINs computed similarly to those used in "How Disclosure Quality Affects the Level of Information Asymmetry" Review of Accounting Studies , Vol 12 (2-3), 2007: co-authored with Stephen A Hillegeist. (Also available here ). See discussion in Brown and Hillegeist (2007) as to why these PINs are much more robust than the basic EKO PINs and how they allow for the very strong positive correlation that we observe between buys and sells in the data. If the basic model did describe the data, this correlation would be negative. As discussed in our RAST paper, the basic EKO model is a special case of the general VdJ model where the parameter psi is infinite. (In my formulation, I use the inverse of this parameter, and hence the EKO model corresponds to invpsi (= 1/psi) = 0.)

The data are contained in two SAS files (and/or two ascii files) - computed by quarter (yyyyq) and year respectively and cover the period 1993 to 2010.

The files are:

data by calendar quarter (gzipped SAS file) (27MB):
data by calendar year (gzipped SAS file) (7MB):
data by calendar quarter (gzipped ascii file) (11MB):
data by calendar year (gzipped ascii file) (3MB):

Each file contains data along the following lines:

Obs permno yyyyq alphasas deltasas epsisas musas - invpsi minlik - numdays - maxgrd terminat pinsas

.. 1 10001 20031 1.00000 . 0.44419 . 3.41 . 03.56 0.89806 -13.815 .. 59 .... 2.77E-08 ABSGTOL 0.343
.. 2 10001 20032 0.47924 . 0.74004 . 8.55 . 11.86 1.13721 -16.818 .. 62 .... 1.21E-11 GTOL... 0.250
.. 3 10001 20033 0.43950 . 0.67354 . 3.12 . 05.66 0.28189 -09.927 .. 62 .... 2.20E-09 GTOL... 0.285
.. 4 10001 20034 1.00000 . 0.83057 . 2.25 . 03.64 0.60876 -10.055 .. 61 .... 0.00E+00 GTOL... 0.447
.. 5 10001 20041 0.47333 . 0.54494 . 3.28 . 06.02 0.41500 -07.783 .. 61 .... 6.39E-07 GTOL... 0.303
.. 6 10001 20042 1.00000 . 0.70410 . 1.53 . 02.00 0.61685 -11.717 .. 50 .... 2.81E-08 GTOL... 0.396
.. 7 10001 20043 1.00000 . 0.59562 . 2.24 . 04.16 0.81434 -09.222 .. 47 .... 5.32E-13 GTOL... 0.482

In the quarterly (annual) file, keyfields are permno and yyyyq (year). Other variables are as follows:

alphasas - probability of an information event
deltasas - probability of information event being bad news
epsisas - trading intensity of uninformed traders (trades per day)
musas - trading intensity of informed traders (trades per day)
invpsi - inverse of the psi parameter. invpsi = 0 imples that the data is described by the basic EKO model.
minlik - the minimum value of the log-likelihood for the data within the period
numdays - the number of days used in the estimation process. (Days with zero trades are excluded.) You may wish to exclude observations where number of observations is less than 30.
maxgrd - the gradient of the objective function at optimum.
terminat - termination condition of the SAS proc nlp procedure. If this variable is "PROBLEMS", you may wish to drop the observation.
pinsas - computed PIN, i.e. PIN = (mu * alpha) / (mu*alpha + 2 * epsi)

As can be seen from the data, the VdJ model generates many (approximately 30%) corner solutions of alpha = 1. i.e the model concludes that every day is a private information day. In almost all cases though PIN is not a corner solution. These corner solutions arise because the assumption of a Poisson arrival rate is too restrictive, given the observed trading data. For a Poisson distribution, the variance is equal to the mean. In the actual trade data, we observe a considerably greater variation than this amount.
Having said that, I have no doubt at all that the VdJ is a better model than he basic EKO model. (Since the optimization procedure could have allowed this model to conclude that invpsi = 0 if that value did fit the data better.)
The minimum values of the likelihood function are small - but not nearly as small as those that arise out of the basic EKO model. In the latter case, the model typically only converges if loglikelihoods are limited (say, somewhere in the range -500 to -40). Without such a limit, the SAS optimization procedure typically fails because of numerical over/underflow when the number of buys and sells exceeds 3,000 a day.

You are very welcome to use these data for research purposes and I should be grateful if you would let me know if you download them and/or find them useful. If you compute your own estimates - using the programs below or otherwise - please let me know. I regularly receive requests for data updated beyond 2010 and I am sure you will increase the number of citations to your own work if you share your own computed estimates.

Computing PINS yourself

In case you wish to compute PIN metrics yourself, you can use the following files:
I attach the programs that should enable you to compute the PINs yourself if you files of buys-sells, based on TAQ data.
buysell.sas7bdat - contains some buys and sells sample data
p4b.sas - the main program to run to produce the output
pinmacros.sas - a file that contains various macros used to estimate pin

If you put all three files into a single directory, and run p4b.sas, I believe the files are self contained and you should be able to see what is going on. As the program is currently set up, there are two output files created: the first contains parameter estimates, the second contains individual likelihood estimates for each observation with the estimated optimum parameters.
When you have 'lots' of observations, you will probably want to to feed them a batch at a time into the pincalc macro, and write out the results periodically. I have programs that will do that and you are welcome to them (just ask). However, I imagine it would be easier for you to write your own program that wraps around the basic macro rather than trying to understand my programs.

As stated above, I do have a number of significant reservations about the PIN measure, in particular:
1) the number of corner solutions found is suspiciously high.
2) If we use millisecond stamped data from TAQ, the basic Lee-Ready algorithm is suspect because it is not clear that trades/quotes are matched up sequentially, in the presence of high frequency trades BUT
3) If we use 'to the second' stamped data from TAQ, there is no unambiguous way of matching trades and quotes when there are multiple trades and quotes with different prices but simultaneous time stamps.
Having said that, I have no doubt this model is vastly superior to the basic EKO model for PIN.

To obtain the number of daily buys and sells for each stock, I am aware that WRDS now makes available these data. However, I have very severe reservations about how reliable those figures are. They do not publish the programs they used to generate these data from the raw TAQ files, but it seems to me that in determining the sign of each trade (using Lee-Ready algorithm) their program may throw _all_ trades and _all_ quotes together - IRRESPECTIVE OF EXCHANGE - into the mix. Therefore, they are potentially signing a trade occurring on NYSE with a quote given on NASDAQ - or even a small city exchange such as Philadelphia. It is not clear to me that doing so is valid. Also, I cannot see that they have adopted the Hasbrouk 1988 adjustment - which treats any trade occurring within 5 seconds of the previous trade as part of that trade and NOT not a separate transaction. Having said that, in recent years, as trading volumes have increased, it is not at all clear to me that such an assumption is warranted.

Basic PINs

The basic PINs (computed under the restriction that invpsi = 0, and as used in Brown, Hillegeist and Lo (2004), ("Conference Calls and Information Asymmetry", Journal of Accounting & Economics, Vol 37 (2)) can be downloaded here .

As discussed in Brown and Hillegeist (2007), PINs computed using the basic Poisson model do not fit the observed data at all well and the VdJ model is undoubtedly better.

Alternate Sources of PIN

Estimates of the basic (EKO) PIN are also available at Soeren Hvidkjaer's website . The PINs at Professor Hvidkjaer's site are computed annually for just NYSE and AMEX firms but use ISSM data as well as TAQ data to cover the period 1983 to 2001.

Last update: 6 August, 2021