Doug Oard's Research Software Page
The research software that is available on this page has been produced
as a byproduct of the work described on my main research page. I am making it available
here in order to help others avoid reinventing the wheel and as a way
of making life easier for anyone interested in duplicating my results.
My students and I have tried to make it as easy as possible to
reproduce the software environments that we have used, but path names
and other system specific information will need to be manually edited
to get any of this to work on your system. The documentation for
these routines ranges from terse to nonexistant, but I'll be happy to
answer questions by email.
This software may be used for research purposes in a manner consistent
with the SMART and SVDPACK distribution statements. Requests for
other uses of this software should be referred to the author so that we can sort out
the tangled interconnection of authorship in these files. Persons
interested in commercial applications of these techniques should be
aware that Bellcore holds two patents on Latent Smenatic Indexing.
SMART Modifications
These modications are posted with the permission of the SMART project
at Cornell University. In order to use them you should first download
a clean copy of SMART version 11.0 from the SMART FTP site at ftp://ftp.cs.cornell.edu/pub/smart/
and then untar the files you need on top of it in the appropriate
order.
- Solaris Patch
- An example of how to modify the SMART 11.0 distribution to run
on Solaris 2.x (SunOS 5.x). There are some useful instructions on how
to do this on the smart-people mailing list, but I find it always helps to have an example in which all the pieces are put together.
- ISO 8859-1 Character Set Patch
- A simple and fairly well documented patch that provides 8-bit
clean processing for characters in the Latin-1 (ISO 8859-1) character
set with appropriate recognition. We used this to run experiments in
Spanish. Stemming for other languages is NOT included, but there is a
Spanish stopword list available on the SMART FTP site.
- Adaptive Multilingual Filtering Patch
- This tar file contains all of the modified and new SMART
routines that I used for my thesis research. Components are included
which compute Latent Semantic Indexing (LSI) feature vectors, perform
adaptive text filtering using those vectors, use that as the basis of
three different techniques for cross-language text filtering, and
evaluate informaton filtering results. There is also some code there
for Gaussian user modeling, a technique that proved no more effective
than LSI in my experiments. My dissertation
contains the best description of what all these parts do and how I
used them. The dissertation also describes the test collections that
I used and tells you how they can be obtained, but they cannot be
redistributed with these routines because almost all of the data and
tthe tools used to preprocess it are proprietary. I'll be happy to
provide the shell scripts and C programs that I used to control the
preprocessing to anyone who wants them, but I have not included them
in this tar file because they are so specific to the data formats
produced by the tools that I used and to the structure of our file
system that I doubt that they would be of much use to most people. My
internal documentation in these files is abysmal, as are some of the
design decision about where in the SMART hierarchy to place things.
But it should provide a useful starting point for people who are
interested in adapting some of the components that I have implemented
for other purposes.
Last modified: Wed Apr 16 08:18:55 2003
Doug Oard
oard@glue.umd.edu