CSSR has been implemented and successfully tested on a range of different discrete time series and sequential data streams. This page provides the full source code, released under the GNU Public License, documentation, and links to scientific papers using the CSSR algorithm.
The theory behind the algorithm was developed by Cosma Shalizi and Kristina Klinkner, under the sponsorship of James Crutchfield. The code for this implementation was written by Kristina Klinkner. For full credits, see the comments to the code, and the paper "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences".
CSSR-v0.1.1.tar.gz (268k) is the complete C++ source code for CSSR versions 0.1.1, plus documentation and license. The last update was on 2 May 2008.
As mentioned, CSSR is released under the GNU Public License. You may want to read the Copyright, license and warranty statement (plain text, 16k) before downloading. (A copy is included in the distribution.) Please note especially that CSSR is provided with ABSOLUTELY NO WARRANTY.
The ReadMe file is available in gzipped Postscript, PDF, and HTML. (All these versions, plus plain ASCII, are included in the distribution.) The documentation was last updated on 2 May 2008. The ReadMe gives a brief overview of what CSSR does and how it works, describes usage, offers some hints on parameters, and explains the known issues with this implementation of the basic algorithm.
There is a thorough explanation of the theory behind CSSR in the paper "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences" (Cosma Rohilla Shalizi and Kristina Lisa Shalizi, pp. 504--511 in Max Chickering and Joseph Halpern (eds.), Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (Arlington, Virginia: AUAI Press, 2004 = arxiv:cs.LG/0406011). For reasons of space, that paper omits certain details, which can be found in the technical report "An Algorithm for Pattern Discovery in Time Series" (Cosma Rohilla Shalizi, Kristina Lisa Shalizi and James P. Crutchfield, Santa Fe Institute Working Paper 02-10-060 = arxiv:cs.LG/0210025).
The following instructions are for Unix-based systems, and are included in the ReadMe file.
Download CSSR-v0.1.1.tar.gz. When gunzipped and untarred, this will produce a directory called CSSR-v0.1.1, containing all the necessary header and source code files, a copy of the documentation, the release license, and a make file. Running make inside that directory will produce an executable, which should be moved to someplace in command path. On most Unix systems, the following sequence of commands will create the executable and put it in the your bin directory, usually part of your command path.
gunzip CSSR-v0.1.1.tar.gz
tar xvf CSSR-v0.1.1.tar
cd CSSR-v0.1.1
make
cp CSSR ~/bin/
The code has been successfully compiled with gcc 3.1, 3.3 and 4.0 on Macintosh OS X (10.2, 10.3 and 10.4, respectively), with gcc 3.2 on Linux (Red Hat 9), and with Microsoft Visual C++ on Windows 98 and Windows XP. (If using Visual C++, be sure to set the project type to Windows Console Application.) If you get CSSR to work on a different system, please let us know.
On some systems, compilation may produce warnings about escape sequences or the use of deprecated headers. These can be safely ignored.
As mentioned, the principal paper on CSSR is "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences", arxiv:cs.LG/0406011. This describes the algorithm and its theoretical basis in detail, gives results on time complexity and on the convergence on the reconstructed model, and compares its performance to other approaches, including the EM algorithm and cross-validation. Some details omitted from the "Blind Construction" paper can be found in the preliminary technical report, "An Algorithm for Pattern Discovery in Time Series". The latest version at arxiv.org, arxiv:cs.LG/0210025, supersedes all earlier drafts.
If you use CSSR in a scientific publication, please cite the "Blind Construction" paper. A sample BibTeX entry:
@inproceedings{CSSR-UAI-2004,
author = "Cosma Rohilla Shalizi and Kristina Lisa Klinkner",
title = "Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences",
editor = "Max Chickering and Joseph Y. Halpern",
booktitle = "Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI 2004)",
publisher = "AUAI Press",
address = "Arlington, Virginia",
year = 2004,
pages = "504--511",
url = "http://arxiv.org/abs/cs.LG/0406011"}
Alphabetical by author.
Please write us to let us know if you use CSSR in an application (especially a paper), compile it on a new platform, modify it, wish to be kept informed of future developments, or have a bug or other problem to report. However, before sending bug reports, please read the documentation carefully, and if possible the paper too, so that you're sure what you're experiencing is a bug. Since CSSR is released under the Gnu Public License, feel free to write your own fix for the bug, and tell us about it!