Non-Linear Suffix Tree Deltas
The NSTD library is a novel means of generating a delta between to revisions of
a file (or any sequence) using a Suffix Tree so that the algorithm is O(n). The
primary difference between the NSTD algorithm and others such as Levenshtien or
Ratcliff-Obershelp is that it is not aligned. By removing the requirement for
alignment it’s possible to only have two records-INSERT and MOVE-and to
create smaller deltas when the changes are relatively few. This algorithm is
fairly fast, but uses quite a bit of memory (20 times the size of the original
file plus the size of the new file). This is contrast to Levenshtien which
usually requires N^2 in memory size. NSTD is excellent for processing binary
files especially ones with few changes between revisions. It also works very
well when the changes consist of mostly moving contents around since this only
requires one record. The results of the NSTD algorithm are pretty horrible for
a human to read, but they are still possible to understand. This release of the
NSTD algorithm (0.3) has been thoroughly tested and has no memory leaks or
buffer overflows thanks to Valgrind. It also includes an initial implementation
of the Ratcliff-Obershelp algorithm and merging. There is a document which
describes the algorithm as well. This version is being used extensively in the
FastCST project.
Source Release [tar.bz2] and Algorithm Document [PDF].