Latest News >> 2008-07-20 2008-06-25

I’ve been completely fed up with news/feed/rss/atom readers these days. I use Linux as my primary operating system, and I only have a few feeds that I want to rip through quick so I can get to reading the content. Yet, trying to find a reader that doesn’t suck donkey balls has been a chore.

2008-06-21

Wanna know what all the Ruby vulnerabilities are? Or at least have a fun look at how to search through code for clues? It’s a blast.

2008-06-13

I’m dropping a large blog post on everyone to just say that I haven’t died, I’ve just been busy working on my book for A/W about Mongrel. I had contracted with them to do a book about deploying Mongrel, but then decided it wouldn’t be a very good book since we’d already done one about that topic and there wasn’t too much more to say.

Non-Linear Suffix Tree Deltas

The NSTD library is a novel means of generating a delta between to revisions of a file (or any sequence) using a Suffix Tree so that the algorithm is O(n). The primary difference between the NSTD algorithm and others such as Levenshtien or Ratcliff-Obershelp is that it is not aligned. By removing the requirement for alignment it’s possible to only have two records-INSERT and MOVE-and to create smaller deltas when the changes are relatively few. This algorithm is fairly fast, but uses quite a bit of memory (20 times the size of the original file plus the size of the new file). This is contrast to Levenshtien which usually requires N^2 in memory size. NSTD is excellent for processing binary files especially ones with few changes between revisions. It also works very well when the changes consist of mostly moving contents around since this only requires one record. The results of the NSTD algorithm are pretty horrible for a human to read, but they are still possible to understand. This release of the NSTD algorithm (0.3) has been thoroughly tested and has no memory leaks or buffer overflows thanks to Valgrind. It also includes an initial implementation of the Ratcliff-Obershelp algorithm and merging. There is a document which describes the algorithm as well. This version is being used extensively in the FastCST project.

Source Release [tar.bz2] and Algorithm Document [PDF].