Posted on 2018-02-21 18:00 vtools
Original vdiff as written by asciilifeform is a tiny awk script that calls out to standard1 sha512sum
and diff
. It was a great illustration of the concept, but inadvertently suffered from in-band issues, poor portability across the unixes in use by the republic. This post introduces the vtools project that is supposed to address some of these shortcomings as well as deliver some long expected features.
First release consists of two patches, one is the genesis, a stripped down version of diff2, the other is SHA-512 code bolted to the differ. I've decided to keep the two parts separate, since the next release will be explicitly about replacing SHA-512 with Keccak, and because the coupling between two parts might be educational for people who might want to hack on this differ themselves. Lacking a republican SHA, I'm using one I've lifted from Busybox.
The result is feature equivalent with the current vdiff, and should produce equivalent patches on the same code base.3
I took diff from GNU diffutils 3.6 and stripped it down to parts absolutely necessary for the functioning of vdiff. Specifically awk vdiff passes -rNu
flags, which makes the operation recursive, produces diffs for missing files, and generates output in unified format. Diff codebase is split between lib
and src
. The former includes copy-pasted code, that's shared between GNU projects. There's a lot of redundancy there, difftools carries an entire compatibility substrate with it; a lot of code there I could eliminate at the expense of "portability". It's unclear to me how much the result has suffered, since the functionality in lib
folder lacks internal consistency. Code that theoretically could run on DOS, shares space with code that has hard unix assumptions baked in. The other directory, src
is where diff proper lives. The distinction is arbitrary, but I've kept it for now, because it is aiding me in the exploration of the code.
The project has a top-level Makefile which will from now on build all the different tools. Functionality ought to be self evident, press the tree, call make at top level, you get a vdiff executable. Vdiff takes some extra arguments, left over from diff, I pruned them down to only ones that are still at all relevant, but their availability shouldn't be relied on.
Testing, comments and hate are welcome.4
It takes about 9.5s
to generate the entire set of patches from the current trb stable using awk vdiff, on the other hand this implementation takes 1.2s
on my machine. For reference it takes about 0.2s
to simply diff those files (producing broken vpatches). The way I tested this, I generated all the intermediate presses (test1.tbz2) for trb from genesis
to makefiles
, and then diffed all those presses against each other.5
cloc on diffutils-3.6
------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C 338 9412 12349 49897 Bourne Shell 96 9135 6533 40712 PO File 32 9082 13029 29382 C/C++ Header 166 4272 7421 22080 m4 197 1283 1387 20227 TeX 1 812 3694 7175 make 13 1760 1469 3875 Perl 1 103 117 451 sed 2 0 0 16 ------------------------------------------------------------------------------- SUM: 846 35859 45999 173815 -------------------------------------------------------------------------------
cloc on the fresh press of vdiff
------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C 15 784 951 2740 C/C++ Header 12 280 351 609 make 1 5 0 34 ------------------------------------------------------------------------------- SUM: 28 1069 1302 3383 -------------------------------------------------------------------------------
Lines of code is a somewhat meaningless metric in this case, since vdiff is not a replacement for diff proper. Perhaps a more relevant metric would've been a vdiff written from scratch, but lacking that we can marvel at great savings.
strcmp
to sort names the same for everyone. Well turns out that en_US.UTF-8
places dot before dash, while C locale, which is what strcmp
uses, places it after. Which means that for example mpi
which contains files named mpi.h
and mpi-internal.h
will have files in different order when produced by C vdiff as opposed to awk vdiff. This might've been prevented if later had LC_ALL=C
set, but as it stands most of the extant vpatches have been produced with whatever system locale.
This can be easily demonstrated,
# mkdir a b
# echo foo > b/mpi.h
# echo foo > b/mpi-internal.h
# LC_ALL=C diff -ruN a b | grep '^diff'
diff -ruN a/mpi-internal.h b/mpi-internal.h
diff -ruN a/mpi.h b/mpi.h
# LC_ALL=en_US.UTF-8 diff -ruN a b | grep '^diff'
diff -ruN a/mpi.h b/mpi.h
diff -ruN a/mpi-internal.h b/mpi-internal.h
Now would be a good time to introduce standard republican alphabetic order.
Like the internationalization there are potentially some changes that are the result of the cut, but these should be considered bugs. There is one explicit change though that was made which is related to the diagnostic output. What differ does when it encounters files that it can't produce a diff for is standardized by POSIX. It is expected to produce messages of the format "File foo is a directory while file bar is a regular file", etc. and output them in band. In case of diff this is perhaps useful behavior, but vpatch format doesn't recognize these sort of messages, so patch author has to remove them by hand. These messages are notoriously hard to spot, and during testing I found a leftover "Binary files a/logbot/logbot.fasl and b/logbot/logbot.fasl differ" in the published logbot's vpatch. So all the diagnostic messages now go to stderr. Vdiff's output should be standard format vpatch. [↩]
wget http://btcbase.org/data/vtools/test1.tbz2
tar xjf test1.tbz2
cd test1
bash run.bash <path to vdiff>
[↩]Post a comment
[...] a choice of vdiff tools to use, since phf conveniently just published the first part of his work on vtools. I can therefore happily report that his patches press fine and his resulting vdiff worked on this [...]
Posted 2018-08-0709:02 by EuCrypt Chapter 11: Serpent « Ossasepia
[...] and Ada programming languages use "--" as comment marker. This was part of the motivation behind vtools, which took the approach of avoiding the system's existing "diff" program in favor of a [...]
Posted 2020-03-3114:02 by Adventures in the forest of V « Fixpoint