vtools keccak « BARKS IN THE WIND

Posted on 2018-02-28 23:55 technical, vtools

Last week's vtools genesis received a warm welcome, and I appreciate all the testing people did. And with testing came bugs. One of the compilation warnings was quietly resolved in a regrind, before anyone but mod6 had a chance to look. Meanwhile hanbot found another two issues.

One is related to compilation, a C11 feature, a _Noreturn attribute on one of the functions, prevented vtools from compiling on older version of GCC¹. Even though in the makefile I specified that the source code is C99 conformant, what I was missing is -pedantic flag², that would've warned me that _Noreturn is not available in C99. The fix was trivial and that was to use a GCC specific __attribute__((noreturn))³.

The other bug is a lot more serious. Unix diff is a line oriented tool, and according to POSIX a line "is a sequence of zero or more non- <newline> characters plus a terminating <newline> character." A file that doesn't end in newline is not even consider a proper text file⁴. Never the less this is a convention that's not often followed, nor do I think it should be, but the support for the alternative needs special case handling. Unified Format that we use puts \ No newline at end of file in the appropriate places. Fuzzy patches can afford to be sloppy with newlines, but vpatch makes hard claims by including the hash of the source and target files. Lacking the marker patch adds extra newline on press invaliding the hash. This is exactly what happened with the current vdiff⁵, I took a wrong turn at conditional flag, and instead of producing correct output, vdiff was simply warning the operator about the missing newlines.

A fix for both of these issues is available in two identical patches, one against the genesis and the other one is on top of SHA release from last week. The reason for two separate patches is that this week's vtools is diverging into SHA and Keccak versions. The SHA release I intend to support until we have the necessary tooling to both press and display Keccak patches. As such people who want to continue testing vdiff in their current workflow should grab vdiff_sha_fixes_newline_gcc to fix the bugs that came up last week.

Keccak support turned out to be a lot trickier than I expected. I'm using Diana Coman's smg_keccak, which is written in Ada.

I ran into two problems, one is that the interface to smg_keccak assumes stateless operation. You pass input and output buffers to Sponge procedure. And you get the immediate result in a single pass. Diff on the other hand reads files block by block, so a conventional hashing interface would be more appropriate. You setup a context, then you incrementally feed new buffers to the context, and eventually you end the process by closing the context and producing the hash result. This is the technique I ended up implementing,

385   -- state based Sponge
386   type Keccak_Context (Block_Len: Keccak_Rate := Default_Bitrate) is
387      record
388         Internal: State := (others => (others => 0));
389         Block: Bitstream(1..Block_Len) := (others => 0);
390         Pos: Natural;
391      end record;
392
393   procedure KeccakBegin(Ctx: in out Keccak_Context);
394   procedure KeccakHash(Ctx: in out Keccak_Context;
395                        Input: Bitstream);
396   procedure KeccakEnd(Ctx: in out Keccak_Context;
397                       Output: out Bitstream);
398

This is essentially Sponge procedure, with the state externalized into Keccak_Context record. KeccakBegin zeroes out the context. There is some complexity, which I handled somewhat inelegantly: KeccakHash treats context's Block as a circular buffer, mapping Input at a moving index. When index gets to a boundary, i.e. Block bitstream is filled, the whole block is fed to AbsorbBlock and the Keccak_Function is applied. This is identical to what Sponge does, except my code for handling the circular buffer is dirty. Ada turns out allows you to number a sequence from an arbitrary base. Diana has some elegant slicing going in her Sponge but I fell back to potato programming with variables being carefully updated in a goto loop. Finally KeccakEnd pads whatever's left in block, and finishes the hash. Experienced Ada programmers are invited to read the procedures and suggest improvements.

Second problem I ran into is that the C interoperability code that Diana Coman kindly provided no longer worked for me. I wrote a separate file that exposes Keccak_Context to C. You can guess the purpose of that code, by the very frequent appearance of letter "C".

281 package Keccak_C is
282    subtype C_Context is Keccak_Context(Block_Len=>Default_Bitrate);
283    type C_Context_Access is access C_Context;
284    procedure C_Get_Size(Size: out Interfaces.C.size_t);
285    pragma Export (C, C_Get_Size, "keccak_get_ctx_byte_size");
286    function C_Begin return C_Context_Access;
287    pragma Export (C, C_Begin, "keccak_begin");
288    procedure C_Hash(Ctx: C_Context_Access;
289                     Input: Interfaces.C.Char_Array;
290                     Len: Interfaces.C.Size_T);
291    pragma Export (C, C_Hash, "keccak_hash");
292    procedure C_End(Ctx: C_Context_Access;
293                    Output: out Interfaces.C.Char_Array;
294                    Len: Interfaces.C.Size_T);
295    pragma Export (C, C_End, "keccak_end");
296    procedure C_Deallocate(Ctx: in out C_Context_Access);
297    pragma Export (C, C_Deallocate, "keccak_free");
298 end Keccak_C;

With corresponding C side headers,

171 extern void *keccak_begin();
172 extern void keccak_get_ctx_byte_size(size_t *size);
173 extern void keccak_hash(void *ctx, char *array, size_t size);
174 extern void keccak_end(void *ctx, char *out, size_t size);
175 extern void keccak_free(void **ctx);
176 extern void adainit();
177 extern void adafinal();

The relevant code uses two techniques for passing data back and forth. C_Context_Access is what Ada calls access type, or what in other languages is called a pointer. From the perspective of C it's an opaque pointer that we get from Ada, and simply pass around through the lifetime of the operation. The underlying record is dynamically allocated with keccak_begin, and it's the users responsibility to deallocate the memory with keccak_free. Note that the later takes a pointer to a pointer, because that's how Ada seems to demand it, and after freeing the pointer itself is set to NULL. I think this is a much cleaner convention than what's standard in the C world, i.e. freeing without also updating the pointer.

The other technique has already been demonstrated in Diana's Hash function and it's used in both keccak_hash and keccak_end. We use a pointer to a character buffer, and explicitly pass the size of that buffer. On the Ada side we then allocate size amount storage and populate it with data from buffer. Similar but reverse technique is used to get the data out. Ada's array interoperability seems to be limited to character arrays⁶, and by default the interning functions assume that the data is NULL terminated the way C string would be. So it's important to explicitly tell Interfaces.C.To_Ada and Interfaces.C.To_C to avoid NULL handling by passing Trim_Nul => False and Append_Nul => False respectively.

Writing the necessary Ada interoperability code was frankly a pain in the ass, but I don't think it was Ada's fault at any point. Every single change at my hands would produce a boundary check warning or an explicit exception, but once the issue was resolved resultant code would ultimately make sense, as opposed to being an arbitrary hack the way it is sometimes done in other languages. The result seems to work, and even produce testable results, but I'm not sure of its overall reliability. I expect there to be bugs.

Keccak patch gets its own subtree in the vtools graph. As I mentioned I'm applying fixes on top of genesis, but vdiff's code is also dependent on a creatively named "keccak" vpatch, which is smg_keccak.ads and smg_keccak.adb taken verbatim from eucrypt project. This way while reading vdiff_keccak it would be obvious what changes I've introduced. Still lacking rename and other similar operations in v machinery, I copied the smg files into the vtools hierarchy, otherwise they are identical to their source.

make can still be used to produce a vdiff file, but it's thin wrapper on top of grpbuild build project. Resultant vdiff works as expected, except it produces a keccak hash⁷.

anything before 4.7 [↩]
go figure, it's not enough to just ask for a standard, you also need to declare yourself a pedant [↩]
One possible exercise would be to attempt to compile vtools on a minimalist compiler, like PCC, and see how many of these GCC-isms will fall out [↩]
POSIX again says that a text file is A file that contains characters organized into zero or more lines. [↩]

% echo -n test > foo
% hexdump foo
0000000 6574 7473
0000004
% shasum -a 512 foo
ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff  foo
% vdiff bar foo|tee p1
vdiff: foo: No newline at end of file

--- bar false
+++ foo ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff
@@ -0,0 +1 @@
+test
% patch<p1
% hexdump bar
0000000 6574 7473 000a
0000005
% shasum -a 512 bar
0e3e75234abc68f4378a86b3f4b32a198ba301845b0cd6e50106e874345700cc6663a86c1ea125dc5e92be17c98f9a0f85ca9d5f595db2012f7cc3571945c123  bar

Correct behavior

% ./vdiff qux foo|tee p2
--- qux false
+++ foo ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff
@@ -0,0 +1 @@
+test
\ No newline at end of file
% patch < p2
% shasum -a 512 qux
ee26b0dd4af7e749aa1a8ee3c10ae9923f618980772e473f8819a5d4940e0db27ac185f8a0e1d5f84f88bc887fd67b143732c304cc5fa9ad8e6f57f50028a8ff  qux

[↩]

corrections are welcome [↩]
Apparently keccak allows arbitrary sized hashes, but the one in vdiff is set to 64 so it matches SHA output. Another special constant is the bitrate that's set to eucrypt's 1344. The implications of both elude me, but I'm sure necessary adjustments will become obvious with reflection and use. [↩]

One way to think of the bitrate is as working capacity of the Sponge: how many bits it can "absorb" or "squeeze" *between two state scrambles* (the whole cycle goes: absorb bits until full or out of bits; scramble; repeat until out of bits). The scramble touches *all* bits in the state, but the absorb/squeeze touches only bitrate bits of the state.

The bitrate has to be between 1 and the total bits of the Keccak state (currently 1600 although it can change if you change the values of constants in smg_keccak.ads). The higher the bitrate, the faster the hashing goes but in principle the "less secure" the hashing (fewer scrambles and also fewer or potentially none at all bits that remain "secret"). And the other way around: the lower the bitrate, the "more secure" aka more scrambles but also overall slower.

Posted 2018-03-0310:04 by Diana Coman

Thank you for clarification! That's what I figured from reading you post on the subject, but the bug in the bitrate threw me off. I thought maybe there's some hidden constraint on bitrate that I didn't understand. With the fix in place, and with your explanation the behavior is a lot more obvious.

Posted 2018-03-0513:20 by phf

[...] to prompt use of Keccak as hashing method in the new vtools by phf , an error has been discovered and neatly reported1 about 8 hours ago. Thanks to the very helpful [...]

Posted 2018-08-0709:01 by EuCrypt: Correcting Bounds Error in Keccak (Word-Level) « Ossasepia

BARKS IN THE WIND: vtools keccak