BARKS IN THE WIND: vtools C interop, other fixes

Posted on 2018-03-08 02:15 technical, vtools

The original plan to get vpatch released this week fell through, instead there's more bug fixes.

The exercise of trimming GNU diff left a bad taste in my mouth, the end result, while significantly reduced and thus easier to study, is still a significant chunk of "clever" C code clocking at 3383 lines. But the exercise was worthwhile1 in that it allowed me to explicitly preserve diff's quirks when it comes to hunk construction, in order to be able to replicate existing vpatches.

Same consideration doesn't apply to a patcher, since a patcher is entirely dumb machinery, a kind of player piano, executing instructions from pre-recorded tape. As I've been enjoying my brief foray into Ada programming, thrust as it was on me by the republic, I decided to stick to the environment and use it to implement the patcher also. There is some rational reasons for using Ada for patcher instead of C. Where differ can afford to be sloppy in operation-- an operator can identify issues by reading the patch-- a press absolutely must result in a tree of files claimed by the press chain, or fail explicitly.

Current version of ada patcher was modeled on btcbase's internal Lisp implementation, and at 490 lines, can successfully press trb's "stable" branch. Unfortunately in the process of testing the patcher I've discovered another bug in the current keccak differ, that ate up the rest of my allocated time.

The possibility of that bug was hinted at in the recent conversation with diana_coman on the subject of Ada and C interoperability. Vtools interfaces to SMG's Keccak has two functions that among other things transfer arrays of characters between diff's C code and Keccak's Ada, C_Hash and C_End. The first takes in the text of the files under the consideration, in chunks, and the later sends back the final hash value, encoded as an array of bits, and on the C side represented as a string. Last week's vdiff uses Interfaces.C. Char_Array type to point to a shared char buffer, and standard functions Interfaces.C. To_Ada / Interfaces.C. To_C to convert the data between languages.

Well, in our conversation, Diana mentioned that in her experiments with To_Ada it sometime stoped too soon and failed to copy the entire contents of buffer. My reaction at the time was essentially "well, works for me"2, except now it doesn't, and on the most recent pass the issue came back with vengeance, almost entirely failing to transfer the data3. The test environment ostensibly stayed the same, so I'm completely mystified by the behavior. I'll attempt a dive into Ada's code to at least understand what's going on, but meanwhile, I rewrote the offending bits using yet another C to Ada copy method.

This is not the approach that diana_coman took in her code, which still uses Interfaces.C. Char_Array for data type, but instead of using To_C/Ada her functions explicitly walk the buffer in a loop, copying each character one by one. I instead listened to the siren's call of Ada's documentation4 and went with Interfaces.C. Pointers, which even provides a helpful example of a roundtrip at the end of the section. The details of the implementation can be seen in the patch, but they follow almost directly the blueprint in the documentation. The approach is similar to what diana_coman does, in that characters are copied in an explicit loop, but instead of Char_Array I'm now using a pointer abstraction, which mimics C's behavior and requires explicit advancement.

The method that I've implemented turned out to have already been condemned by ave1. Apparently he wrote what looks like an extensive investigation into dealing with char array. All in all much back to the drawing board.

The patch also includes a backport of bounds error fix for smg_keccak, documented in extensive detail on diana_coman's blog.

  1. besides, that is, personal educational value []
  2. clearly there's a need for a better stress testing environment on my part []
  3. although when it comes to hashing, "almost" entirely is the proverbial "shit soup" []
  4. standard!!1 []

Yes, I did condemn the use for char_ptr, it will work tough for properly null terminated strings.

It does have a two problems. Parts of the code is Ada code written as C (implementation of addition for char pointers, so that the code can walk over the array). And more problematic, all based on a custom strlen without any limit checks (should have been more strnlen like).

That said, using the GNAT internal buffer structures in C is also way too fragile and I would not recommend it.
I guess, a custom char_ptr / char_array based module will be necessary in the end.

Posted 2018-03-0805:45 by ave1

You know rereading your blog post, I misread the first time. I think I might've used yet another method, that's not mentioned in your post, unless char_ptr uses it internally. Have you looked at Interfaces.C.Pointers? It's C style pointer arithmetic done using procedures and wrapper types on the Ada side. It comes closest so far to how I think Ada to C interoperability should work: if you're going to work with C, you might as well make what is happening as explicit as possible, rather than rely on smart methods to do things for you. The only advantage of something like char_array would be some sort of transparent elimination of explicit cross boundery copy, but that requires GNAT's internal support. Do you know of char_array actually copies bits, or if all that code ensures that we just cast C's ptr to some bogus Ada type?

Posted 2018-03-0812:04 by phf

I've read it now, I see there is simply no way around it, to call from C to Ada, you'll need these modules. Pointer does not seem to support a copy from ptr to an array, but that's not hard to implement. For strings, the C.Strings is probably best here.

The procedures in the C.Strings module copy the bytes one by one. So, a possible scheme to call from C to Ada seems to need on the Ada side, to first use a "char_ptr" to "char_array" procedure. And then "char_array" to "String" (using the To_ADA call). To pass the string out, do the reverse.

For calling C from Ada you can simply use char_arrays, or even Strings. The char_array / String difference is mainly conceptual on GNAT ADA. I.E. in theory an Ada character could be something different then a C char, in practice it never is. Using Strings (for calling C code from Ada, not the reverse), will eliminate all the conversion steps etc, but may make the code less portable across Ada implementations.

In short, there does not seem to be a clean way.

Posted 2018-03-0915:48 by ave1

Post a comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>