# A practical look at JEP-389 in JDK16 with libsodium

 JDK-17 will be released on the September 14th 2021, it will come with another incubator JEP-412 for foreign function and foreign memory, go to the updated article for JDK-17.

JDK 16 is coming and with the incubating JEP-389 (Foreign Linker API).

The Foreign Linker API is a very convenient and attractive way to connect to the native world. Let’s have a practical look at this API that should supersede JNI. In order to do so I wanted Java code to interact with the infamous libsodium.

First I will focus on using the foreign linker API, then I will show how to use `jextract` in its current state (it is still being actively developed).

 Note that JEP-389 is still incubating, therefore examples below are to be obsolete for the next JDK as API and behavior are further refined.
 The following examples were based on JDK 16 release candidate build 36 (2021/2/8).

Thanks to Jean-Phillipe Bempel for the review and in particular spotting errors.

## The crypto sealed box example

Let’s try to reproduce the following example from the libsodium sealbox documentation, on this page there is a simple code snippet, that could be interesting to reproduce in Java.

Crypto sealed box example
``````#define MESSAGE (const unsigned char *) "Message"
#define MESSAGE_LEN 7
#define CIPHERTEXT_LEN (crypto_box_SEALBYTES + MESSAGE_LEN)

/* Recipient creates a long-term key pair */
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);

/* Anonymous sender encrypts a message using an ephemeral key pair
* and the recipient's public key */
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);

/* Recipient decrypts the ciphertext */
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
recipient_pk, recipient_sk) != 0) {
/* message corrupted or not intended for this recipient */
}``````

### Testing the idea in `jshell`

One of the cool thing with `jshell` is that you can try small ideas with a rapid feedback loop. With the right configuration, it is also possible to play the foreign linker.

Allow jshell to use the foreign module
``$jshell --add-modules jdk.incubator.foreign -R-Dforeign.restricted=permit`` Then within jshell, let’s try out a simple smoke test. Smoke testing the foreign module ``````jshell> import java.lang.invoke.*; jshell> import jdk.incubator.foreign.*; jshell> var getpid = CLinker.getInstance() ...> .downcallHandle( ...> LibraryLookup.ofDefault().lookup("getpid").get(), ...> MethodType.methodType(long.class), ...> FunctionDescriptor.of(CLinker.C_LONG) ...> ); getpid ==> MethodHandle()long jshell> (long) getpid.invokeExact();$4 ==> 53699

jshell> ProcessHandle.current().pid()
$5 ==> 53699`````` Yes it works ! It really is easy to try a things for almost free, without leaving Java this is really neat. Now I would like to focus on the small example with libsodium within a project. I’ll explain how to use the API along the way. ### Configuring Gradle The incubating modules are not on the default module path. Hence, it is required to add the `jdk.incubator.foreign` module when invoking the compilation command. ``$ javac --add-modules jdk.incubator.foreign ...``

This module also needs to be declared when running this code, as well as another property `foreign.restricted` to be able to invoke native code.

``$java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...`` If you like to play with `jshell`, it will be necessary to use these two as well ``$ jshell -R-Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...``

Then comes the question to configure the build tool. I am using Gradle, the configuration is likely similar for other build tool.

``````// ...

java {
toolchain {
languageVersion.set(JavaLanguageVersion.of(16))
}
}

withType<JavaCompile>().configureEach {
options.forkOptions.jvmArgs = listOf(
)

options.compilerArgs = listOf(
)
options.release.set(16)
}

withType<JavaExec>().configureEach {
jvmArgs("-Dforeign.restricted=permit", (3)
}

withType<Test>().configureEach {
useJUnitPlatform()
jvmArgs("-Dforeign.restricted=permit", (4)
}
}``````
 1 Gradle itself can run on a different JDK, but the code needs to be compiled with JDK16, at this time Gradle 6.8.2 does not support the new module restriction introduced with JDK16 by default, hence it is necessary to explicitly open modules. See gradle/gradle#15538. 2 Let the compiler knows about the `jdk.incubator.foreign` module 3 Configure the tasks that executes a main class, while this is not immediately useful IntelliJ IDEA will pick up this configuration, when you click running a `main` method. 4 Configure test tasks to be able to run `jdk.incubator.foreign` tests.

### The first and minimal call `crypto_box_sealbytes`

The first lines makes use of a few macros (the lines starting with `#define`), we can assume that `MESSAGE` will be a method parameter, `MESSAGE_LEN` will be derived from the message parameter, and `CIPHERTEXT_LEN` is also derived from the message but needs another constant `crypto_box_SEALBYTES`.

The first thing needed is to acquire the `crypto_box_SEALBYTES` constant, looking at `crypto_box.h` there’s a method `size_t crypto_box_sealbytes(void);` that returns this constant.

It’s simple, and it will be the first method I will present here.

The first challenge is to map the return type `size_t`, unsigned integer type, since the constant 1 2 3 is inferior to the integer max value and that I’d like to use this as an array size, I will map it to an `int`.

crypto_box_sealbytes (.java)
``````MethodHandle crypto_box_sealbytes =
.downcallHandle(
libsodiumLookup.lookup("crypto_box_sealbytes").get(),
MethodType.methodType(int.class),
);

var crypto_box_SEALBYTES = (int) crypto_box_sealbytes.invokeExact();``````

The java type and the C descriptor must match, otherwise the call will fail at runtime with a `IllegalArgumentException`.

Example 1. Carrier mismatch long != b32

If the java method type used `long.class`, and the C descriptor was `C_INT`, the code would have failed with a carrier mismatch.

``java.lang.IllegalArgumentException: Carrier size mismatch: long != b32[abi/kind=INT]``
Example 2. Carrier mismatch int != b64

If the java method type used `int.class`, and the C descriptor was `C_LONG`, the code would have failed with a carrier mismatch.

``java.lang.IllegalArgumentException: Carrier size mismatch: int != b64[abi/kind=LONG]``

For reference, `CLinker.C_INT` is actually a `MemoryLayout`, a layout is used to model native memory.

### Then a more interesting case, passing argument pointers

The next part of the example is a little more involved code, the `crypto_box_keypair` method takes two array pointers `recipient_pk` and `recipient_sk`, the generated keypair will be written to the given byte array.

crypto_box_keypair (.c)
``````unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);``````

In order to initialize the size of these arrays, the codes needs two constants `crypto_box_PUBLICKEYBYTES` and `crypto_box_SECRETKEYBYTES`. To access these two it’ll be the same as `crypto_box_SEALBYTES`.

The C mapping is easy to get : a void method that takes 2 pointers `FunctionDescriptor.ofVoid(C_POINTER, C_POINTER)`. In Java the method type require a type called `MemoryAddress` which represents the pointer address.

The pointers need to point to some memory. That’s what the `MemorySegment` type is for. Before invoking the method the necessary memory will be allocated via `MemorySegment::allocateNative`, and the respective memory segment address will be passed.

crypto_box_keypair (.java)
``````MethodHandle crypto_box_keypair =
libsodiumLookup.lookup("crypto_box_keypair").get(),
MethodType.methodType(
void.class,
),
FunctionDescriptor.ofVoid(C_POINTER, C_POINTER)
);

var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes());
var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes());

var kp = new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);``````
 This code works, but there is something that must be taken care of, the native segment lifecycle.

The above code snippet never deallocate native memory. Fortunately in JDK 16 the `MemorySegment` class implements `AutoCloseable`, declaring it in a try-with_resources block will solve the issue.

`MemorySegment` lifecycle
```try (var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes());
var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes())) {

return new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);
}```

However, JEP-389 comes with the concept of scopes, which allows to express temporal bounds of these segments, in JDK16 look for the `NativeScope` class, it allows registering segments in a code section and allocating segments anywhere in this section.

crypto_box_keypair with `NativeScope` (.java)
``````try (var scope = NativeScope.unboundedScope()) {
var recipientPublicKey = scope.allocate(crypto_box_publickeybytes());
var recipientSecretKey = scope.allocate(crypto_box_secretkeybytes());

return new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);
}``````

In order to get back the off-heap content into Java types, the code can call any of the `to{The Java Type}` methods on the `MemorySegment` instance, they will take care of the conversion.

### Next invoking the sealing method

The next method to call is `crypto_box_seal`, which also takes pointers and a message length.

crypto_box_seal (.c)
``````unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);``````

When looking at the C signature however we notice something unusual for Java developers: the message length argument is of type `long long`!

In C or C++, this declaration means the type is at least 8 bytes (64 bits), this means a Java `long` type is what is needed.

In particular here’s a breakdown of the signed integers. It is incomplete as they can be declared differently (eg. `long` is the same as `long int`, or `long long` is the same as `long long int`), this wikipedia page has a more complete overview of C data types.

 `int` A signed integer type with "the natural size suggested by the architecture of the execution environment", with a minimum of 2 byte (16 bits, [-32767; +32767]). On a 64bits CPU, `int` is 4bytes and the range becomes [-2147483647; +2147483647]; `long` A signed integer type that is at least so 4 bytes ([-2147483647; +2147483647]). On a 64bits CPU, `long` is 8bytes and the range becomes [−9223372036854775807; +9223372036854775807]; `long long` A signed integer type that is at least so 8 bytes ([−9223372036854775807; +9223372036854775807]). On a 64bits CPU, `long long` is still 8 bytes long.
 When you start to study these C data types a bit more, you’ll notice two things that just don’t match with Java types: `unsigned` integers, while they do have the same width as their signed counterpart, their math is different as their range is different: `unsigned long`'s range is [0; +4294967295] (on a 64 bit CPU) `unsigned long long`'s range is [0; +18446744073709551615] (on a 64 bit CPU) `long double`s are larger than 64 bytes, I never had to use those, but it seems they can be as big as 128 bits (16 bytes). As a reminder `size_t` is unsigned.
crypto_box_seal definition (.c)
``````SODIUM_EXPORT
int crypto_box_seal(unsigned char *c, const unsigned char *m,
unsigned long long mlen, const unsigned char *pk)
__attribute__ ((nonnull(1, 4)));``````

Also, for this post, and I intend to pass a short `String` message, which is baked by a `char` array whose length can only be an `int`.

crypto_box_seal (.java)
``````var crypto_box_seal = CLinker.getInstance().downcallHandle(
libsodiumLookup.lookup("crypto_box_seal").get(),
MethodType.methodType(int.class,
long.class,          // message length
),
FunctionDescriptor.of(C_INT,
C_POINTER,
C_POINTER,
C_LONG_LONG,
C_POINTER)

);

try (var scope = NativeScope.unboundedScope()) {
var cipherText = scope.allocate(crypto_box_sealbytes() + message.length());
var ret = (int) crypto_box_seal.invokeExact(
(long) message.length(),
);
return cipherText.toByteArray();
}``````

There’s a few thing to notice :

1. I am specifically passing the `US_ASCII` charset, as I now that the byte array representation of the string will be 1 byte per `char`, implying I can use the `String::length` method. If the string used characters that do not fit in a single byte, I would have needed to extract the byte array using `UTF-8` charset encoder first and use the length of the byte array instead.

2. The `var ret` is not used, however due to the dynamic nature of `invokeExact`, the compiler needs the exact signature on the call-site, that’s why the result of this invocation is assigned to an `int` variable even if it is not used.

Without this assignment the JVM would have raised a `WrongMethodTypeException`, in this case the exception message helps to identify the type differences in the signature:

``java.lang.invoke.WrongMethodTypeException: expected (MemoryAddress,MemoryAddress,long,MemoryAddress)int but found (MemoryAddress,MemoryAddress,long,MemoryAddress)void``

### Ending the crypto box example

The last method call of this snippet ends the libsodium crypto box example. The method `crypto_box_seal_open` take pointers and a ciphered text length so let’s apply again what has been done for `crypto_box_seal`.

crypto_box_seal_open (.c)
``````unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
recipient_pk, recipient_sk) != 0) {
/* message corrupted or not intended for this recipient */
}``````

Which translates to

crypto_box_seal_open (.java)
``````var crypto_box_seal_open = getInstance().downcallHandle(
libsodiumLookup.lookup("crypto_box_seal_open").get(),
MethodType.methodType(int.class,
long.class,          // cipherText.length
),
FunctionDescriptor.of(C_INT,
C_POINTER,
C_POINTER,
C_LONG_LONG,
C_POINTER,
C_POINTER
)
);

try (var scope = NativeScope.unboundedScope()) {
var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes());
(long) cipherText.length,

}``````

Yet running this code raise an error:

``````java.lang.IndexOutOfBoundsException: Out of bound access on segment MemorySegment{ id=0x6f11d841 limit: 20 }; new offset = 20; new length = 1
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.outOfBoundException(AbstractMemorySegmentImpl.java:495)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBoundsSmall(AbstractMemorySegmentImpl.java:465)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBounds(AbstractMemorySegmentImpl.java:446)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkAccess(AbstractMemorySegmentImpl.java:401)
at java.base/java.lang.invoke.MemoryAccessVarHandleByteHelper.get(MemoryAccessVarHandleByteHelper.java:113)
at jdk.incubator.foreign/jdk.incubator.foreign.MemoryAccess.getByteAtOffset(MemoryAccess.java:105)
at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.strlen(SharedUtils.java:259)
at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.toJavaStringInternal(SharedUtils.java:249)

I didn’t get why this code failed at first.

`CLinker::toJavaString` is the mirror function of the `CLinker::toCString`, so it looked correct.

The exception message indicates the segment has the size 20 which is the length of the string `Hello foreign code !`, there’s `new offset is 20` indicating the segment was read up to the 20th byte / character, and there is the `new length = 1`, which suggests `toJavaString` needs to read an additional character but can’t.

The javadoc of `toJavaString` says (emphasis is mine) :

Converts a null-terminated C string stored at given address into a Java string, using the platform’s default charset.

This immediately clicked: libsodium’s message does not imply it is a string. It’s API takes a pointer to a memory region and the length to read in that memory region. For all that matter, the message could be any binary payload.

Let’s look at the string `Hello`

1. Libsodium seal method will be passed the following byte array `CLinker.toCString("Hello", StandardCharsets.US_ASCII).toByteArray()``48656C6C6F00`

2. But since the code is using `String::length`, libsodium will only seal up to 5 bytes : `48656C6C6F`.

3. Then opening the seal, the content of the `MemorySegment` that contains the decrypted message will be `48656C6C6F`

4. But `CLinker.toJavaString(decipheredText, StandardCharsets.US_ASCII)` expects the memory segment to be a valid C string, terminated by the `\0` character. And since the actual decrypted memory segment is not terminated by '\0', the code emit an error.

For this reason this suggests the code to use is `new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII)`. They are other possibilities like not using the `CLinker::toCString` with the `crypto_box_seal` method and instead, or to increment by 1 the length when `CLinker::toCString` is passed.

For reference here are the bytes returned by `String::getBytes` and `CLinker::toCString`.

• `"Hello".getBytes(US_ASCII)``48656C6C6F`

• `CLinker.toCString("Hello", US_ASCII).toByteArray()``48656C6C6F00`

For this blog post I’d like to keep the assumption the sealed message is a `String`, which leads to the following correct code :

``````try (var scope = NativeScope.unboundedScope()) {
var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes());
(long) cipherText.length,

return new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII);
}``````

Also, I have intentionally left out the returned status of `crypto_box_seal_open`, to focus on the foreign module API, but this would make sense to perform checks on the returned value before returning the buffer as suggested on the libsodium documentation.

### Wrap up on manually using the Foreign Linker API

I didn’t cover everything this API has to offer, like the up call stubs, which is a way to pass a function pointer to the native code, nor did I cover the every feature of JEP-389, like `MemorySegment` or `MemoryLayout` API.

At this time I find this API a pleasure to use compared to JNI. Note that I don’t have experience with JNA, so I may be lacking perspective there.

There’s a few pitfalls like the `CLinker::toJavaString` or the `MemorySegment` lifecycles which get more complicated if those segments are shared between threads. I found the API well-designed and well documented, but if you’re novice in this area, you’ll likely need other materials. A package wide documentation, in `jdk.incubator.foreign`, should definitely fill this gap in my opinion.

The chosen example was concise in native code, but writing the stubs in Java is quickly tedious and verbose. JDK developers felt the same way as they are also investing energy on a tool named `jextract` whose goal is to reduce the tedious work amount. I’ll show in a section below what can be done with the current state of `jextract`.

## Remarks about `MemorySegment`s memory mapping

`MemorySegment` do have the same constraints as `DirectByteBuffer`s, ie by default the size of the segment can’t size can’t go over `Runtime.getRuntime().maxMemory()`

Allocating a very bigger segment than than `maxMemory`
``Exception in thread "main" java.lang.OutOfMemoryError: Cannot reserve 2147483648 bytes of direct buffer memory (allocated: 8192, limit: 522190848)``

This limit is configurable by setting the `-XX:MaxDirectMemorySize={size}` flag.

``var memorySegment = MemorySegment.allocateNative(nativeSegmentSize);``

There’s one interesting thing with this API it is possible to access the address from the API, via `MemorySegment::address`, and one can bet the hexadecimal representation, via `Long.toHexString(memorySegment.address().toRawLongValue())`.

``MemoryAddress{ base: null offset=0x7fc513fff010 }``

If you are on Linux then you use `pmap` from the procps package to inspect memory mappings of the JVM.

/pmap output of a 2GiB native segment
``````151:   java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign -XX:MaxDirectMemorySize=2100m MemorySegments.java
...
0000557635ba1000       4       0       0 r-x-- java
0000557635ba3000       4       0       0 r---- java
0000557635ba4000       4       0       0 rw--- java
0000557636d4b000     132      16      16 rw---   [ anon ]
00007fc513fff000 2097156 1811456 1811456 rw---   [ anon ] (1)
00007fc594000000     132       0       0 rw---   [ anon ]
00007fc594021000   65404       0       0 -----   [ anon ]
...``````
 1 This is the allocated segment, 2 GiB ⇐⇒ 2097152 KiB, this segment is a bit larger by one page (4 KiB). And in fact the base address of the segment is `0x7fc513fff010`.

In this case it is not related to alignment, but it may be possible. What is important is that the address of a `MemorySegment` may be contained in a larger memory mapping.

One important and useful distinction with `DirectByteBuffer`s is the presence of a `MemorySegment::close` method, that will immediately free the native mapping when called. `DirectByteBuffer` used to be challenging because they had no explicit method to free the native mapping, and as such had to wait for the GC to kick in order to be freed.

Initilization

Another thing to remind is that the memory mapping is zeroed, that means a big segment will take a noticeable time to get initialized. As with `DirectByteBuffer`s this pattern is interesting when inspecting off-heap memory.

Scope

Usually it is more practical to use the `NativeScope` API as it is easier to reason about boundaries of the involved memory mapping. Using a larger `MemorySegment` coud be interesting when it has to be sliced and shared among various threads. Also given the high initialization cost for large segments it’s likely to have the same lifecycle as the application. Typically, in a few years, Netty, Aeron, Kafka, Cassandra, …​ could make use of this API !

Slices

One thing that caught me off-guard, is that when closing a slice (created by `MemorySegment::asSlice`) also closes the underlying segment.

Multiple allocations

Finally, when the code requires new native allocation, the JVM appears to be able to grow native mappings. In short the JVM tries to put these segment in a bigger memory mapping.

Access modes

The access modes allows to define a set of permissions of the `MemorySegment`, by default all permissions are given. In the example below this segment won’t be readable by

``````var ms = MemorySegment.allocateNative(segmentSize)
.withAccessModes(MemorySegment.WRITE | MemorySegment.CLOSE);

ms.asByteBuffer().getLong(); (1)``````
 1 Throws UnsupportedOperationException: Required access mode READ ; current access modes: [WRITE, CLOSE]

I am not quite sure how to use these at this time. It certainly would be useful to prevent a slice from being closed though.

Also, the `WRITE` and `READ` permissions only apply to the Java object, the native memory mapping isn’t afected, which is expected since it can hold multiple `MemorySegment`.

From a file

Until JEP-389, we used a `FileChannel` and a `MappedByteBuffer` to memory map a file. The JEP-389 also take care of this use case, by using the `mapFile` factory method.

``````try (var mmaped = MemorySegment.mapFile(
path, (1)
0, (2)
Files.size(path), (3)
)) {
// ...
}``````
 1 A path eg Path.of("…​") 2 The base offset 3 The size of the mapping, here the complete file 4 The mapping mode

What is really nice here is that the `MemorySegment` is also immediately freed when the code leaves the try-with-resources block.

## JEP-389 is still incubating

I mentioned that `MemorySegment` is implementing `AutoCloseable`, it won’t be the case in the next JDK release. In the same manner I mentioned `NativeScope` earlier, which is a JDK16 API, but in the current panama state it will be replaced by a slightly different construct.

``````try (ResourceScope scope : ResourceScope.ofConfined()) {
MemorySegment.allocateNative(layout, scope):
MemorySegment.mapFile(… , scope);
}``````

Given the current state I have doubts JEP-389 will get out of incubating for JDK 17. JEP-389 is working well, but I think the developers may need more time to get this API right. They are doing a fantastic job in my opinion.

## `jextract`

`jextract` is still being backed and was not ready to be included in JDK 16 for incubation, but since it complements JEP-389, I wanted to give it a try and showcase its usefulness.

This tool leverages the native `libclang` and as such the `jdk.incubator.foreign` module.

In order to be able to use it, one should download the panama jdk here: https://jdk.java.net/panama/. Don’t be scared by early access, JDK 17 (very early at this stage) or the other warnings, you just need to use `jextract` not the panama jdk.

When I started to bootstrap work on JDK16 and libsodium, the built panama JDK didn’t contain the `jextract`, as I wasn’t sure I voiced this on Twitter, Oracle engineers confirmed me this was a bug in the release JDK-8261733 if this every happen again, or you want to try the latest `jextract`, you’ll need to build the panama JDK.

 Again the `jextract` tool is still being backed at this time. That means it that everything below can be obsolete any time.

### Extracting Java liking code from the Libsodium headers

The first thing I need is to get the headers of libsodium, and for that I cloned the repo. Then checked out the 1.0.18 tag as I intend to target this released binary.

Get the libsodium source
``````$git clone https://github.com/jedisct1/libsodium.git Cloning into 'libsodium'... remote: Enumerating objects: 151, done. remote: Counting objects: 100% (151/151), done. remote: Compressing objects: 100% (105/105), done. remote: Total 32369 (delta 74), reused 86 (delta 41), pack-reused 32218 Receiving objects: 100% (32369/32369), 8.24 MiB | 10.52 MiB/s, done. Resolving deltas: 100% (19205/19205), done.$ git checkout 1.0.18``````

Headers are located in this folder `src/libsodium/include`. Now let use `jextract`.

First contact with `jextract`
``````$jextract -d libsodium-jextract \ (1) -l sodium \ (2) --target-package com.github.bric3.sodium \ (3) -I src/libsodium/include/ \ (4) -I src/libsodium/include/sodium \ (4) --filter sodium.h \ (5) src/libsodium/include/sodium.h (6) src/libsodium/include/sodium/export.h:5:10: fatal error: 'stddef.h' file not found``````  1 Destination of the generated sources 2 Extracts or more precisely generate sources, instead of classes 3 Indicates the target package of the generated source 4 Includes of the library (some files include others in the library) 5 Only includes symbols from the given file, otherwise symbols of other includes may be extracted 6 The C header file Obviously the standard C headers are not discovered by `jextract`. I tried to solve this by declaring the system includes in `/usr/include` and `/usr/include/linux` (`/usr/include/linux/stddef.h`) but the error went a bit further with `unknown type name 'size_t'`. This is a known issue that for some platforms jextract has issues to find the system headers (JDK-8262127). `size_t` is a standard C alias representing the unsigned integer type. I found help in this old thread from november 2018. Instead of using the includes under `/usr/includes`, it is necessary to use the includes of the compiler ; on my docker image they were located here : `/usr/lib/gcc/x86_64-redhat-linux/8/include`. Also I noticed that `jextract` generates classes first, but you can pass a `--source` option to configure it to generate sources instead. On the next run of `jextract` the `extraction` process stopped on the file `version.h`. Includes the compiler headers ``````$ jextract \
-d libsodium-jextract \
-l sodium \
--source \ (1)
--target-package com.github.bric3.sodium \
-I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (2)
-I src/libsodium/include/ \
-I src/libsodium/include/sodium \
--filter sodium.h \
src/libsodium/include/sodium.h
 1 generates the sources 2 the compiler includes installed on this linux image

In the libsodium repository there’s a file named `version.h.in`, and upon inspection of its content I noticed placeholders that suggests a preliminary phase in the libsodium build will generate the final `version.h`. In native sources this usually happen via a combination of `./autogen.sh` and `./configure`.

Let’s prepare the code base.

Configure libsodium codebase
``````$./autogen.sh autoreconf: Entering directory `.' autoreconf: configure.ac: not using Gettext autoreconf: running: aclocal --force -I m4 autoreconf: configure.ac: tracing autoreconf: configure.ac: creating directory build-aux autoreconf: running: libtoolize --copy --force libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'. libtoolize: copying file 'build-aux/ltmain.sh' libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'. libtoolize: copying file 'm4/libtool.m4' libtoolize: copying file 'm4/ltoptions.m4' libtoolize: copying file 'm4/ltsugar.m4' libtoolize: copying file 'm4/ltversion.m4' libtoolize: copying file 'm4/lt~obsolete.m4' autoreconf: running: /usr/bin/autoconf --force autoreconf: configure.ac: not using Autoheader autoreconf: running: automake --add-missing --copy --force-missing configure.ac:75: installing 'build-aux/compile' configure.ac:9: installing 'build-aux/config.guess' configure.ac:9: installing 'build-aux/config.sub' configure.ac:10: installing 'build-aux/install-sh' configure.ac:10: installing 'build-aux/missing' src/libsodium/Makefile.am: installing 'build-aux/depcomp' parallel-tests: installing 'build-aux/test-driver' autoreconf: Leaving directory `.' Downloading config.guess and config.sub... Done. ./configure checking build system type... x86_64-pc-linux-gnu checking host system type... x86_64-pc-linux-gnu checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking for gawk... gawk checking whether make sets$(MAKE)... yes
checking whether make supports nested variables... yes
checking whether UID '0' is supported by ustar format... yes
checking whether GID '0' is supported by ustar format... yes
checking how to create a ustar tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating builds/Makefile
config.status: creating contrib/Makefile
config.status: creating dist-build/Makefile
config.status: creating libsodium.pc
config.status: creating libsodium-uninstalled.pc
config.status: creating msvc-scripts/Makefile
config.status: creating src/Makefile
config.status: creating src/libsodium/Makefile
config.status: creating src/libsodium/include/Makefile
config.status: creating src/libsodium/include/sodium/version.h (1)
config.status: creating test/default/Makefile
config.status: creating test/Makefile
config.status: executing depfiles commands
config.status: executing libtool commands``````
 1 Configuring `version.h` with version values

Finally, this time `jextract` worked as expected.

Working jextract command
``````$jextract \ -d libsodium-jextract \ -l sodium \ --source \ --target-package com.github.bric3.sodium \ -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ -I src/libsodium/include/ \ -I src/libsodium/include/sodium \ --filter sodium.h \ src/libsodium/include/sodium.h`````` However, when I opened `sodium_h.java` it was empty. ``````public final class sodium_h { /* package-private */ sodium_h() {} }`````` In the 1.x tree the `sodium.h` file only includes the declaration of other headers. When I explicitly filtered on `sodium.h`, `jextract` evicted symbols of the includes. How to keep the declarations of the other headers ? At this time `jextract` help is a bit vague. Jextract help ``````$ jextract --help
Non-option arguments:

Option                         Description
------                         -----------
-?, -h, --help                 print help
-C <String>                    pass through argument for clang
-I <String>                    specify include files path
-d <String>                    specify where to place generated files
--filter <String>              header files to filter
-l <String>                    specify a library
--source                       generate java sources
-t, --target-package <String>  target package for specified header files``````

Looking at the `jextract` source code was the way to go, first the code suggests that it’s possible to pass multiple filters (`--filter`), just like it is possible to pass multiple include (`-I`). Although it is not very practical with multiple values, isn’t is possible to pass a pattern ?

This is answered here in this document (Using the `jextract` tool) or in the source code in the `Filter` class ; it’s possible to pass `--filter` a part of the path, the current code will just verify if this string is contained in the header path.

Concretely I can use the string `sodium` as a filter to include headers located in `include/sodium/` folder.

Correct jextract command
``````$jextract \ -d libsodium-jextract \ (1) --source \ (2) --target-package com.github.bric3.sodium \ (3) -l sodium \ (4) -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (5) -I src/libsodium/include/ \ (6) -I src/libsodium/include/sodium \ (6) --filter sodium \ (7) src/libsodium/include/sodium.h (8)``````  1 Destination of the generated sources 2 Extracts or more precisely generate sources, instead of classes 3 Indicates the target package of the generated source 4 Name without the JNI prefix and suffix (or path) of the library to load 5 Includes C definitions or includes like `size_t`, `stddef.h` etc. 6 Includes of the library (some files include others in the library) 7 Only includes symbols from the given file, otherwise symbols of other includes may be extracted 8 The C header file Generated files ``````$ ls -lh libsodium-jextract-f/com/github/bric3/sodium/
total 956K
-rw-r--r--. 1 root root  557 Feb 16 14:10 C.java
-rw-r--r--. 1 root root 8.8K Feb 16 14:10 RuntimeHelper.java
-rw-r--r--. 1 root root 350K Feb 16 14:10 sodium_h.java
-rw-r--r--. 1 root root 124K Feb 16 14:10 sodium_h_0.java
-rw-r--r--. 1 root root 329K Feb 16 14:10 sodium_h_constants_0.java
-rw-r--r--. 1 root root 131K Feb 16 14:10 sodium_h_constants_1.java``````

### Invoking the library

Let’s have a look at what `jextract` generated. The entry point is the class `sodium_h`. In particular let’s compare the method stubs to these I wrote earlier :

• `crypto_box_sealbytes`

• `crypto_box_keypair`

• `crypto_box_seal`

• `crypto_box_seal_open`

The libsodium headers declare a method named `crypto_box_sealbytes`, whose role is to return a constant `crypto_box_SEALBYTES`, however this constant is defined as a C preprocessor directive `#DEFINE`, which is not visible as a symbol when performing a library lookup. The native `crypto_box_sealbytes` method compensates this limitation.

`jextract` is however reading the headers, in doing so it actually extracts the constant `crypto_box_SEALBYTES`. It is still exposed as method, and it is declared in a different class `sodium_h_0#crypto_box_SEALBYTES`.

Note that `sodium_h` extends `sodium_h_0`, so one will write

``sodium_h.crypto_box_SEALBYTES()``

Behind the scene this call invokes `sodium_h_constants_1#crypto_box_SEALBYTES`, and for `sodium_h` this split in two classes due to the class limits. `sodium_h_constants_1` extends `sodium_h_constants_0`.

#### First hiccup

When I accessed this constant for the first time, I got this error :

``````java.lang.ExceptionInInitializerError
at com.github.bric3.sodium.sodium_h_0.crypto_box_PUBLICKEYBYTES(sodium_h_0.java:1511)
at com.github.bric3.sodium.Libsodium$JextractedLibsodium.crypto_box_keypair(Libsodium.java:263) at com.github.bric3.sodium.LibsodiumTest.can_invoke_crypto_box_keypair(LibsodiumTest.java:44) Caused by: java.lang.IllegalArgumentException: Library not found: sodium at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.lookup(LibrariesHelper.java:94) at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.loadLibrary(LibrariesHelper.java:60) at jdk.incubator.foreign/jdk.incubator.foreign.LibraryLookup.ofLibrary(LibraryLookup.java:150) at com.github.bric3.sodium.RuntimeHelper.lambda$libraries$0(RuntimeHelper.java:46) at com.github.bric3.sodium.RuntimeHelper.libraries(RuntimeHelper.java:49) at com.github.bric3.sodium.sodium_h_constants_0.<clinit>(sodium_h_constants_0.java:14)`````` The stacktrace points to this code: sodium_h_constants_0.LIBRARIES ``````static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] { "sodium", (1) });``````  1 This is the value I passed to the `jextract` command. `RuntimeHelper::libraries` can load a library from a name (using JNI conventions, `JNI_LIB_PREFIX` and `JNI_LIB_PREFIX`) or a path. The value above is the value I used in the `-l sodium` option of `jextract`, yet this value here is obviously incorrect for my use case. Work around 1: with `jextract` It is not yet clear, in the `jextract` usage description at this time, but one can pass to the `-l` option 1. A library name, which has to be available on one of the paths declared in the JVM system property `java.library.path` linux `/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib` macOs `/Users/bric3/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.` The library must conform to JNI conventions, `libsodium.23.dylib` or `libtasn1.so.6.5.5` won’t work as they contain version numbers. 2. Or an absolute path eg `/usr/local/opt/libsodium/lib/libsodium.23.dylib`. However, the actual library path is dependent on the system, on the library version and on the installation mechanism. I could have used `jextract` with `-l /usr/local/opt/libsodium/lib/libsodium.23.dylib`, but then the generated code can not run on Linux without modifications, etc. My final objective for this code is to declare the libsodium bindings in java, and link with the actual libsodium on the platform macOs or Linux. Work-around 2: Modify generated code `LIBRARIES` is a static final variable that is used by other static variables in the same class. While it is possible to edit the `sodium_h_constants_0` class, it is still difficult to make this `LibraryLookup` code configurable without a significant refactoring. Oracle engineers are aware of this problem JDK-8262126, so we might see it fixed in the final JEP-389 release. For this article the easiest solution, is to declare the local libsodium path in the code, as I did in the first section of this blog. ``````static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] { "/usr/local/opt/libsodium/lib/libsodium.23.dylib" });`````` In the end I’ll rework this initialization later though with custom code to find the actual libsodium on the current platform. #### Now implementing the other functions Now let’s profit from the generated function call, in the same order I’d like to use `crypto_box_keypair`, this is straightforward. The arguments are still carrier type like `MemorySegment`, which means we still need to take care of the scope / lifecycle of these allocations. crypto_box_keypair ``````try (var scope = NativeScope.unboundedScope()) { var recipientPublicKey = scope.allocate(sodium_h.crypto_box_PUBLICKEYBYTES()); var recipientSecretKey = scope.allocate(sodium_h.crypto_box_SECRETKEYBYTES()); sodium_h.crypto_box_keypair(recipientPublicKey, recipientSecretKey); (1) return new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray() ); }``````  1 the jextracted method The IDE might suggest a method named `crypto_box_keypair$MH` ; the suffix `$MH` simply indicates this returns the Method Handle for this native method which is basically what I showed in the first part of this blog post. As reflex, I always like to navigate the code I’m invoking. The method we are invoking are just the public API methods, checking null, and declaring a correct callsite (correct return type, correct argument types). sodium_h.crypto_box_keypair ``````public static MethodHandle crypto_box_keypair$MH() {
return RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(), "unresolved symbol: crypto_box_keypair"); } public static int crypto_box_keypair ( Addressable pk, Addressable sk) { var mh$ = RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(), "unresolved symbol: crypto_box_keypair"); try { return (int)mh$.invokeExact(pk.address(), sk.address());
} catch (Throwable ex$) { throw new AssertionError("should not reach here", ex$);
}
}``````

Going further down to see how the `MethodHandle` is declared:

sodium_h_constants_0.crypto_box_keypair$MH ``````static final FunctionDescriptor crypto_box_keypair$FUNC_ = FunctionDescriptor.of(
C_INT,
C_POINTER,
C_POINTER
);

static final MethodHandle crypto_box_keypair$MH_ = RuntimeHelper.downcallHandle( LIBRARIES, "crypto_box_keypair", "(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I", (1) crypto_box_keypair$FUNC_, false
);
static final java.lang.invoke.MethodHandle crypto_box_keypair$MH() { return crypto_box_keypair$MH_; }``````
 1 Note that the Java method signature is declared with a String instead of the Java API `MethodType`.

This code invokes creates the down-call stub, the only difference with the handcrafted handle in the section above, is the signature of the method declared as a `String`.

`(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I` breakdown
• `Ljdk/incubator/foreign/MemoryAddress` ⇒ arg0

• `Ljdk/incubator/foreign/MemoryAddress` ⇒ arg1

• `I``int` return type

The other two methods in this example `crypto_box_seal` and `crypto_box_seal_open` are similar and don’t require to do the tedious handle declaration.

This type raised a few questions about how to map them in Java in the first section where I used manually `jdk.incubator.foreign`. Also there’s statement at this time about `jextract` not supporting some wide types.

• jextract does not support certain C types bigger than 64 bits (e.g. `long double`).

How does it handle these unsupported types, the answer is in the source code.

In here we learn that unsigned types are represented with their signed counterpart and the types wider than 64 bits are represented with a specific unsupported layout during headers processing. The symbols with unsupported layouts won’t be generated as the JEP-389 linker won’t be able to link them.

Some details on how `jextract`'s primitive types handling

The enum below in jextract show how native primitive types are mapped to their respective memory layout whether they are supported of not.

``````enum Kind {
/**
* {@code void} type.
*/
Void("void", null),
/**
* {@code Bool} type.
*/
/**
* {@code char} type.
*/
/**
* {@code char16} type.
*/
Char16("char16", UnsupportedLayouts.CHAR16),
/**
* {@code short} type.
*/
/**
* {@code int} type.
*/
/**
* {@code long} type.
*/
/**
* {@code long long} type.
*/
/**
* {@code int128} type.
*/
Int128("__int128", UnsupportedLayouts.__INT128),
/**
* {@code float} type.
*/
/**
* {@code double} type.
*/
/**
* {@code long double} type.
*/
LongDouble("long double", UnsupportedLayouts.LONG_DOUBLE),
/**
* {@code float128} type.
*/
Float128("float128", UnsupportedLayouts._FLOAT128),
/**
* {@code float16} type.
*/
HalfFloat("__fp16", UnsupportedLayouts.__FP16),
/**
* {@code wchar} type.
*/
WChar("wchar_t", UnsupportedLayouts.WCHAR_T);``````

Those types can be qualified, in particular integer types can be unsigned:

jdk.internal.jextract.impl.TypeMaker#makeTypeInternal
``````case UShort: {
Type chType = Type.primitive(Primitive.Kind.Short);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UInt: {
Type chType = Type.primitive(Primitive.Kind.Int);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULong: {
Type chType = Type.primitive(Primitive.Kind.Long);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULongLong: {
Type chType = Type.primitive(Primitive.Kind.LongLong);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UChar: {
Type chType = Type.primitive(Primitive.Kind.Char);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}``````

Going further we can see that signed and unsigned integers use the same memory layout, eg. `long long` and `unsigned long long` use the same layout `C_LONG_LONG`.

``````public static MemoryLayout getLayout(Type t) {
Supplier<UnsupportedOperationException> unsupported = () ->
new UnsupportedOperationException("unsupported: " + t.kind());
switch(t.kind()) {
case UChar, Char_U:
case SChar, Char_S:
return Primitive.Kind.Char.layout().orElseThrow(unsupported);
case Short:
case UShort:
return Primitive.Kind.Short.layout().orElseThrow(unsupported);
case Int:
case UInt:
return Primitive.Kind.Int.layout().orElseThrow(unsupported);
case ULong:
case Long:
return Primitive.Kind.Long.layout().orElseThrow(unsupported);
case ULongLong:
case LongLong:
return Primitive.Kind.LongLong.layout().orElseThrow(unsupported); (1)
case UInt128:
case Int128:
return Primitive.Kind.Int128.layout().orElseThrow(unsupported); (2)
case Enum:
return valueLayoutForSize(t.size() * 8).layout().orElseThrow(unsupported);
case Bool:
return Primitive.Kind.Bool.layout().orElseThrow(unsupported);
case Float:
return Primitive.Kind.Float.layout().orElseThrow(unsupported);
case Double:
return Primitive.Kind.Double.layout().orElseThrow(unsupported);
case LongDouble:
return Primitive.Kind.LongDouble.layout().orElseThrow(unsupported);
case Complex:
throw new UnsupportedOperationException("unsupported: " + t.kind());
case Record:
return getRecordLayout(t);
case Vector:
return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType()));
case ConstantArray:
return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType()));
case IncompleteArray:
return MemoryLayout.ofSequence(getLayout(t.getElementType()));
case Unexposed:
Type canonical = t.canonicalType();
if (canonical.equalType(t)) {
throw new TypeMaker.TypeException("Unknown type with same canonical type: " + t.spelling());
}
return getLayout(canonical);
case Typedef:
case Elaborated:
return getLayout(t.canonicalType());
case Pointer:
case BlockPointer:
return C_POINTER;
default:
throw new UnsupportedOperationException("unsupported: " + t.kind());
}
}``````
 1 `C_LONG_LONG` will be used for both `long long` and `unsigned long long`. 2 Native types longer than 64 bits are still represented internally by jextract.

jextract identify unsupported types, and represents them correctly during the C header processing. But the symbols that use them will be skipped during the Java generation.

``````private static final String ATTR_LAYOUT_KIND = "jextract.abi.unsupported.layout.kind";

public static final ValueLayout __INT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "__int128");

public static final ValueLayout LONG_DOUBLE = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "long double");

public static final ValueLayout _FLOAT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "_float128");

public static final ValueLayout __FP16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "__fp16");

public static final ValueLayout CHAR16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "char16");

public static final ValueLayout WCHAR_T = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "wchar_t");

static boolean isUnsupported(MemoryLayout vl) { (1)
return vl.attribute(ATTR_LAYOUT_KIND).isPresent();
}

static String getUnsupportedTypeName(MemoryLayout vl) {
return (String)
vl.attribute(ATTR_LAYOUT_KIND).orElseThrow(IllegalArgumentException::new);
}``````
 1 Invoked during java representation generation.

### Wrapping up on `jextract`

In the end `jextract` is useful but there’s a few little hiccups along the way. The generated code is currently lacking in some usability. Also, the generated code is a tad verbose, I would wish a way to eliminate some unneeded generated methods. Using `jextract` is a bit obscure as well, and they are a few pitfalls there too, and may require peeking at the `jdk.incubating.jextract` source code (in the panama repository).

While I mention these point, this should not diminish the work done on this tool and what this tool could become. When ready, this could be leveraged by Gradle, or Jetbrains IntelliJ IDEA, etc.

## Closing words

Cool part

In JDK16 the foreign module is really easy to use albeit `javac` and `java` command line requirement. The API is well-designed and easy to use. I also appreciated the idea of scoped segments, a bit like what was implemented in the Rust language. There’s also the coolness of being able to free memory segment at will, without depending on the GC.

This is the third incubator and there’s still planned API. Some of this blog post content will eventually become incorrect when JDK17 comes out. `jextract` looks like a very practical tool, yet it is still being cooked.