A practical look at JEP-412 in JDK17 with libsodium

This article is an updated version of this article on JEP-389 shipped with JDK 16. This updated version merely reflects the change that appeared in JEP-412, along with update to the gradle section.

JDK 17 will be released on September 14th 2021 with yet another incubator JEP-412 of the Foreign Linker API.

The Foreign Linker API is a very convenient and attractive way to connect to the native world. Let’s have a practical look at this API that should supersede JNI. In order to do so, I wanted Java code to interact with the infamous libsodium.

First I will focus on using the foreign linker API, then I will show how to use jextract in its current state (it is still being actively developed).

Note that JEP-412 is still incubating, therefore examples below are to be obsolete for the next JDK as API and behavior are further refined.
The following examples were based on JDK 17 release candidate build 35 (2021/8/6).

Here’s a quick recapitulation of what’s changed:

Table 1. Changes
JEP-389 JEP-412

-Dforeign.restricted=permit

--enable-native-access=ALL-UNNAMED The new option allows fine-grained restriction over which module is permitted to invoke native code. ALL-UNNAMED is a special value to refer to classpath based code.

LibraryLookup.ofDefault()

CLinker.systemLookup()

LibraryLookup.ofLibrary(String) / LibraryLookup.ofPath(Path)

There’s no straight replacement, instead the code has to be a sequence of

System.load(String); // or System.loadLibrary(String);
SymbolLookup.loaderLookup();

NativeScope.unboundedScope()

ResourceScope.newConfinedScope()

MemorySegment.allocateNative(long)

MemorySegment.allocateNative(long, ResourceScope)

NativeScope.allocate* methods

SegmentAllocator.allocate*, the allocator has factory methods taking a ResourceScope, e.g. SegmentAllocator.ofScope(scope)

CLinker.toCString(String, Charset, ResourceScope), CLinker.toJavaString(MemorySegment, Charset)

There’s no equivalent, encoding/decoding a string to/from a given charset must be done manually.

And jextract in particular saw massive improvements.

Thanks to Jean-Phillipe Bempel for the review and in particular spotting errors.

The crypto sealed box example

Let’s try to reproduce the following example from the libsodium sealbox documentation, on this page there is a simple code snippet, that could be interesting to reproduce in Java.

Crypto sealed box example
#define MESSAGE (const unsigned char *) "Message"
#define MESSAGE_LEN 7
#define CIPHERTEXT_LEN (crypto_box_SEALBYTES + MESSAGE_LEN)

/* Recipient creates a long-term key pair */
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);

/* Anonymous sender encrypts a message using an ephemeral key pair
 * and the recipient's public key */
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);

/* Recipient decrypts the ciphertext */
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
                         recipient_pk, recipient_sk) != 0) {
    /* message corrupted or not intended for this recipient */
}

Testing the idea in jshell

One of the cool thing with jshell is that you can try small ideas with a rapid feedback loop. With the right configuration, it is also possible to play the foreign linker.

Allow jshell to use the foreign module
$ jshell --add-modules jdk.incubator.foreign -R--enable-native-access=ALL-UNNAMED

Then within jshell, let’s try out a simple smoke test.

Smoke testing the foreign module
jshell> import java.lang.invoke.*;
jshell> import jdk.incubator.foreign.*;
jshell> var getpid = CLinker.getInstance()
   ...>                     .downcallHandle(
   ...>                             CLinker.systemLookup().lookup("getpid").get(),
   ...>                             MethodType.methodType(long.class),
   ...>                             FunctionDescriptor.of(CLinker.C_LONG)
   ...>                     );
getpid ==> MethodHandle()long

jshell> (long) getpid.invokeExact();
$4 ==> 53699

jshell> ProcessHandle.current().pid()
$5 ==> 53699

It works ! It really is easy to try native things for almost free without leaving Java this is really neat.

In this article I would like to focus on the small example with libsodium within a project. I’ll explain how to use the API along the way.

Configuring Gradle

The incubating modules are not on the default module path. Hence, it is required to add the jdk.incubator.foreign module when invoking the compilation command.

$ javac --add-modules jdk.incubator.foreign ...

This module also needs to be declared when running this code, as well as another option --enable-native-access to permit modules to perform native operations.

$ java --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.foreign ...

If you like to play with jshell, it will be necessary to use these two as well

$ jshell -R--enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.foreign ...

Then comes the question to configure the build tool. I am using Gradle, the configuration is likely similar for other build tool.

The following lines assume Gradle 7.2.
build.gradle.kts
// ...

java {
    toolchain {
        languageVersion.set(JavaLanguageVersion.of(17))
    }
}

tasks {
    withType<JavaCompile>().configureEach {
        options.compilerArgs = listOf(
                "--add-modules", "jdk.incubator.foreign" (1)
        )
        options.release.set(17)
    }

    withType<JavaExec>().configureEach {
        jvmArgs("--enable-native-access=ALL-UNNAMED", (2)
                "--add-modules", "jdk.incubator.foreign")
        javaLauncher.set(project.javaToolchains.launcherFor(java.toolchain)) (3)
    }

    withType<Test>().configureEach {
        useJUnitPlatform()
        jvmArgs("--enable-native-access=ALL-UNNAMED", (4)
                "--add-modules", "jdk.incubator.foreign")
    }
}
1 Let the compiler knows about the jdk.incubator.foreign module
2 Configure the tasks that execute a main class, while this is not immediately useful IntelliJ IDEA will pick up this configuration, when you click running a main method.
3 Currently, the project toolchain is not the default value for some properties like the JavaExec task launcher, see gradle/gradle/issues#16791.
4 Configure test tasks to be able to run jdk.incubator.foreign tests.

The first and minimal call crypto_box_sealbytes

Lookup

The very first thing to set up is the native symbol lookup mechanism. In JDK 17 the nifty LibraryLookup is gone, in my opinion this API was better as it allowed to pass a path, which is particularly useful when embedding native libraries in JARs.

Basically in the JDK 17 there’s two options:

  • CLinker.systemLookup() this mechanism will find symbols in the system libraries, libraries of the JVM itself ; the path is defined in this property sun.boot.library.path

    $ jshell -s - <<< "System.out.println(System.getProperty(\"sun.boot.library.path\"))"
    /Users/brice/.asdf/installs/java/openjdk-17/lib

    And it doesn’t seem related to classloader.

  • SymbolLookup.loaderLookup() on the other hand appear to be based library loaded via System.load / System.loadLibrary, which are tied to the classloader. This mechanism will look up libraries defined in the java.library.path property

    jshell -s - <<< "System.out.println(System.getProperty(\"java.library.path\"))"
    /Users/brice/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

So which method to choose ?

Assuming libsodium has been installed with homebrew (brew install libsodium) this should install a symbolic link in $(brew --prefix)/lib/libsodium.dylib (or /usr/local/lib/libsodium.dylib).

Basically there’s two choice to consume this library, and it is very similar to what was needed with JNI.

  • either the runtime execution requires alteration via the environment variable JAVA_LIBRARY_PATH, and the library can be loaded by its name System.loadLibrary("sodium") .

    env JAVA_LIBRARY_PATH=:/usr/local/lib java --enable-native-access=ALL-UNNAMED ...
  • or the code explicitly load the library from a path System.load("/usr/local/lib/libsodium.dylib") without requiring to change environment variable.

In the code however the question remain: Which lookup mechanism ?

  • Well if it’s a library loaded via System::load or System::loadLibrary then use SymbolLookup.loaderLookup().

  • If it is system library with system symbols like printf or getpid, the code need to use CLinker.systemLookup.

Let’s define the lookup this way for this article

static {
    System.load("/usr/local/lib/libsodium.dylib");
    libsodiumLookup = SymbolLookup.loaderLookup();
}

From C to Java

Going back to the snippet to translate, the first lines makes use of a few macros (the lines starting with #define), we can assume that MESSAGE will be a method parameter, MESSAGE_LEN will be derived from the message parameter, and CIPHERTEXT_LEN is also derived from the message but needs another constant crypto_box_SEALBYTES.

The first thing needed is to acquire the crypto_box_SEALBYTES constant, looking at crypto_box.h there’s a method size_t crypto_box_sealbytes(void); that returns this constant.

It’s simple, and it will be the first method I will present here.

The first challenge is to map the return type size_t, unsigned integer type, since the constant 1 2 3 is inferior to the integer max value and that I’d like to use this as an array size, I will map it to an int.

crypto_box_sealbytes (.java)
MethodHandle crypto_box_sealbytes =
        CLinker.getInstance()
               .downcallHandle(
                       libsodiumLookup.lookup("crypto_box_sealbytes").get(),
                       MethodType.methodType(int.class),
                       FunctionDescriptor.of(CLinker.C_INT)
               );

var crypto_box_SEALBYTES = (int) crypto_box_sealbytes.invokeExact();

The java type and the C descriptor must match, otherwise the call will fail at runtime with a IllegalArgumentException.

Example 1. Carrier mismatch long != b32

If the java method type used long.class, and the C descriptor was C_INT, the code would have failed with a carrier mismatch.

java.lang.IllegalArgumentException: Carrier size mismatch: long != b32[abi/kind=INT]
Example 2. Carrier mismatch int != b64

If the java method type used int.class, and the C descriptor was C_LONG, the code would have failed with a carrier mismatch.

java.lang.IllegalArgumentException: Carrier size mismatch: int != b64[abi/kind=LONG]

For reference, CLinker.C_INT is actually a MemoryLayout, a layout is used to model native memory, it is particularly useful when modeling the native datatype like structs, unions, etc.

Then a more interesting case, passing argument pointers

The next part of the example is a little more involved code, the crypto_box_keypair method takes two array pointers recipient_pk and recipient_sk, the generated keypair will be written to the given byte array.

crypto_box_keypair (.c)
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);

In order to initialize the size of these arrays, the codes needs two constants crypto_box_PUBLICKEYBYTES and crypto_box_SECRETKEYBYTES. To access these two it’ll be the same as crypto_box_SEALBYTES.

The C mapping is easy to get : a void method that takes 2 pointers FunctionDescriptor.ofVoid(C_POINTER, C_POINTER). In Java the method type require a type called MemoryAddress which represents the pointer address.

The pointers need to point to some memory. That’s what the MemorySegment type is for. Before invoking the method the necessary memory will be allocated via MemorySegment::allocateNative, and the respective memory segment address will be passed.

crypto_box_keypair (.java)
MethodHandle crypto_box_keypair =
        CLinker.getInstance().downcallHandle(
                libsodiumLookup.lookup("crypto_box_keypair").get(),
                MethodType.methodType(
                        void.class,
                        MemoryAddress.class, // pk
                        MemoryAddress.class  // sk
                ),
                FunctionDescriptor.ofVoid(C_POINTER, C_POINTER)
        );

var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes(), scope); (1)
var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes(), scope); (1)
crypto_box_keypair.invokeExact(recipientPublicKey.address(),
                               recipientSecretKey.address());

var kp = new CryptoBoxKeyPair(
        recipientPublicKey.toByteArray(),
        recipientSecretKey.toByteArray()
);
1 The MemorySegment::allocateNative method takes the segment size and a ResourceScope.

JEP-389 already had the concept of bounded usage for memory segments with the NativeScope class, but it was still possible to write code that never deallocates native memory. The API in the JEP-412 improves over JEP-389 and now imposes the user to handle the native segment lifecycle via the same concepts embodied by the ResourceScope type.

The above is completed by wrapping it in a try-with-resources block with a ResourceScope, the scope will be take care the allocated memory segment upon the block exit.

crypto_box_keypair with ResourceScope (.java)
MethodHandle crypto_box_keypair = ...

try (var scope = ResourceScope.newConfinedScope()) {
    var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes(), scope);
    var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes(), scope);

    crypto_box_keypair.invokeExact(recipientPublicKey.address(),
                                   recipientSecretKey.address());

    return new CryptoBoxKeyPair(
            recipientPublicKey.toByteArray(),
            recipientSecretKey.toByteArray()
    );
}

In order to get back the off-heap content into Java types, the code can call any of the to\{The Java Type} methods on the MemorySegment instance, they will take care of the conversion.

There’s more to say about allocation API in JEP 412, please refer to section : Remarks about MemorySegments memory mapping.

Next invoking the sealing method

The next method to call is crypto_box_seal, which also takes pointers and a message length.

crypto_box_seal (.c)
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);

However when looking at the C signature we notice something unusual for Java developers: the message length argument is of type long long!

In C or C++, this declaration means the type is at least 8 bytes (64 bits), this means a Java long type is what is needed.

In particular here’s a breakdown of the signed integers. It is incomplete as they can be declared differently (e.g. long is the same as long int, or long long is the same as long long int), this wikipedia page has a more complete overview of C data types.

Table 1. Signed integers

int

A signed integer type with the natural size suggested by the architecture of the execution environment,
with a minimum of 2 byte (16 bits, \$[-32767; +32767]\$).

On a 64bits CPU, int is 4bytes and the range becomes \$[-2147483647; +2147483647]\$;

long

A signed integer type that is at least so 4 bytes (\$[-2147483647; +2147483647]\$).

On a 64bits CPU, long is 8bytes and the range becomes \$[−9223372036854775807; +9223372036854775807]\$;

long long

A signed integer type that is at least so 8 bytes (\$[−9223372036854775807; +9223372036854775807]\$).

On a 64bits CPU, long long is still 8 bytes long.

When you start to study these C data types a bit more, you’ll notice two things that just don’t match with Java types:

  • unsigned integers, while they do have the same width as their signed counterpart, their math is different as their range is different:

    • unsigned long's range is \$[0; +4294967295]\$ (on a 64-bit CPU)

    • unsigned long long's range is \$[0; +18446744073709551615]\$ (on a 64 bit CPU)

  • long doubles are larger than 64 bytes, I never had to use those, but it seems they can be as big as 128 bits (16 bytes).

As a reminder size_t is unsigned.

crypto_box_seal definition (.c)
SODIUM_EXPORT
int crypto_box_seal(unsigned char *c, const unsigned char *m,
                    unsigned long long mlen, const unsigned char *pk)
            __attribute__ ((nonnull(1, 4)));

For this post, and I intend to pass a short String message, which is baked by a char array, and array length in Java are limited to the positive values of an int (\$[0; +2147483647]\$;).

crypto_box_seal (.java)
var crypto_box_seal = CLinker.getInstance().downcallHandle(
        libsodiumLookup.lookup("crypto_box_seal").get(),
        MethodType.methodType(int.class,
                              MemoryAddress.class, // cipherText, output buffer
                              MemoryAddress.class, // message
                              long.class,          // message length
                              MemoryAddress.class  // publicKey
        ),
        FunctionDescriptor.of(C_INT,
                              C_POINTER,
                              C_POINTER,
                              C_LONG_LONG,
                              C_POINTER)
);

try (var scope = ResourceScope.newConfinedScope()) {
    var segmentAllocator = SegmentAllocator.ofScope(scope);
    var nativeMessage = CLinker.toCString(message, scope);
    var cipherText = segmentAllocator.allocate(crypto_box_sealbytes() + nativeMessage.byteSize());
    var ret = (int) crypto_box_seal.invokeExact(
            cipherText.address(),
            nativeMessage.address(),
            (long) nativeMessage.byteSize(),
            segmentAllocator.allocateArray(C_CHAR, publicKey).address());
    );
    return cipherText.toByteArray();
}

There’s a few thing to notice :

  1. The toCString method don’t take anymore a charset compared to JEP-389 (JDK-16), and encode the String to UTF-8. This change implies to pay attention to native APIs that may not understand wide characters like 中文 that require more than 1 byte to encode the character. Consequently, native API that may need the length have to pay attention to this detail too — UTF-8 encode characters in one or more byte if necessary — in other words don’t rely on String::length to count bytes.

    In the above snippet, the String is first encoded then the length is taken from the memory segment nativeMessage.byteSize().

    Alternatively the encoding could have been done using a charset via String::getBytes. And the actual size taken from the resulting byte array.

  2. The var ret is not used, however due to the dynamic nature of invokeExact, the compiler needs the exact signature on the call-site, that’s why the result of this invocation is assigned to an int variable even if it is not used.

    Without this assignment the JVM would have raised a WrongMethodTypeException, in this case the exception message helps to identify the type differences in the signature:

    java.lang.invoke.WrongMethodTypeException: expected (MemoryAddress,MemoryAddress,long,MemoryAddress)int but found (MemoryAddress,MemoryAddress,long,MemoryAddress)void

Ending the crypto box example

The last method call of this snippet ends the libsodium crypto box example. The method crypto_box_seal_open take pointers and a ciphered text length, so let’s apply again what has been done for crypto_box_seal.

crypto_box_seal_open (.c)
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
    recipient_pk, recipient_sk) != 0) {
    /* message corrupted or not intended for this recipient */
}

Which translates to

crypto_box_seal_open (.java)
var crypto_box_seal_open = getInstance().downcallHandle(
        libsodiumLookup.lookup("crypto_box_seal_open").get(),
        MethodType.methodType(int.class,
                              MemoryAddress.class, // message
                              MemoryAddress.class, // cipherText
                              long.class,          // cipherText.length
                              MemoryAddress.class, // public key
                              MemoryAddress.class  // secret key
        ),
        FunctionDescriptor.of(C_INT,
                              C_POINTER,
                              C_POINTER,
                              C_LONG_LONG,
                              C_POINTER,
                              C_POINTER
        )
);

try (var scope = ResourceScope.newConfinedScope()) {
    var allocator = SegmentAllocator.ofScope(scope); (1)
    var decipheredText = allocator.allocateArray(C_CHAR,
                                                 cipherText.length - crypto_box_sealbytes());
    var ret = (int) crypto_box_seal_open.invokeExact(decipheredText.address(),
                                                     scope.allocateArray(C_CHAR, cipherText).address(),
                                                     (long) cipherText.length,
                                                     scope.allocateArray(C_CHAR, publicKey).address(),
                                                     scope.allocateArray(C_CHAR, secretkey).address());

    return CLinker.toJavaString(decipheredText); (2)
}
1 MemorySegment offers API to allocate segments, to allocate arrays SegmentAllocator offers a better API
2 In JDK 16, using toJavaString raised a IndexOutOfBoundsException with the message Out of bound access on segment MemorySegment\{ id=0x6f11d841 limit: 20 }; new offset = 20; new length = 1.

Indeed, during my first use of the foreign linker API in the JDK 16 I used String::length to indicate the number of bytes to seal, a Java String length that didn’t include the null character \0 that terminates a C string. Which caused this bound issue during the reverse operation toJavaString.

The seal example with JDK 17 API uses the memory segment’s length, which thereby prevents ths issue from happening.

This reminds us that one has to be careful with String and encodings.

A side note, in this snippet too I have intentionally left out the returned status of crypto_box_seal_open, to focus on the foreign module API, but this would make sense to perform checks on the returned value before returning the buffer just as it is suggested on the libsodium documentation.

More interestingly this example introduces the SegmentAllocator of the JEP-412 which offers a richer set of API that can use layouts, in particular it can be used for array allocation.

SegmentAllocator provides different allocation strategies.

Table 1. Different segment allocators

SegmentAllocator.ofScope(ResourceScope)

It is a regular allocator for native memory.
It uses a standard malloc call. The new allocated
segments will all be cleaned when the
scope closes.

SegmentAllocator.ofSegment(MemorySegment)

This allows to reuse, or recycle, the same memory segment.

Allocated segments are all sub parts of this parent memory
segment. This is useful to limit allocations as malloc
operations as they are known to be expensive.

SegmentAllocator.arenaAllocator(scope)

This allocator is doing region based memory management.

The short version of the arena memory management is :
the allocator allocates a chunk of memory and either
use a slice of that segment, or allocate a new
chunk of memory to satisfy the allocation request.
Since segment are scoped in inside a ResourceScope,
they are freed, and their slice can be used again. This allocator is useful to limit costly malloc
operations, yet allows more flexibility than the
alternative segment recycling.

The factory has an overload that takes a size, in this
case allocations are possible until no further allocation
is possible, ie it won’t add a new underlying chunk
of memory.

All allocators are thread safe, but a confined scope will restrict the allocation to the owner thread.

Wrap up on manually using the Foreign Linker API

I didn’t cover everything this API has to offer, like the up call stubs, which is a way to pass a function pointer to the native code, nor did I cover the every feature of JEP-412, like MemorySegment or MemoryLayout API.

At this time I find this API a pleasure to use compared to JNI. Note that I don’t have experience with JNA, so I may be lacking perspective there.

There’s a few pitfalls to be aware of using API that use pointers or reference, String encoding is of particular interest, and MemorySegment lifecycles get more complicated if those segments are shared between threads. Overall I found the API well-designed and well documented, but if you’re novice in this area, you’ll likely need other reading materials. A package wide documentation, in jdk.incubator.foreign, should definitely fill this gap in my opinion.

The chosen example was concise in native code, but writing the stubs in Java is quickly tedious and verbose. JDK developers felt the same way as they are also investing energy on a tool named jextract whose goal is to reduce the tedious work amount. I’ll show in a section below what can be done with the current state of jextract.

Remarks about MemorySegments memory mapping

MemorySegment do have the same constraints as DirectByteBuffers, ie the segment can’t go over Runtime.getRuntime().maxMemory()

Allocating a very bigger segment than maxMemory
Exception in thread "main" java.lang.OutOfMemoryError: Cannot reserve 2147483648 bytes of direct buffer memory (allocated: 8192, limit: 522190848)

This limit is configurable by setting the -XX:MaxDirectMemorySize={size} flag.

var memorySegment = MemorySegment.allocateNative(nativeSegmentSize);

There’s one interesting thing with this API it is possible to access the address from the API, via MemorySegment::address, and one can bet the hexadecimal representation, via Long.toHexString(memorySegment.address().toRawLongValue()).

MemoryAddress::toString
MemoryAddress{ base: null offset=0x7fc513fff010 }

If you are on Linux then you use pmap from the procps package to inspect memory mappings of the JVM.

/pmap output of a 2 GiB native segment
151:   java --enable-native-access=ALL-UNNAMED --add-modules jdk.incubator.foreign -XX:MaxDirectMemorySize=2100m MemorySegments.java
Address           Kbytes     RSS   Dirty Mode  Mapping
...
0000557635ba1000       4       0       0 r-x-- java
0000557635ba3000       4       0       0 r---- java
0000557635ba4000       4       0       0 rw--- java
0000557636d4b000     132      16      16 rw---   [ anon ]
00007fc513fff000 2097156 1811456 1811456 rw---   [ anon ] (1)
00007fc594000000     132       0       0 rw---   [ anon ]
00007fc594021000   65404       0       0 -----   [ anon ]
...
1 This is the allocated segment, 2 GiB ⇐⇒ 2097152 KiB, this segment is a bit larger by one page (4 KiB). And in fact the base address of the segment is 0x7fc513fff010.

In this case it is not related to alignment, but it may be possible. What is important is that the address of a MemorySegment may be contained in a larger memory mapping.

One important and useful distinction with DirectByteBuffers is the presence of a MemorySegment::close method, that will immediately free the native mapping when called. DirectByteBuffer used to be challenging because they had no explicit method to free the native mapping, and as such had to wait for the GC to kick in order to be freed.

Initialization

Another thing to remind is that the memory mapping is zeroed, that means a big segment will take a noticeable time to get initialized. As with DirectByteBuffers this pattern is interesting when inspecting off-heap memory.

Scope

Usually it is more practical to use the NativeScope API as it is easier to reason about boundaries of the involved memory mapping. Using a larger MemorySegment could be interesting when it has to be sliced and shared among various threads. Also given the high initialization cost for large segments it’s likely to have the same lifecycle as the application. Typically, in a few years, Netty, Aeron, Kafka, Cassandra, …​ could make use of this API !

Slices

One thing that caught me off-guard with JEP-389, is that when closing a slice (created by MemorySegment::asSlice) also closes the underlying segment. This is no longer the case with JEP-412 since MemorySegment is not anymore AutoCloseable. Problem solved.

Access modes

The READ, WRITE, CLOSE access modes and related API disappeared from MemorySegment, now the only choice is to return a read-only view of the segment via MemorySegment::asReadOnly. Which is more limited, but way more intuitive to use.

File API

Until JEP-389, we used a FileChannel and a MappedByteBuffer to memory map a file. The JEP-389 also take care of this use case, by using the mapFile factory method. JEP 412 amend this API with a ResourceScope parameter.

try (var scope = ResourceScope.newConfinedScope()) {
    MemorySegment.mapFile(path, (1)
                          0, (2)
                          Files.size(path), (3)
                          FileChannel.MapMode.READ_ONLY, (4)
                          scope);
  // ...
}
1 A path e.g. Path.of("…​")
2 The base offset
3 The size of the mapping, here the complete file
4 The mapping mode

The MemorySegment is not any more auto closeable, instead it will be immediately freed when the code leaves the try-with-resources block.

Also, with JEP 412, a MemorySegment gains some API (MemorySegment::load, MemorySegment::unload, MemorySegment::force) that allows to force IO operations. The force method looks particularly useful when forcing a write operation to disk (fsync) to page-out to a colder storage such as a disk.

JEP-389, now JEP-412 foreign functions and memory is still incubating

In JDK 17 MemorySegment dropped AutoCloseable, NativeScope is replaced by ResourceScope, the loss of the LibraryLookup with an API with a different scope replaced by SymbolLookup API, appearance of the SegmentAllocator. jextract saw very good improvement, it seems mature enough to be featured in a standard JDK, yet it is not part of incubator module. In fact, jextract might never be part of the JDK itself as it might be distributed by other mechanisms, see this discussion.

Given all this, I am not sure JEP-412 will get out of incubating for JDK 18 as well. JEP-412 is working well and show great refinements, but to me the developers are still tackling the API to get it right, indeed a broken API could lead to broken applications. As with the previous incubator, I think they are doing a fantastic job in my opinion.

jextract

jextract is still being backed and was not included in JDK 17 for incubation, but since it complements JEP-412, I wanted to give it a try and showcase its usefulness.

The jextract version used in this entry comes the build 17-panama+3-167 that can be downloaded here.

This tool leverages the native libclang and the jdk.incubator.foreign module.

In order to be able to use it, one should download the panama jdk here: https://jdk.java.net/panama/. Don’t be scared by early access, JDK 17 (very early at this stage) or the other warnings, you just need to use jextract not the panama jdk.

Again the jextract tool is still being backed at this time. That means it that everything below can be obsolete any time.

Extracting Java liking code from the Libsodium headers

The first thing I need is to get the headers of libsodium, either use the headers installed by homebrew with symbolic links placed in /usr/local/include (or $(brew --prefix)/include), or clone the repo (Make sure to check out the correct tag for the installed binary library, 1.0.18 at this time).

First contact with jextract

jextract first use
$ jextract
  -d src/main/java \ (1)
  -l sodium \ (2)
  --target-package com.github.bric3.sodium \ (3)
  -I $(brew --prefix)/include/sodium \ (4)
  $(brew --prefix)/include/sodium.h (5)
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.jextract
/usr/local/include/sodium/crypto_hash_sha512.h:13:10: fatal error: 'stdlib.h' file not found
1 Destination of the generated sources
2 Specifies the name of library, this option is important as it will drive the way the library is loaded, with -l sodium the library has to be available on the java.library.path.
3 Indicates the target package of the generated source
4 Includes of the library (some files include others in the library)
5 The C header file

Obviously some standard C headers are not discovered by jextract.

macOs

On macOs the solution is to use the header that are installed by XCode, at this location

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
Linux

The above command used to fail for an equivalent reason, I had to find the local compiler includes like this on Fedora /usr/lib/gcc/x86_64-redhat-linux/8/include. Now with the build 17-panama+3-167 jextract worked fine.

This issue is tracked by the ticket JDK-8262127.

Also, I noticed that jextract generates classes first, but you can pass a --source option to configure it to generate sources instead.

Possible problems when working with libsodium repository clone

jextract might fail the extraction process on the file version.h.

Reminder, in the libsodium repository, headers are located in this folder src/libsodium/include.

Includes the compiler headers
$ jextract \
  -d src/main/java \
   -l sodium \
   --source \ (1)
   --target-package com.github.bric3.sodium \
   -I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (2)
   -I src/libsodium/include/ \
   -I src/libsodium/include/sodium \
   src/libsodium/include/sodium.h
src/libsodium/include/sodium.h:5:10: fatal error: 'sodium/version.h' file not found
1 Generates the sources
2 the compiler includes installed on this linux image

In the libsodium repository there’s a file named version.h.in, and upon inspection of its content I noticed placeholders that suggest a preliminary phase in the libsodium build will generate the final version.h. In native sources this usually happen via a combination of ./autogen.sh and ./configure.

Let’s prepare the code base.

Configure libsodium codebase
$ ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: configure.ac: creating directory build-aux
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: configure.ac: not using Autoheader
autoreconf: running: automake --add-missing --copy --force-missing
configure.ac:75: installing 'build-aux/compile'
configure.ac:9: installing 'build-aux/config.guess'
configure.ac:9: installing 'build-aux/config.sub'
configure.ac:10: installing 'build-aux/install-sh'
configure.ac:10: installing 'build-aux/missing'
src/libsodium/Makefile.am: installing 'build-aux/depcomp'
parallel-tests: installing 'build-aux/test-driver'
autoreconf: Leaving directory `.'
Downloading config.guess and config.sub...
Done.

./configure
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether UID '0' is supported by ustar format... yes
checking whether GID '0' is supported by ustar format... yes
checking how to create a ustar tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating builds/Makefile
config.status: creating contrib/Makefile
config.status: creating dist-build/Makefile
config.status: creating libsodium.pc
config.status: creating libsodium-uninstalled.pc
config.status: creating msvc-scripts/Makefile
config.status: creating src/Makefile
config.status: creating src/libsodium/Makefile
config.status: creating src/libsodium/include/Makefile
config.status: creating src/libsodium/include/sodium/version.h (1)
config.status: creating test/default/Makefile
config.status: creating test/Makefile
config.status: executing depfiles commands
config.status: executing libtool commands
1 Configuring version.h with version values

Finally, this time jextract worked as expected.

Narrowing down the extraction

Looking at the generated classes, there’s a bag of 288 files, not even mentioning the symbols in these types.

When I looked at jextract during my review of JEP 389, jextract had an option --filter that was supposed to only emit symbols of a specific file. At this time of writing, this option is gone and replaced by a different mechanism.

The previous mechanism filtered headers by their path, the new mechanism however allows filtering by type, see these option in the help message.

include-(function|macro|struct|typedef|union|var) options
--include-function <String>    name of function to include
--include-macro <String>       name of constant macro to include
--include-struct <String>      name of struct definition to include
--include-typedef <String>     name of type definition to include
--include-union <String>       name of union definition to include
--include-var <String>         name of global variable to include

At first this looks like a huge effort to list every symbol (function, data types, variables, etc.), but there’s a nifty trick. jextract comes with --dump-includes. This option alter jextract behavior in that it won’t generate source or class bindings, but instead it will dump symbols in the given file.

dumping symbols configuration
jextract \
  -d src/main/java \
  -l sodium \
  --source \
  --target-package com.github.bric3.sodium \
  -I /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include \
  -I $(brew --prefix)/include/sodium \
  --dump-includes sodium.conf \ (1)
  $(brew --prefix)/include/sodium.h
WARNING: Using incubator modules: jdk.incubator.jextract, jdk.incubator.foreign
WARNING: skipping strtold because of unsupported type usage: long double
WARNING: Layout size not available for sys_errlist
1 the dump option
sodium.conf
#### Extracted from: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/AvailabilityVersions.h

--include-macro MAC_OS_VERSION_11_0         # header: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/AvailabilityVersions.h
--include-macro MAC_OS_X_VERSION_10_0       # header: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/AvailabilityVersions.h
--include-macro MAC_OS_X_VERSION_10_1       # header: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/AvailabilityVersions.h
--include-macro MAC_OS_X_VERSION_10_10

...

#### Extracted from: /usr/local/include/sodium/core.h

--include-function sodium_init               # header: /usr/local/include/sodium/core.h
--include-function sodium_misuse             # header: /usr/local/include/sodium/core.h
--include-function sodium_set_misuse_handler # header: /usr/local/include/sodium/core.h

...

When looking at the generated file (sodium.conf), we notice that jextract actually wrote the --include-(function|macro|struct|typedef|union|var) options with the found symbol, more jextract indicates were this file was found.

The ultimate part of this trick is that this file can be used on the command line

jextract \
  -d src/main/java \
  -l sodium \
  --source \
  --target-package com.github.bric3.sodium \
  -I /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include \
  -I $(brew --prefix)/include/sodium \
  @sodium.conf \ (1)
  $(brew --prefix)/include/sodium.h
1 Pass the option file into jextract, notice the preceding @.

By editing the sodium.conf file and removing everything non-related to libsodium, it was possible to cut down the generated bindings by more than a half. Depending on the required API usage it is of course possible to remove even more by selecting more aggressively the symbols.

One could even go further and move the other options (-d, -l, --source, --target-package, etc.), in this option file. Making the command even simpler

$ jextract @sodium-only.conf $(brew --prefix)/include/sodium.h

Even the last argument $(brew --prefix)/include/sodium.h can be appended in the configuration file to use simplify even more the command to the simplest form jextract @sodium-only.conf.

Remember that shell variable expansion $(brew --prefix) won’t work and must be expanded manually.

This work was part of the following ticket JDK-8260976.

Generated files
$ \ls -lh src/main/java/com/github/bric3/sodium
total 1944
-rw-r--r--  1 brice  staff   8.9K Sep  4 14:50 RuntimeHelper.java
-rw-r--r--  1 brice  staff   1.9K Sep  4 14:50 constants$0.java
-rw-r--r--  1 brice  staff   2.2K Sep  4 14:50 constants$1.java
...
-rw-r--r--  1 brice  staff    14K Sep  4 14:50 randombytes_implementation.java
-rw-r--r--  1 brice  staff   398K Sep  4 14:50 sodium_h.java
-rw-r--r--  1 brice  staff   1.1K Sep  4 14:50 sodium_set_misuse_handler$handler.java

Invoking the library

Let’s have a look at what jextract generated. The entry point is the class sodium_h. In particular let’s compare the method stubs to these I wrote earlier :

  • crypto_box_sealbytes

  • crypto_box_keypair

  • crypto_box_seal

  • crypto_box_seal_open

The libsodium headers declare a method named crypto_box_sealbytes, whose role is to return a constant crypto_box_SEALBYTES, however this constant is defined as a C preprocessor directive #DEFINE, which is not visible as a symbol when performing a library lookup. The native crypto_box_sealbytes method compensates this limitation.

jextract is however reading the headers, in doing so it actually extracts the constant crypto_box_SEALBYTES. It is still also exposed as method.

I noticed that if the library has lots of symbols bindings jextract use inheritance: There’s a single entry point like the public type sodium_h, and this type inherits package visible classes like sodium_h_0, sodium_h_1 and so on. The members in these package visible classes are public, and by inheritance these members are accessible via the public entry point.

sodium_h.crypto_box_SEALBYTES()

Library loading

Remember the passed jextract option -l sodium, this option makes the generated code to load the library via the well-known System.loadLibrary("sodium") upon class loading the of the generated type (sodium_h).

This operation expects the library to be available on the java library path, the one set via this property System.getProperty("java.library.path"), or amended via JAVA_LIBRARY_PATH.

If the library was installed in one of the lookup path there’s no issue, but if it isn’t you need to alter the java library path.

linux

/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib

macOs

/Users/bric3/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.

Otherwise, the code will fail with the following stacktrace

no sodium in java.library.path: /Users/brice/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
java.lang.UnsatisfiedLinkError: no sodium in java.library.path: /Users/brice/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
	at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2429)
	at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:818)
	at java.base/java.lang.System.loadLibrary(System.java:1989)
	at com.github.bric3.libsodium.sodium_h.<clinit>(sodium_h.java:13)
	at com.github.bric3.sodium.Libsodium$JextractedLibsodium.crypto_box_keypair(Libsodium.java:283)
	at com.github.bric3.sodium.LibsodiumTest.can_invoke_crypto_box_keypair(LibsodiumTest.java:45)

This is a nice improvement over my previous try of jextract generated code, before the stacktrace was a bit less obvious and the code harder to change, because the loading mechanism was nested deep in the generated code.

But if one need to load the library from a custom path, e.g. jar that pack native libraries (and extract them in some temporary folder), it’s possible to drop the -l sodium option, in this case the generated code just won’t emit the System::loadLibrary in the static initialization of sodium_h. Instead, it becomes necessary to manually load the library to your need.

System.load("tmp/path/to/libsodium.so"); (1)
sodium_h.crypto_kdf_blake2b_keybytes(); (2)
1 Load the library
2 Simply use the library bindings

This is a direct improvement (see JDK-8262126) over my previous use of jextract, loading a library from a specific location was difficult to do.

Now implementing the other functions

Now let’s profit from the generated function call, in the same order I’d like to use crypto_box_keypair, this is straightforward. The arguments are still carrier type like MemorySegment, which means we still need to take care of the scope / lifecycle of these allocations.

crypto_box_keypair
try (var scope = ResourceScope.newConfinedScope()) {
    var segmentAllocator = SegmentAllocator.ofScope(scope);
    var recipientPublicKey = segmentAllocator.allocate(sodium_h.crypto_box_PUBLICKEYBYTES());
    var recipientSecretKey = segmentAllocator.allocate(sodium_h.crypto_box_SECRETKEYBYTES());
    sodium_h.crypto_box_keypair(recipientPublicKey, recipientSecretKey); (1)
    return new CryptoBoxKeyPair(
            recipientPublicKey.toByteArray(),
            recipientSecretKey.toByteArray()
    );
}
1 Use the jextracted method

The IDE might suggest a method named crypto_box_keypair$MH ; the suffix $MH simply indicates this returns the Method Handle for this native method which is basically what I showed in the first part of this blog post.

As reflex, I always like to navigate the code I’m invoking. The method we are invoking are just the public API methods, checking null, and declaring a correct call-site (correct return type, correct argument types).

sodium_h.crypto_box_keypair
public static MethodHandle crypto_box_keypair$MH() {
    return RuntimeHelper.requireNonNull(constants$22.crypto_box_keypair$MH,
                                        "crypto_box_keypair");
}
public static int crypto_box_keypair ( Addressable pk,  Addressable sk) {
    var mh$ = RuntimeHelper.requireNonNull(constants$22.crypto_box_keypair$MH,
                                           "crypto_box_keypair");
    try {
        return (int)mh$.invokeExact(pk.address(), sk.address());
    } catch (Throwable ex$) {
        throw new AssertionError("should not reach here", ex$);
    }
}

Going further down to see how the MethodHandle is declared:

sodium_h_constants_0.crypto_box_keypair$MH
static final FunctionDescriptor crypto_box_keypair$FUNC = FunctionDescriptor.of(
    C_INT,
    C_POINTER,
    C_POINTER
);

static final MethodHandle crypto_box_keypair$MH = RuntimeHelper.downcallHandle(
    sodium_h.LIBRARIES,
    "crypto_box_keypair",
    "(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I", (1)
    constants$22.crypto_box_keypair$FUNC,
    false
);
1 Note that the Java method signature is declared with a String instead of the Java API MethodType.

This code creates the down-call stub, the only difference with the handcrafted handle in the section above, is the signature of the method declared as a String.

(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I 's breakdown
  • Ljdk/incubator/foreign/MemoryAddress ⇒ arg0

  • Ljdk/incubator/foreign/MemoryAddress ⇒ arg1

  • Iint return type

The other two methods in this example crypto_box_seal and crypto_box_seal_open are similar and don’t require to do the tedious handle declaration.

This type raised a few questions about how to map them in Java in the first section where I used manually jdk.incubator.foreign. Also, there’s statement at this time about jextract not supporting some wide types.

  • jextract does not support certain C types bigger than 64 bits (e.g. long double).

How does it handle these unsupported types, the answer is in the source code.

In here we learn that unsigned types are represented with their signed counterpart and the types wider than 64 bits are represented with a specific unsupported layout during headers processing. The symbols with unsupported layouts won’t be generated as the JEP-389 linker won’t be able to link them.

Some details on how jextract's primitive types handling

The enum below in jextract show how native primitive types are mapped to their respective memory layout whether they are supported of not.

enum Kind {
    /**
     * {@code void} type.
     */
    Void("void", null),
    /**
     * {@code Bool} type.
     */
    Bool("_Bool", CLinker.C_CHAR),
    /**
     * {@code char} type.
     */
    Char("char", CLinker.C_CHAR),
    /**
     * {@code char16} type.
     */
    Char16("char16", UnsupportedLayouts.CHAR16),
    /**
     * {@code short} type.
     */
    Short("short", CLinker.C_SHORT),
    /**
     * {@code int} type.
     */
    Int("int", CLinker.C_INT),
    /**
     * {@code long} type.
     */
    Long("long", CLinker.C_LONG),
    /**
     * {@code long long} type.
     */
    LongLong("long long", CLinker.C_LONG_LONG),
    /**
     * {@code int128} type.
     */
    Int128("__int128", UnsupportedLayouts.__INT128),
    /**
     * {@code float} type.
     */
    Float("float", CLinker.C_FLOAT),
    /**
     * {@code double} type.
     */
    Double("double",CLinker.C_DOUBLE),
    /**
      * {@code long double} type.
      */
    LongDouble("long double", UnsupportedLayouts.LONG_DOUBLE),
    /**
     * {@code float128} type.
     */
    Float128("float128", UnsupportedLayouts._FLOAT128),
    /**
     * {@code float16} type.
     */
    HalfFloat("__fp16", UnsupportedLayouts.__FP16),
    /**
     * {@code wchar} type.
     */
    WChar("wchar_t", UnsupportedLayouts.WCHAR_T)

    // ...
}

Those types can be qualified, in particular integer types can be unsigned:

case UShort: {
    Type chType = Type.primitive(Primitive.Kind.Short);
    return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UInt: {
    Type chType = Type.primitive(Primitive.Kind.Int);
    return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULong: {
    Type chType = Type.primitive(Primitive.Kind.Long);
    return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULongLong: {
    Type chType = Type.primitive(Primitive.Kind.LongLong);
    return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UChar: {
    Type chType = Type.primitive(Primitive.Kind.Char);
    return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}

Going further we can see that signed and unsigned integers use the same memory layout, e.g. long long and unsigned long long use the same layout C_LONG_LONG.

public static MemoryLayout getLayout(Type t) {
    Supplier<UnsupportedOperationException> unsupported = () ->
            new UnsupportedOperationException("unsupported: " + t.kind());
    switch(t.kind()) {
        case UChar, Char_U:
        case SChar, Char_S:
            return Primitive.Kind.Char.layout().orElseThrow(unsupported);
        case Short:
        case UShort:
            return Primitive.Kind.Short.layout().orElseThrow(unsupported);
        case Int:
        case UInt:
            return Primitive.Kind.Int.layout().orElseThrow(unsupported);
        case ULong:
        case Long:
            return Primitive.Kind.Long.layout().orElseThrow(unsupported);
        case ULongLong:
        case LongLong:
            return Primitive.Kind.LongLong.layout().orElseThrow(unsupported); (1)
        case UInt128:
        case Int128:
            return Primitive.Kind.Int128.layout().orElseThrow(unsupported); (2)
        case Enum:
            return valueLayoutForSize(t.size() * 8).layout().orElseThrow(unsupported);
        case Bool:
            return Primitive.Kind.Bool.layout().orElseThrow(unsupported);
        case Float:
            return Primitive.Kind.Float.layout().orElseThrow(unsupported);
        case Double:
            return Primitive.Kind.Double.layout().orElseThrow(unsupported);
        case LongDouble:
            return Primitive.Kind.LongDouble.layout().orElseThrow(unsupported);
        case Complex:
            throw new UnsupportedOperationException("unsupported: " + t.kind());
        case Record:
            return getRecordLayout(t);
        case Vector:
            return MemoryLayout.sequenceLayout(t.getNumberOfElements(), getLayout(t.getElementType()));
        case ConstantArray:
            return MemoryLayout.sequenceLayout(t.getNumberOfElements(), getLayout(t.getElementType()));
        case IncompleteArray:
            return MemoryLayout.sequenceLayout(getLayout(t.getElementType()));
        case Unexposed:
            Type canonical = t.canonicalType();
            if (canonical.equalType(t)) {
                throw new TypeMaker.TypeException("Unknown type with same canonical type: " + t.spelling());
            }
            return getLayout(canonical);
        case Typedef:
        case Elaborated:
            return getLayout(t.canonicalType());
        case Pointer:
        case BlockPointer:
            return C_POINTER;
        default:
            throw new UnsupportedOperationException("unsupported: " + t.kind());
    }
}
1 C_LONG_LONG will be used for both long long and unsigned long long.
2 Native types longer than 64 bits are still represented internally by jextract.

jextract identify unsupported types, and represents them correctly during the C header processing. But the symbols that use them will be skipped during the Java generation.

private static final String ATTR_LAYOUT_KIND = "jextract.abi.unsupported.layout.kind";

public static final ValueLayout __INT128 = MemoryLayout.valueLayout(128, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "__int128");

public static final ValueLayout LONG_DOUBLE = MemoryLayout.valueLayout(128, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "long double");

public static final ValueLayout _FLOAT128 = MemoryLayout.valueLayout(128, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "_float128");

public static final ValueLayout __FP16 = MemoryLayout.valueLayout(16, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "__fp16");

public static final ValueLayout CHAR16 = MemoryLayout.valueLayout(16, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "char16");

public static final ValueLayout WCHAR_T = MemoryLayout.valueLayout(16, ByteOrder.nativeOrder()).
        withAttribute(ATTR_LAYOUT_KIND, "wchar_t");

static boolean isUnsupported(MemoryLayout vl) { (1)
    return vl.attribute(ATTR_LAYOUT_KIND).isPresent();
}

static String getUnsupportedTypeName(MemoryLayout vl) {
    return (String)
            vl.attribute(ATTR_LAYOUT_KIND).orElseThrow(IllegalArgumentException::new);
}
1 Invoked during java representation generation.

To be part of the JDK or not ?

It has been brought to me that jextract may never be part of a standard JDK. This is still being debated.

But the main motivation is the substantial weight of jextract, indeed jextract is based on libclang which is about 81 MiB on macOs, 92 MiB on Linux. Panama developers don’t want to put this much weight on the JDK. Moreover, this tool is likely to be confined to a small audience of Java developers.

Instead, jextract could be delivered via other means like JMH (Java Microbenchmark Harness), or JDK Mission Control.

Wrapping up on jextract for JEP-412 / build 17-panama+3-167

This iteration showed massive improvements of jextract, for my usage the pitfalls present at the time of JEP-389 (JDK 16) are gone. I tend to think the generated code is still a bit verbose, but it got better.

Most welcome is the precise inclusion of symbols which is based on a two phase approach : dump symbol include options then load as a configuration file. This mechanism is very useful, the sheer number of dumped symbols can be a tad intimidating, but this approach is easy to manage. The use of this configuration file is great.

If there’s something that need improvement it’s the help. But I’m sure it will be fixed before the final release.

When a final version is released, this could be leveraged by Gradle or Jetbrains IntelliJ IDEA, etc.

Closing words

Cool part

In JDK17 the foreign module is even easier and particularly safer to use albeit javac and java command line requirement. The API is well-designed and easy to use. I also appreciated the idea of scoped segments, a bit like what was implemented in the Rust language. There’s also the coolness of being able to free memory segment (in particular for mapped file) at will, without depending on the GC.

Sad part

This is yet another incubator with slight API change. It’s not unlikely the API get refined again, e.g. to prevent unsafe usage. Some of this blog post content will eventually become incorrect when the next JDK comes out. Also, jextract solidify its position as a very practical tool, too sad it isn’t included in the JDK yet, but the safer approach wins here.

Overall

JEP-412 is yet another solid step-stone toward what looks like the replacement (in terms of usage) of JNI or JNA. As before I can only applaud the work done! My only regret is it’s not yet already available. That said as a developer I support the idea to not ship until ready.


You might also be interested in these two podcasts (thanks to David Delabassée)