A practical look at JEP-389 in JDK16 with libsodium
JDK-17 will be released on the September 14th 2021, it will come with another incubator JEP-412 for foreign function and foreign memory, go to the updated article for JDK-17. |
JDK 16 is coming and with the incubating JEP-389 (Foreign Linker API).
The Foreign Linker API is a very convenient and attractive way to connect to the native world. Let’s have a practical look at this API that should supersede JNI. In order to do so I wanted Java code to interact with the infamous libsodium.
First I will focus on using the foreign linker API, then I will show how to use
jextract
in its current state (it is still being actively developed).
Note that JEP-389 is still incubating, therefore examples below are to be obsolete for the next JDK as API and behavior are further refined. |
The following examples were based on JDK 16 release candidate build 36 (2021/2/8). |
Thanks to Jean-Phillipe Bempel for the review and in particular spotting errors.
The crypto sealed box example
Let’s try to reproduce the following example from the libsodium sealbox documentation, on this page there is a simple code snippet, that could be interesting to reproduce in Java.
#define MESSAGE (const unsigned char *) "Message"
#define MESSAGE_LEN 7
#define CIPHERTEXT_LEN (crypto_box_SEALBYTES + MESSAGE_LEN)
/* Recipient creates a long-term key pair */
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);
/* Anonymous sender encrypts a message using an ephemeral key pair
* and the recipient's public key */
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);
/* Recipient decrypts the ciphertext */
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
recipient_pk, recipient_sk) != 0) {
/* message corrupted or not intended for this recipient */
}
Testing the idea in jshell
One of the cool thing with jshell
is that you can try small ideas with a rapid
feedback loop. With the right configuration, it is also possible to play the
foreign linker.
$ jshell --add-modules jdk.incubator.foreign -R-Dforeign.restricted=permit
Then within jshell, let’s try out a simple smoke test.
jshell> import java.lang.invoke.*;
jshell> import jdk.incubator.foreign.*;
jshell> var getpid = CLinker.getInstance()
...> .downcallHandle(
...> LibraryLookup.ofDefault().lookup("getpid").get(),
...> MethodType.methodType(long.class),
...> FunctionDescriptor.of(CLinker.C_LONG)
...> );
getpid ==> MethodHandle()long
jshell> (long) getpid.invokeExact();
$4 ==> 53699
jshell> ProcessHandle.current().pid()
$5 ==> 53699
Yes it works ! It really is easy to try a things for almost free, without leaving Java this is really neat. Now I would like to focus on the small example with libsodium within a project. I’ll explain how to use the API along the way.
Configuring Gradle
The incubating modules are not on the default module path. Hence, it is required
to add the jdk.incubator.foreign
module when invoking the compilation command.
$ javac --add-modules jdk.incubator.foreign ...
This module also needs to be declared when running this code, as well as
another property foreign.restricted
to be able to invoke native code.
$ java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...
If you like to play with jshell
, it will be necessary to use these two as well
$ jshell -R-Dforeign.restricted=permit --add-modules jdk.incubator.foreign ...
Then comes the question to configure the build tool. I am using Gradle, the configuration is likely similar for other build tool.
// ...
java {
toolchain {
languageVersion.set(JavaLanguageVersion.of(16))
}
}
tasks {
withType<JavaCompile>().configureEach {
options.forkOptions.jvmArgs = listOf(
"--add-opens", "jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED" (1)
)
options.compilerArgs = listOf(
"--add-modules", "jdk.incubator.foreign" (2)
)
options.release.set(16)
}
withType<JavaExec>().configureEach {
jvmArgs("-Dforeign.restricted=permit", (3)
"--add-modules", "jdk.incubator.foreign")
}
withType<Test>().configureEach {
useJUnitPlatform()
jvmArgs("-Dforeign.restricted=permit", (4)
"--add-modules", "jdk.incubator.foreign")
}
}
1 | Gradle itself can run on a different JDK, but the code needs to be compiled with JDK16, at this time Gradle 6.8.2 does not support the new module restriction introduced with JDK16 by default, hence it is necessary to explicitly open modules. See gradle/gradle#15538. |
2 | Let the compiler knows about the jdk.incubator.foreign module |
3 | Configure the tasks that executes a main class, while this is not immediately useful
IntelliJ IDEA will pick up this configuration, when you click running a main method. |
4 | Configure test tasks to be able to run jdk.incubator.foreign tests. |
The first and minimal call crypto_box_sealbytes
The first lines makes use of a few macros (the lines starting with #define
),
we can assume that MESSAGE
will be a method parameter, MESSAGE_LEN
will be derived from the message parameter, and CIPHERTEXT_LEN
is also derived
from the message but needs another constant crypto_box_SEALBYTES
.
The first thing needed is to acquire the crypto_box_SEALBYTES
constant, looking at
crypto_box.h
there’s a method size_t crypto_box_sealbytes(void);
that returns this constant.
It’s simple, and it will be the first method I will present here.
The first challenge is to map the return type size_t
, unsigned integer type,
since the constant
1
2
3
is inferior to the integer max value and that I’d like to use
this as an array size, I will map it to an int
.
MethodHandle crypto_box_sealbytes =
CLinker.getInstance()
.downcallHandle(
libsodiumLookup.lookup("crypto_box_sealbytes").get(),
MethodType.methodType(int.class),
FunctionDescriptor.of(CLinker.C_INT)
);
var crypto_box_SEALBYTES = (int) crypto_box_sealbytes.invokeExact();
The java type and the C descriptor must match, otherwise the call will fail at
runtime with a IllegalArgumentException
.
If the java method type used long.class
, and the C descriptor was C_INT
,
the code would have failed with a carrier mismatch.
java.lang.IllegalArgumentException: Carrier size mismatch: long != b32[abi/kind=INT]
If the java method type used int.class
, and the C descriptor was C_LONG
,
the code would have failed with a carrier mismatch.
java.lang.IllegalArgumentException: Carrier size mismatch: int != b64[abi/kind=LONG]
For reference, CLinker.C_INT
is actually a MemoryLayout
, a layout is used
to model native memory.
Then a more interesting case, passing argument pointers
The next part of the example is a little more involved code, the
crypto_box_keypair
method takes two array pointers recipient_pk
and
recipient_sk
, the generated keypair will be written to the given byte array.
unsigned char recipient_pk[crypto_box_PUBLICKEYBYTES];
unsigned char recipient_sk[crypto_box_SECRETKEYBYTES];
crypto_box_keypair(recipient_pk, recipient_sk);
In order to initialize the size of these arrays, the codes needs
two constants crypto_box_PUBLICKEYBYTES
and
crypto_box_SECRETKEYBYTES
. To access these two it’ll be the same
as crypto_box_SEALBYTES
.
The C mapping is easy to get : a void method that takes 2 pointers
FunctionDescriptor.ofVoid(C_POINTER, C_POINTER)
. In Java the method type
require a type called MemoryAddress
which represents the pointer address.
The pointers need to point to some memory. That’s what the MemorySegment
type
is for. Before invoking the method the necessary memory will be allocated
via MemorySegment::allocateNative
, and the respective memory segment address
will be passed.
MethodHandle crypto_box_keypair =
CLinker.getInstance().downcallHandle(
libsodiumLookup.lookup("crypto_box_keypair").get(),
MethodType.methodType(
void.class,
MemoryAddress.class, // pk
MemoryAddress.class // sk
),
FunctionDescriptor.ofVoid(C_POINTER, C_POINTER)
);
var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes());
var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes());
crypto_box_keypair.invokeExact(recipientPublicKey.address(),
recipientSecretKey.address());
var kp = new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);
This code works, but there is something that must be taken care of, the native segment lifecycle. |
The above code snippet never deallocate native memory. Fortunately
in JDK 16 the MemorySegment
class implements AutoCloseable
, declaring it
in a try-with_resources block will solve the issue.
MemorySegment
lifecycletry (var recipientPublicKey = MemorySegment.allocateNative(crypto_box_publickeybytes()); var recipientSecretKey = MemorySegment.allocateNative(crypto_box_secretkeybytes())) { crypto_box_keypair.invokeExact(recipientPublicKey.address(), recipientSecretKey.address()); return new CryptoBoxKeyPair( recipientPublicKey.toByteArray(), recipientSecretKey.toByteArray() ); }
However, JEP-389 comes with the concept of scopes, which allows to express
temporal bounds of these segments, in JDK16 look for the NativeScope
class,
it allows registering segments in a code section and allocating segments
anywhere in this section.
NativeScope
(.java)try (var scope = NativeScope.unboundedScope()) {
var recipientPublicKey = scope.allocate(crypto_box_publickeybytes());
var recipientSecretKey = scope.allocate(crypto_box_secretkeybytes());
crypto_box_keypair.invokeExact(recipientPublicKey.address(),
recipientSecretKey.address());
return new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);
}
In order to get back the off-heap content into Java types, the code can call
any of the to{The Java Type}
methods on the MemorySegment
instance, they
will take care of the conversion.
Next invoking the sealing method
The next method to call is crypto_box_seal
, which also takes
pointers and a message length.
unsigned char ciphertext[CIPHERTEXT_LEN];
crypto_box_seal(ciphertext, MESSAGE, MESSAGE_LEN, recipient_pk);
When looking at the
C signature
however we notice something unusual for Java developers: the message length
argument is of type long long
!
In C or C++, this declaration means the type is at least 8 bytes (64 bits),
this means a Java long
type is what is needed.
In particular here’s a breakdown of the signed integers. It is incomplete
as they can be declared differently (eg. long
is the same as long int
,
or long long
is the same as long long int
), this wikipedia page has a more complete overview of
C data types.
|
A signed integer type with "the natural size suggested by the
architecture of the execution environment", On a 64bits CPU, |
|
A signed integer type that is at least so 4 bytes (\$[-2147483647; +2147483647]\$). On a 64bits CPU, |
|
A signed integer type that is at least so 8 bytes (\$[−9223372036854775807; +9223372036854775807]\$). On a 64bits CPU, |
When you start to study these C data types a bit more, you’ll notice two things that just don’t match with Java types:
As a reminder |
SODIUM_EXPORT
int crypto_box_seal(unsigned char *c, const unsigned char *m,
unsigned long long mlen, const unsigned char *pk)
__attribute__ ((nonnull(1, 4)));
Also, for this post, and I intend to pass a short String
message,
which is baked by a char
array whose length can only be an int
.
var crypto_box_seal = CLinker.getInstance().downcallHandle(
libsodiumLookup.lookup("crypto_box_seal").get(),
MethodType.methodType(int.class,
MemoryAddress.class, // cipherText, output buffer
MemoryAddress.class, // message
long.class, // message length
MemoryAddress.class // publicKey
),
FunctionDescriptor.of(C_INT,
C_POINTER,
C_POINTER,
C_LONG_LONG,
C_POINTER)
);
try (var scope = NativeScope.unboundedScope()) {
var cipherText = scope.allocate(crypto_box_sealbytes() + message.length());
var ret = (int) crypto_box_seal.invokeExact(
cipherText.address(),
CLinker.toCString(message, StandardCharsets.US_ASCII, scope).address(),
(long) message.length(),
scope.allocateArray(C_CHAR, publicKey).address()
);
return cipherText.toByteArray();
}
There’s a few thing to notice :
-
I am specifically passing the
US_ASCII
charset, as I now that the byte array representation of the string will be 1 byte perchar
, implying I can use theString::length
method. If the string used characters that do not fit in a single byte, I would have needed to extract the byte array usingUTF-8
charset encoder first and use the length of the byte array instead. -
The
var ret
is not used, however due to the dynamic nature ofinvokeExact
, the compiler needs the exact signature on the call-site, that’s why the result of this invocation is assigned to anint
variable even if it is not used.Without this assignment the JVM would have raised a
WrongMethodTypeException
, in this case the exception message helps to identify the type differences in the signature:java.lang.invoke.WrongMethodTypeException: expected (MemoryAddress,MemoryAddress,long,MemoryAddress)int but found (MemoryAddress,MemoryAddress,long,MemoryAddress)void
Ending the crypto box example
The last method call of this snippet ends the libsodium crypto box example.
The method crypto_box_seal_open
take pointers and a ciphered text length
so let’s apply again what has been done for crypto_box_seal
.
unsigned char decrypted[MESSAGE_LEN];
if (crypto_box_seal_open(decrypted, ciphertext, CIPHERTEXT_LEN,
recipient_pk, recipient_sk) != 0) {
/* message corrupted or not intended for this recipient */
}
Which translates to
var crypto_box_seal_open = getInstance().downcallHandle(
libsodiumLookup.lookup("crypto_box_seal_open").get(),
MethodType.methodType(int.class,
MemoryAddress.class, // message
MemoryAddress.class, // cipherText
long.class, // cipherText.length
MemoryAddress.class, // public key
MemoryAddress.class // secret key
),
FunctionDescriptor.of(C_INT,
C_POINTER,
C_POINTER,
C_LONG_LONG,
C_POINTER,
C_POINTER
)
);
try (var scope = NativeScope.unboundedScope()) {
var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes());
var ret = (int) crypto_box_seal_open.invokeExact(decipheredText.address(),
scope.allocateArray(C_CHAR, cipherText).address(),
(long) cipherText.length,
scope.allocateArray(C_CHAR, publicKey).address(),
scope.allocateArray(C_CHAR, secretkey).address());
return CLinker.toJavaString(decipheredText, StandardCharsets.US_ASCII);
}
Yet running this code raise an error:
java.lang.IndexOutOfBoundsException: Out of bound access on segment MemorySegment{ id=0x6f11d841 limit: 20 }; new offset = 20; new length = 1
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.outOfBoundException(AbstractMemorySegmentImpl.java:495)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBoundsSmall(AbstractMemorySegmentImpl.java:465)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkBounds(AbstractMemorySegmentImpl.java:446)
at jdk.incubator.foreign/jdk.internal.foreign.AbstractMemorySegmentImpl.checkAccess(AbstractMemorySegmentImpl.java:401)
at java.base/java.lang.invoke.MemoryAccessVarHandleByteHelper.checkAddress(MemoryAccessVarHandleByteHelper.java:80)
at java.base/java.lang.invoke.MemoryAccessVarHandleByteHelper.get(MemoryAccessVarHandleByteHelper.java:113)
at jdk.incubator.foreign/jdk.incubator.foreign.MemoryAccess.getByteAtOffset(MemoryAccess.java:105)
at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.strlen(SharedUtils.java:259)
at jdk.incubator.foreign/jdk.internal.foreign.abi.SharedUtils.toJavaStringInternal(SharedUtils.java:249)
at jdk.incubator.foreign/jdk.incubator.foreign.CLinker.toJavaString(CLinker.java:342)
I didn’t get why this code failed at first.
CLinker::toJavaString
is the mirror function of the CLinker::toCString
, so it looked correct.
The exception message indicates the segment has the size 20 which is the length
of the string Hello foreign code !
, there’s new offset is 20
indicating the
segment was read up to the 20th byte / character, and there is the new length = 1
,
which suggests toJavaString
needs to read an additional character but can’t.
The javadoc of toJavaString
says (emphasis is mine) :
Converts a null-terminated C string stored at given address into a Java string, using the platform’s default charset.
This immediately clicked: libsodium’s message does not imply it is a string. It’s API takes a pointer to a memory region and the length to read in that memory region. For all that matter, the message could be any binary payload.
Let’s look at the string Hello
-
Libsodium seal method will be passed the following byte array
CLinker.toCString("Hello", StandardCharsets.US_ASCII).toByteArray()
⇒48656C6C6F00
-
But since the code is using
String::length
, libsodium will only seal up to 5 bytes :48656C6C6F
. -
Then opening the seal, the content of the
MemorySegment
that contains the decrypted message will be48656C6C6F
-
But
CLinker.toJavaString(decipheredText, StandardCharsets.US_ASCII)
expects the memory segment to be a valid C string, terminated by the\0
character. And since the actual decrypted memory segment is not terminated by '\0', the code emit an error.
For this reason this suggests the code to use is
new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII)
. They are other
possibilities like not using the CLinker::toCString
with the crypto_box_seal
method and instead, or to increment by 1 the length when CLinker::toCString
is passed.
For reference here are the bytes returned by String::getBytes
and
CLinker::toCString
.
-
"Hello".getBytes(US_ASCII)
⇒48656C6C6F
-
CLinker.toCString("Hello", US_ASCII).toByteArray()
⇒48656C6C6F00
For this blog post I’d like to keep the assumption the sealed message is a String
,
which leads to the following correct code :
try (var scope = NativeScope.unboundedScope()) {
var decipheredText = scope.allocateArray(C_CHAR, cipherText.length - crypto_box_sealbytes());
var ret = (int) crypto_box_seal_open.invokeExact(decipheredText.address(),
scope.allocateArray(C_CHAR, cipherText).address(),
(long) cipherText.length,
scope.allocateArray(C_CHAR, publicKey).address(),
scope.allocateArray(C_CHAR, secretkey).address());
return new String(decipheredText.toByteArray(), StandardCharsets.US_ASCII);
}
Also, I have intentionally left out the returned status of crypto_box_seal_open
,
to focus on the foreign module API, but this would make sense to perform checks
on the returned value before returning the buffer as suggested on the libsodium
documentation.
Wrap up on manually using the Foreign Linker API
I didn’t cover everything this API has to offer, like the up call stubs,
which is a way to pass a function pointer to the native code, nor did I cover
the every feature of JEP-389, like MemorySegment
or MemoryLayout
API.
At this time I find this API a pleasure to use compared to JNI. Note that I don’t have experience with JNA, so I may be lacking perspective there.
There’s a few pitfalls like the CLinker::toJavaString
or the
MemorySegment
lifecycles which get more complicated if those segments
are shared between threads. I found the API well-designed and well
documented, but if you’re novice in this area, you’ll likely need
other materials. A package wide documentation, in jdk.incubator.foreign
,
should definitely fill this gap in my opinion.
The chosen example was concise in native code, but writing the stubs in Java
is quickly tedious and verbose. JDK developers felt the same way as they
are also investing energy on a tool named jextract
whose goal is to reduce
the tedious work amount. I’ll show in a section below what can be done with
the current state of jextract
.
Remarks about MemorySegment
s memory mapping
MemorySegment
do have the same constraints as DirectByteBuffer
s,
ie by default the size of the segment can’t size can’t go over
Runtime.getRuntime().maxMemory()
maxMemory
Exception in thread "main" java.lang.OutOfMemoryError: Cannot reserve 2147483648 bytes of direct buffer memory (allocated: 8192, limit: 522190848)
This limit is configurable by setting the -XX:MaxDirectMemorySize={size}
flag.
var memorySegment = MemorySegment.allocateNative(nativeSegmentSize);
There’s one interesting thing with this API it is possible to access the address
from the API, via MemorySegment::address
, and one can bet the hexadecimal
representation, via Long.toHexString(memorySegment.address().toRawLongValue())
.
MemoryAddress{ base: null offset=0x7fc513fff010 }
If you are on Linux then you use pmap
from the procps package to
inspect memory mappings of the JVM.
151: java -Dforeign.restricted=permit --add-modules jdk.incubator.foreign -XX:MaxDirectMemorySize=2100m MemorySegments.java
Address Kbytes RSS Dirty Mode Mapping
...
0000557635ba1000 4 0 0 r-x-- java
0000557635ba3000 4 0 0 r---- java
0000557635ba4000 4 0 0 rw--- java
0000557636d4b000 132 16 16 rw--- [ anon ]
00007fc513fff000 2097156 1811456 1811456 rw--- [ anon ] (1)
00007fc594000000 132 0 0 rw--- [ anon ]
00007fc594021000 65404 0 0 ----- [ anon ]
...
1 | This is the allocated segment, 2 GiB ⇐⇒ 2097152 KiB, this segment is a bit
larger by one page (4 KiB). And in fact the base address of the segment is
0x7fc513fff010 . |
In this case it is not related to alignment, but it may be possible. What is
important is that the address of a MemorySegment
may be contained in a larger
memory mapping.
One important and useful distinction with DirectByteBuffer
s is the presence
of a MemorySegment::close
method, that will immediately free the native mapping
when called.
DirectByteBuffer
used to be challenging because they had no explicit method
to free the native mapping, and as such had to wait for the GC to kick in
order to be freed.
Another thing to remind is that the memory mapping is zeroed, that means
a big segment will take a noticeable time to get initialized. As with
DirectByteBuffer
s this pattern is interesting when inspecting off-heap memory.
Usually it is more practical to use the NativeScope
API as it is easier to
reason about boundaries of the involved memory mapping.
Using a larger MemorySegment
coud be interesting when it has to be sliced and
shared among various threads. Also given the high initialization cost for large
segments it’s likely to have the same lifecycle as the application.
Typically, in a few years, Netty, Aeron, Kafka, Cassandra, …
could make use of this API !
One thing that caught me off-guard, is that when closing a slice (created by
MemorySegment::asSlice
) also closes the underlying segment.
Finally, when the code requires new native allocation, the JVM appears to be able to grow native mappings. In short the JVM tries to put these segment in a bigger memory mapping.
The access modes allows to define a set of permissions of the MemorySegment
,
by default all permissions are given. In the example below this segment won’t
be readable by
var ms = MemorySegment.allocateNative(segmentSize)
.withAccessModes(MemorySegment.WRITE | MemorySegment.CLOSE);
ms.asByteBuffer().getLong(); (1)
1 | Throws UnsupportedOperationException: Required access mode READ ; current access modes: [WRITE, CLOSE] |
I am not quite sure how to use these at this time. It certainly would be useful to prevent a slice from being closed though.
Also, the WRITE
and READ
permissions only apply to the Java object, the
native memory mapping isn’t afected, which is expected since it can hold multiple
MemorySegment
.
Until JEP-389, we used a FileChannel
and a MappedByteBuffer
to memory map a
file. The JEP-389 also take care of this use case, by using the mapFile
factory
method.
try (var mmaped = MemorySegment.mapFile(
path, (1)
0, (2)
Files.size(path), (3)
FileChannel.MapMode.READ_ONLY (4)
)) {
// ...
}
1 | A path eg Path.of("…") |
2 | The base offset |
3 | The size of the mapping, here the complete file |
4 | The mapping mode |
What is really nice here is that the MemorySegment
is also immediately freed
when the code leaves the try-with-resources block.
JEP-389 is still incubating
I mentioned that MemorySegment
is implementing AutoCloseable
, it won’t be
the case in the next JDK release.
In the same manner I mentioned NativeScope
earlier, which is a JDK16 API, but
in the current panama state it will be replaced by a slightly different
construct.
try (ResourceScope scope : ResourceScope.ofConfined()) {
MemorySegment.allocateNative(layout, scope):
MemorySegment.mapFile(… , scope);
CLinker.upcallStub(… , scope);
}
Given the current state I have doubts JEP-389 will get out of incubating for JDK 17. JEP-389 is working well, but I think the developers may need more time to get this API right. They are doing a fantastic job in my opinion.
jextract
jextract
is still being backed and was not ready to be included in JDK 16
for incubation, but since it complements JEP-389, I wanted to give
it a try and showcase its usefulness.
This tool leverages the native libclang
and as such the jdk.incubator.foreign
module.
In order to be able to use it, one should download the panama jdk
here: https://jdk.java.net/panama/. Don’t be scared by early access,
JDK 17 (very early at this stage) or the other warnings, you just need
to use jextract
not the panama jdk.
When I started to bootstrap work on JDK16 and libsodium, the built
panama JDK didn’t contain the jextract
, as I wasn’t sure
I voiced this on Twitter,
Oracle engineers confirmed me this was a bug in the release
JDK-8261733 if this every
happen again, or you want to try the latest jextract
, you’ll need to build
the panama JDK.
Again the jextract tool is still being backed at this time.
That means it that everything below can be obsolete any time.
|
Extracting Java liking code from the Libsodium headers
The first thing I need is to get the headers of libsodium, and for that I cloned the repo. Then checked out the 1.0.18 tag as I intend to target this released binary.
$ git clone https://github.com/jedisct1/libsodium.git
Cloning into 'libsodium'...
remote: Enumerating objects: 151, done.
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects: 100% (105/105), done.
remote: Total 32369 (delta 74), reused 86 (delta 41), pack-reused 32218
Receiving objects: 100% (32369/32369), 8.24 MiB | 10.52 MiB/s, done.
Resolving deltas: 100% (19205/19205), done.
$ git checkout 1.0.18
Headers are located in this folder src/libsodium/include
. Now let use
jextract
.
jextract
$ jextract
-d libsodium-jextract \ (1)
-l sodium \ (2)
--target-package com.github.bric3.sodium \ (3)
-I src/libsodium/include/ \ (4)
-I src/libsodium/include/sodium \ (4)
--filter sodium.h \ (5)
src/libsodium/include/sodium.h (6)
src/libsodium/include/sodium/export.h:5:10: fatal error: 'stddef.h' file not found
1 | Destination of the generated sources |
2 | Extracts or more precisely generate sources, instead of classes |
3 | Indicates the target package of the generated source |
4 | Includes of the library (some files include others in the library) |
5 | Only includes symbols from the given file, otherwise symbols of other includes may be extracted |
6 | The C header file |
Obviously the standard C headers are not discovered by jextract
.
I tried to solve this by declaring the system includes in /usr/include
and /usr/include/linux
(/usr/include/linux/stddef.h
) but the error
went a bit further with unknown type name 'size_t'
. This is a known issue
that for some platforms jextract has issues to find the system headers
(JDK-8262127).
size_t
is a standard C alias representing the unsigned integer type.
I found help in this old thread from november 2018.
Instead of using the includes under /usr/includes
, it is necessary to use
the includes of the compiler ; on my docker image they were located
here : /usr/lib/gcc/x86_64-redhat-linux/8/include
.
Also I noticed that jextract
generates classes first, but you can pass
a --source
option to configure it to generate sources instead.
On the next run of jextract
the extraction
process stopped on
the file version.h
.
$ jextract \
-d libsodium-jextract \
-l sodium \
--source \ (1)
--target-package com.github.bric3.sodium \
-I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (2)
-I src/libsodium/include/ \
-I src/libsodium/include/sodium \
--filter sodium.h \
src/libsodium/include/sodium.h
src/libsodium/include/sodium.h:5:10: fatal error: 'sodium/version.h' file not found
1 | generates the sources |
2 | the compiler includes installed on this linux image |
In the libsodium repository there’s a file named version.h.in
,
and upon inspection of its content I noticed placeholders that suggests
a preliminary phase in the libsodium build will generate the final version.h
.
In native sources this usually happen via a combination of ./autogen.sh
and ./configure
.
Let’s prepare the code base.
$ ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: configure.ac: creating directory build-aux
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: configure.ac: not using Autoheader
autoreconf: running: automake --add-missing --copy --force-missing
configure.ac:75: installing 'build-aux/compile'
configure.ac:9: installing 'build-aux/config.guess'
configure.ac:9: installing 'build-aux/config.sub'
configure.ac:10: installing 'build-aux/install-sh'
configure.ac:10: installing 'build-aux/missing'
src/libsodium/Makefile.am: installing 'build-aux/depcomp'
parallel-tests: installing 'build-aux/test-driver'
autoreconf: Leaving directory `.'
Downloading config.guess and config.sub...
Done.
./configure
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether UID '0' is supported by ustar format... yes
checking whether GID '0' is supported by ustar format... yes
checking how to create a ustar tar archive... gnutar
checking whether make supports nested variables... (cached) yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking whether make supports the include directive... yes (GNU style)
checking for gcc... gcc
...
configure: creating ./config.status
config.status: creating Makefile
config.status: creating builds/Makefile
config.status: creating contrib/Makefile
config.status: creating dist-build/Makefile
config.status: creating libsodium.pc
config.status: creating libsodium-uninstalled.pc
config.status: creating msvc-scripts/Makefile
config.status: creating src/Makefile
config.status: creating src/libsodium/Makefile
config.status: creating src/libsodium/include/Makefile
config.status: creating src/libsodium/include/sodium/version.h (1)
config.status: creating test/default/Makefile
config.status: creating test/Makefile
config.status: executing depfiles commands
config.status: executing libtool commands
1 | Configuring version.h with version values |
Finally, this time jextract
worked as expected.
$ jextract \
-d libsodium-jextract \
-l sodium \
--source \
--target-package com.github.bric3.sodium \
-I /usr/lib/gcc/x86_64-redhat-linux/8/include \
-I src/libsodium/include/ \
-I src/libsodium/include/sodium \
--filter sodium.h \
src/libsodium/include/sodium.h
However, when I opened sodium_h.java
it was empty.
public final class sodium_h {
/* package-private */ sodium_h() {}
}
In the 1.x tree the sodium.h
file only includes the declaration of other headers.
When I explicitly filtered on sodium.h
, jextract
evicted symbols
of the includes.
How to keep the declarations of the other headers ?
At this time jextract
help is a bit vague.
$ jextract --help
Non-option arguments:
[String] -- header file
Option Description
------ -----------
-?, -h, --help print help
-C <String> pass through argument for clang
-I <String> specify include files path
-d <String> specify where to place generated files
--filter <String> header files to filter
-l <String> specify a library
--source generate java sources
-t, --target-package <String> target package for specified header files
Looking at the jextract
source code was the way to go, first the code suggests
that it’s possible to pass multiple filters (--filter
), just like it
is possible to pass multiple include (-I
).
Although it is not very practical with multiple values, isn’t is
possible to pass a pattern ?
This is answered here in this document
(Using the jextract
tool)
or in the source code in the Filter
class ;
it’s possible to pass --filter
a part of the path, the current
code will just verify if this string is contained in the header path.
Concretely I can use the string sodium
as a filter to include headers
located in include/sodium/
folder.
$ jextract \
-d libsodium-jextract \ (1)
--source \ (2)
--target-package com.github.bric3.sodium \ (3)
-l sodium \ (4)
-I /usr/lib/gcc/x86_64-redhat-linux/8/include \ (5)
-I src/libsodium/include/ \ (6)
-I src/libsodium/include/sodium \ (6)
--filter sodium \ (7)
src/libsodium/include/sodium.h (8)
1 | Destination of the generated sources |
2 | Extracts or more precisely generate sources, instead of classes |
3 | Indicates the target package of the generated source |
4 | Name without the JNI prefix and suffix (or path) of the library to load |
5 | Includes C definitions or includes like size_t , stddef.h etc. |
6 | Includes of the library (some files include others in the library) |
7 | Only includes symbols from the given file, otherwise symbols of other includes may be extracted |
8 | The C header file |
$ ls -lh libsodium-jextract-f/com/github/bric3/sodium/
total 956K
-rw-r--r--. 1 root root 557 Feb 16 14:10 C.java
-rw-r--r--. 1 root root 8.8K Feb 16 14:10 RuntimeHelper.java
-rw-r--r--. 1 root root 350K Feb 16 14:10 sodium_h.java
-rw-r--r--. 1 root root 124K Feb 16 14:10 sodium_h_0.java
-rw-r--r--. 1 root root 329K Feb 16 14:10 sodium_h_constants_0.java
-rw-r--r--. 1 root root 131K Feb 16 14:10 sodium_h_constants_1.java
Invoking the library
Let’s have a look at what jextract
generated. The entry point is
the class sodium_h
. In particular let’s compare the method stubs
to these I wrote earlier :
-
crypto_box_sealbytes
-
crypto_box_keypair
-
crypto_box_seal
-
crypto_box_seal_open
The libsodium headers declare a method named crypto_box_sealbytes
,
whose role is to return a constant crypto_box_SEALBYTES
, however
this constant is defined as a C preprocessor directive #DEFINE
,
which is not visible as a symbol when performing a library lookup.
The native crypto_box_sealbytes
method compensates this limitation.
jextract
is however reading the headers, in doing so it actually extracts
the constant crypto_box_SEALBYTES
. It is still exposed as method,
and it is declared in a different class sodium_h_0#crypto_box_SEALBYTES
.
Note that sodium_h
extends sodium_h_0
, so one will write
sodium_h.crypto_box_SEALBYTES()
Behind the scene this call invokes sodium_h_constants_1#crypto_box_SEALBYTES
,
and for sodium_h
this split in two classes due to the class limits.
sodium_h_constants_1
extends sodium_h_constants_0
.
First hiccup
When I accessed this constant for the first time, I got this error :
java.lang.ExceptionInInitializerError
at com.github.bric3.sodium.sodium_h_0.crypto_box_PUBLICKEYBYTES(sodium_h_0.java:1511)
at com.github.bric3.sodium.Libsodium$JextractedLibsodium.crypto_box_keypair(Libsodium.java:263)
at com.github.bric3.sodium.LibsodiumTest.can_invoke_crypto_box_keypair(LibsodiumTest.java:44)
Caused by: java.lang.IllegalArgumentException: Library not found: sodium
at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.lookup(LibrariesHelper.java:94)
at jdk.incubator.foreign/jdk.internal.foreign.LibrariesHelper.loadLibrary(LibrariesHelper.java:60)
at jdk.incubator.foreign/jdk.incubator.foreign.LibraryLookup.ofLibrary(LibraryLookup.java:150)
at com.github.bric3.sodium.RuntimeHelper.lambda$libraries$0(RuntimeHelper.java:46)
at com.github.bric3.sodium.RuntimeHelper.libraries(RuntimeHelper.java:49)
at com.github.bric3.sodium.sodium_h_constants_0.<clinit>(sodium_h_constants_0.java:14)
The stacktrace points to this code:
static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] {
"sodium", (1)
});
1 | This is the value I passed to the jextract command. |
RuntimeHelper::libraries
can load a library from a name (using JNI conventions,
JNI_LIB_PREFIX
and JNI_LIB_PREFIX
)
or a path.
The value above is the value I used in the -l sodium
option of jextract
,
yet this value here is obviously incorrect for my use case.
jextract
It is not yet clear, in the jextract
usage description at this time,
but one can pass to the -l
option
-
A library name, which has to be available on one of the paths declared in the JVM system property
java.library.path
- linux
-
/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
- macOs
-
/Users/bric3/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.
The library must conform to JNI conventions,
libsodium.23.dylib
orlibtasn1.so.6.5.5
won’t work as they contain version numbers. -
Or an absolute path eg
/usr/local/opt/libsodium/lib/libsodium.23.dylib
.
However, the actual library path is dependent on the system, on the library
version and on the installation mechanism. I could have used jextract
with -l /usr/local/opt/libsodium/lib/libsodium.23.dylib
, but then the generated
code can not run on Linux without modifications, etc.
My final objective for this code is to declare the libsodium bindings in java, and link with the actual libsodium on the platform macOs or Linux.
LIBRARIES
is a static final variable that is used by other static variables
in the same class. While it is possible to edit the sodium_h_constants_0
class, it is still difficult to make this LibraryLookup
code configurable
without a significant refactoring.
Oracle engineers are aware of this problem JDK-8262126, so we might see it fixed in the final JEP-389 release.
For this article the easiest solution, is to declare the local libsodium path in the code, as I did in the first section of this blog.
static final LibraryLookup[] LIBRARIES = RuntimeHelper.libraries(new String[] {
"/usr/local/opt/libsodium/lib/libsodium.23.dylib"
});
In the end I’ll rework this initialization later though with custom code to find the actual libsodium on the current platform.
Now implementing the other functions
Now let’s profit from the generated function call, in the same order
I’d like to use crypto_box_keypair
, this is straightforward.
The arguments are still carrier type like MemorySegment
,
which means we still need to take care of the scope / lifecycle of
these allocations.
try (var scope = NativeScope.unboundedScope()) {
var recipientPublicKey = scope.allocate(sodium_h.crypto_box_PUBLICKEYBYTES());
var recipientSecretKey = scope.allocate(sodium_h.crypto_box_SECRETKEYBYTES());
sodium_h.crypto_box_keypair(recipientPublicKey, recipientSecretKey); (1)
return new CryptoBoxKeyPair(
recipientPublicKey.toByteArray(),
recipientSecretKey.toByteArray()
);
}
1 | the jextracted method |
The IDE might suggest a method named crypto_box_keypair$MH
; the suffix
$MH
simply indicates this returns the Method Handle for this native
method which is basically what I showed in the first part of this blog post.
As reflex, I always like to navigate the code I’m invoking. The method we are invoking are just the public API methods, checking null, and declaring a correct callsite (correct return type, correct argument types).
public static MethodHandle crypto_box_keypair$MH() {
return RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(),
"unresolved symbol: crypto_box_keypair");
}
public static int crypto_box_keypair ( Addressable pk, Addressable sk) {
var mh$ = RuntimeHelper.requireNonNull(sodium_h_constants_0.crypto_box_keypair$MH(),
"unresolved symbol: crypto_box_keypair");
try {
return (int)mh$.invokeExact(pk.address(), sk.address());
} catch (Throwable ex$) {
throw new AssertionError("should not reach here", ex$);
}
}
Going further down to see how the MethodHandle
is declared:
static final FunctionDescriptor crypto_box_keypair$FUNC_ = FunctionDescriptor.of(
C_INT,
C_POINTER,
C_POINTER
);
static final MethodHandle crypto_box_keypair$MH_ = RuntimeHelper.downcallHandle(
LIBRARIES,
"crypto_box_keypair",
"(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I", (1)
crypto_box_keypair$FUNC_, false
);
static final java.lang.invoke.MethodHandle crypto_box_keypair$MH() { return crypto_box_keypair$MH_; }
1 | Note that the Java method signature is declared with a String instead
of the Java API MethodType . |
This code invokes creates the down-call stub, the only difference with the
handcrafted handle in the section above, is the signature of the method declared
as a String
.
(Ljdk/incubator/foreign/MemoryAddress;Ljdk/incubator/foreign/MemoryAddress;)I
breakdown-
Ljdk/incubator/foreign/MemoryAddress
⇒ arg0 -
Ljdk/incubator/foreign/MemoryAddress
⇒ arg1 -
I
⇒int
return type
The other two methods in this example crypto_box_seal
and crypto_box_seal_open
are similar and don’t require to do the tedious handle declaration.
This type raised a few questions about how to map them in Java in the first section
where I used manually jdk.incubator.foreign
. Also there’s statement at this time
about jextract
not supporting some wide types.
jextract does not support certain C types bigger than 64 bits (e.g.
long double
).
How does it handle these unsupported types, the answer is in the source code.
In here we learn that unsigned types are represented with their signed counterpart and the types wider than 64 bits are represented with a specific unsupported layout during headers processing. The symbols with unsupported layouts won’t be generated as the JEP-389 linker won’t be able to link them.
Some details on how jextract
's primitive types handling
The enum below in jextract show how native primitive types are mapped to their respective memory layout whether they are supported of not.
enum Kind {
/**
* {@code void} type.
*/
Void("void", null),
/**
* {@code Bool} type.
*/
Bool("_Bool", CLinker.C_CHAR),
/**
* {@code char} type.
*/
Char("char", CLinker.C_CHAR),
/**
* {@code char16} type.
*/
Char16("char16", UnsupportedLayouts.CHAR16),
/**
* {@code short} type.
*/
Short("short", CLinker.C_SHORT),
/**
* {@code int} type.
*/
Int("int", CLinker.C_INT),
/**
* {@code long} type.
*/
Long("long", CLinker.C_LONG),
/**
* {@code long long} type.
*/
LongLong("long long", CLinker.C_LONG_LONG),
/**
* {@code int128} type.
*/
Int128("__int128", UnsupportedLayouts.__INT128),
/**
* {@code float} type.
*/
Float("float", CLinker.C_FLOAT),
/**
* {@code double} type.
*/
Double("double",CLinker.C_DOUBLE),
/**
* {@code long double} type.
*/
LongDouble("long double", UnsupportedLayouts.LONG_DOUBLE),
/**
* {@code float128} type.
*/
Float128("float128", UnsupportedLayouts._FLOAT128),
/**
* {@code float16} type.
*/
HalfFloat("__fp16", UnsupportedLayouts.__FP16),
/**
* {@code wchar} type.
*/
WChar("wchar_t", UnsupportedLayouts.WCHAR_T);
Those types can be qualified, in particular integer types can be unsigned:
case UShort: {
Type chType = Type.primitive(Primitive.Kind.Short);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UInt: {
Type chType = Type.primitive(Primitive.Kind.Int);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULong: {
Type chType = Type.primitive(Primitive.Kind.Long);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case ULongLong: {
Type chType = Type.primitive(Primitive.Kind.LongLong);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
case UChar: {
Type chType = Type.primitive(Primitive.Kind.Char);
return Type.qualified(Delegated.Kind.UNSIGNED, chType);
}
Going further we can see that signed and unsigned integers use the same
memory layout, eg. long long
and unsigned long long
use the same layout
C_LONG_LONG
.
public static MemoryLayout getLayout(Type t) {
Supplier<UnsupportedOperationException> unsupported = () ->
new UnsupportedOperationException("unsupported: " + t.kind());
switch(t.kind()) {
case UChar, Char_U:
case SChar, Char_S:
return Primitive.Kind.Char.layout().orElseThrow(unsupported);
case Short:
case UShort:
return Primitive.Kind.Short.layout().orElseThrow(unsupported);
case Int:
case UInt:
return Primitive.Kind.Int.layout().orElseThrow(unsupported);
case ULong:
case Long:
return Primitive.Kind.Long.layout().orElseThrow(unsupported);
case ULongLong:
case LongLong:
return Primitive.Kind.LongLong.layout().orElseThrow(unsupported); (1)
case UInt128:
case Int128:
return Primitive.Kind.Int128.layout().orElseThrow(unsupported); (2)
case Enum:
return valueLayoutForSize(t.size() * 8).layout().orElseThrow(unsupported);
case Bool:
return Primitive.Kind.Bool.layout().orElseThrow(unsupported);
case Float:
return Primitive.Kind.Float.layout().orElseThrow(unsupported);
case Double:
return Primitive.Kind.Double.layout().orElseThrow(unsupported);
case LongDouble:
return Primitive.Kind.LongDouble.layout().orElseThrow(unsupported);
case Complex:
throw new UnsupportedOperationException("unsupported: " + t.kind());
case Record:
return getRecordLayout(t);
case Vector:
return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType()));
case ConstantArray:
return MemoryLayout.ofSequence(t.getNumberOfElements(), getLayout(t.getElementType()));
case IncompleteArray:
return MemoryLayout.ofSequence(getLayout(t.getElementType()));
case Unexposed:
Type canonical = t.canonicalType();
if (canonical.equalType(t)) {
throw new TypeMaker.TypeException("Unknown type with same canonical type: " + t.spelling());
}
return getLayout(canonical);
case Typedef:
case Elaborated:
return getLayout(t.canonicalType());
case Pointer:
case BlockPointer:
return C_POINTER;
default:
throw new UnsupportedOperationException("unsupported: " + t.kind());
}
}
1 | C_LONG_LONG will be used for both long long and unsigned long long . |
2 | Native types longer than 64 bits are still represented internally by jextract. |
jextract identify unsupported types, and represents them correctly during the C header processing. But the symbols that use them will be skipped during the Java generation.
private static final String ATTR_LAYOUT_KIND = "jextract.abi.unsupported.layout.kind";
public static final ValueLayout __INT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "__int128");
public static final ValueLayout LONG_DOUBLE = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "long double");
public static final ValueLayout _FLOAT128 = MemoryLayout.ofValueBits(128, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "_float128");
public static final ValueLayout __FP16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "__fp16");
public static final ValueLayout CHAR16 = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "char16");
public static final ValueLayout WCHAR_T = MemoryLayout.ofValueBits(16, ByteOrder.nativeOrder()).
withAttribute(ATTR_LAYOUT_KIND, "wchar_t");
static boolean isUnsupported(MemoryLayout vl) { (1)
return vl.attribute(ATTR_LAYOUT_KIND).isPresent();
}
static String getUnsupportedTypeName(MemoryLayout vl) {
return (String)
vl.attribute(ATTR_LAYOUT_KIND).orElseThrow(IllegalArgumentException::new);
}
1 | Invoked during java representation generation. |
Wrapping up on jextract
In the end jextract
is useful but there’s a few little hiccups along the way.
The generated code is currently lacking in some usability. Also, the generated
code is a tad verbose, I would wish a way to eliminate some unneeded generated
methods. Using jextract
is a bit obscure as well, and they are a few pitfalls
there too, and may require peeking at the jdk.incubating.jextract
source code
(in the panama repository).
While I mention these point, this should not diminish the work done on this tool and what this tool could become. When ready, this could be leveraged by Gradle, or Jetbrains IntelliJ IDEA, etc.
Closing words
In JDK16 the foreign module is really easy to use albeit javac
and java
command line requirement. The API is well-designed and easy to use.
I also appreciated the idea of scoped segments, a bit like what was
implemented in the Rust language. There’s also the coolness of being able
to free memory segment at will, without depending on the GC.
This is the third incubator and there’s still planned API. Some of this
blog post content will eventually become incorrect when JDK17 comes out.
jextract
looks like a very practical tool, yet it is still
being cooked.
JEP-389 looks like solid replacement of JNI or JNA. I can only applaud the work done! My only regret is it’s not yet already available. That said as a developer I support the idea to not ship until ready.
You might also be interested in these two podcasts (thanks to David Delabassée)