Using Linux’s memfd_secret syscall from the JVM with JEP-419 JEP-454
UPDATED: 2026-04-02
| When this article was written, JEP-419 had been released as part of JDK 18, at that time the API was still incubating and subject to change. FFM has been finalized in JDK 22 as part of JEP-454, this article has been updated with the final public API. (Original Article) |
Linux 5.14 brought a new system call memfd_secret in order to mitigate
speculative attacks by preventing the kernel from being able to peek at memory
segments created by this system call.
In this article I will leverage the FFM API introduced by JEP-454 (Project Panama). This API has been delivered as part of JDK 22.
For those unfamiliar with it, Project Panama is a project that aims to provide an easy, secure, and efficient way to call native methods from Java. It is the fruit of work started in 2014, from ideas even older. You can look at my previous articles on the earlier versions of the project, articles on foojay.io by Carl Dea, or those on inside.java.
| The following examples are based on JDK 22 APIs. The Linux distribution in the original article in 2022 was Fedora release 35 with the kernel 5.16.20-200.fc35.x86_64. And later adapted to a Fedora 43 running on Orbstack on macbook M1, with a kernel 6.17.8. |
While this article will focus on Linux, the same concepts apply to other OSes
(and CPU as we’ll see). Also, this article is not about introducing FFM API,
this is done in other material, which means this blog assumes the right flags,
usually --enable-native-access=ALL-UNNAMED.
Now, to know more about the whole deal, read on.
What is a system call (syscall)?
Before jumping to memfd_secret, let’s first understand how to make a system call.
And even before that, let’s see what a system call is.
For those not interested in this part, you can jump to memfd_secret section.
To do something useful, a program has to interact with some resources, memory, disk, network, terminal, etc. On a computer, these resources are handled by very complex and critical software, the Operating System.
To use these resources, a program has to make system calls like
read, wait, write, exit, etc. The standard malloc, the native allocator,
has to actually place a request to the OS to get memory via a mmap syscall.
As expected the JVM does plenty of syscalls too, e.g. when logging something
on stdout or persisting a (unified) log file.
Essentially,
a system call is a way of requesting the kernel to do something for the program.
Why do system calls have to be in the kernel and not in the user space like in a standard library? As mentioned earlier, the reasoning is that system calls are a way to interact with, or involve, a resource like devices, file system, network, processes, etc. These resources are managed by privileged software: the OS or kernel.
When a system call happens, the program doesn’t simply invoke code that resides at some address; a system call is actually making the CPU switch to Kernel mode because the kernel is privileged software.
On most modern processors there is a security model that allows limiting the scope of what a program can do. In particular on Intel-based CPUs, the model is known as processor protection ring (or hierarchical protection domains).
|
It seems that Ring 1 and 2 are rarely used because paging (the way that the OS handles memory, see my blog post on [off-heap memory]) only has the concept of privileged and unprivileged which minimize the actual benefit of those rings, according to Evan Teran's answer on SO.' |
When a processor executes some code (in a thread), the processor knows the current mode, this way the processor is able to gate memory accesses, e.g. a Ring 3 (user-land program cannot access memory from Ring 0, the kernel). This is yet another feature of the virtual memory abstraction. The processor could also restrict some processor instructions and registers to the software running in Ring 0.
Out of scope: there are even negative rings on some CPU architectures for
hypervisor, or CPU System management, up to Ring -3.
The CPU enforces restrictions; in order to perform its purpose, a user-land program needs to place a request to the kernel. This mechanism is called syscall, it allows transitioning between rings.
During mode switches a lot is happening, saving and restoring registers, putting the CPU in specific mode (user vs kernel) etc. And of course doing the reverse once the request is handled either with success or a failure
Privilege context switches are sufficiently costly that most libraries
try to avoid those. For example, reading 8 KiB instead of 256 bytes is a good
idea as it drastically reduces the number of syscall and as such mode switches.
|
What does the documentation say about syscalls?
Now let’s get practical.
Looking at man 2 syscall,
the manpage shed some details on how to make the call, specifically in the
Architecture calling conventions section. Those details are in assembly, e.g.
-
processor interrupt
0x80for i386 processors (32 bits), then specific registers -
syscallinstruction for x86_64 processors (64 bits), then specific registers
The calling convention of other architectures is also described e.g.,
on ARM processors, the system call is performed by a swi 0x0 instruction,
on aarch64 by svc #0.
| For people not aware of what exactly is a calling convention should read at least this wikipedia article on x86 calling convention. But in a short a calling convention defines how and where parameters should be placed in order to call the code, how parameters are passed registers or/and stack, how values are returned, etc. |
This manual page also gives an important difference with regular functions, while
we look up system calls by their names: write, read, execve, exit, mmap,
memfd_create etc. The programs and the kernel actually know them by numbers.
Why numbers? The reason is that syscalls are like messages that are passed down, and these numbers are somewhat like enum ordinals indicating the type of message. These numbers are part of the syscall ABI (Application Binary Interface), and as such they are stable for a CPU architecture, although unbounded (new syscalls can be added).
|
Outside of this scope not all syscalls are made equal nowadays, some syscalls, usually the most used ones, are exported in the user space memory, to avoid the cost of switching to kernel mode. In practice, vDSO (Virtual Descriptor Shared Object) is like a library, it is loaded in memory so that it can be accessed from the program memory (glibc knows about this memory region and will use it). pmap -X {pid}
To read more about it, one should read the relevant manual page ( E.g |
The syscall numbers are different between architectures! On Linux
one can look at their definition in the /include/asm-/unistd-.h files.
|
From the syscall manpage, the syscall calling convention varies by architecture:
- Set the registers
-
-
rax← System Call number -
rdi← First argument -
rsi← Second argument -
rdx← Third argument
-
- Make the syscall
-
-
execute
syscallprocessor instruction
-
The actual syscall numbers (for 64-bit programs) are usually defined in /usr/include/asm/unistd_64.h
- Set the registers
-
-
eax← System Call Number -
ebx← First Argument -
ecx← Second Argument -
edx← Third Argument
-
- Make the syscall
-
-
Place a processor interrupt
int 0x80
-
The actual syscall numbers (for 32-bit programs) are usually defined in /usr/include/asm/unistd_32.h.
- Set the registers
-
-
x8← System Call number -
x0← First argument -
x1← Second argument -
x2← Third argument
-
- Make the syscall
-
-
execute
svc #0processor instruction (supervisor call)
-
The actual syscall numbers are usually defined in /usr/include/asm/unistd.h (Linux),
or $(xcrun --show-sdk-path)/usr/include/sys/syscall.h (macOS SDK).
The hardware only defines svc as the trap mechanism — which register holds the syscall number
is an ABI decision made by the OS kernel, not mandated by the ARM spec.
Linux AArch64 uses x8 (Linux ARM64 syscall ABI), while macOS AArch64 uses x16 (Darwin/XNU ABI).
The CPU executes svc #0 and traps into the kernel, which then reads whichever register it expects.
|
My first syscall
In order to quickly practice a syscall, let’s do a basic hello world. The example will be in assembler, I promise this is the only source snippet in assembly, and after that I’ll be back with Java and Panama.
-
/usr/include/asm/unistd_64.h
global _start ; define entrypoint
section .text
_start:
mov rax, 0x1 ; syscall number for write (1)
mov rdi, 0x1 ; int fd (2)
mov rsi, msg ; const void* buf
mov rdx, mlen ; size_t count
syscall ; make the call (3)
mov rax, 0x3c ; syscall number for exit (1)
mov rdi, 0x1 ; int status (2)
syscall ; make the call (3)
section .rodata
msg: db "Hello Linux syscalls!",0x0a, 0x0d ; message string, terminated by a new line (0A, 0D)
mlen: equ $-msg ; calculate the lenght of the message
| 1 | At this place this register will hold the selected syscall (a number).
Note the number comes from /usr/include/asm/unistd_64.h. |
| 2 | Syscall arguments are placed in the next registers. |
| 3 | Make the syscall with interrupt 0x80. |
nasm -w+all -f elf64 -o hello_syscall.o hello_syscall.asm (1)
ld -o hello_syscall hello_syscall.o
./hello_syscall
| 1 | Note the elf64 format for 64 bits. |
global _start ; define entrypoint
section .text
_start:
mov eax, 4 ; syscall number: write (1)
mov ebx, 1 ; stdout (2)
mov ecx, str ; buffer address
mov edx, str_len ; buffer length
int 0x80 ; make the call (3)
mov eax, 1 ; syscall number: exit (1)
mov ebx, 0 ; exit status (2)
int 0x80 ; make the call (3)
section .rodata
str: db "Hello Linux!", 0Ah ; message string, terminated by a new line (0A)
str_len: equ $ - str ; calculate the lenght of the message
| 1 | At this place this register will hold the selected syscall (a number).
Note the number comes from /usr/include/asm/unistd_64.h. |
| 2 | Syscall arguments are placed in the next registers. |
| 3 | Make the syscall with interrupt 0x80. |
nasm -w+all -f elf32 -o hello_syscall_via_int80.o hello_syscall_via_int80.asm (1)
ld -m elf_i386 -o hello_syscall_via_int80 hello_syscall_via_int80.o (2)
./hello_syscall_via_int80
| 1 | Note the elf32 format for 32 bits. |
| 2 | Note the linker emulation option for i386 |
The following is using GNU Assembler (GAS) syntax (.s extension), which
is the way to go for AArch64, unlike NASM which is x86-focused.
.global _start
.section .text
_start:
mov x8, #64 // syscall number for write (1)
mov x0, #1 // int fd: stdout (2)
adr x1, msg // const void* buf
mov x2, #mlen // size_t count
svc #0 // make the call (3)
mov x8, #93 // syscall number for exit (1)
mov x0, #0 // int status (2)
svc #0 // make the call (3)
.section .rodata
msg:
.ascii "Hello Linux syscalls!\n"
.equ mlen, . - msg
| 1 | At this place this register will hold the selected syscall (a number).
Note the number comes from $(xcrun --show-sdk-path)/usr/include/sys/syscall.h. |
| 2 | Syscall arguments are placed in the next registers. |
| 3 | Make the syscall with the svc #0 supervisor call instruction. |
as -o hello_syscall_arm64.o hello_syscall_arm64.s (1)
ld -o hello_syscall_arm64 hello_syscall_arm64.o
./hello_syscall_arm64
| 1 | as is available via binutils, when crosscompiling on x86-64, use aarch64-linux-gnu-as. |
On macOS the syscall convention differs from Linux AArch64:
the syscall number goes in x16 (not x8), BSD syscall numbers carry a 0x2000000 class prefix,
the SVC immediate is 0x80, and sections use Mach-O names.
.global _start
.section __TEXT,__text
_start:
movz x16, #4 // syscall: write (BSD #4) (1)
movk x16, #0x200, lsl #16 // + BSD class prefix 0x2000000
mov x0, #1 // int fd: stdout (2)
adrp x1, msg@PAGE // const void* buf (page) (3)
add x1, x1, msg@PAGEOFF // + page offset
mov x2, #mlen // size_t count
svc #0x80 // make the call (4)
movz x16, #1 // syscall: exit (BSD #1) (1)
movk x16, #0x200, lsl #16 // + BSD class prefix 0x2000000
mov x0, #0 // int status (2)
svc #0x80 // make the call (4)
.section __TEXT,__cstring,cstring_literals
msg:
.ascii "Hello macOS syscalls!\n"
.equ mlen, . - msg
| 1 | BSD syscall numbers are prefixed with 0x2000000 (the BSD class).
Refer to $(xcrun --show-sdk-path)/usr/include/sys/syscall.h. |
| 2 | Syscall arguments use the same registers as Linux (x0–x5). |
| 3 | Mach-O uses page-relative addressing (adrp + add) instead of the single adr used on Linux ELF. |
| 4 | macOS BSD syscalls use svc #0x80 (not svc #0). |
as -arch arm64 -o hello_syscall_macos.o hello_syscall_macos.s
ld -arch arm64 -o hello_syscall_macos hello_syscall_macos.o \
-lSystem -syslibroot $(xcrun --show-sdk-path) -e _start
./hello_syscall_macos
When looking at this very simplistic code, something immediately stands out: From the application point of view (user land), a syscall is just like an atomic pseudo machine instruction. I believe this example is more striking than the figure above on syscall ring transitions.
We saw what is exactly a syscall and how to make one using assembly. In general, though, it’s rare to invoke syscall directly as the standard library exposes wrappers that handle everything for most of the syscalls.
Because memfd_secret syscall has been recently used there are no wrapper functions
in the C standard library; hence we’ll need to make a system call ourselves.
Making syscalls from the JVM
The work of the Panama project doesn’t allow us to directly write assembly code and execute it. Fortunately!
And the libc already exposes a syscall function that takes care of
the calling convention as mentioned in
man 2 syscall, ie it
will place the arguments in the right CPU registers.
int main(int argc, char *argv[])
{
pid_t tid;
pid = syscall(SYS_getpid);
printf("pid: %ld\n", pid);
}
So, basically to make a syscall using FFM API, I only have to perform a lookup
for the syscall function, also since it’s part of the standard libc, this
just need Linker.nativeLinker().
/*
On linux (Intel x86_64) in
- /usr/include/asm/unistd_64.h
#define __NR_getpid 39
On macOs (Intel x86_64) in either :
- /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/syscall.h
- /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/syscall.h
#define SYS_getpid 20
*/
final static in SYS_getpid = 20; (1)
var linker = Linker.nativeLinker();
MethodHandle syscall = linker.downcallHandle(
linker.defaultLookup().find("syscall").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_INT, (2)
ValueLayout.JAVA_INT (3)
)
);
int pid = (int) syscall.invoke(SYS_getpid); (4)
System.out.println("pid: " + pid);
| 1 | The syscall number. |
| 2 | The return type of the syscall function. |
| 3 | The first argument is the syscall number. |
| 4 | Making the syscall. |
That’s it; we’ve made our first direct syscall using FFM-API.
Simple right? Let’s try to use that knowledge for memfd_secret syscall.
memfd_secret
The memfd_secret syscall was introduced in this commit.
Fortunately, Linux has good commit messages, so we can read and learn more about
how to create "secret" memory areas.
The following example demonstrates creation of a secret mapping (error handling is omitted):
fd = memfd_secret(0); ftruncate(fd, MAP_SIZE); ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
Basically, we need to create the secret file descriptor, truncate it to the desired size, and then memory map it.
-
First, get a file descriptor with
memfd_secretmemfd_secret syscall/* On linux (Intel x86_64) in /usr/include/asm/unistd_64.h #define __NR_memfd_secret 447 */ final static in SYS_memfd_secret = 447; (1) var linker = Linker.nativeLinker(); MethodHandle syscall = linker.downcallHandle( linker.defaultLookup().find("syscall").orElseThrow(), FunctionDescriptor.of( ValueLayout.JAVA_INT, (2) ValueLayout.JAVA_INT, (3) ValueLayout.JAVA_INT (4) ) ); int secret_fd = (int) syscall.invoke(SYS_memfd_secret, 0); (5)1 The memfd_secretnumber.2 The return type of the syscall function. 3 The first argument is the syscall number. 4 The flags passed to memfd_secret, currently the only supported flag isO_CLOEXECaccording to this LWN article by Jonathan Corbet.5 Making the syscall, not using any flags, the returned value is a file descriptor. We can proceed with the rest of the process.
-
Then sets the desired size
// int ftruncate(int fd, off_t length); var linker = Linker.nativeLinker(); MethodHandle ftruncate = linker.downcallHandle( linker.defaultLookup().find("ftruncate").orElseThrow(), FunctionDescriptor.of( ValueLayout.JAVA_INT, ValueLayout.JAVA_INT, // fd ValueLayout.JAVA_LONG // length ) ); var res = (int) ftruncate.invoke( (1) secret_fd, secret.length() );1 Invoke the ftruncatefrom the libc on the file descriptor with the wanted size. -
Finally, memory map this file descriptor, this operation has the effect to unmap this memory segment from the Kernel pages (in Ring 0), so only the user process can read these memory pages.
// in /usr/include/bits/mman-linux.h // #define PROT_READ 0x1 /* Page can be read. */ // #define PROT_WRITE 0x2 /* Page can be written. */ final int PROT_READ = 1; final int PROT_WRITE = 2; // #define MAP_SHARED 0x01 /* Share changes. */ final int MAP_SHARED = 1; // in /usr/include/sys/mman.h // extern void *mmap (void *__addr, size_t __len, int __prot, // int __flags, int __fd, __off_t __offset) __THROW; var linker = Linker.nativeLinker(); MethodHandle mmap = linker.downcallHandle( linker.defaultLookup().find("mmap").orElseThrow(), FunctionDescriptor.of( ValueLayout.ADDRESS, // return: mapped address ValueLayout.ADDRESS, // addr ValueLayout.JAVA_LONG, // size ValueLayout.JAVA_INT, // protection modes ValueLayout.JAVA_INT, // flags ValueLayout.JAVA_INT, // fd ValueLayout.JAVA_LONG // offset ) ); var segmentAddress = (MemorySegment) mmap.invoke( (1) MemorySegment.NULL, secret.length(), PROT_READ | PROT_WRITE, MAP_SHARED, secret_fd, 0 );1 Memory-map the file descriptor, using the same wanted size, and use the right protection modes (read & write), and flags. -
Once the memory segment is mapped, we can actually get access to it via the
MemorySegmentAPI.var roSecretSegment = segmentAddress.reinterpret(length, arena, null) (1) .copyFrom(MemorySegment.ofArray(secretBytes)) (2) .asReadOnly(); (3)1 Reinterpret the zero-length MemorySegmentreturned bymmapto the actual size, scoped to the currentArena.2 Since secretSegmentis actually aMemorySegmentoff heap, the source secret array has to be transformed first into an on-heapMemorySegmentbefore being copied to the target secret memory mapping.3 Eventually make the secret segment read-only. Later, to read the secret, extract the byte array from the memory segment.
var bytes = roSecretSegment.toArray(ValueLayout.JAVA_BYTE);
With this you have a complete working example of how to use the memfd_secret
from Java using Panama (JEP-454).
…or not!
Indeed, running this will make the JVM seg-fault!
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f561919ffd7, pid=4798, tid=4799
#
# JRE version: OpenJDK Runtime Environment 22.3 (18.0+37) (build 18+37)
# Java VM: OpenJDK 64-Bit Server VM 22.3 (18+37, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# v ~StubRoutines::jbyte_disjoint_arraycopy
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" (or dumping to /home/bob/opensource/core.4798)
#
# An error report file with more information is saved as:
# /home/bob/opensource/hs_err_pid4798.log
#
# If you would like to submit a bug report, please visit:
# https://bugzilla.redhat.com/enter_bug.cgi?product=Fedora&component=java-latest-openjdk&version=35
#
So, what did happen? The problematic frame isn’t helpful if you’re not familiar with JVM internals.
Opening hs_err_pid4798.log is more helpful.
...
Stack: [0x00007f734ae3d000,0x00007f734af3e000], sp=0x00007f734af3c430, free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
v ~StubRoutines::jbyte_disjoint_arraycopy
V [libjvm.so+0xe66d70] Unsafe_CopyMemory0+0xd0
j jdk.internal.misc.Unsafe.copyMemory0(Ljava/lang/Object;JLjava/lang/Object;JJ)V+0 [email protected]
j jdk.internal.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V+29 [email protected]
j jdk.internal.misc.ScopedMemoryAccess.copyMemoryInternal(Ljdk/internal/misc/ScopedMemoryAccess$Scope;Ljdk/internal/misc/ScopedMemoryAccess$Scope;Ljava/lang/Object;JLjava/lang/Object;JJ)V+32 [email protected]
j jdk.internal.misc.ScopedMemoryAccess.copyMemory(Ljdk/internal/misc/ScopedMemoryAccess$Scope;Ljdk/internal/misc/ScopedMemoryAccess$Scope;Ljava/lang/Object;JLjava/lang/Object;JJ)V+12 [email protected]
j jdk.incubator.foreign.MemorySegment.copy(Ljdk/incubator/foreign/MemorySegment;Ljdk/incubator/foreign/ValueLayout;JLjdk/incubator/foreign/MemorySegment;Ljdk/incubator/foreign/ValueLayout;JJ)V+202 [email protected]
j jdk.incubator.foreign.MemorySegment.copy(Ljdk/incubator/foreign/MemorySegment;JLjdk/incubator/foreign/MemorySegment;JJ)V+13 [email protected]
j jdk.incubator.foreign.MemorySegment.copyFrom(Ljdk/incubator/foreign/MemorySegment;)Ljdk/incubator/foreign/MemorySegment;+10 [email protected] (1)
j io.github.bric3.panama.f.syscalls.LinuxSyscall.memfd_secret_external()V+48
j io.github.bric3.panama.f.syscalls.LinuxSyscall.main([Ljava/lang/String;)V+99
v ~StubRoutines::call_stub
V [libjvm.so+0x81420a] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x30a
V [libjvm.so+0x8a2111] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*) [clone .isra.174] [clone .constprop.397]+0x351
V [libjvm.so+0x8a4a05] jni_CallStaticVoidMethod+0x145
C [libjli.so+0x47a9] JavaMain+0xd19
C [libjli.so+0x7d69] ThreadJavaMain+0x9
...
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0xffffffffffffffff (2)
...
| 1 | This happened while doing the MemorySegment::copyFrom call. |
| 2 | Moreover, the segmentation fault appears to have been caused by memory access
to non mapped memory address SEGV_MAPERR. The most common other reason for segfault
is SEGV_ACCERR, which is caused by accessing a memory address with wrong permissions. |
So what happened? Actually, the value of the file descriptor was -1. Which, of course,
is not a valid file descriptor. Also, the call to ftruncate seems to handle well
the case where the file descriptor is not valid.
The call to mmap the file descriptor, also returns -1, which is supposed to
be the memory segment address.
So why did this happen? When invoking native methods, syscalls in particular, one needs to be aware of the convention about error handling for these methods.
errno
Indeed, when developing in C/C++, when something returns -1, it usually means
that something went wrong, and that the result is invalid.
Moreover, the errno variable is a global variable that is set by the system
calls and some library functions, see the relevant
man 3 errno.
Because it is a global variable, its declaration depends on the system.
errno-
/usr/include/asm-generic/errno.h -
/usr/include/asm-generic/errno-base.h
extern int *__errno_location (void) __THROW __attribute_const__;
# define errno (*__errno_location ())
...
/*
* This error code is special: arch syscall entry code will return
* -ENOSYS if users try to call a syscall that doesn't exist. To keep
* failures of syscalls that really do exist distinguishable from
* failures due to attempts to use a nonexistent syscall, syscall
* implementations should refrain from returning -ENOSYS.
*/
#define ENOSYS 38 /* Invalid system call number */
...
errno-
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/sys/errno.h -
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/errno.h
extern int * __error(void);
#define errno (*__error())
...
#define ENOLCK 77 /* No locks available */
#define ENOSYS 78 /* Function not implemented */
...
So we’ll need to check the errors after each call in our case, as each of these calls is a system call underneath.
On Linux we can see that errno definition is actually a call to a function
that returns a pointer : *__errno_location (), which is OS-specific.
Before the finalization of the FFM API, one had to look up that function
manually and adapt per OS. The final API offers a portable alternative:
passing Linker.Option.captureCallState("errno") when creating the downcall handle.
errnovar captureStateLayout = Linker.Option.captureStateLayout(); (1)
long errnoOffset = captureStateLayout.byteOffset(
MemoryLayout.PathElement.groupElement("errno")); (2)
var linker = Linker.nativeLinker();
MethodHandle syscall = linker.downcallHandle(
linker.defaultLookup().find("syscall").orElseThrow(),
FunctionDescriptor.of(ValueLayout.JAVA_INT, ValueLayout.JAVA_INT),
Linker.Option.captureCallState("errno") (3)
);
var captureState = arena.allocate(captureStateLayout); (4)
int result = (int) syscall.invoke(captureState, SYS_getpid); (5)
int errno = captureState.get(ValueLayout.JAVA_INT, errnoOffset); (6)
| 1 | The layout of the capture state struct (contains an errno field). |
| 2 | Pre-compute the byte offset of the errno field once. |
| 3 | Request that the linker capture errno after the call. |
| 4 | Allocate the capture state segment scoped to the current arena. |
| 5 | The capture state segment is always passed as the first argument. |
| 6 | Read the captured errno value directly via the pre-computed offset. |
On Linux the package more-utils has a tool called errno that can be used
to list all the error codes errno -l.
Additionally, there is a function strerror that returns a string from an error
code. Alternatively, one can also use strerror_r function.
var linker = Linker.nativeLinker();
MethodHandle strerror = linker.downcallHandle(
linker.defaultLookup().find("strerror").orElseThrow(),
FunctionDescriptor.of(ValueLayout.ADDRESS, ValueLayout.JAVA_INT)
);
String errmsg = ((MemorySegment) strerror.invoke(errno)) (1)
.reinterpret(Long.MAX_VALUE) (2)
.getString(0); (3)
| 1 | Pass the errno code |
| 2 | Materialize the address as a segment, the string is null-terminated, so we need
to reinterpret the segment with a large size. Don’t go over the null character, otherwise…
It might be possible to have a better value here like POSIX’s LINE_MAX. |
| 3 | Read the string from the segment, offset is 0. |
So, placing this check after the memfd_secret syscall, looked like a good bet.
Eventually doing something similar after each call is a good idea as well; it
kinda looks like the Go lang way of checking errors.
var captureState = arena.allocate(captureStateLayout); (1)
fd = (int) sys_memfd_secret.invoke(captureState, 0); (2)
if (fd == -1) {
int errno = captureState.get(ValueLayout.JAVA_INT, errnoOffset); (3)
System.err.println(errno == ENOSYS ?
"tried to call a syscall that doesn't exist (errno=ENOSYS), may need to set the 'secretmem.enable=1' kernel boot option" :
"syscall memfd_secret failed, errno: " + errno + ", " + strerror(errno));
return Optional.empty();
}
| 1 | Allocate the capture state segment (reusing captureStateLayout from above). |
| 2 | Pass it as the first argument; sys_memfd_secret must have been created with Linker.Option.captureCallState("errno"). |
| 3 | Read the captured errno directly via the pre-computed errnoOffset. |
While reviewing the memfd_secret commit, we can see there’s a check that
returns ENOSYS when a
condition is not met.
So to make the whole thing work, we need to make sure the OS has the memfd_secret
syscall available.
Linux availability
After Linux 6.5 (included)
When I wrote this article, the Linux Kernel was in the 5.x line, and tools were not using the latest 5.14 kernel.
Fortunately, we’re now in 2026, and I discovered that starting from Linux 6.5,
memfd_secret is enabled by default,
albeit only if the Kernel is built with the CONFIG_SECRETMEM
config option.
The good news is that, as of today, most of the distributions have it. And the better news is that tools like Docker Desktop, or Orbstack are running a later version of the Linux Kernel.
Hibernation is inhibited for as long as any memfd_secret() descriptions exist,
to prevent secrets from leaking into the hibernation image, which likely makes them inappropriate
for laptops and other mobile devices.
|
Validating the presence of the syscall
In fact this can be quickly validated rather than running our full program MemfdSecretDemo;
instead we can validate the presence of the syscall by using a quick jshell snippet
(assuming it’s a Java 22+ VM).
jshell snippetjshell -q -s <<EOF
import java.lang.foreign.*;
import java.lang.invoke.*;
import static java.lang.foreign.ValueLayout.*;
var linker = Linker.nativeLinker();
var libc = linker.defaultLookup();
long SYS_memfd_secret = switch (System.getProperty("os.arch")) {
case "amd64", "x86_64", "aarch64", "arm64" -> 447L;
default -> throw new IllegalStateException("Unsupported arch: " + System.getProperty("os.arch"));
};
var syscall = linker.downcallHandle(
libc.find("syscall").orElseThrow(),
FunctionDescriptor.of(JAVA_LONG, JAVA_LONG, JAVA_LONG)
);
long fd = (long) syscall.invokeExact(SYS_memfd_secret, 0L); (1)
System.out.println("fd=" + fd);
/exit
EOF
| 1 | If the syscall works, we’ll have a valid file descriptor (positive number),
otherwise, we’ll get a -1 value. |
The bootloader flag from Linux 5.14 to 6.4
Linux is gating the memfd_secret syscall by a flag named
secretmem_enable. That may be why memfd_secret is not listed when looking at
man 2 syscalls.
It’s not quite clear from the commit
that introduced memfd_secret but in order to work, the machine boot has to
be configured with the flag secretmem.enable=1.
| DISCLAIMER: I am not responsible if something happens wrong on your machines / OS. The following actually changes the Linux bootloader configuration, and as such, any misconfiguration could make this system non-bootable! Please read and understand the documentation of your system before proceeding. |
| Enabling this prevents hibernation whenever there are active secret memory users. |
My test machine is a Fedora 35, let’s read their page on the GRUB2 bootloader.
From this page, it seems there’s a fairly simple way to change the bootloader configuration.
secretmem.enable=1 flagsudo grubby --update-kernel=ALL --args="secretmem.enable=1"
sudo grubby --info=ALL
secretmem.enable=1 flagsudo grubby --update-kernel=ALL --remove-args="secretmem.enable=1"
Notice the actual flag name is secretmem.enable, not secretmem_enable !
Then reboot the OS. Now if the configuration was properly applied,
memfd_secret should return a valid file descriptor.
The result
Assuming the host OS has the syscall, the program should run just fine:
$ java --enable-native-access=ALL-UNNAMED MemfdSecret.java
Secret mem fd: 4 (1)
Secret: super secret decryption key
| 1 | memfd_secret here returned the file descriptor 4 |
Typically, this secret storage could be used to store a decryption key during startup, and it’ll be used to decrypt encrypted payload. Of course, care must be taken to prevent this data from leaving this memory. Which might not be possible under many circumstances. E.g., a library that takes a Java String, in which case the secret buffer is copied in elsewhere in the heap.
Improvements
Trying to replace most panama calls by JDK types
So apart from the memfd_secret syscall, do the other calls look to be
replaceable?
FileChannel.map(MapMode, long, long, Arena) — added in JDK 22 as part of JEP-454 —
looks like a good bet to replace mmap.
FileChannel::map usagetry (var channel = FileChannel.open(path, StandardOpenOption.READ, StandardOpenOption.WRITE)) {
MemorySegment segment = channel.map(FileChannel.MapMode.READ_WRITE, 0, size, arena);
}
However, this API requires a Path and a FileChannel, and the mapping is limited
to a single MapMode:
-
A file descriptor value is a number
4, on Linux accessible via procfs under/dev/fd/4or/proc/self/fd/4, but even these as aPathdon’t work. -
And, we could not map this segment as both write then read via this API; performing this operation twice, one in write-only mode and one in read-only mode, would not work, as this special file descriptor would be closed after the first memory mapping.
There’s some interesting bits in FileOutputStream / FileInputStream as they
can be created from a JDK’s FileDescriptor, they allow to get the underlying
FileChannel, which then allow to call map() to get a memory mapping. However,
FileDescriptor class does not have a public constructor, and even being able to
hack FileDescriptor (with`--add-opens=java.base/java.io=ALL-UNNAMED`) is not
enough as we get in the same situation as above because it’s only possible to
have a mapping in read-only or write-only.
Basically, we’re stuck with using the mmap native function to do what’s
necessary. I don’t know if it is out of scope for the JEP-419, or the next
JEP-424, but I think this would be a good thing to support MemorySegment of
arbitrary file descriptor, in particular when writing programs that run on the
command line, this could enable things like
java Main.java <(cat neko | grep meow).
Finally, I don’t believe there’s something equivalent available in JDK for the
ftruncate function.
Improving our syscall API.
In the snippet above, we’ve declared a MethodHandle to the syscall function,
if there’s multiple syscalls, we’ll need to pass the syscall number as the
first argument each time. MethodHandles API allows making partial function.
var linker = Linker.nativeLinker();
var syscallAddress = linker.defaultLookup().find("syscall").orElseThrow();
var syscall = linker.downcallHandle(
syscallAddress,
FunctionDescriptor.of(
ValueLayout.JAVA_INT,
ValueLayout.JAVA_INT (1)
)
);
var sys_getpid = MethodHandles.insertArguments(syscall, 0, SYS_getpid); (2)
sys_getpid.invoke(); (3)
| 1 | The first argument is the syscall number. |
| 2 | Capture the syscall number and creates a "partial function", here named sys_getpid. |
| 3 | Just invoke the new MethodHandle, the partial function doesn’t need anymore the syscall
number arg (at position 0). |
Now, if the syscall has different arity, FunctionDescriptor::appendArgumentLayouts
has us covered, so that we can use the basic template of a syscall, sort of,
and build on top of this to have specific identifiers for each syscall.
var linker = Linker.nativeLinker();
var syscallAddress = linker.defaultLookup().findOrThrow("syscall");
var basicSyscallDescriptor = FunctionDescriptor.of(
ValueLayout.JAVA_INT,
ValueLayout.JAVA_INT
);
var sys_memfd_secret = MethodHandles.insertArguments(
linker.downcallHandle(
syscallAddress,
basicSyscallDescriptor.appendArgumentLayouts(ValueLayout.JAVA_INT) (1)
),
0,
SYS_memfd_secret (2)
);
int fd = (int) sys_memfd_secret.invoke(0); (3)
| 1 | Append arguments to the function descriptor, this returns a new FunctionDescriptor. |
| 2 | Capture the syscall number and creates a "partial function". |
| 3 | Invoke the call passing only required arguments on the call site. |
Other things are possible with MethodHandles that can be handy with Panama,
yet out of scope for this blog post. Check the API.
Generating the MethodHandles with jextract
The JDK Panama team also created a tool known as jextract whose job is to
lift most of the work required to generate the MethodHandles.
As I mentioned in other blog posts and conference talks, jextract is now
a separate tool. Unlike when I first wrote this article, there are now
early-access binaries for it, so building it
from source is optional. If you want the latest bits or need to match a
specific JDK line, the jextract project
page explains how to build it. My test machine is a Fedora, so adapt the
command and the JDK distribution to your needs.
|
As of March 26, 2026, jdk.java.net/jextract
lists the early-access build |
jextractsudo dnf install java-21-openjdk-devel (1)
sudo dnf install java-25-openjdk-devel java-25-openjdk-jmods (2)
curl -LO https://github.com/llvm/llvm-project/releases/download/llvmorg-22.1.2/LLVM-22.1.2-Linux-ARM64.tar.xz (3)
tar xf LLVM-22.1.2-Linux-ARM64.tar.xz --directory /clang/extract/path/
git clone --depth 1 https://github.com/openjdk/jextract.git
cd jextract
sh ./gradlew \
-Pjdk_home=/path/to/jdk25 \
-Pllvm_home=/clang/extract/path/LLVM-22.1.2-Linux-ARM64/ \
clean verify (4)
| 1 | Install a JDK to run Gradle; at the time of writing, this project uses Gradle 8.11.1, which is incompatible with JDK 25. |
| 2 | Install a recent JDK with jmods; on some distributions jmods are
packaged separately. |
| 3 | One possible LLVM distribution, here for Linux arm64; any compatible libclang
installation works. |
| 4 | Finally, run the documented build command with the required JDK and LLVM home directories. |
If everything went alright, you can now use jextract. The output below is the one I
got from commit ad6430f83085c87f0f226a47c46c593d87d26376
jextract Version$ ./build/jextract/bin/jextract --version
jextract 23
JDK version 25.0.2+10-LTS
LibClang version clang version 22.1.2 (https://github.com/llvm/llvm-project 1ab49a973e210e97d61e5db6557180dcb92c3e98)
The basic usage is jextract <options> <header file>. Since there are
multiple headers, the trick is to specify a handcrafted header that includes
every required one.
memfd_secret_header.h#include <unistd.h>
#include <sys/mman.h>
#include <sys/syscall.h>
|
When I first wrote this article, I had to specify the |
|
This would be really neat if jextract with here-doc, for multiple headers
Of course nowadays, |
Also, note that jextract supports passing the options as option file.
So we can pass output options, like the target package and class name,
but also the symbols we’d like.
memfd_secret_header.jextract.options--source (1)
--output build/generated/sources/jextract-syscall/java (2)
--target-package linux (3)
--header-class-name syscall_h (4)
--include-function syscall
--include-constant SYS_memfd_secret
--include-function close
--include-function ftruncate
--include-function mmap
--include-function munmap
--include-constant PROT_READ
--include-constant PROT_WRITE
--include-constant MAP_SHARED
--include-function strerror
| 1 | Tells jextract to generate the source code, instead of classes. |
| 2 | Specifies the output directory. |
| 3 | Specifies the package name. |
| 4 | Specifies the class name. |
jextract$ jextract @memfd_secret_header.jextract.options memfd_secret_header.h (1)
| 1 | Assuming jextract is in the PATH, or there’s an alias. |
So once done, we’ll have a file with all the symbols we need.
// ...
public class syscall_h extends syscall_h$shared {
syscall_h() {
// Should not be called directly
}
static final Arena LIBRARY_ARENA = Arena.ofAuto();
static final SymbolLookup SYMBOL_LOOKUP = SymbolLookup.loaderLookup()
.or(Linker.nativeLinker().defaultLookup());
private static final int PROT_READ = (int)1L;
/**
* {@snippet lang=c :
* #define PROT_READ 1
* }
*/
public static int PROT_READ() {
return PROT_READ;
}
private static final int PROT_WRITE = (int)2L;
/**
* {@snippet lang=c :
* #define PROT_WRITE 2
* }
*/
public static int PROT_WRITE() {
return PROT_WRITE;
}
private static final int MAP_SHARED = (int)1L;
/**
* {@snippet lang=c :
* #define MAP_SHARED 1
* }
*/
public static int MAP_SHARED() {
return MAP_SHARED;
}
// ...
/**
* {@snippet lang=c :
* extern int close(int __fd)
* }
*/
public static int close(int __fd) {
// ...
}
//...
/**
* {@snippet lang=c :
* extern int ftruncate(int __fd, __off_t __length)
* }
*/
public static int ftruncate(int __fd, long __length) {
// ...
}
/**
* Variadic invoker class for:
* {@snippet lang=c :
* extern long syscall(long __sysno, ...)
* }
*/
public static class syscall {
private static final FunctionDescriptor BASE_DESC = FunctionDescriptor.of(
syscall_h.C_LONG,
syscall_h.C_LONG
);
private static final MemorySegment ADDR = SYMBOL_LOOKUP.findOrThrow("syscall");
private final MethodHandle handle;
private final FunctionDescriptor descriptor;
private final MethodHandle spreader;
private syscall(MethodHandle handle, FunctionDescriptor descriptor, MethodHandle spreader) {
this.handle = handle;
this.descriptor = descriptor;
this.spreader = spreader;
}
/**
* Variadic invoker factory for:
* {@snippet lang=c :
* extern long syscall(long __sysno, ...)
* }
*/
public static syscall makeInvoker(MemoryLayout... layouts) {
FunctionDescriptor desc$ = BASE_DESC.appendArgumentLayouts(layouts);
Linker.Option fva$ = Linker.Option.firstVariadicArg(BASE_DESC.argumentLayouts().size());
var mh$ = Linker.nativeLinker().downcallHandle(ADDR, desc$, fva$);
var spreader$ = mh$.asSpreader(Object[].class, layouts.length);
return new syscall(mh$, desc$, spreader$);
}
/**
* {@return the address}
*/
public static MemorySegment address() {
return ADDR;
}
/**
* {@return the specialized method handle}
*/
public MethodHandle handle() {
return handle;
}
/**
* {@return the specialized descriptor}
*/
public FunctionDescriptor descriptor() {
return descriptor;
}
public long apply(long __sysno, Object... x1) {
try {
if (TRACE_DOWNCALLS) {
traceDowncall("syscall", __sysno, x1);
}
return (long) spreader.invokeExact(__sysno, x1);
} catch(IllegalArgumentException | ClassCastException ex$) {
throw ex$; // rethrow IAE from passing wrong number/type of args
} catch (Throwable ex$) {
throw new AssertionError("should not reach here", ex$);
}
}
}
// ...
/**
* {@snippet lang=c :
* extern void *mmap(void *__addr, size_t __len, int __prot, int __flags, int __fd, __off_t __offset)
* }
*/
public static MemorySegment mmap(MemorySegment __addr, long __len, int __prot, int __flags, int __fd, long __offset) {
var mh$ = mmap.HANDLE;
try {
if (TRACE_DOWNCALLS) {
traceDowncall("mmap", __addr, __len, __prot, __flags, __fd, __offset);
}
return (MemorySegment)mh$.invokeExact(__addr, __len, __prot, __flags, __fd, __offset);
} catch (Error | RuntimeException ex) {
throw ex;
} catch (Throwable ex$) {
throw new AssertionError("should not reach here", ex$);
}
}
// ...
/**
* {@snippet lang=c :
* extern int munmap(void *__addr, size_t __len)
* }
*/
public static int munmap(MemorySegment __addr, long __len) {
// ...
}
private static final int SYS_memfd_secret = (int)447L;
/**
* {@snippet lang=c :
* #define SYS_memfd_secret 447
* }
*/
public static int SYS_memfd_secret() {
return SYS_memfd_secret;
}
}
What’s nice is that the arguments are named, eg. sysno, addr, __fd, etc.
Once you have made your research on which symbols you need, it’s really nice to
let jextract generate the code for you, which is likely to be up-to date, with
the best practice backed in.
There’s one thing where this a bit suboptimal, the errno handling is not supported
at this time by jextract, so it requires to manually patch the generated code.
When I wrote this article, the FFM API didn’t have the linker option to capture the
errno value, so I had to use OS specific tricks to get it, on Linux, I could use
the __errno_location function.
That’s not the only unsupported thing, so watch out for those.
That being said, it’s not a deal-breaker, just something to be aware of.
Closing words
memfd_secretOriginally, I heard about this feature coming in Linux 5.14, and I was hoping to test it after the Spectre style attacks, at least from a developer perspective. The first thing is that you’ll need a Linux with that version. Back then it was a bit tedious to make it work since Docker Desktop ran a Linux 5.10 kernel, on my personal machine it required me to set a kernel flag to enable the syscall. Fortunately, those days are over, and most of the distributions are built with that syscall, that said, better check the OS you’re running. On your regular laptop, the fact that having live descriptors disables hibernation is almost a deal-breaker for this kind of hardware. Also, personally, if an application is not having a very tight control at how secrets are actually used after reading them from this memory, I fail to see the value of such a feature.
Yet again project Panama embodied by multiple JEPs and released in JDK 22 delivers!
It’s possible to interact with the operating system and do so with some ease,
without having to deal with different build systems. I have almost nothing relevant
to mention here.
I missed the possibility of creating a MemorySegment from a file descriptor,
but this might be a rare case, especially on the topic at hand.
It’s worth mentioning that the mandatory use of --enable-native-access=ALL-UNNAMED
is unpractical, but given the JVM direction to integrity by default,
which affects other aspects — including regular JNI: JEP-472,
JEP-471, JEP-451, so I understand
the purpose of this flag …and anyway there’s just no way around it.
If this flag is not specified, users will get a warning for the first call to
a restricted method (one warning per module).
Yet again, I’m happy to see this project Panama landing in the JDK to bridge the gap to the native world without a third party library. I wished it was on an LTS as folks tend to prefer those, but now that JDK 25 also landed, that should be good enough for many projects.
-
Athijegannathan Sundararajan and the Panama team