Announcing iceoryx2 v0.6 | True Zero-Copy Inter-Prozess Communication

elBoberido@programming.dev · edit-2 2 months ago

Announcing iceoryx2 v0.6 | True Zero-Copy Inter-Prozess Communication

solrize@lemmy.ml · 2 months ago

if it’s for critical systems, why not Ada?
it would be nice to have some more technical description of exactly how it works, both for the shared memory operations and what happens with complicated data structures
I presume there are the usual locks/futexes but it could be way cool to also have software transational memory (STM) if that’s not already there
at the end of the day if you really want super low latency, you probably have to run on a realtime CPU intended for such stuff, rather than a general purpose CPU under a full blown OS. Otherwise, cacne misses can be almost as bad as page faults used to be.

elBoberido@programming.dev · edit-2 2 months ago

Hi solrize,

Those are some great questions. Let me answer them one by one.

if it’s for critical systems, why not Ada?

Our background is in automotive and iceoryx classic is written in C++. It was meant to be certified to ISO 26262 ASIL-D for autonomous driving. Due to some early design decisions, we hit a dead end and a major refactoring was necessary. At that time, we were already convinced that C++ is not a good fit for safety-critical software. Instead of a major refactoring in C++, which would basically be a rewrite, we opted for Rust. Rust was chosen because we still want iceoryx2 to become the backbone of communication in cars, and automotive OEMs are moving to Rust, not Ada. While Ada is widely used in aviation, in the automotive sector, we are competing with C and C++. If we had chosen Ada, we would have had to fight an uphill battle to convince OEMs to use Ada, which we would not have been able to win as a bunch of OSS developers.

it would be nice to have some more technical description of exactly how it works, both for the shared memory operations and what happens with complicated data structures

Yeah, we need to create more documentation. It will be part of the certification process. We are currently a small team and there is a ton of work to do. On top of that, we created our own company to follow our vision with open-source software and the German bureaucracy is a nightmare, almost as if Germany does not want new companies to succeed. But I digress.

We are basically using shared memory with lock-free algorithms for data transport. For the event mechanism, one can choose between semaphores and Unix domain sockets. We currently have three messaging patterns: publish-subscribe, request-response, and event. The former two will not block by default and also do not involve syscalls; everything happens with shared memory and lock-free algorithms without the OS. The last one, the event messaging pattern, can be used in combination with the first two to have blocking waits for data. That’s also the only blocking call in the iceoryx API (well, the publisher can be configured to block if the subscriber is too slow to process the data, but that is not turned on by default and also not recommended for safety-critical systems). With the lessons learned from iceoryx classic, iceoryx2 does not use a central broker or any any background threads. We tried to be as unopinionated as possible and create building blocks to be able to integrate iceoryx2 as a transport layer into other frameworks.

For complex data structures, we provide some fixed-size containers that fulfill the requirement of being placed in shared memory, which basically means that they are self-contained and relocatable. If that’s not the case, then a serialization step is required, or frameworks like FlatBuffers can be used. We plan to have containers that are ABI compatible between Rust and C++, but there is still quite a bit of work to do.

I presume there are the usual locks/futexes but it could be way cool to also have software transational memory (STM) if that’s not already there

We use neither of those. iceoryx uses lock-free queues to share data between processes. Locks cannot be used since they can create deadlocks, especially if the application holding the lock crashes. Besides the lock-free queues, iceoryx2 also tracks the data that is flying around and can reclaim the resources of crashed applications. In the end, we want to make shared memory communication even simpler than using Unix domain sockets by abstracting all the complexity away behind a nice-to-use API.

at the end of the day if you really want super low latency, you probably have to run on a realtime CPU intended for such stuff, rather than a general purpose CPU under a full blown OS. Otherwise, cacne misses can be almost as bad as page faults used to be.

Agreed, for safety-critical systems, iceoryx2 needs to run on real-time OSes and CPUs. While we have a latency of around 100ns in polling mode on Linux on modern CPUs on average, every now and then the OS kicks in and schedules another process while we are publishing data. This leads to latency spikes, which are unavoidable on non-real-time OSes and CPUs.

I’m currently working on a port to VxWorks and other real-time OSes will follow, together with ports to ARM R-cores. But, as mentioned above, we are a small team and not VC-funded, therefore we need customers willing to fund our open-source work. Our current modus operandi is to find a customer who is almost satisfied with the present feature set and is willing to pay for one or two additional features. Once that is done, we repeat the search with the new feature set. It’s cumbersome, but it currently looks like it might work out well. We just need to improve our sales skills; after all, all of us are software developers. So, if you know someone who is in dire need of a framework like iceoryx2, don’t hesitate to recommend us :)