Using kqueue for simple async io - ios

How does one actually use kqueue() for doing simple async r/w's?
It's inception seems to be as a replacement for epoll(), and select(), and thus the problem it is trying to solve is scaling to listening on large number of file descriptors for changes.
However, if I want to do something like: read data from descriptor X, let me know when the data is ready - how does the API support that? Unless there is a complimentary API for kicking-off non-blocking r/w requests, I don't see a way other than managing a thread pool myself, which defeats the purpose.
Is this simply the wrong tool for the job? Stick with aio?
Aside: I'm not savvy with how modern BSD-based OS internals work - but is kqueue() built on aio or visa-versa? I would imagine it would depend on whether the OS io subsystem system is fundamentally interrupt-driven or polling.

None of the APIs you mention, aside from aio itself, has anything to do with asynchronous IO, as such.
None of select(), poll(), epoll(), or kqueue() are helpful for reading from file systems (or "vnodes"). File descriptors for file system items are always "ready", even if the file system is network-mounted and there is network latency such that a read would actually block for a significant time. Your only choice there to avoid blocking is aio or, on a platform with GCD, dispatch IO.
The use of kqueue() and the like is for other kinds of file descriptors such as sockets, pipes, etc. where the kernel maintains buffers and there's some "event" (like the arrival of a packet or a write to a pipe) that changes when data is available. Of course, kqueue() can also monitor a variety of other input sources, like Mach ports, processes, etc.
(You can use kqueue() for reads of vnodes, but then it only tells you when the file position is not at the end of the file. So, you might use it to be informed when a file has been extended or truncated. It doesn't mean that a read would not block.)
I don't think either kqueue() or aio is built on the other. Why would you think they were?

I used kqueues to adapt a Linux proxy server (based on epoll) to BSD. I set up separate GCD async queues, each using a kqueue to listen on a set of sockets. GCD manages the threads for you.

Related

When to write a Custom Kernel Module

Problem Statement:
I have a very high bandwidth data link that is UDP based. The source of this data is not configurable, and sends on UDP a stream of datagrams. We have code that uses the standard methods for receiving data on the UDP socket that works adequately. I wanted to know if
Does there exist a command interface to extract multiple UDP datagrams at a time? to improve efficiency?
If one doesn't exist, does it make sense to create a kernel module to provide the capability?
I am a novice, and i wanted to understand what thought process has to happen when writing your own kernel module seems appropriate. I know that such a surgical procedure isn't meant to done lightly, but there must be a set of criteria where that action is prudent. Maybe not in my case, but in general.
HW / Kernel Module Perspective
A typical network adapter these days would be capable of distributing received packets across multiple hardware Rx queues thus letting the host run multiple software Rx queues bound to different CPU cores reading out packets in parallel. From a single HW/SW queue perspective, the host may poll it for new packets (see Linux NAPI), with each poll ideally yielding a batch of packets, and, alternatively, the host may still use interrupt-driven approach for Rx signalling with interrupt coalescing turned on for improved efficiency.
Existing NIC drivers in Linux kernel strive to stick with the most performant techniques, and the kernel itself should be able to leverage all of that properly.
Userland / Application Perspective
There's PACKET_MMAP interface provided by Linux kernel for improved Rx/Tx efficiency on the application side. Long story short, an application can set up a memory buffer shared between the kernel- and userspace and read out incoming packets from it, ideally in batches, or blocks, thus avoiding costly kernel-to-userspace copies and context switches so customary when using regular methods.
For added efficiency, the application may have multiple sockets bound to the NIC in separate threads / processes and demand that packet reception be load balanced across these sockets (see AF_PACKET fanout mode description).
DPDK Perspective
Kernel bypass framework that allows an application to seize full control of a network adapter by means of a vendor-specific poll-mode driver, or PMD, effectively running in userspace as part of the application and by its very nature not needing any kernel-to-userspace copies, context switches and, most likely, locking. Multi-queue receive operation, load balancing (round robin, RSS, you name it) and more cutting edge offloads are likely to be available, too (it's vendor specific).
Summary
The short of it, given the fact that multiple network acceleration techniques already exist, one need never write their own kernel module to solve the problem in question. By the looks of it, your application, which, as you say, uses standard methods, is not aware of PACKET_MMAP technique. So I'd be tempted to suggest looking at this one closely. DPDK approach might require that the application be effectively re-implemented from scratch, so I would first go for PACKET_MMAP approach as a low-hanging fruit.

NIF to wrap my multi-threaded C++ code

I have a C++ code that implement a special protocol over the serial port. The code is multi-threaded and internally polls the serial port and do its own cyclic processing. I would like to call this driver from erlang and also receive events from this driver. My concern is that this C++ code is multi-threaded and also statefull meaning that when I call a certain function on the driver, it caches things internally which will be used/required on the subsequent calls of the driver. My questions are
1.Does NIF run in the same os process as the rest of my erlang proceses or NIF is launched in a separate os process?
2.Does it make sense to warp this multi-threaded stateful C++ code with NIF?
4.If NIF is not the right approach, what is the better way for me to make Elrang talk back and forth with this C++ code. I also prefer my C++ code to be inside the same OS process as the rest of my Erlang processes and as it looks like linked-in drivers are an option but not sure if the multi-threaded nature of my C++ code will be ok to that model. Plus I hear they can mess up elrang scheduler?
Unlike ports, NIFs are run within Erlang VM process, similar to drivers. Because of that, any NIF crashes will bring VM down as well. And, answering in advance, to your last question, NIFs, like drivers, may block your scheduler.
That depends on the functionality you are implementing by this C++ code. Due to the answer 1), you probably want to avoid concurrency in the C++ part, since it's a potential source of errors. It's not always possible, of course. But if you are implementing, say, some workers pool, go ahead and implement 1-threaded code, spawning it as many times as you need.
Drivers can be multi-threaded too, with same potential problems and quite similar performance (well, still slightly faster than NIFs). If you are not completely sure about your C++ code stability, use it as an Erlang port.
Speaking of the difference between NIFs and drivers, the former is synchronous natively, and the latter can be asynchronous (which can be really a huge advantage if you don't want to receive any answers for most of the commands). Drivers are easier to mess up and harder to implement (but once you grasp the main patterns and problems, they seem okay, actually).
Here's a good start for drivers:
http://www.erlang.org/doc/apps/erts/driver.html
And something similar (behold the difference in complexity) for NIFs:
http://www.erlang.org/doc/tutorial/nif.html

Sharing data system wide

Good evening.
I'm looking for a method to share data from my application system-wide, so that other applications could read that data and then do whatever they want with it (e.g. format it for display, use it for logging, etc). The data needs to be updated dynamically in the method itself.
WMI came to mind first, but then you've got the issue of applications pausing while reading from WMI. Additionally, i've no real idea how to setup my own namespace or classes if that's even possible in Delphi.
Using files is another idea, but that could get disk heavy, and it's a real awful method to use for realtime data.
Using a driver would probably be the best option, but that's a little too intrusive on the users end for my liking, and i've no idea on where to even start with it.
WM_COPYDATA would be great, but i'm not sure if that's dynamic enough, and whether it'll be heavy on resources or not.
Using TCP/IP would be the best choice for over the network, but obviously is of little use when run on a single system with no networking requirement.
As you can see, i'm struggling to figure out where to go with this. I don't want to go into one method only to find that it's not gonna work out in the end. Essentially, something like a service, or background process, to record data and then allow other applications to read that data. I'm just unsure on methods. I'd prefer to NOT need elevation/UAC to do this, but if needs be, i'll settle for it.
I'm running in Delphi 2010 for this exercise.
Any ideas?
You want to create some Client-Server architecture, which is also called IPC.
Using WM_COPYDATA is a very good idea. I found out it is very fast, lightweight, and efficient on a local machine. And it can be broadcasted over the system, to all applications at once (to be used with care if some application does not handle it correctly).
You can also share some memory, using memory mapped files. This is may be the fastest IPC option around for huge amount of data, but synchronization is a bit complex (if you want to share more than one buffer at once).
Named pipes are a good candidates for local. They tend to be difficult to implement/configure over a network, due to security issues on modern Windows versions (and are using TCP/IP for network communication - so you should better use directly TCP/IP instead).
My personal advice is that you shall implement your data sharing with abstract classes, able to provide several implementations. You may use WM_COPYDATA first, then switch to named pipes, TCP/IP or HTTP in order to spread your application over a network.
For our Open Source Client-Server ORM, we implemented several protocols, including WM_COPY_DATA, named pipe, HTTP, or direct in-process access. You can take a look at the source code provided for implementation patterns. Here are some benchmarks, to give you data from real implementations:
Client server access:
- Http client keep alive: 3001 assertions passed
first in 7.87ms, done in 153.37ms i.e. 6520/s, average 153us
- Http client multi connect: 3001 assertions passed
first in 151us, done in 305.98ms i.e. 3268/s, average 305us
- Named pipe access: 3003 assertions passed
first in 78.67ms, done in 187.15ms i.e. 5343/s, average 187us
- Local window messages: 3002 assertions passed
first in 148us, done in 112.90ms i.e. 8857/s, average 112us
- Direct in process access: 3001 assertions passed
first in 44us, done in 41.69ms i.e. 23981/s, average 41us
Total failed: 0 / 15014 - Client server access PASSED
As you can see, fastest is direct access, then WM_COPY_DATA, then named pipes, then HTTP (i.e. TCP/IP). Message was around 5 KB of JSON data containing 113 rows, retrieved from server, then parsed on the client 100 times (yes, our framework is fast :) ). For huge blocks of data (like 4 MB), WM_COPY_DATA is slower than named pipes or HTTP-TCP/IP.
Where are several IPC (inter-process communication) methods in Windows. Your question is rather general, I can suggest memory-mapped files to store your shared data and message broadcasting via PostMessage to inform other application that the shared data changed.
If you don't mind running another process, you could use one of the NoSQL databases.
I'm pretty sure that a lot of them won't have Delphi drivers, but some of them have REST drivers and hence can be driven from pretty much anything.
Memcached is an easy way to share data between applications. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects).
A Delphi 2010 client for Memcached can be found on google code:
http://code.google.com/p/delphimemcache/
related question:
Are there any Caching Frameworks for Delphi?
Googling for 'delphi interprocess communication' will give you lots of pointers.
I suggest you take a look at http://madshi.net/, especially MadCodeHook (http://help.madshi.net/madCodeHook.htm)
I have good experience with the product.

Best approach for Comet? (Non Blocking IO vs Erlang)

Perhaps the Question isnt that simple to answer... but what is your opinion? Should i either use Non-Blocking approaches (libevent for exampe) or use erlang light weight processes to:
Achieve as much connections as possible at a given amount of RAM
Achieve as much throughput as possible at a given amount of CPU
The background is, that i am planing to code a pub/sub-Server and i cannot decide which approach i should use.
One article about making A Million-user Comet Application with Mochiweb you can read there. But I think stability, flexibility and maintainability will be more important most of time. Keeping this in mind I would not think about anything other than Erlang even there will be some better performing solution.
Under the hood, the Erlang VM uses non-blocking IO. If you Erlang light weight process blocks, the VM does not really do a kernel level thread context switch. Most of the time, it will just wake up another LWP on the same OS thread (thus, its not "blocking" in the right sense of the word).
You can even start the vm using the +A argument and specify how many IO event loop threads you would like to allocate (AFAIK, Node.js is still single-threaded and if a callback function hangs, ur VM is done for)

0MQ with green threads?

I've grown to like erlang, and it's a great (cough) architectural fit to my problem. Meanwhile I still like to imagine that I can kludge erlang processes & asynchronous message passing in python (I am currently in therapy to rid myself of this obsession).
During a recent binge I came across 0MQ & I like its messaging features. These may be self-evident to an erlang/OTP expert, but I'm just a humble python programmer (my shrink will no doubt get to read this clever argument). The 0MQ user-guide states that it uses native OS threads, and not virtual "green" threads.
Is there a way to make 0MQ work with say eventlet/gevent?
Or, should I avoid the green-eyed monster and stick to a single Python app thread, with non-blocking I/O handled by 0MQ's message queuing & its own (skilled) use of native threads?
Or, check out of rehab & go back to erlang?
Responding to a stale thread because I am kind of in the same boat. Thought I would share my thoughts.
1: It looks like all the heavy lifting has already been done: https://github.com/traviscline/gevent-zeromq has integrated the gevent loop with a nonblocking zmq socket and even some Cpython speedups. It also seems to be (at the time of this writing), reasonably well maintainted.
2: It depends; if you are writing something that can use zmq without a ton of external event logic, then you should just use zmq. If OTOH you need to integrate with other protocols, you may want to use gevent (or twisted perhaps, although it has no workable zmq now at all). My projects generally require multiple protocols (ie: private queue manager, public http, public https, private memcache, etc), so I am investigating switching to gevent for quicker project turnaround than my current favorite: twisted.
3: You may want to skip zmq entirely and integrate with an existing erlang based solution like rabbitMQ; the performance advantages of zmq may not be as important as you think, and then you have an erlang message queue that easily integrates with python with existing libraries.
Also see: Messsage Queue comparison at second life wiki
Zero MQ now works with Eventlet:
https://lists.secondlife.com/pipermail/eventletdev/2010-October/000907.html

Resources