Erlang Kernel Space Polling - erlang

I just want to know if kqueue for FreeBSD and epoll for Linux and other kernel space polling functions are a default OS behaviour to handle sockets and connections ?
I asked that because I have read the source code of the Erlang Network Driver Section that handle tcp requests and there was any kqueue or epoll or something like that to poll events that came from sockets

According to the blog post I/O polling options in OTP 21, starting in Erlang/OTP 21.0 whether kernel poll will be used is decided at compile time (in earlier versions, you could explicitly activate it by passing the +K command line option). As I understand it, it will default to on as long as your system supports it.
You can check whether kernel poll is activated with system_info:
> erlang:system_info(kernel_poll).
true

Related

When to write a Custom Kernel Module

Problem Statement:
I have a very high bandwidth data link that is UDP based. The source of this data is not configurable, and sends on UDP a stream of datagrams. We have code that uses the standard methods for receiving data on the UDP socket that works adequately. I wanted to know if
Does there exist a command interface to extract multiple UDP datagrams at a time? to improve efficiency?
If one doesn't exist, does it make sense to create a kernel module to provide the capability?
I am a novice, and i wanted to understand what thought process has to happen when writing your own kernel module seems appropriate. I know that such a surgical procedure isn't meant to done lightly, but there must be a set of criteria where that action is prudent. Maybe not in my case, but in general.
HW / Kernel Module Perspective
A typical network adapter these days would be capable of distributing received packets across multiple hardware Rx queues thus letting the host run multiple software Rx queues bound to different CPU cores reading out packets in parallel. From a single HW/SW queue perspective, the host may poll it for new packets (see Linux NAPI), with each poll ideally yielding a batch of packets, and, alternatively, the host may still use interrupt-driven approach for Rx signalling with interrupt coalescing turned on for improved efficiency.
Existing NIC drivers in Linux kernel strive to stick with the most performant techniques, and the kernel itself should be able to leverage all of that properly.
Userland / Application Perspective
There's PACKET_MMAP interface provided by Linux kernel for improved Rx/Tx efficiency on the application side. Long story short, an application can set up a memory buffer shared between the kernel- and userspace and read out incoming packets from it, ideally in batches, or blocks, thus avoiding costly kernel-to-userspace copies and context switches so customary when using regular methods.
For added efficiency, the application may have multiple sockets bound to the NIC in separate threads / processes and demand that packet reception be load balanced across these sockets (see AF_PACKET fanout mode description).
DPDK Perspective
Kernel bypass framework that allows an application to seize full control of a network adapter by means of a vendor-specific poll-mode driver, or PMD, effectively running in userspace as part of the application and by its very nature not needing any kernel-to-userspace copies, context switches and, most likely, locking. Multi-queue receive operation, load balancing (round robin, RSS, you name it) and more cutting edge offloads are likely to be available, too (it's vendor specific).
Summary
The short of it, given the fact that multiple network acceleration techniques already exist, one need never write their own kernel module to solve the problem in question. By the looks of it, your application, which, as you say, uses standard methods, is not aware of PACKET_MMAP technique. So I'd be tempted to suggest looking at this one closely. DPDK approach might require that the application be effectively re-implemented from scratch, so I would first go for PACKET_MMAP approach as a low-hanging fruit.

How does an Erlang process bind to a specific scheduler?

How does an Erlang process bind to a specific scheduler?
Currently processes does not get bound to specific schedulers (though you can force it via undocumentet functions, not recommended). Scheduler threads may be bound to logical processors using cpu topology and binding types. The vm does use some of this information to enhance performance in its normal scheduling scheme.
Reading from an old mail from Kenneth Lundin:
The Erlang VM without SMP support has 1 scheduler which runs in the
main process thread. The scheduler picks runnable Erlang processes
and IO-jobs from the run-queue and there is no need to lock data
structures since there is only one thread accessing them.
The Erlang VM with SMP support can have 1 to many schedulers which are
run in 1 thread each. The schedulers pick runnable Erlang processes
and IO-jobs from one common run-queue. In the SMP VM all shared data
structures are protected with locks, the run-queue is one example of
a data structure protected with locks.
From OTP R12B the SMP version of the VM is automatically started as
default if the OS reports more than 1 CPU (or Core) and with the same
number of schedulers as CPU's or Cores.
Not sure if this answer your question. Could you expand a bit more?

When does a UDP sendto() block?

While using the default (blocking) behavior on an UDP socket, in which case will a call to sendto() block? I'm interested essentially in the Linux behavior.
For TCP I understand that congestion control makes the send() call blocking if the sending window is full, but what about UDP? Does it even block sometimes or just let packets getting discarded at lower layers?
This can happen if you filled up your socket buffer, but it is highly operating system dependent. Since UDP does not provide any guarantee your operating system can decide to do whatever it wants when your socket buffer is full: block or drop. You can try to increase SO_SNDBUF for temporary relief.
This can even depend on the fine tuning of your system, for instance it can also depend on the size of the TX ring in the driver of your network interface. There are a few discussions about this in the iperf mailing list, but you really want to discuss this with the developers of your operating system. Pay special attention to O_NONBLOCK and EAGAIN / EWOULDBLOCK.
This may be because your operating system is attempting to perform an ARP request in order to get the hardware address of the remote host.
Basically whenever a packet goes out, the header requires the IP address of the remote host and the MAC address of the remote host (or the first gateway to reach it). 192.168.1.34 and AB:32:24:64:F3:21.
Your "block" behavior could be that ARP is working.
I've heard in older versions of Windows (2k I think), that the 1st packet would sometimes get discarded if the request is taking too long and you're sending out too much data. A service pack probably fixed that since then.

Erlang's maximum number of simultaneous open ports?

Does the erlang TCP/IP library have some limitations? I've done some searching but can't find any definitive answers.
I have set the ERL_MAX_PORTS environment variable to 12000 and configured Yaws to use unlimited connections.
I've written a simple client application that connects to an appmod I've written for Yaws and am testing the number of simultaneous connections by launch X number of clients all at the same time.
I find that when I get to about 100 clients, the Yaws server stops accepting more TCP connections and the client errors out with
Error in process with exit value: {{badmatch,{error,socket_closed_remotely}}
I know there must be a limit to the number of open simultaneous connections, but 100 seems really low. I've looked through all the yaws documentation and have removed any limit on connections.
This is on a 2.16Ghz Intel Core 2 Duo iMac running Snow Leopard.
A quick test on a Vista Machine shows that I get the same problems at about 300 connections.
Is my test unreasonable? I.e. is it silly to open 100+ connections simultaneously to test Yaws' concurrency?
Thanks.
It seems you hit a system limitation, try to increase the max number of open files using
$ ulimit -n 500
Python on Snow Leopard, how to open >255 sockets?
Erlang itself has a limit of 1024:
From http://www.erlang.org/doc/man/erlang.html
The maximum number of ports that can be open at the same time is 1024 by default, but can be configured by the environment variable ERL_MAX_PORTS.
EDIT:
The system call listen()
has a parameter backlog which determines how many requests can be queued, please check whether a delay between requests to establish connections helps. This could be your problem.
All Erlang system limits are reported in the Erlang Efficiency Guide:
http://erlang.org/doc/efficiency_guide/advanced.html#id2265856
Reading from the open ports section:
The maximum number of simultaneously
open Erlang ports is by default 1024.
This limit can be raised up to at most
268435456 at startup (see environment
variable ERL_MAX_PORTS in erlang(3))
The maximum limit of 268435456 open
ports will at least on a 32-bit
architecture be impossible to reach
due to memory shortage.
After trying out everybody's suggestion and scouring the Erlang docs, I've come to the conclusion that my problem is with Yaws not being able to keep up with the load.
On the same machine, an Apache Http Components web server (non-blocking I/O) does not have the same problems handling connections at the same thresholds.
Thanks for all your help. I'm going to move on to other erlang based web servers, like Mochiweb.

How to Enable Wake on LAN programmatically

Is there a way to programmatically reach into the BIOS and turn on the Wake on LAN capability for those machines that support it?
Ideally, the solution would be cross-BIOS, but hitting each of the major vendors with separate solutions would be okay, too.
BIOS configuration is something that the OS intentionally limits to avoid virus problems (lots of bios viruses back in the day!).
You need to look at the system management interface to see if it's available generally. You'll probably need to work in ring0 in windows (or root/kernel in linux). Additionally, you'll likely need to learn how to do this accessing the hardware directly, learning and keeping a database of the most common BIOS manufacturers and types, and even then won't be able to cover all of them.
SMBIOS might help?
I know we had a utility to read the BIOS from a regular windoze program once, at my previous job.
I think you're going to find that Wake on LAN is a CMOS Setup option, and so not programmable via hardware-agnostic OS interfaces.
Dell Inc. provides customers the OpenManage suite of utilities for remotely manipulating Setup settings on its client machines. Some links:
Dell OpenManage
Wikipedia article
There are several steps on enabling Wake on LAN. First it must be enabled in BIOS and second it must be enabled on the network card itself.
In Windows, you can find the settings under the advanced options dialog box for your network adapter. For Linux, you can use ethtool command.
Use ethtool eth0 to display current status for eth0 interface:
Settings for eth0:
Wake-on: g
Use ethtool -s eth0 wol XYZ to set the option, but remember that not all cards support all WoL methods and that some cards do remember the settings upon reboot, but others do not (then you need to add this command to your startup scripts).
wol p|u|m|b|a|g|s|d...
Sets Wake-on-LAN options. Not all devices support this. The argument to this option is a string of
characters specifying which options to enable.
p Wake on phy activity
u Wake on unicast messages
m Wake on multicast messages
b Wake on broadcast messages
a Wake on ARP
g Wake on MagicPacket(tm)
s Enable SecureOn(tm) password for MagicPacket(tm)
d Disable (wake on nothing). This option clears all previous options.

Resources