I want to know how erlang's VM preempts the running code and contexts the stack. How it can be done in a language such as c?
The trick is that the Erlang runtime has control over the VM, so it can - entirely in userspace - keep track of how many VM instructions it's already executed (or, better yet, an estimate or representation of the actual physical computation required for those instructions - a.k.a. "reductions" in Erlang VM parlance) and - if that number exceeds some threshold - immediately swap around process pointers/structs/whatever and resume the execution loop.
Think of it something like this (in kind of a pseudo-C that may or may not actually be C, but I wouldn't know because I ain't a C programmer, but you asked how you'd go about it in C so I'll try my darndest):
void proc_execute(Proc* proc)
{
/* I don't recall if Erlang's VM supports different
reduction limits for different processes, but if it
did, it'd be a rather intuitive way to define process
priorities, i.e. making sure higher-priority processes
get more reductions to spend */
int rds = proc->max_reductions;
for (; rds > 0; rds--) {
/* Different virtual instructions might execute different numbers of
physical instructions, so vm_execute_next_instruction will return
however many reductions are left after executing that virtual
instruction. */
rds = vm_execute_next_instruction(proc, rds);
if (proc->exited) break;
}
}
void vm_loop(Scheduler* sched)
{
Proc *proc;
for (;;) {
proc = sched_next_in_queue(sched);
/* we'll assume that the proc will be null if the
scheduler doesn't have any processes left in its
list */
if (!proc) break;
proc_execute(proc);
}
}
Proc* sched_next_in_queue(Scheduler* sched)
{
if (!sched->current_proc->exited) {
/* If the process hasn't exited yet, readd it to the
end of the queue so we can resume running it
later */
shift(sched->queue, sched->current_proc);
}
sched->current_proc = pop(sched->queue);
return sched->current_proc;
}
This is obviously quite simplified (notably excluding/eliding a lot of important stuff like how VM instructions are implemented and how messages get passed), but hopefully it illustrates how (if I'm understanding right, at least) Erlang's preemptive scheduler and process model works on a basic level.
All code of Erlang will compile to operation code of Erlang's VM. Erlang's VM execute Erlang's operation code by OS's threads which are created at startup of Erlang's VM.
Erlang's code run on Virtual CPUs which are controlled by Erlang's VM. And Erlang's VM consider IO as interrupt of Virtual CPUs. So Erlang's VM implements a machine and a scheduler like an OS. Because of operation code and non-blocking IO, we can implements preempts in Erlang's VM using C languange.
Related
Hi i'm relativly new to kernel programming (i've got a lot of c++ development experience though) and have a goal that i want to achieve:
Detecting and conditionally blocking attempts from userland programs to write or read to specific memory addresses located in my own userland process. This has to be done from a driver.
I've setup a development enviorment (virtual machine running the latest windows 10 + virtualkd + windbg) and already successfully deployed a small kmdf test driver via the visual studio integration (over lan).
So my question is now:
How do i detect/intercept Read/WriteProcessMemory calls to my ring3 application? Simply blocking handles isn't enough here.
It would be nice if some one could point me into the right direction either by linking (a non outdated) example or just by telling me how to do this.
Update:
Read a lot about filter drivers and hooking Windows Apis from kernel mode, but i really dont want to mess with Patchguard and dont really know how to filter RPM calls from userland. Its not important to protect my program from drivers, only from ring3 applications.
Thank you :)
This code from here should do the trick.
OB_PREOP_CALLBACK_STATUS PreCallback(PVOID RegistrationContext,
POB_PRE_OPERATION_INFORMATION OperationInformation)
{
UNREFERENCED_PARAMETER(RegistrationContext);
PEPROCESS OpenedProcess = (PEPROCESS)OperationInformation->Object,
CurrentProcess = PsGetCurrentProcess();
PsLookupProcessByProcessId(ProtectedProcess, &ProtectedProcessProcess); // Getting the PEPROCESS using the PID
PsLookupProcessByProcessId(Lsass, &LsassProcess); // Getting the PEPROCESS using the PID
PsLookupProcessByProcessId(Csrss1, &Csrss1Process); // Getting the PEPROCESS using the PID
PsLookupProcessByProcessId(Csrss2, &Csrss2Process); // Getting the PEPROCESS using the PID
if (OpenedProcess == Csrss1Process) // Making sure to not strip csrss's Handle, will cause BSOD
return OB_PREOP_SUCCESS;
if (OpenedProcess == Csrss2Process) // Making sure to not strip csrss's Handle, will cause BSOD
return OB_PREOP_SUCCESS;
if (OpenedProcess == CurrentProcess) // make sure the driver isnt getting stripped ( even though we have a second check )
return OB_PREOP_SUCCESS;
if (OpenedProcess == ProtectedProcess) // Making sure that the game can open a process handle to itself
return OB_PREOP_SUCCESS;
if (OperationInformation->KernelHandle) // allow drivers to get a handle
return OB_PREOP_SUCCESS;
// PsGetProcessId((PEPROCESS)OperationInformation->Object) equals to the created handle's PID, so if the created Handle equals to the protected process's PID, strip
if (PsGetProcessId((PEPROCESS)OperationInformation->Object) == ProtectedProcess)
{
if (OperationInformation->Operation == OB_OPERATION_HANDLE_CREATE) // striping handle
{
OperationInformation->Parameters->CreateHandleInformation.DesiredAccess = (SYNCHRONIZE | PROCESS_QUERY_LIMITED_INFORMATION);
}
else
{
OperationInformation->Parameters->DuplicateHandleInformation.DesiredAccess = (SYNCHRONIZE | PROCESS_QUERY_LIMITED_INFORMATION);
}
return OB_PREOP_SUCCESS;
}
}
This code, once registered with ObRegisterCallback, will detect when a new handle is created to your protected process and will kill it if it's not coming from Lsass, Csrss, or itself. This is to prevent blue screens from critical process being denied a handle to
your application.
I am using Chumak in erlang, opening a ROUTER socket.
I have a handful (4 or so) clients that use the Python zmq library to send REQ requests to this server.
Things work fine most of the time, but sometimes a client will have disconnect issues (reconnecting automatically is in the client code, and it works). I've found that when an error occurs in one client connection, it seems to move on to others as well, and I get a lot of
** {{noproc,{gen_server,call,[<0.31596.16>,incomming_queue_out]}},
on the server.
On the server side, I'm just opening one chumak socket and looping:
{ok, Sock} = chumak:socket( router ),
{ok, _} = chumak:bind( Sock, tcp, "0.0.0.0", ?PORT ),
spawn_link( fun() -> loop( Sock ) end ),
...
loop( CmdSock ) ->
{ok, [Identity, <<>>, Data]} = chumak:recv_multipart( Sock ),
...
The ZeroMQ docs seem to imply that one listening socket is enough unless I have many clients.
Do I misunderstand them?
No, there is no need to increase number of Socket instances
Abstractions are great to reduce a need to understand all the details under the hood for a typical user. That ease of life stops whenever such user has to go into performance tuning or debugging incidents.
Let's step in this way:
- unless some mastodon beast sized data payloads are to get moved through, there is quite enough to have a single ROUTER-AccessPoint into a Socket-instance, for say tens, hundreds, thousands of REQ-AccessPoints on the client side(s).
- yet, such numbers will increase the performance envelope requirements for the ROUTER-side Context-instance, so as to remain capable of handling all the Scalable Formal Communication Archetype ( pre-scribed ) handling, so as to all happen in due time and fair fashion.
This means, one can soon realise benefits from spawning Context-instances with more than its initial default solo-thread + in all my high-performance setups I advocate for using zmq.AFFINITY mappings, so as to squeeze indeed a max performance on highest-priority Socket-instances, whereas leaving non-critical resources sharing a common sub-set of the Context-instance's IO-thread-pool.
Next comes RAM
Yes, the toys occupy memory.
Check all the .{RCV|SND}BUF, .MAXMSGSIZE, .{SND|RCV}HWM, .BACKLOG, .CONFLATE
Next comes LINK-MANAGEMENT
Do not hesitate to optimise .IMMEDIATE, .{RCV|SND}BUF, .RECONNECT_IVL, .RECONNECT_IVL_MAX, .TCP_KEEPALIVE, .TCP_KEEPALIVE_CNT, .TCP_KEEPALIVE_INTVL, .TCP_KEEPALIVE_IDLE
Always set .LINGER right upon instantiations, as drop-outs cease to be lethal.
Next may come a few defensive and performance helper tools:
.PROBE_ROUTER, .TCP_ACCEPT_FILTER, .TOS, .HANDSHAKE_IVL
Next step?
If no memory-related troubles remain in the game and once mentioning reconnections, my suspect would be to rather go and setup .IMMEDIATE + possibly let ROUTER benefit from explicit PROBE_ROUTER signalling.
This is probably not of major importance, however I have noticed during testing that the performance of the print statement and also stdout is much faster in the Dart-Editor than from the command-line. From the command-line the performance of print takes around 36% longer than using stdout from the command-line. However, running the program from within the editor, using stdout takes around 900% longer than using the print statement in the editor, but both are considerably faster than from the command-line. ie. Print from a program running in the editor takes around 2.65% of the time it takes from the command-line.
Some relative timings based on average performance from my test :
Running program from command line (5000 iterations) :
print 1700 milliseconds.
stdout 1245 milliseconds.
Running program within Dart-Editor (5000 iterations) :
print 45 milliseconds
stdout 447 milliseconds.
Can someone explain to me the reason for these differences – in particular why performance in the Dart-Editor is so much faster? Also, is it acceptable practice to use stdout and what are the pros and cons versus using print?
Why is the Dart Editor faster?
Because the output handling by the command line is just really slow, and this blocks the output stream, and subsequently the call to print/stdout.
You can test this for yourself - test the following java program (with your own paths, of course):
public static void main(String[] args) {
try {
// the dart file does print and stdout in a loop
Process p = Runtime.getRuntime().exec("C:\\eclipse\\dart-sdk\\bin\\dart.exe D:\\DEVELOP\\Dart\\Console_Playground\\bin\\console_playground.dart");
BufferedReader in = new BufferedReader(new InputStreamReader(p.getInputStream()));
StringBuffer buf = new StringBuffer();
String line;
while((line = in.readLine()) != null) {
buf.append(line + "\r\n");
}
System.out.print(buf.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
On my machine, this is even slightly faster than the Dart Editor (which probably does something like buffering the input and rendering it periodically, but I don't really know).
You will also see that adding a Thread.sleep(1); into the loop will severely impact the performance of the dart program, because the stream is blocked.
Should stdout be used?
I think that's highly subjective. I, for one, do whatever lets me write code more quickly. When i just want to dump a variable, i use print(myvar);. But with stdout, you can do neat stuff like this: stdout.addStream(new File(r"D:\test.csv").openRead());. Of course, if performance is an issue, it depends on how your application will be used - for example, called by another program (where print is faster) vs. command line (where stdout is faster, for some reason).
Why is stdout faster in command line?
I have no idea, sorry. It's the only environment I tested where print() is slower, so I'd guess it has something to do with how the console handles incoming data.
On Solaris, processor_bind is used to set affinity for threads. You need to know the LWPID of the target thread or use the constant P_MYID to refer to yourself.
I have a function that looks like this:
void set_affinity(pthread_t thr, int cpu_number)
{
id_t lwpid = what_do_I_call_here(thr);
processor_bind(P_LWPID, lwpid, cpu_number, NULL);
}
In reality my function has a bunch of cross platform stuff in it that I've elided for clarity.
The key point is that I'd like to set the affinity of an arbitrary pthread_t so I can't use P_MYID.
How can I achieve this using processor_bind or an alternative interface?
Following up on this, and due to my confusion:
The lwpid is what is created by
pthread_create( &lwpid, NULL, some_func, NULL);
Thread data is available externally to a process that is not the one making the pthread_create() call - via the /proc interface
/proc/<pid>/lwp/<lwpid>/ lwpid == 1 is the main thread, 2 .. n are the lwpid in the above example.
But this tells you almost nothing about which thread you are dealing with, except that it is the lwpid in the example above.
/proc/pid/lwp/lwpid/lwpsinfo
can be read into a struct lwpsinfo which has some more information, from which you might be able to ascertain if you are looking at the thread you want. see /usr/include/sys/procfs.h
Or man -s 4 proc
The Solaris 11 kernel has critical threads optimization. You setup which threads require special care, the kernel does the rest. This appears to be what you want. Please read this short explanation to see if I understood what you want.
https://blogs.oracle.com/observatory/entry/critical_threads_optimization
The above is an alternate. It may not fly at all for you. But is the preferred mechanism, per Oracle.
For Solaris 10, use the pthread_t tid of the LWP with an idtype_t of P_LWPID in your call to processor_bind. This works in Solaris 8 -> 11. It works ONLY for LWP's in the process. It is not clear to me if that is your model.
HTH
I'm making a application in Erlang, with a GUI in Java.
I've managed to establish a connection between the to languages, but now i need to (i guess) send a message from Java to Erlang, every time I e.g press a button.
Is that the right way to go?
How would such a message look?
I've found a few good sites about this form of integration, but I feel like im not getting everything.
http://www.trapexit.org/How_to_communicate_java_and_erlang
Besides classic Java-Erlang communication via OTP jinterface you can research such methods like:
- thrift
- ice from zeroC (no official erlang binding)
- maybe two http servers on both sides (I like this approach)
- protocol buffers (rather not, it is better for larger data transfers)
You need to learn the shape of your traffic and choose the best solution.
Jinterface is not so bad, tho.. (here is official doc: http://www.erlang.org/doc/apps/jinterface/jinterface_users_guide.html)
If jinterface is too complicated you might just use the packet option on open_port and use
byte[] in_buf = new byte[256];
byte[] out_buf = new byte[256];
int in_count = System.in.read ();
int offset = 0;
do
{
int c = System.in.read (in_buf, offset, in_count-offset);
offset += c;
}
while (offset < in_count);
To read packets from erlang and to write use:
System.out.write(out_count);
System.out.write(out_buf, 0, out_count);
On the erlang side this would match with
open_port({spawn, "<path-to-java> -cp <classpath> your-java-prog",
[{packet, 1}]).
If you need larger packets use {packet, 2} or {packet, 4} and adapt the java.
Inside the packets you can run whatever protocol you like on both sides.
I am working on an application similar to yours: C++ GUI and Erlang server. I use TCP sockets to exchange messages between the GUI and server, and Erlang server patterns for handling requests (I may have more than one GUI hooked up to the server at the same time).