How to find out the PID of the Flinks execution process? - perf

I want to measure flinks performance with performance counters (perf). My code:
var text = env.readTextFile("<filename>")
var counts = text.flatMap { _.toLowerCase.split("\\W+") }.map { (_, 1) }.groupBy(0).sum(1)
counts.writeAsText("<filename_result>", WriteMode.OVERWRITE)
env.execute()
I know the PID of the jobmanager. Also I can see the TID of the Thread (CHAIN DataSource), that runs the execute()-command, during execution. But for each execution the TID changes, so it wont work with the TID. Is there a way to figure out the PID of the jobmanagers child process, that runs the execute()-command? And are there different child processes for every transformation (e.g. flatMap) of the rdd? If so, is it possible to find out their distinct PIDs?

The individual operators are not executed in distinct processes. The JobManager and the TaskManagers are started as Java processes. The TaskManager then runs a set of parallel tasks (corresponding to the operators). Each parallel task is executed in its own thread. When you start Flink, then the system will create files /tmp/your-name-taskmanager.pid and /tmp/your-name-jobmanager.pid which contain the PID of the processes.

Related

waitForCompletion(timeout) in Abaqus API does not actually kill the job after timeout passes

I'm doing a parametric sweep of some Abaqus simulations, and so I'm using the waitForCompletion() function to prevent the script from moving on prematurely. However, occassionally the combination of parameters causes the simulation to hang on one or two of the parameters in the sweep for something like half an hour to an hour, whereas most parameter combos only take ~10 minutes. I don't need all the data points, so I'd rather sacrifice one or two results to power through more simulations in that time. Thus I tried to use waitForCompletion(timeout) as documented here. But it doesn't work - it ends up functioning just like an indefinite waitForCompletion, regardless of how low I set the wait time. I am using Abaqus 2017, and I was wondering if anyone else had gotten this function to work and if so how?
While I could use a workaround like adding a custom timeout function and using the kill() function on the job, I would prefer to use the built-in functionality of the Abaqus API, so any help is much appreciated!
It seems like starting from a certain version the timeOut optional argument was removed from this method: compare the "Scripting Reference Manual" entry in the documentation of v6.7 and v6.14.
You have a few options:
From Abaqus API: Checking if the my_abaqus_script.023 file still exists during simulation:
import os, time
timeOut = 600
total_time = 60
time.sleep(60)
# whait untill the the job is completed
while os.path.isfile('my_job_name.023') == True:
if total_time > timeOut:
my_job.kill()
total_time += 60
time.sleep(60)
From outside: Launching the job using the subprocess
Note: don't use interactive keyword in your command because it blocks the execution of the script while the simulation process is active.
import subprocess, os, time
my_cmd = 'abaqus job=my_abaqus_script analysis cpus=1'
proc = subprocess.Popen(
my_cmd,
cwd=my_working_dir,
stdout='my_study.log',
stderr='my_study.err',
shell=True
)
and checking the return code of the child process suing poll() (see also returncode):
timeOut = 600
total_time = 60
time.sleep(60)
# whait untill the the job is completed
while proc.poll() is None:
if total_time > timeOut:
proc.terminate()
total_time += 60
time.sleep(60)
or waiting until the timeOut is reached using wait()
timeOut = 600
try:
proc.wait(timeOut)
except subprocess.TimeoutExpired:
print('TimeOut reached!')
Note: I know that terminate() and wait() methods should work in theory but I haven't tried this solution myself. So maybe there will be some additional complications (like looking for all children processes created by Abaqus using psutil.Process(proc.pid) )

Is there an equivalent to Akka Streams' `conflate` and/or `batch` operators in Reactor?

I am looking for an equivalent of the batch and conflate operators from Akka Streams in Project Reactor, or some combination of operators that mimic their behavior.
The idea is to aggregate upstream items when the downstream backpressures in a reduce-like manner.
Note that this is different from this question because the throttleLatest / conflate operator described there is different from the one in Akka Streams.
Some background regarding what I need this for:
I am watching a change stream on a MongoDB and for every change I run an aggregate query on the MongoDB to update some metric. When lots of changes come in, the queries can't keep up and I'm getting errors. As I only need the latest value of the aggregate query, it is fine to aggregate multiple change events and run the aggregate query less often, but I want the metric to be as up-to-date as possible so I want to avoid waiting a fixed amount of time when there is no backpressure.
The closest I could come so far is this:
changeStream
.window(Duration.ofSeconds(1))
.concatMap { it.reduce(setOf<String>(), { applicationNames, event -> applicationNames + event.body.sourceReference.applicationName }) }
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
but this would always wait for 1 second regardless of backpressure.
What I would like is something like this:
changeStream
.conflateWithSeed({setOf(it.body.sourceReference.applicationName)}, {applicationNames, event -> applicationNames + event.body.sourceReference.applicationName})
.concatMap { Flux.fromIterable(it) }
.concatMap { taskRepository.findTaskCountForApplication(it) }
I assume you always run only 1 query at the same time - no parallel execution. My idea is to buffer elements in list(which can be easily aggregated) as long as the query is running. As soon as the query finishes, another list is executed.
I tested it on a following code:
boolean isQueryRunning = false;
Flux.range(0, 1000000)
.delayElements(Duration.ofMillis(10))
.bufferUntil(aLong -> !isQueryRunning)
.doOnNext(integers -> isQueryRunning = true)
.concatMap(integers-> Mono.fromCallable(() -> {
int sleepTime = new Random().nextInt(10000);
System.out.println("processing " + integers.size() + " elements. Sleep time: " + sleepTime);
Thread.sleep(sleepTime);
return "";
})
.subscribeOn(Schedulers.elastic())
).doOnNext(s -> isQueryRunning = false)
.subscribe();
Which prints
processing 1 elements. Sleep time: 4585
processing 402 elements. Sleep time: 2466
processing 223 elements. Sleep time: 2613
processing 236 elements. Sleep time: 5172
processing 465 elements. Sleep time: 8682
processing 787 elements. Sleep time: 6780
Its clearly visible, that size of the next batch is proprortional to previous query execution time(Sleep time).
Note that it is not "real" backpressure solution, just a workaround. Also its not suited for parallel execution. It might also require some tuning in order to prevent running queries for empty batches.

How to queue dask delayed on each workers to allow sequential execution of a process?

I need worker to process a single tasks at a time and finish the current process before starting a new one. I cannot manage to: (1) have at most one task running at any moment on each worker, (2) make a worker finish a procedure before starting a new one; atomic transactions.
I use dask.distributed Client on a cluster with 40 nodes; 4 cores & 15GB ram each. The pipeline I process have task of around 8-10GB thus having two task on a work will lead to failure of the application.
I tried to assign my workers resource and task allocation with dask-worker scheduler-ip:port --nprocs 1 --resources process=1 and futures = [client.submit(func, f, resources={'process': 1}) for f in futures] but it was with no success.
My code is as following:
import dask
from dask.distributed import Client
#dask.delayed
def load():
...
#dask.delayed
def foo():
...
#dask.delayed
def save():
...
client = Client(scheduler-ip:port)
# Process file from a given path
paths = ['list', 'of', 'path']
results = []
for path in paths:
img = load(path)
for _ in range(n):
img = foo(img)
results.append(save(output-filename))
client.scatter(results)
futures = client.compute(results)
def identity(x):
return x
client.scatter(futures)
futures = [client.submit(same, f, resources={'process': 1}) for f in futures]
client.gather(futures)
As of right now I have two cases:
1- I run all my inputs and the applications terminates with MemoryError
2- I run a subsample however it run as follow:
load(img-1)->load(img-2)->foo(img-1)->load(img-3)->...->save(img-1)->save(img-2)->...
TLDR: this is what I want to do on each worker:
load(img-1)->foo(img-1)->save(img-1)->load(img-7)->...
The simplest thing here would probably to be to start your workers with only one thread
dask-worker ... --nthreads 1
Then that worker will only start one thing at a time

GCD concurrent queue not starting tasks in FIFO order [duplicate]

This question already has answers here:
iOS GCD custom concurrent queue execution sequence
(2 answers)
Closed 5 years ago.
I have a class which contains two methods as per the example in Mastering Swift by Jon Hoffman. The class is as below:
class DoCalculation {
func doCalc() {
var x = 100
var y = x * x
_ = y/x
}
func performCalculation(_ iterations: Int, tag: String) {
let start = CFAbsoluteTimeGetCurrent()
for _ in 0..<iterations {
self.doCalc()
}
let end = CFAbsoluteTimeGetCurrent()
print("time for \(tag): \(end - start)")
}
}
Now in the viewDidLoad() of the ViewController from the single view template, I create an instance of the above class and then create a concurrent queue. I then add the blocks executing the performCalculation(: tag:) method to the queue.
cqueue.async {
print("Starting async1")
calculation.performCalculation(10000000, tag: "async1")
}
cqueue.async {
print("Starting async2")
calculation.performCalculation(1000, tag: "async2")
}
cqueue.async {
print("Starting async3")
calculation.performCalculation(100000, tag: "async3")
}
Every time I run the application on simulator, I get random out put for the start statements. Example outputs that I get are below:
Example 1:
Starting async1
Starting async3
Starting async2
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
Example 2:
Starting async3
Starting async2
Starting async1
time for async2: 2.80141830444336e-05
time for async3: 0.00216799974441528
time for async1: 0.114436984062195
Example 3:
Starting async1
Starting async3
Starting async2
time for async2: 1.60336494445801e-05
time for async3: 0.00220298767089844
time for async1: 0.129496037960052
I don't understand why the blocks don't start in FIFO order. Can somebody please explain what am I missing here?
I know they will be executed concurrently, but its stated that concurrent queue will respect FIFO for starting the execution of tasks, but won't guarantee which one completes first. So at least the starting task statements should have started with
Starting async1
Starting async3
Starting async2
and this completion statements random:
time for async2: 4.1961669921875e-05
time for async3: 0.00238299369812012
time for async1: 0.117094993591309
and the completion statements random.
A concurrent queue runs the jobs you submit to it concurrentlyThat's what it's for.
If you want a queue the runs jobs in FIFO order, you want a serial queue.
I see what you're saying about the docs claiming that the jobs will be submitted in FIFO order, but your test doesn't really establish the order in which they're run. If the concurrent queue has 2 threads available but only one processor to run those threads on, it might swap out one of the threads before it gets a chance to print, run the other job for a while, and then go back to running the first job. There's no guarantee that a job runs to the end before getting swapped out.
I don't think a print statement gives you reliable information about the order in which the jobs are started.
cqueue is a concurrent queue which is dispatching your block of work to three different threads(it actually depends on the threads availability) at almost the same time but you can not control the time at which each thread completes the work.
If you want to perform a task serially in a background queue, you are much better using serial queue.
let serialQueue = DispatchQueue(label: "serialQueue")
Serial Queue will start the next task in queue only when your previous task is completed.
"I don't understand why the blocks don't start in FIFO order" How do you know they don't? They do start in FIFO order!
The problem is that you have no way to test that. The notion of testing it is, in fact, incoherent. The soonest you can test anything is the first line of each block — and by that time, it is perfectly legal for another line of code from another block to execute, because these blocks are asynchronous. That is what asynchronous means.
So, they start in FIFO order, but there is no guarantee about the order in which, given multiple asynchronous blocks, their first lines will be executed.
With a concurrent queue, you are effectively specifing that they can run at the same time. So while they’re added in FIFO manner, you have a race condition between these various worker threads, and thus you have no assurance which will hit its respective print statement first.
So, this raises the question: Why do you care which order they hit their respective print statements? If order is really important, you shouldn't be using concurrent queue. Or, the other way of saying that, if you want to use a concurrent queue, write code that isn't dependent upon the order with which they run.
You asked:
Would you suggest some way to get the info when a Task is dequeued from the queue so that I can log it to get the FIFO order.
If you're asking how to enjoy FIFO starting of the tasks on concurrent queue in real-world app, the answer is "you don't", because of the aforementioned race condition. When using concurrent queues, never write code that is strictly dependent upon the FIFO behavior.
If you're asking how to verify this empirically for purely theoretical purposes, just do something that ties up the CPUs and frees them up one by one:
// utility function to spin for certain amount of time
func spin(for seconds: TimeInterval, message: String) {
let start = CACurrentMediaTime()
while CACurrentMediaTime() - start < seconds { }
os_log("%#", message)
}
// my concurrent queue
let queue = DispatchQueue(label: label, attributes: .concurrent)
// just something to occupy up the CPUs, with varying
// lengths of time; don’t worry about these re FIFO behavior
for i in 0 ..< 20 {
queue.async {
spin(for: 2 + Double(i) / 2, message: "\(i)")
}
}
// Now, add three tasks on concurrent queue, demonstrating FIFO
queue.async {
os_log(" 1 start")
spin(for: 2, message: " 1 stop")
}
queue.async {
os_log(" 2 start")
spin(for: 2, message: " 2 stop")
}
queue.async {
os_log(" 3 start")
spin(for: 2, message: " 3 stop")
}
You'll be able to see those last three tasks are run in FIFO order.
The other approach, if you want to confirm precisely what GCD is doing, is to refer to the libdispatch source code. It's admittedly pretty dense code, so it's not exactly obvious, but it's something you can dig into if you're feeling ambitious.

Can someone explain the structure of a Pid (Process Identifier) in Erlang?

Can someone explain the structure of a Pid in Erlang?
Pids looks like this: <A.B.C>, e.g. <0.30.0> , but I would like to know what is the meaning of these three "bits": A, B and C.
A seems to be always 0 on a local node, but this value changes when the Pid's owner is located on another node.
Is it possible to directly send a message on a remote node using only the Pid? Something like that: <4568.30.0> ! Message, without having to explicitly specify the name of the registered process and the node name ( {proc_name, Node} ! Message)?
Printed process ids < A.B.C > are composed of 6:
A, the node number (0 is the local
node, an arbitrary number for a remote node)
B, the first 15 bits of the process number (an index into the process table) 7
C, bits 16-18 of the process number (the same process number as B) 7
Internally, the process number is 28 bits wide on the 32 bit emulator. The odd definition of B and C comes from R9B and earlier versions of Erlang in which B was a 15bit process ID and C was a wrap counter incremented when the max process ID was reached and lower IDs were reused.
In the erlang distribution PIDs are a little larger as they include the node atom as well as the other information. (Distributed PID format)
When an internal PID is sent from one node to the other, it's automatically converted to the external/distributed PID form, so what might be <0.10.0> (inet_db) on one node might end up as <2265.10.0> when sent to another node. You can just send to these PIDs as normal.
% get the PID of the user server on OtherNode
RemoteUser = rpc:call(OtherNode, erlang,whereis,[user]),
true = is_pid(RemoteUser),
% send message to remote PID
RemoteUser ! ignore_this,
% print "Hello from <nodename>\n" on the remote node's console.
io:format(RemoteUser, "Hello from ~p~n", [node()]).
For more information see: Internal PID structure,
Node creation information,
Node creation counter interaction with EPMD
If I remember this correctly the format is <nodeid,serial,creation>.
0 is current node much like a computer always has the hostname "localhost" to refer to itself. This is by old memory so it might not be 100% correct tough.
But yes. You could build the pid with list_to_pid/1 for example.
PidString = "<0.39.0>",
list_to_pid(PidString) ! message.
Of course. You just use whatever method you need to use to build your PidString. Probably write a function that generates it and use that instead of PidString like such:
list_to_pid( make_pid_from_term({proc_name, Node}) ) ! message
Process id < A.B.C > is composed of:
A, node id which is not arbitrary but the internal index for that node in dist_entry. (It is actually the atom slot integer for the node name.)
B, process index which refers to the internal index in the proctab, (0 -> MAXPROCS).
C, Serial which increases every time MAXPROCS has been reached.
The creation tag of 2 bits is not displayed in the pid but is used internally and increases every time the node restarts.
The PID refers to a process and a node table. So you can only send a message directly to a PID if it is known in the node from which you do the call.
It is possible that this will work if the node you do the call from already knows about the node on which the process is running.
Apart from what others have said, you may find this simple experiment useful to understand what is going on internally:
1> node().
nonode#nohost
2> term_to_binary(node()).
<<131,100,0,13,110,111,110,111,100,101,64,110,111,104,111,
115,116>>
3> self().
<0.32.0>
4> term_to_binary(self()).
<<131,103,100,0,13,110,111,110,111,100,101,64,110,111,104,
111,115,116,0,0,0,32,0,0,0,0,0>>
So, you can se that the node name is internally stored in the pid. More info in this section of Learn You Some Erlang.

Resources