How many levels of thread priorities are there? - pthreads

I was wondering how many levels of thread priorities are there?. Is it dependent on the OS that uses posix ?

It depends on the implementation used by your OS. You should use
int sched_get_priority_min(int policy);
int sched_get_priority_max(int policy);
to find the range for a particular scheduling policy on your platform.

Depends on Schedule Policy
Here are the various values:
source

Related

Why does temporal workflow design numTaskqueueWritePartitions and numTaskqueueReadPartitions?

Can anyone explain why temporal need two paramters: numTaskqueueWritePartitions and numTaskqueueReadPartitions?
I think we use one parameters numTaskqueuePartitions for read and wirte which is enough.
It is needed to support shrinking the number of partitions without losing tasks. Set numTaskqueueWritePartitions < numTaskqueueReadPartitions to drain the backlog from the partitions to be removed. After no messages are left in those partitions set numTaskqueueReadPartitions to the value of numTaskqueueWritePartitions.

How to monitor Flux.onBackpressureBuffer() queue size

In my reactive application I have hot Publisher with slow Subscriber. To handle lack of demand I am using onBackpressureBuffer but possible overflow errors are kinda scary.
How can I monitor number of elements present in the queue created by Flux.onBackpressureBuffer(maxSize)? Preferably with built-in reactor metrics() method. I am using Spring Boot + Micrometer if it makes any difference.
Although we didn't we find an easy way to this in Reactor, but we found a bit "hacky" one. Here it is: https://github.com/allegro/envoy-control/blob/master/envoy-control-core/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/utils/ReactorUtils.kt#L34
This function measures buffer size of various Flux operators. It is not guaranteed to work on every operator, but it was tested on onBackpressureBuffer with positive results.
It is written in Kotlin, but it should be very easy to port it to Java.
The essence of this code in case of onBackpressureBuffer is to cast Subscription to Scannable, and then use BUFFERED attribute:
flux
.onBackressureBuffer(maxSize)
.doOnSubscribe { subscription ->
// ...
val queueSize = Scannable.from(subscription).scan(Scannable.Attr.BUFFERED)
// ...
}

Is it safe for an OpenCL kernel to randomly write to a __global buffer?

I want to run an instrumented OpenCL kernel to get some execution metrics. More specifically, I have added a hidden global buffer which will be initialized from the host code with N zeros. Each of the N values are integers and they represent a different metric, which each kernel instance will increment in a different manner, depending on its execution path.
A simplistic example:
__kernel void test(__global int *a, __global int *hiddenCounter) {
if (get_global_id(0) == 0) {
// do stuff and then increment the appropriate counter (random numbers here)
hiddenCounter[0] += 3;
}
else {
// do stuff...
hiddenCounter[1] += 5;
}
}
After the kernel execution is complete, I need the host code to aggregate (a simple element-wise vector addition) all the hiddenCounter buffers and print the appropriate results.
My question is whether there are race conditions when multiple kernel instances try to write to the same index of the hiddenCounter buffer (which will definitely happen in my project). Do I need to enforce some kind of synchronization? Or is this impossible with __global arguments and I need to change it to __private? Will I be able to aggregate __private buffers from the host code afterwards?
My question is whether there are race conditions when multiple kernel instances try to write to the same index of the hiddenCounter buffer
The answer to this is emphatically yes, your code will be vulnerable to race conditions as currently written.
Do I need to enforce some kind of synchronization?
Yes, you can use global atomics for this purpose. All but the most ancient GPUs will support this. (anything supporting OpenCL 1.2, or cl_khr_global_int32_base_atomics and similar extensions)
Note that this will have a non-trivial performance overhead. Depending on your access patterns and frequency, collecting intermediate results in private or local memory and writing them out to global memory at the end of the kernel may be faster. (In the local case, the whole work group would share just one global atomic call for each updated cell - you'll need to use local atomics or a reduction algorithm to accumulate the values from individual work items across the group though.)
Another option is to use a much larger global memory buffer, with counters for each work item or group. In that case, you will not need atomics to write to them, but you will subsequently need to combine the values on the host. This uses much more memory, obviously, and likely more memory bandwidth too - modern GPUs should cache accesses to your small hiddenCounter buffer. So you'll need to work out/try which is the lesser evil in your case.

Difference in "Initializing Each Device"

everyone. I am learning LDD3. and has a question on below statement
"Note that struct net_device is always put together at runtime; it cannot be set up at compile time in the same manner as a file_operations or block_device_operations structure"
so what is root cause for this difference? the different behavior btw network driver and char driver?? Could anyone explain here? Thankss
The root cause of the difference is the nature of the data stored in these structures.
file_operations is some kind of global set of callbacks for a particular device which have a well-defined purpose (such as .open, .mmap, etc.) and are available (known) at compile time.
It doesn't assume any volatile data fields which could change throughout the process of module usage. So, merely a set of callbacks known at compile time.
net_device, in turn, is a structure intended to keep data amenable to lots of runtime changes. Suffice it to say, such fields as state, mtu and many others are self-explanatory. They are not known at compile time and need to be initialised/changed throughout runtime.
In other words, the structure obeys a strict interface of device probe/configuration/start order and the corresponding fields are initialised on the corresponding steps. Say, nobody knows the number of Rx or Tx queues at compile time. The driver calculates these values on initialisation based on some restrictions and demands from the OS. In example, the same network adapter may find it feasible to allocate 1 Rx queue and 1 Tx queue on 1-core CPU system whilst it will likely decide to setup 4 queues on a quad-core system. There is no point predicting this at compile time and initialising the struct to some values which will be changed anyway.
the different behavior btw network driver and char driver??
So, to put it together, it is the difference between the purposes of the two kind of structures. Keeping a static information about callbacks in one case and maintaining a volatile device state (name, tunable properties, counters, etc.) in the other case. I hope you find it clear.

How to increase the range of POSIX thread priorities?

I'd like to know how to increase the range of POSIX thread priorities beyond 1-99 for SCHED_RR. I've called sched_get_priority_min and sched_get_priority_max to verify the 1-99 range for SCHED_RR, but I'm porting code written for another operating system which uses more priority levels. I want each thread to have the same relative priority but do not want to force threads to share the same priority when they should be different.
You have probably read them already but the man pages seem clear enough:
The range of scheduling priorities
may vary on other POSIX systems, thus
it is a good idea for portable
applications to use a virtual priority
range and map it to the interval given
by sched_get_priority_max() and
sched_get_priority_min(). POSIX.1-2001
requires a spread of at least 32
between the maximum and the minimum
values for SCHED_FIFO and SCHED_RR.
I rather doubt there is a knob to twist to change this so my guess is you are stuck with either changing the kernel or scaling the priorities as suggested. I frankly have to wonder if more than 99 priorities really amount to a hill of beans in actual performance.

Resources