bpftrace doesn’t recognise a syscall argument as negative - comparison

Here is a simple bpftrace script:
#!/usr/bin/env bpftrace
tracepoint:syscalls:sys_enter_kill
{
$tpid = args->pid;
printf("%d %d %d\n", $tpid, $tpid < 0, $tpid >= 0);
}
It traces kill syscalls, prints the target PID and two additional values: whether it is negative, and whether it is non-negative.
Here is the output that I get:
# ./test.bt
Attaching 1 probe...
-1746 0 1
-2202 0 1
4160 0 1
4197 0 1
4197 0 1
-2202 0 1
-1746 0 1
Weirdly, both positive and negative pids appear to be positive for the comparison operator.
Just as a sanity, check, if I replace the assignment line with:
$tpid = -10;
what I get is exactly what I expect:
# ./test.bt
Attaching 1 probe...
-10 1 0
-10 1 0
-10 1 0
What am I doing wrong?

As you've discovered, bpftrace assigns a u64 type to your $tpid variable. Yet, according to the tracepoint format doc., args->pid should be of type pid_t, or int.
# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_kill/format
name: sys_enter_kill
ID: 185
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:int __syscall_nr; offset:8; size:4; signed:1;
field:pid_t pid; offset:16; size:8; signed:0;
field:int sig; offset:24; size:8; signed:0;
print fmt: "pid: 0x%08lx, sig: 0x%08lx", ((unsigned long)(REC->pid)), ((unsigned long)(REC->sig))
The bpftrace function that assigns this type is TracepointFormatParser::adjust_integer_types(). This change was introduced by commit 42ce08f to address issue #124.
For the above tracepoint description, bpftrace generates the following structure:
struct _tracepoint_syscalls_sys_enter_kill
{
unsigned short common_type;
unsigned char common_flags;
unsigned char common_preempt_count;
int common_pid;
int __syscall_nr;
u64 pid;
s64 sig;
};
When it should likely generate:
struct _tracepoint_syscalls_sys_enter_kill
{
unsigned short common_type;
unsigned char common_flags;
unsigned char common_preempt_count;
int common_pid;
int __syscall_nr;
u32 pad1;
pid_t pid;
u32 pad2;
int sig;
};
bpftrace seems to be confused by the size parameter that doesn't match the type in the above description. All syscall arguments get size 8 (on 64-bit at least), but that doesn't mean all 8 bytes are used. I think it would be worth opening an issue on bpftrace.

There is something strange going on with integer types in bpftrace (see #554, #772, #834 for details).
It seems that in my case arg->pids gets treated as a 64-bit value by default, while it is actually not. So the solution is to explicitly cast it:
$tpid = (int32)args->pid;
And now it works as expected:
# bpftrace test.bt
Attaching 1 probe...
-2202 1 0
-1746 1 0
-2202 1 0
4160 0 1
4197 0 1

Related

How can I cast a value into two bytes in Dafny?

I want to convert an integer from 0 to 65355 and for that I need a two byte representation. I'm trying to divide it by 2, 8 times, and sum the powers of 2 when the rest is one, and then cast that integer as a byte but I'm having problems meeting the restrictions of a byte (256). The second byte will be the rest of the 8th division and I'm having problems casting that as a byte too.
The following is my code for the previously described function method:
method convertBin(i:int) returns (b:seq<byte>)
requires 0<=i<=65535;
{
var b1:=0;
var q:=i;
var j:=0;
while j<8
invariant 0<=j<=8 && (b1 as int)< power(2,j)
decreases 8-j
{
var p:int;
if(q%2==1){
p:=power(2, j);
b1:=b1 + p;
q:=q/2;
}
j:=j+1;
}
b1:=b1 as byte;
b:=[b1]+[q as byte];
}
To complete your example, you need stronger loop invariants. But you don't need a loop at all, since there's no reason to divide only by 2.
Here's doing it with byte as a subset type:
type byte = x | 0 <= x < 256
method convertBin(i: int) returns (b1: byte, b0: byte)
requires 0 <= i < 0x1_0000
ensures i == 256 * b1 + b0
{
b1, b0 := i / 256, i % 256;
}
And here's the same program, but with byte being a newtype:
newtype byte = x | 0 <= x < 256
method convertBin(i: int) returns (b1: byte, b0: byte)
requires 0 <= i < 0x1_0000
ensures i == 256 * b1 as int + b0 as int
{
b1, b0 := (i / 256) as byte, (i % 256) as byte;
}
Rustan

Integer overflow in Fibonacci number

I was solving this codechef problem on Fibonacci numbers. It says number is of 1000 digits then why it is not causing integer overflow in tester's solution when it is scanning the array and storing it in unsigned long long int. I can't understand how solution is working. Below is the problem and tester's solution.
The Head Chef has been playing with Fibonacci numbers for long . He has learnt several tricks related to Fibonacci numbers . Now he wants to test his chefs in the skills .
A fibonacci number is defined by the recurrence :
f(n) = f(n-1) + f(n-2) for n > 2
and f(1) = 0
and f(2) = 1 .
Given a number A , determine if it is a fibonacci number.
Input
The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows.
The only line of each test case contains a single integer A denoting the number to be checked .
Output
For each test case, output a single line containing "YES" if the given number is a fibonacci number , otherwise output a single line containing "NO" .
Constraints
1 ≤ T ≤ 1000
1 ≤ number of digits in A ≤ 1000
The sum of number of digits in A in all test cases <= 10000.
Example
Input:
3
3
4
5
Output:
YES
NO
YES
**Tester's solution:**
#include <iostream>
#include <cstdio>
#include <algorithm>
#include <set>
#include <cstring>
using namespace std;
int const mx = 6666;
set <unsigned long long> f;
unsigned long long fib[mx + 10];
char s[mx + 1];
int main(){
// freopen("input.txt", "r", stdin);
// freopen("output.txt", "w", stdout);
fib[0] = 0;
fib[1] = 1;
f.insert(1);
f.insert(0);
int i;
for (i = 2; i <= mx; i++){
fib[i] = fib[i - 1] + fib[i - 2];
f.insert(fib[i]);
}
int tc;
cin>>tc;
while (tc--){
unsigned long long n = 0, ten = 10;
cin>>s;
int len = strlen(s);
for (i = 0; i < len; i++){
char q = s[i];
unsigned long long a = q - '0';
n = n * ten + a;
}
if (f.find(n) == f.end()) printf("NO\n");
else printf("YES\n");
}
return 0;
}
From cplusplus you will see that,
ULLONG_MAX Maximum value for an object of type unsigned long long int is 18446744073709551615 (264-1) or greater.
The actual value depends on the particular system and library implementation, but shall reflect the limits of these types in
the target platform.
Above information is just to let you know its a BIG number. Moreover the cause of not getting overflow is not the limit i mentioned.
Most probably, the input file of judge does not contain any input that can cause an overflow.
And its still possible to set such input even after fulfilling the conditions,
1 ≤ T ≤ 1000
1 ≤ number of digits in A ≤ 1000
The sum of number of digits in A in all test cases <= 10000.

Bitwise right shift >> in Objective-C

I have a variable (unsigned int) part_1.
If I do this:
NSLog(#"%u %08x", part_1, part_1); (print unsigned value, and hex value) it outputs:
2063597568 7b000000
(only first two will have values).
I want to convert this to
0000007b
So i've tried doing
unsigned int part_1b = part_1 >> 6 (and lots of variations)
But this outputs:
32243712 01ec0000
Where am i going wrong?
You want to shift by 6*4 = 24 bits, not just 6 bits. Each '0' in the hex printf represents 4 bits.
unsigned int part_1b = part_1 >> 24;
^^

PGMidi changing pitch sendBytes example

I'm trying the second day to send a midi signal. I'm using following code:
int pitchValue = 8191 //or -8192;
int msb = ?;
int lsb = ?;
UInt8 midiData[] = { 0xe0, msb, lsb};
[midi sendBytes:midiData size:sizeof(midiData)];
I don't understand how to calculate msb and lsb. I tried pitchValue << 8. But it's working incorrect, When I'm looking to events using midi tool I see min -8192 and +8064 max. I want to get -8192 and +8191.
Sorry if question is simple.
Pitch bend data is offset to avoid any sign bit concerns. The maximum negative deviation is sent as a value of zero, not -8192, so you have to compensate for that, something like this Python code:
def EncodePitchBend(value):
''' return a 2-tuple containing (msb, lsb) '''
if (value < -8192) or (value > 8191):
raise ValueError
value += 8192
return (((value >> 7) & 0x7F), (value & 0x7f))
Since MIDI data bytes are limited to 7 bits, you need to split pitchValue into two 7-bit values:
int msb = (pitchValue + 8192) >> 7 & 0x7F;
int lsb = (pitchValue + 8192) & 0x7F;
Edit: as #bgporter pointed out, pitch wheel values are offset by 8192 so that "zero" (i.e. the center position) is at 8192 (0x2000) so I edited my answer to offset pitchValue by 8192.

How can I access my constant memory in my kernel?

I can't manage to access the data in my constant memory and I don't know why. Here is a snippet of my code:
#define N 10
__constant__ int constBuf_d[N];
__global__ void foo( int *results, int *constBuf )
{
int tdx = threadIdx.x;
int idx = blockIdx.x * blockDim.x + tdx;
if( idx < N )
{
results[idx] = constBuf[idx];
}
}
// main routine that executes on the host
int main(int argc, char* argv[])
{
int *results_h = new int[N];
int *results_d = NULL;
cudaMalloc((void **)&results_d, N*sizeof(int));
int arr[10] = { 16, 2, 77, 40, 12, 3, 5, 3, 6, 6 };
int *cpnt;
cudaError_t err = cudaGetSymbolAddress((void **)&cpnt, "constBuf_d");
if( err )
cout << "error!";
cudaMemcpyToSymbol((void**)&cpnt, arr, N*sizeof(int), 0, cudaMemcpyHostToDevice);
foo <<< 1, 256 >>> ( results_d, cpnt );
cudaMemcpy(results_h, results_d, N*sizeof(int), cudaMemcpyDeviceToHost);
for( int i=0; i < N; ++i )
printf("%i ", results_h[i] );
}
For some reason, I only get "0" in results_h. I'm running CUDA 4.0 with a card with capability 1.1.
Any ideas? Thanks!
If you add proper error checking to your code, you will find that the cudaMemcpyToSymbol is failing with a invalid device symbol error. You either need to pass the symbol by name, or use cudaMemcpy instead. So this:
cudaGetSymbolAddress((void **)&cpnt, "constBuf_d");
cudaMemcpy(cpnt, arr, N*sizeof(int), cudaMemcpyHostToDevice);
or
cudaMemcpyToSymbol("constBuf_d", arr, N*sizeof(int), 0, cudaMemcpyHostToDevice);
or
cudaMemcpyToSymbol(constBuf_d, arr, N*sizeof(int), 0, cudaMemcpyHostToDevice);
will work. Having said that, passing a constant memory address as an argument to a kernel is the wrong way to use constant memory - it defeats the compiler from generating instructions to access memory via the constant memory cache. Compare the compute capability 1.2 PTX generated for your kernel:
.entry _Z3fooPiS_ (
.param .u32 __cudaparm__Z3fooPiS__results,
.param .u32 __cudaparm__Z3fooPiS__constBuf)
{
.reg .u16 %rh<4>;
.reg .u32 %r<12>;
.reg .pred %p<3>;
.loc 16 7 0
$LDWbegin__Z3fooPiS_:
mov.u16 %rh1, %ctaid.x;
mov.u16 %rh2, %ntid.x;
mul.wide.u16 %r1, %rh1, %rh2;
cvt.s32.u16 %r2, %tid.x;
add.u32 %r3, %r2, %r1;
mov.u32 %r4, 9;
setp.gt.s32 %p1, %r3, %r4;
#%p1 bra $Lt_0_1026;
.loc 16 14 0
mul.lo.u32 %r5, %r3, 4;
ld.param.u32 %r6, [__cudaparm__Z3fooPiS__constBuf];
add.u32 %r7, %r6, %r5;
ld.global.s32 %r8, [%r7+0];
ld.param.u32 %r9, [__cudaparm__Z3fooPiS__results];
add.u32 %r10, %r9, %r5;
st.global.s32 [%r10+0], %r8;
$Lt_0_1026:
.loc 16 16 0
exit;
$LDWend__Z3fooPiS_:
} // _Z3fooPiS_
with this kernel:
__global__ void foo2( int *results )
{
int tdx = threadIdx.x;
int idx = blockIdx.x * blockDim.x + tdx;
if( idx < N )
{
results[idx] = constBuf_d[idx];
}
}
which produces
.entry _Z4foo2Pi (
.param .u32 __cudaparm__Z4foo2Pi_results)
{
.reg .u16 %rh<4>;
.reg .u32 %r<12>;
.reg .pred %p<3>;
.loc 16 18 0
$LDWbegin__Z4foo2Pi:
mov.u16 %rh1, %ctaid.x;
mov.u16 %rh2, %ntid.x;
mul.wide.u16 %r1, %rh1, %rh2;
cvt.s32.u16 %r2, %tid.x;
add.u32 %r3, %r2, %r1;
mov.u32 %r4, 9;
setp.gt.s32 %p1, %r3, %r4;
#%p1 bra $Lt_1_1026;
.loc 16 25 0
mul.lo.u32 %r5, %r3, 4;
mov.u32 %r6, constBuf_d;
add.u32 %r7, %r5, %r6;
ld.const.s32 %r8, [%r7+0];
ld.param.u32 %r9, [__cudaparm__Z4foo2Pi_results];
add.u32 %r10, %r9, %r5;
st.global.s32 [%r10+0], %r8;
$Lt_1_1026:
.loc 16 27 0
exit;
$LDWend__Z4foo2Pi:
} // _Z4foo2Pi
Note that in the second case, constBuf_d is accessed via ld.const.s32, rather than ld.global.s32, so that constant memory cache is used.
Excellent answer #talonmies. But I would like to mention that there have been changes in cuda 5. In the function MemcpyToSymbol(), char * argument is no longer supported.
The CUDA 5 release notes read:
** The use of a character string to indicate a device symbol, which was possible with certain API functions, is no longer supported. Instead, the symbol should be used directly.
Instead the copy have to be made to the constant memory as follows :
cudaMemcpyToSymbol( dev_x, x, N * sizeof(float) );
In this case "dev_x" is pointer to constant memory and "x" is pointer to host memory which needs to be copied into dev_x.

Resources