I need to create an OpenCL application that instruments the code of the OpenCL kernel that it receives as input, for some exotic profiling purposes (haven't found what I need, so I need/want to do it myself).
I want to compile the kernel to an intermediate representation (LLVM-IR right now), instrument it (using the LLVM C++ bindings), transpile the instrumented code to SPIR-V and then create a kernel in the hostcode with clCreateProgramWithIL().
For now, I am just compiling a simple OpenCL kernel that adds 2 vectors, without instrumentation:
__kernel void vadd(
__global float* a,
__global float* b,
__global float* c,
const unsigned int count)
{
int i = get_global_id(0);
if(i < count) c[i] = a[i] + b[i];
}
For compiling the above to LLVM IR, I use the following command:
clang -c -emit-llvm -include libclc/generic/include/clc/clc.h -I libclc/generic/include/ vadd.cl -o vadd.bc -emit-llvm -O0 -x cl
Afterwards, I transpile vadd.bc to vadd.spv with the llvm-spirv tool (here).
Finally, I try building a kernel from the C hostcode like this:
...
cl_program program = clCreateProgramWithIL(context, binary_data->data, binary_data->size, &err);
err = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
...
After running the hostcode, I receive the above error from the clBuildProgram command:
CL_BUILD_PROGRAM_FAILURE
error: undefined reference to `get_global_id()'
error: backend compiler failed build.
It seems that the vadd.spv file is not link with the OpenCL kernel library. Any idea how to achieve this?
Related
In a related question, How to trap floating-point exceptions on M1 Macs?, someone wanted to understand how to make the following code work natively on macOS hosted by a machine using the M1 processor:
#include <cmath> // for sqrt()
#include <csignal> // for signal()
#include <iostream>
#include <xmmintrin.h> // for _mm_setcsr
void fpe_signal_handler(int /*signal*/) {
std::cerr << "Floating point exception!\n";
exit(1);
}
void enable_floating_point_exceptions() {
_mm_setcsr(_MM_MASK_MASK & ~_MM_MASK_INVALID);
signal(SIGFPE, fpe_signal_handler);
}
int main() {
const double x{-1.0};
std::cout << sqrt(x) << "\n";
enable_floating_point_exceptions();
std::cout << sqrt(x) << "\n";
}
I am looking at this from another angle, and want to understand why it doesn't work using Rosetta 2. I compiled it using the following command:
clang++ -g -std=c++17 -arch x86_64 -o fpe fpe.cpp
When I run it, I see the following output:
nan
nan
Mind you, when I do the same thing on a Intel-based Mac, I see the following output:
nan
Floating point exception!
Does anyone know if it is possible to trap floating-point exceptions on Rosetta 2?
Considering the difference in trapping on Intel using:
_mm_setcsr(_MM_MASK_MASK & ~_MM_MASK_INVALID);
and trapping on Apple Silicon using:
fegetenv(&env);
env.__fpcr = env.__fpcr | __fpcr_trap_invalid;
fesetenv(&env);
it seems more likely it is a bug in the Rosetta implementation.
I am trying to add labels in C source code(instrumentation); with a small experience with assembly, comipler is clang; I have got a strange behavior with __asm__ and labels in CASE statements !!!;
here is what I have tried:
// Compiles successfully.
int main()
{
volatile unsigned long long a = 3;
switch(8UL)
{
case 1UL:
//lbl:;
__asm__ ("movb %%gs:%1,%0": "=q" (a): "m" (a));
a++;
}
return 0;
}
and this :
// Compiles successfully.
int main()
{
volatile unsigned long long a = 3;
switch(8UL)
{
case 1UL:
lbl:;
//__asm__ ("movb %%gs:%1,%0": "=q" (a): "m" (a));
a++;
}
return 0;
}
command:
clang -c examples/a.c
examples/a.c:5:14: warning: no case matching constant switch condition '8'
switch(8UL)
^~~
1 warning generated.
BUT this:
// not Compile.
int main()
{
volatile unsigned long long a = 3;
switch(8UL)
{
case 1UL:
lbl:;
__asm__ ("movb %%gs:%1,%0": "=q" (a): "m" (a));
a++;
}
return 0;
}
the error:
^~~
examples/a.c:9:22: error: invalid operand for instruction
__asm__ ("movb %%gs:%1,%0": "=q" (a): "m" (a));
^
<inline asm>:1:21: note: instantiated into assembly here
movb %gs:-16(%rbp),%rax
^~~~
1 warning and 1 error generated.
I am using :
clang --version
clang version 9.0.0-2 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
IMPORTANT; this will compile successfully with gcc.
gcc --version
gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I am working on Ubuntu 19 ,64 BIT.
any Help please..
EDIT
Based on the accepted answer below:
The __asm__ statement itself cause error (different sizes).
The __asm__ is unreachable.
Adding Label to the statement makes it reachable.
GCC ignores it.
Clang does not ignore it.
movb is 8-bit operand-size, %rax is 64-bit because you used unsigned long long. Just use mov to do a load of the same width as the output variable, or use movzbl %%gs:%1, %k0 to zero-extend to 64-bit. (Explicitly to 32-bit with movzbl, and implicitly to 64-bit by writing the 32-bit low half of the 64-bit register (the k modifier in %k0))
Surprised GCC doesn't reject that as well; maybe GCC removes it as dead code because of the unreachable case in switch(8). If you look at GCC's asm output, it probably doesn't contain that instruction.
I want to apply clang optimizations to a source file then generate its AST.
I tried passing -O3 flag but it seems that it is ignored.
For example I assume that for this snipped of program:
#include <stdio.h>
int main(void) {
int a = 5 + 5;
for (int i = 0; i < 10; i++) { }
printf("%i\n", a);
return 0;
}
Many optimizations can be applied, like removing the for loop converting 5 + 5 to 10.
When I dump the AST using clang -O3 -Xclang -ast-dump -fsyntax-only a.c I get the same AST without the optimization flag.
My goal is create a TranslationUnit with flag optimizations passed.
Maybe optimizations don't result into another AST? See if what you are looking for are IR dumps after each llvm optimization.
I want to slice the unused variables which are shown down with frama-c. But I have no idea which command line should I write to slice all unused variables with one command line
Last login: Thu Nov 9 20:48:42 on ttys000
Recep-MacBook-Pro:~ recepinanir$ cd desktop
Recep-MacBook-Pro:desktop recepinanir$ cat hw.c
#include <stdio.h>
int main()
{
int x= 10;
int y= 24;
int z;
printf("Hello World\n");
return 0;
}
Recep-MacBook-Pro:desktop recepinanir$ clang hw.c
Recep-MacBook-Pro:desktop recepinanir$ ./a.out
Hello World
Recep-MacBook-Pro:desktop recepinanir$ clang -Wall hw.c -o result
hw.c:5:9: warning: unused variable 'x' [-Wunused-variable]
int x= 10;
^
hw.c:6:9: warning: unused variable 'y' [-Wunused-variable]
int y= 24;
^
hw.c:7:9: warning: unused variable 'z' [-Wunused-variable]
int z;
^
3 warnings generated.
Recep-MacBook-Pro:desktop recepinanir$
As mentioned on https://frama-c.com/slicing.html, slicing is always relative some criterion, and the goal is to produce a program that is smaller to the original one, while presenting the same behavior with respect to the criterion. The Slicing plug-in itself gives several ways to build such criteria, but it seems that you are interested in the result of the Sparecode plugin (https://frama-c.com/sparecode.html): this is a specialized version of slicing, where the criterion is the program state at the end of the entry point of your analysis (i.e. main in your case). In other words, Sparecode will remove everything that does not contribute to the final result of the code under analysis. In your case, frama-c -sparecode-analysis hw.c gives the following result (note that the call to printf has been modified by the Variadic plug-in, and that its argument is not considered as useful for the final state of main. If this is an issue, you'd need to provide more specialized output functions, with an ACSL specification indicating that they have an impact to some global variable)
/* Generated by Frama-C */
#include "stdio.h"
/*# assigns \result, __fc_stdout->__fc_FILE_data;
assigns \result
\from (indirect: __fc_stdout->__fc_FILE_id),
__fc_stdout->__fc_FILE_data;
assigns __fc_stdout->__fc_FILE_data
\from (indirect: __fc_stdout->__fc_FILE_id),
__fc_stdout->__fc_FILE_data;
*/
int printf_va_1(void);
int main(void)
{
int __retres;
printf_va_1();
__retres = 0;
return __retres;
}
Finally, note that in the general case, Slicing (hence Sparecode) gives an overapproximation: it will only remove statements for which it is certain that they have no impact on the criterion.
Hello I have to parse some LLVM IR code for a compiler course. I am very new to LLVM.
I have clang and LLVM on my computer, and when I compile a simple C program:
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int i = 0; i < 10; i++) {
printf("Stuff!\n");
}
return 0;
}
using command: clang -cc1 test.c -emit-llvm
I get llvm IR with what I believe are called implicit blocks:
; <label>:4 ; preds = %9, %0
However my parser also needs to handle llvm IR with textual labels:
for.cond: ; preds = %for.inc, %entry
My problem is that I do not know how to generate such IR and was hoping someone show me how.
I tried Google and such, but I couldn't find appropriate information. Thanks in advance.
The accepted answer is no longer valid. Nor is it a good way to achieve the stated.
In case someone stumbles upon this question, like I did, I'm providing the answer.
clang-8 -S -fno-discard-value-names -emit-llvm test.c
use this site with Show detailed bytecode analysis checked
http://ellcc.org/demo/index.cgi