How to build and link to CGLM from Zig with or without SIMD intrinsics - sse

I would like to link and use cglm C library. I'm working on windows without msvc (so targeting gnu C ABI) with Zig 0.7.1 and Zig 0.8.0 (master) without any luck.
I have been able to build CGLM static library from Zig build.zig but no luck linking against it from Zig program as CGLM SIMD intrinsics optimizations errors are reported.
const Builder = #import("std").build.Builder;
pub fn build(b: *Builder) void {
// Standard target options allows the person running `zig build` to choose
// what target to build for. Here we do not override the defaults, which
// means any target is allowed, and the default is native. Other options
// for restricting supported target set are available.
const target = b.standardTargetOptions(.{});
// Standard release options allow the person running `zig build` to select
// between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall.
const mode = b.standardReleaseOptions();
// cglm
const cglmLibBasePath = "vendor/cglm/";
const cglmFiles = &[_][]const u8{
cglmLibBasePath ++ "src/euler.c",
cglmLibBasePath ++ "src/affine.c",
cglmLibBasePath ++ "src/io.c",
cglmLibBasePath ++ "src/quat.c",
cglmLibBasePath ++ "src/cam.c",
cglmLibBasePath ++ "src/vec2.c",
cglmLibBasePath ++ "src/vec3.c",
cglmLibBasePath ++ "src/vec4.c",
cglmLibBasePath ++ "src/mat2.c",
cglmLibBasePath ++ "src/mat3.c",
cglmLibBasePath ++ "src/mat4.c",
cglmLibBasePath ++ "src/plane.c",
cglmLibBasePath ++ "src/frustum.c",
cglmLibBasePath ++ "src/box.c",
cglmLibBasePath ++ "src/project.c",
cglmLibBasePath ++ "src/sphere.c",
cglmLibBasePath ++ "src/ease.c",
cglmLibBasePath ++ "src/curve.c",
cglmLibBasePath ++ "src/bezier.c",
cglmLibBasePath ++ "src/ray.c",
cglmLibBasePath ++ "src/affine2d.c",
};
const cglmLib = b.addStaticLibrary("cglm", null);
cglmLib.setBuildMode(mode);
cglmLib.setTarget(target);
cglmLib.defineCMacro("CGLM_STATIC");
cglmLib.defineCMacro("WIN32");
cglmLib.addIncludeDir(cglmLibBasePath ++ "src/");
for (cglmFiles) |cglmFile| {
cglmLib.addCSourceFile(cglmFile, &[_][]const u8 {
"-std=c11",
"-Wall",
"-Werror",
"-O3",
});
}
cglmLib.linkLibC();
cglmLib.install();
const exe = b.addExecutable("app-zig-cglm", "src/main.zig");
exe.setBuildMode(mode);
exe.setTarget(target);
// cglm
exe.linkLibrary(cglmLib);
exe.addSystemIncludeDir(cglmLibBasePath ++ "include/");
// C and win32
exe.linkLibC();
exe.install();
const run_cmd = exe.run();
run_cmd.step.dependOn(b.getInstallStep());
if (b.args) |args| {
run_cmd.addArgs(args);
}
const run_step = b.step("run", "Run the app");
run_step.dependOn(&run_cmd.step);
}
When I try to build/run the exe with the generated library linked then the following error is reported (similar in both Zig versions 0.7.1 and 0.8.0 master) Ex from 0.7.1 Zig version.
alagt#LAPTOP-HS5L5VEH MINGW64 /c/Dev/Projects/test-bed/app-zig-cglm (master)
$ zig build -Dtarget=x86_64-windows-gnu
.\zig-cache\o\1bc9e6dc93c2ab6590b8006381a9000e\cimport.zig:4255:26: error: unable to translate function
pub const glm_mul_sse2 = #compileError("unable to translate function"); // C:\Dev\Projects\test-bed\app-zig-cglm\vendor\cglm\include\cglm/simd/sse2/affine.h:17:1
^
.\zig-cache\o\1bc9e6dc93c2ab6590b8006381a9000e\cimport.zig:4264:5: note: referenced here
glm_mul_sse2(m1, m2, dest);
^
.\zig-cache\o\1bc9e6dc93c2ab6590b8006381a9000e\cimport.zig:657:20: error: unable to resolve typedef child type
pub const __m128 = #compileError("unable to resolve typedef child type"); // C:\Dev\Tools\zig-0.7.1\lib\include\xmmintrin.h:17:15
^
.\zig-cache\o\1bc9e6dc93c2ab6590b8006381a9000e\cimport.zig:1255:48: note: referenced here
pub fn _mm_store_ps(arg___p: [*c]f32, arg___a: __m128) callconv(.C) void {
^
.\zig-cache\o\1bc9e6dc93c2ab6590b8006381a9000e\cimport.zig:3557:5: note: referenced here
_mm_store_ps(&dest[#intCast(c_uint, #as(c_int, 0))], _mm_load_ps(&mat[#intCast(c_uint, #as(c_int, 0))]));
^
app-zig-cglm...The following command exited with error code 1:
C:\Dev\Tools\zig-0.7.1\zig.exe build-exe C:\Dev\Projects\test-bed\app-zig-cglm\src\main.zig C:\Dev\Projects\test-bed\app-zig-cglm\zig-cache\o\dfacf78e7514990119de31b967d43202\cglm.lib --library c --cache-dir C:\Dev\Projects\test-bed\app-zig-cglm\zig-cache --global-cache-dir C:\Users\alagt\AppData\Local\zig --name app-zig-cglm -target x86_64-windows-gnu -isystem C:\Dev\Projects\test-bed\app-zig-cglm\vendor\cglm\include --enable-cache
error: the following build command failed with exit code 1:
C:\Dev\Projects\test-bed\app-zig-cglm\zig-cache\o\64a99c599e98cac00a19cffa57c71a68\build.exe C:\Dev\Tools\zig-0.7.1\zig.exe C:\Dev\Projects\test-bed\app-zig-cglm C:\Dev\Projects\test-bed\app-zig-cglm\zig-cache C:\Users\alagt\AppData\Local\zig -Dtarget=x86_64-windows-gnu
The unable to resolve line from avxintrin.h is the following
typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));
I would like to know if C libraries using intrinsics can work as of now and if not I would like a way to disable the intrinsics features sse/avx/... from the build.zig so the CGLM can be built and linked without SIMD optimizations.
Edit (Disabled SIMD features from build.zig)
I have dug a bit on the lib/zig/std/build.zig to know how to disable SIMD features and finally I was able to disable with following code on my build.zig
var target = b.standardTargetOptions(.{});
target.cpu_features_sub = x86.featureSet(&[_]x86.Feature{
x86.Feature.avx,
x86.Feature.avx2,
x86.Feature.avx512bf16,
x86.Feature.avx512bitalg,
x86.Feature.avx512bw,
x86.Feature.avx512cd,
x86.Feature.avx512dq,
x86.Feature.avx512er,
x86.Feature.avx512f,
x86.Feature.avx512ifma,
x86.Feature.avx512pf,
x86.Feature.avx512vbmi,
x86.Feature.avx512vbmi2,
x86.Feature.avx512vl,
x86.Feature.avx512vnni,
x86.Feature.avx512vp2intersect,
x86.Feature.avx512vpopcntdq,
x86.Feature.sse,
x86.Feature.sse_unaligned_mem,
x86.Feature.sse2,
x86.Feature.sse3,
x86.Feature.sse4_1,
x86.Feature.sse4_2,
x86.Feature.sse4a,
x86.Feature.ssse3,
});
State:
Now I'm able to build (without SIMD optimizations what is not ideal) and link CGLM lib from zig code but really concerning problem arise as the code calling standard C tanf function is computing really wrong results.
Following the zig code generated for the glm_perspective function.
pub fn glm_perspective(arg_fovy: f32, arg_aspect: f32, arg_nearVal: f32, arg_farVal: f32, arg_dest: [*c]vec4) callconv(.C) void {
var fovy = arg_fovy;
var aspect = arg_aspect;
var nearVal = arg_nearVal;
var farVal = arg_farVal;
var dest = arg_dest;
var f: f32 = undefined;
var #"fn": f32 = undefined;
glm_mat4_zero(dest);
f = (1 / tanf((fovy * 0.5)));
#"fn" = (1 / (nearVal - farVal));
dest[#intCast(c_uint, #as(c_int, 0))][#intCast(c_uint, #as(c_int, 0))] = (f / aspect);
dest[#intCast(c_uint, #as(c_int, 1))][#intCast(c_uint, #as(c_int, 1))] = f;
dest[#intCast(c_uint, #as(c_int, 2))][#intCast(c_uint, #as(c_int, 2))] = ((nearVal + farVal) * #"fn");
dest[#intCast(c_uint, #as(c_int, 2))][#intCast(c_uint, #as(c_int, 3))] = -1;
dest[#intCast(c_uint, #as(c_int, 3))][#intCast(c_uint, #as(c_int, 2))] = (((2 * nearVal) * farVal) * #"fn");
}
The result of executing the f = (1 / tanf((fovy * 0.5))); line with arg_fovy = 0.785398185 (45 deg in radians) is returning 134217728. which is just totally wrong. Correct result should be approx. 2.4142134.
VSCode cppvsdbg do not allow me to step into the tanf to understand what could be wrong with the tanf implementation.
Edit (building and compiling min C code with Zig compiler):
I have compiled a min C case using math tanf and built with Zig builder mechanism.
build.zig
const Builder = #import("std").build.Builder;
const x86 = #import("std").Target.x86;
pub fn build(b: *Builder) void {
// Standard target options allows the person running `zig build` to choose
// what target to build for. Here we do not override the defaults, which
// means any target is allowed, and the default is native. Other options
// for restricting supported target set are available.
var target = b.standardTargetOptions(.{});
target.cpu_features_sub = x86.featureSet(&[_]x86.Feature{
x86.Feature.avx,
x86.Feature.avx2,
x86.Feature.avx512bf16,
x86.Feature.avx512bitalg,
x86.Feature.avx512bw,
x86.Feature.avx512cd,
x86.Feature.avx512dq,
x86.Feature.avx512er,
x86.Feature.avx512f,
x86.Feature.avx512ifma,
x86.Feature.avx512pf,
x86.Feature.avx512vbmi,
x86.Feature.avx512vbmi2,
x86.Feature.avx512vl,
x86.Feature.avx512vnni,
x86.Feature.avx512vp2intersect,
x86.Feature.avx512vpopcntdq,
x86.Feature.sse,
x86.Feature.sse_unaligned_mem,
x86.Feature.sse2,
x86.Feature.sse3,
x86.Feature.sse4_1,
x86.Feature.sse4_2,
x86.Feature.sse4a,
x86.Feature.ssse3,
});
// Standard release options allow the person running `zig build` to select
// between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall.
const mode = b.standardReleaseOptions();
const exeC = b.addExecutable("app-c", "src/main.c");
exeC.setBuildMode(mode);
exeC.setTarget(target);
// C and win32
exeC.linkLibC();
exeC.install();
const run_cmd = exeC.run();
run_cmd.step.dependOn(b.getInstallStep());
if (b.args) |args| {
run_cmd.addArgs(args);
}
const run_step = b.step("run", "Run the app");
run_step.dependOn(&run_cmd.step);
}
src/main.c
#include <stdio.h>
#include <math.h>
int main(int argc, char **argv) {
float fovDeg = (argc == 1 ? 45.0f : 50.0f);
while (1) {
float fovRad = fovDeg * (M_PI / 180.0f);
float tanFov = tanf(fovRad);
fprintf(stdout, "fovDeg=%f, fovRad=%f, tanFov=%f\n", fovDeg, fovRad, tanFov);
}
return 0;
}
If I build without SIMD exclusions or just set target as native the code executes correctly with the expected results.
$ zig build -Dtarget=native-windows-gnu run -- 1
fovDeg=50.000000, fovRad=0.872665, tanFov=1.191754
If I build with the SIMD exclusions and execute it then tanf returns 0
$ zig build -Dtarget=x86_64-windows-gnu run -- 1
fovDeg=50.000000, fovRad=0.872665, tanFov=0.000000
Current status
It seems to me that
zig mingw64 or msvcrt do not support basic math functions and has floating point errors with x86_64 arch SIMD features disabled.
Zig mingw64 do not support SIMD C code.
CGLM can't be compiled by Zig in a workable manner.
If anybody can bring some light in case I'm missing something it would be appreciated.
Thanks in advance.

Perhaps, I misunderstand your problem but this worked for me with no errors. Add to your build.zig:
exe.addIncludeDir("path_to_your_cglm/cglm/include");
exe.addLibPath("path_to_your_cglm/cglm/win/Release");
exe.linkSystemLibrary("cglm");
Also, the Discord and reddit Zig communities are very active so consider posting there. Have a look at this page: https://github.com/ziglang/zig/wiki/Community.
There is also the Zig Forum.

Related

triSYCL throws non_cl_error, when tricycle::device::~device is called

I'm trying to run a parallel for loop with triSYCL. This is my code:
#define TRISYCL_OPENCL
#define OMP_NUM_THREADS 8
#define BOOST_COMPUTE_USE_CPP11
//standart libraries
#include <iostream>
#include <functional>
//deps
#include "CL/sycl.hpp"
struct Color
{
float r, g, b, a;
friend std::ostream& operator<<(std::ostream& os, const Color& c)
{
os << "(" << c.r << ", " << c.g << ", " << c.b << ", " << c.a << ")";
return os;
}
};
struct Vertex
{
float x, y;
Color color;
friend std::ostream& operator<<(std::ostream& os, const Vertex& v)
{
os << "x: " << v.x << ", y: " << v.y << ", color: " << v.color;
return os;
}
};
template<typename T>
T mapNumber(T x, T a, T b, T c, T d)
{
return (x - a) / (b - a) * (d - c) + c;
}
int windowWidth = 640;
int windowHeight = 720;
int main()
{
auto exception_handler = [](cl::sycl::exception_list exceptions) {
for (std::exception_ptr const& e : exceptions)
{
try
{
std::rethrow_exception(e);
} catch (cl::sycl::exception const& e)
{
std::cout << "Caught asynchronous SYCL exception: " << e.what() << std::endl;
}
}
};
cl::sycl::default_selector defaultSelector;
cl::sycl::context context(defaultSelector, exception_handler);
cl::sycl::queue queue(context, defaultSelector, exception_handler);
auto* pixelColors = new Color[windowWidth * windowHeight];
{
cl::sycl::buffer<Color, 2> color_buffer(pixelColors, cl::sycl::range < 2 > {(unsigned long) windowWidth,
(unsigned long) windowHeight});
cl::sycl::buffer<int, 1> b_windowWidth(&windowWidth, cl::sycl::range < 1 > {1});
cl::sycl::buffer<int, 1> b_windowHeight(&windowHeight, cl::sycl::range < 1 > {1});
queue.submit([&](cl::sycl::handler& cgh) {
auto color_buffer_acc = color_buffer.get_access<cl::sycl::access::mode::write>(cgh);
auto width_buffer_acc = b_windowWidth.get_access<cl::sycl::access::mode::read>(cgh);
auto height_buffer_acc = b_windowHeight.get_access<cl::sycl::access::mode::read>(cgh);
cgh.parallel_for<class init_pixelColors>(
cl::sycl::range<2>((unsigned long) width_buffer_acc[0], (unsigned long) height_buffer_acc[0]),
[=](cl::sycl::id<2> index) {
color_buffer_acc[index[0]][index[1]] = {
mapNumber<float>(index[0], 0.f, width_buffer_acc[0], 0.f, 1.f),
mapNumber<float>(index[1], 0.f, height_buffer_acc[0], 0.f, 1.f),
0.f,
1.f};
});
});
std::cout << "cl::sycl::queue check - selected device: "
<< queue.get_device().get_info<cl::sycl::info::device::name>() << std::endl;
}//here the error appears
delete[] pixelColors;
return 0;
}
I'm building it with this CMakeLists.txt file:
cmake_minimum_required(VERSION 3.16.2)
project(acMandelbrotSet_stackoverflow)
set(CMAKE_CXX_STANDARD 17)
set(SRC_FILES
path/to/main.cpp
)
find_package(OpenCL REQUIRED)
set(Boost_INCLUDE_DIR path/to/boost)
include_directories(${Boost_INCLUDE_DIR})
include_directories(path/to/SYCL/include)
set(LIBS PRIVATE ${Boost_LIBRARIES} OpenCL::OpenCL)
add_executable(${PROJECT_NAME} ${SRC_FILES})
set_target_properties(${PROJECT_NAME} PROPERTIES DEBUG_POSTFIX _d)
target_link_libraries(${PROJECT_NAME} ${LIBS})
When I try to run it, I get this message: libc++abi.dylib: terminating with uncaught exception of type trisycl::non_cl_error from path/to/SYCL/include/triSYCL/command_group/detail/task.hpp line: 278 function: trisycl::detail::task::get_kernel, the message was: "Cannot use an OpenCL kernel in this context".
I've tried to create a lambda of mapNumber in the kernel but that didn't make any difference. I've also tried to use this before the end of the scope to catch errors:
try
{
queue.wait_and_throw();
} catch (cl::sycl::exception const& e)
{
std::cout << "Caught synchronous SYCL exception: " << e.what() << std::endl;
}
but nothing was printed to the console except the error from before. And I've also tried to make an event of the queue.submit call and then call event.wait() before the end of the scope but again the exact same output.
Does any body have an idea what else I could try?
The problem is that triSYCL is a research project looking deeper at some aspects of SYCL while not providing a global generic SYCL support for an end-user. I have just clarified this on the README of the project. :-(
Probably the problem here is that the OpenCL SPIR kernel has not been generated.
So you need to first compile the specific (old) Clang & LLVM from triSYCL https://github.com/triSYCL/triSYCL/blob/master/doc/architecture.rst#trisycl-architecture-for-accelerator. But unfortunately there is no simple Clang driver to use all the specific Clang & LLVM to generate the kernels from the SYCL source. Right know it is done with some ad-hoc awful Makefiles (look around https://github.com/triSYCL/triSYCL/blob/master/tests/Makefile#L360) and, even if you can survive to this, you might encounter some bugs...
The good news is now there are several other implementations of SYCL which are quite easier to use, quite more complete and quite less buggy! :-) Look at ComputeCpp, DPC++ and hipSYCL for example.

Why does returning an element of a copied Matrix3d result in incorrect output when using Clang 3.9?

Compiling the following example with -O2 on Clang 3.9 results in the reproFunction returning garbage (1.9038e+185) when called in main:
Code
double reproFunction(const Eigen::Matrix3d& R_in)
{
const Eigen::Matrix3d R = R_in;
Eigen::Matrix3d Q = R.cwiseAbs();
if(R(1,2) < 2) {
Eigen::Vector3d n{0, 1, R(1, 2)};
double s2 = R(1,2);
s2 /= n.norm();
}
return R(1, 2);
}
int main() {
Eigen::Matrix3d R;
R = Eigen::Matrix3d::Zero(3,3);
// This fails - reproFunction(R) returns 0
R(1, 2) = 0.7;
double R12 = reproFunction(R);
bool are_they_equal = (R12 == R(1,2));
std::cout << "R12 == R(1,2): " << are_they_equal << std::endl;
std::cout << "R12: " << R12 << std::endl;
std::cout << "R(1, 2): " << R(1, 2) << std::endl;
}
Output
R12 == R(1,2): 0
R12: 1.9036e+185
R(1, 2): 0.7
reproFunction, initializes R (which is const) by assignment from R_in. It returns R(1, 2). Between the assignment and the return, reproFunction uses R in several operations, but none of them should be able to change R. Removing any of those operations results in reproFunction returning the correct value.
This behavior does not appear in any of the following cases:
The program is compiled with Clang 3.5, Clang 4.0,or g++-5.4.
The optimization level is -O1 or lower
Eigen 3.2.10 is used instead of Eigen 3.3.3
Now the question: Is this behavior due to a bug I've missed in the code above, a bug in Eigen 3.3.3, or a bug in Clang 3.9?
A self-contained reproduction example can be found at https://github.com/avalenzu/eigen-clang-weirdness.
I could reproduce this with clang 3.9, but not with clang 3.8. I bisected the issue on Eigen's side to this commit from 2016-05-24 21:54:
Bug 256: enable vectorization with unaligned loads/stores. This concerns all architectures and all sizes. This new behavior can be disabled by defining EIGEN_UNALIGNED_VECTORIZE=0
That commit enables vectorized operations on unaligned data.
I still think, this is a bug in clang, but you can work-around it by compiling with
-D EIGEN_UNALIGNED_VECTORIZE=0
Also, Eigen could be 'fixed' by automatically disabling this feature if clang 3.9 is detected as compiler.

Boxed Fn requires lifetime 'static only when testing?

Using rustc 1.10.0, I'm trying to write some code which passes around boxed closures--the eventual goal is to procedurally generate an animation of fractals. Right now I have some function signatures like this:
pub fn interpolate_rectilinear(width: u32, height: u32, mut min_x: f64, mut max_x: f64, mut min_y: f64, mut max_y: f64)
-> Box<Fn(u32, u32) -> Complex64 + Send + Sync + 'static> { ... }
pub fn interpolate_stretch(width: u32, height: u32, mut min_x: f64, mut max_x: f64, mut min_y: f64, mut max_y: f64)
-> Box<Fn(u32, u32) -> Complex64 + Send + Sync + 'static> { ... }
pub fn parallel_image<F>(width: u32, height: u32, function: &F, interpolate: &Box<Fn(u32, u32) -> Complex64 + Send + Sync>, threshold: f64)
-> ImageBuffer<image::Luma<u8>, Vec<u8>>
where F: Sync + Fn(Complex64) -> Complex64
{ ... }
pub fn sequential_image<F>(width: u32, height: u32, function: &F, interpolate: &Box<Fn(u32, u32) -> Complex64>, threshold: f64)
-> ImageBuffer<image::Luma<u8>, Vec<u8>>
where F: Fn(Complex64) -> Complex64
{ ... }
Running this code for one image at a time in a binary works without problems:
let interpolate = interpolate_rectilinear(width, height, -1.0, 1.0, -1.0, 1.0);
let image = parallel_image(width * 2, height * 2, &default_julia, &interpolate, 2.0);
However, I wanted to ensure my serial and parallel image-production were both producing the same results, so I wrote the following test function:
#[test]
fn test_serial_parallel_agree() {
let (width, height) = (200, 200);
let threshold = 2.0;
let interpolate = interpolate_stretch(width, height, -1.0, 1.0, -1.0, 1.0);
assert!(parallel_image(width, height, &default_julia, &interpolate, threshold)
.pixels()
.zip(sequential_image(width, height, &default_julia, &interpolate, threshold)
.pixels())
.all(|(p, s)| p == s));
}
This refuses to compile, and I just can't figure it out. The error it gives is as follows:
> cargo test
Compiling julia-set v0.3.0
src/lib.rs:231:66: 231:78 error: mismatched types [E0308]
src/lib.rs:231 .zip(sequential_image(width, height, &default_julia, &interpolate, threshold)
^~~~~~~~~~~~
src/lib.rs:229:9: 233:36 note: in this expansion of assert! (defined in <std macros>)
src/lib.rs:231:66: 231:78 help: run `rustc --explain E0308` to see a detailed explanation
src/lib.rs:231:66: 231:78 note: expected type `&Box<std::ops::Fn(u32, u32) -> num::Complex<f64> + 'static>`
src/lib.rs:231:66: 231:78 note: found type `&Box<std::ops::Fn(u32, u32) -> num::Complex<f64> + Send + Sync>`
error: aborting due to previous error
Build failed, waiting for other jobs to finish...
error: Could not compile `julia-set`.
I really don't know what's going on there. I don't know why I'm required to manually mark Send and Sync in the boxed return types of the interpolation functions, when the compiler typically derives those traits automatically. Still, I just kept adding in markers that the compiler suggested until things worked.
The real problem is that, while I think I have a pretty good guess why you can't just mark a boxed closure 'static, I don't know what's requiring that lifetime in this case or how to fix it.
I did guess that possibly the issue was that I was trying to reference the closure from two read-borrows at once, (which should be ok, but I was desperate); at any rate, wrapping interpolate in an Rc gives the exact same error, so that wasn't the problem.
The problem is actually here:
pub fn sequential_image<F>(
...,
interpolate: &Box<Fn(u32, u32) -> Complex64>,
...) -> ...
The interpolate doesn't expect a &Box<Fn(u32, u32) -> Complex64 + Send + Sync>, and Rust is pretty bad at handling variance through all of this complexity.
One solution is to do the cast where it's called:
sequential_image(width, height, &default_julia,
&(interpolate as Box<Fn(u32, u32) -> Complex64>),
threshold)
but this requires a value case of sequential_image and is pretty damn ugly.
A nicer way is to just fix the parameter of sequential_image to something both more general and something easier for the compiler to reason about: basic pointers.
pub fn sequential_image<F>(
...,
interpolate: &Fn(u32, u32) -> Complex64,
...) -> ...
Now you can call it with just
sequential_image(width, height, &default_julia,
&*interpolate,
threshold)
and the compiler can do all of the variance magic itself.

the performance of erlang numeric calculation

C version:
#include <stdio.h>
#include <stdlib.h>
unsigned int test(unsigned int n_count) {
unsigned int c = 1;
unsigned int i;
for (i=0; i< n_count;i++) {
c += 2 * 34 + 1;
c /= 2;
c *= 39;
}
return c;
}
int main(int argc, char* argv[])
{
printf("%u\n", test(atoi(argv[1])));
}
Result:
$ gcc p2.c
$ time ./a.out 100000000
563970997
real 0m0.865s
user 0m0.864s
sys 0m0.004s
erlang version:
-module(test2).
-export([main/1]).
-mode(compile).
calc(Cnt, Total) when Cnt > 0 ->
if Total >= 4294967296 -> Total2 = Total rem 4294967296;
true -> Total2 = Total end,
calc(Cnt - 1, trunc((Total2 + 2 * 34 + 1) / 2) * 39);
calc(0, Total)->
if Total >= 4294967296 -> Total2 = Total rem 4294967296;
true -> Total2 = Total end,
io:format("~p ~n", [Total2]),
ok.
main([A])->
Cnt = list_to_integer(A),
calc(Cnt, 1).
Result:
$ erlc +native +"{hipe, [to_llvm]}" test2.erl
$ time escript test2.beam 100000000
563970997
real 0m4.940s
user 0m4.892s
sys 0m0.056s
$ erlc +native test2.erl
$ time escript test2.beam 100000000
563970997
real 0m5.381s
user 0m5.320s
sys 0m0.064s
$ erlc test2.erl
$ time escript test2.beam 100000000
563970997
real 0m9.868s
user 0m9.808s
sys 0m0.056s
How to improve the performance of erlang version?
In erlang, I have to simulate the integer overflow case, is there better way?
And even with hipe, the performance is far from C.
Edit:
Python version:
def test(n_count):
c = 1
for i in xrange(n_count):
c += 2 * 34 + 1
c /= 2
c *= 39
if c >= 4294967296:
c = c % 4294967296
return c
print test(100000000)
Result:
$ time python p2.py
563970997
real 0m17.813s
user 0m17.808s
sys 0m0.008s
$ time pypy p2.py
563970997
real 0m1.852s
user 0m0.508s
sys 0m0.128s
I think the following link may be especially helpful, you'll be able to 'bake' your C code into your Erlang application:
http://www.erlang.org/doc/tutorial/c_port.html
Erlang really has not good in digit-rolling tasks. It good, if you want take bytes and send them.
Usual serious Erlang development cycle is including final optimization, when you rewriting some bottleneck modules to native.
Yes, Erlang looks like good calc (and projects like Wings3D showing that), but maybe you must choose another tool?

Fortran preprocessing with Portland compiler

I am trying to pre-process a Fortran module (pmu.F90) with pgf90. The module is as follows:
module pmu
module variables
contains
include 'file.F90'
end module
file.F90 is a subroutine which contains the following lines:
#ifdef PART
startm1 = xstart - 1
startm2 = xstart - 2
endp1 = xend + 1
endp2 = xend + 2
#else
startm1 = xstart - 1
startm2 = xstart - 1
endp1 = xend + 1
endp2 = xend + 1
#endif
If I compile with:
pgf90 -DPART -Mfree -Mbounds -Msave -Mdclchk -r8 -Mpreprocess -I/data/users/mrosso/fftw3/include -c pmu.F90
I get
PGF90-S-0021-Label field of continuation line is not blank.
Well, the included file contains no procedures, which is what you need between the "contains" and "end module" statements in the module pmu file.
Another issue with using CPP with Fortran is that the Fortran include statement is not the same as the CPP #include. In particular, their interaction is not specified. That is, if you're including a file which itself contains CPP directives, it's one less thing that can go wrong if you use #include instead.

Resources