Implicit vector conversion in ImGui (ImVec <--> glm::vec) - glm-math

I am trying to get the implicit conversion between ImGui's (ImVec) and glm's (glm::vec) vector types working.
In here I read, that I have to change the following lines in the imconfig.h file:
#define IM_VEC2_CLASS_EXTRA \
constexpr ImVec2(const MyVec2& f) : x(f.x), y(f.y) {} \
operator MyVec2() const { return MyVec2(x,y); }
#define IM_VEC4_CLASS_EXTRA \
constexpr ImVec4(const MyVec4& f) : x(f.x), y(f.y), z(f.z), w(f.w) {} \
operator MyVec4() const { return MyVec4(x,y,z,w); }
The first line makes sense to me, but I don't see the point of the second making a new constructor for MyVec.
Since I really have no idea what is going on here, I just tried to replace MyVecN with either glm::vecN or vecN, but neither works.
Also I don't get why there are these backslashes, I guess they're to comment out? Either way, I removed them, and it still didn't work.
The compiler ends up throwing tons of errors so I don't know where the problem is.

You have to defined/include your struct before including imgui:
// define glm::vecN or include it from another file
namespace glm
{
struct vec2
{
float x, y;
vec2(float x, float y) : x(x), y(y) {};
};
struct vec4
{
float x, y, z, w;
vec4(float x, float y, float z, float w) : x(x), y(y), z(z), w(w) {};
};
}
// define extra conversion here before including imgui, don't do it in the imconfig.h
#define IM_VEC2_CLASS_EXTRA \
constexpr ImVec2(glm::vec2& f) : x(f.x), y(f.y) {} \
operator glm::vec2() const { return glm::vec2(x, y); }
#define IM_VEC4_CLASS_EXTRA \
constexpr ImVec4(const glm::vec4& f) : x(f.x), y(f.y), z(f.z), w(f.w) {} \
operator glm::vec4() const { return glm::vec4(x,y,z,w); }
#include "imgui/imgui.h"
And the backslashes \ are just line breaks in the #define, to specify a continuous definition

Related

Synchronizing Statically Allocated Struct Instances between CPU and GPU

I have a struct that contains an array, and I want to copy the contents from an instance of that struct in CPU memory to another instance in GPU memory.
My question is similar to this one. There are two big difference between this question and the one from the link:
I'm not using an array of structs. I just need one.
All instances of the struct are statically allocated.
In attempt to answer my own question, I tried modifying the code in the answer as follows:
#include <stdio.h>
#include <stdlib.h>
#define cudaCheckError() { \
cudaError_t err = cudaGetLastError(); \
if(err != cudaSuccess) { \
printf("Cuda error: %s:%d: %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \
exit(1); \
} \
}
struct Test {
char array[5];
};
__global__ void kernel(Test *dev_test) {
for(int i=0; i < 5; i++) {
printf("Kernel[0][i]: %c \n", dev_test[0].array[i]);
}
}
__device__ Test dev_test; //dev_test is now global, statically allocated, and one instance of the struct
int main(void) {
int size = 5;
Test test; //test is now statically allocated and one instance of the struct
char temp[] = { 'a', 'b', 'c', 'd' , 'e' };
memcpy(test.array, temp, size * sizeof(char));
cudaCheckError();
cudaMemcpy(&dev_test, &test, sizeof(Test), cudaMemcpyHostToDevice);
cudaCheckError();
kernel<<<1, 1>>>(&dev_test);
cudaCheckError();
cudaDeviceSynchronize();
cudaCheckError();
// memory free
return 0;
}
But this code throws a runtime error:
Cuda error: HelloCUDA.cu:34: invalid argument
Is there a way to copy test into dev_test?
When using a statically allocated __device__ variable:
We don't use the cudaMemcpy API. We use the cudaMemcpyToSymbol (or cudaMemcpyFromSymbol) API
We don't pass __device__ variables as kernel arguments. They are at global scope. You just use them in your kernel code.
The following code has these issues addressed:
$ cat t10.cu
#include <stdio.h>
#define cudaCheckError() { \
cudaError_t err = cudaGetLastError(); \
if(err != cudaSuccess) { \
printf("Cuda error: %s:%d: %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \
exit(1); \
} \
}
struct Test {
char array[5];
};
__device__ Test dev_test; //dev_test is now global, statically allocated, and one instance of the struct
__global__ void kernel() {
for(int i=0; i < 5; i++) {
printf("Kernel[0][i]: %c \n", dev_test.array[i]);
}
}
int main(void) {
int size = 5;
Test test; //test is now statically allocated and one instance of the struct
char temp[] = { 'a', 'b', 'c', 'd' , 'e' };
memcpy(test.array, temp, size * sizeof(char));
cudaCheckError();
cudaMemcpyToSymbol(dev_test, &test, sizeof(Test));
cudaCheckError();
kernel<<<1, 1>>>();
cudaCheckError();
cudaDeviceSynchronize();
cudaCheckError();
// memory free
return 0;
}
$ nvcc -o t10 t10.cu
$ cuda-memcheck ./t10
========= CUDA-MEMCHECK
Kernel[0][i]: a
Kernel[0][i]: b
Kernel[0][i]: c
Kernel[0][i]: d
Kernel[0][i]: e
========= ERROR SUMMARY: 0 errors
$
(your array usage in kernel code also didn't make sense. dev_test is not an array, therefore you cannot index into it: dev_test[0]....)

How to sum all 32-bit or 64-bit sub-registers in an SSE XMM, or AVX YMM, and ZMM register?

Say your task results in a subtotal in each floating-point subregister. I'm not seeing an instruction that would sum the subtotals down to one floating-point total. Do I need to store the MM register in plain old memory then do the sum with simple instructions?
(It's unresolved whether these will be double or single-precision, and I plan on coding for every CPU variation up to the forthcoming (?) 512-bit AVX version if I can find the opcodes.)
wget http://www.agner.org/optimize/vectorclass.zip
unzip vectorclass.zip -d vectorclass
cd vectorclass/
This code is GPLv3.
SSE
grep -A11 horizontal_add vectorf128.h
static inline float horizontal_add (Vec4f const & a) {
#if INSTRSET >= 3 // SSE3
__m128 t1 = _mm_hadd_ps(a,a);
__m128 t2 = _mm_hadd_ps(t1,t1);
return _mm_cvtss_f32(t2);
#else
__m128 t1 = _mm_movehl_ps(a,a);
__m128 t2 = _mm_add_ps(a,t1);
__m128 t3 = _mm_shuffle_ps(t2,t2,1);
__m128 t4 = _mm_add_ss(t2,t3);
return _mm_cvtss_f32(t4);
#endif
--
static inline double horizontal_add (Vec2d const & a) {
#if INSTRSET >= 3 // SSE3
__m128d t1 = _mm_hadd_pd(a,a);
return _mm_cvtsd_f64(t1);
#else
__m128 t0 = _mm_castpd_ps(a);
__m128d t1 = _mm_castps_pd(_mm_movehl_ps(t0,t0));
__m128d t2 = _mm_add_sd(a,t1);
return _mm_cvtsd_f64(t2);
#endif
}
AVX
grep -A6 horizontal_add vectorf256.h
static inline float horizontal_add (Vec8f const & a) {
__m256 t1 = _mm256_hadd_ps(a,a);
__m256 t2 = _mm256_hadd_ps(t1,t1);
__m128 t3 = _mm256_extractf128_ps(t2,1);
__m128 t4 = _mm_add_ss(_mm256_castps256_ps128(t2),t3);
return _mm_cvtss_f32(t4);
}
--
static inline double horizontal_add (Vec4d const & a) {
__m256d t1 = _mm256_hadd_pd(a,a);
__m128d t2 = _mm256_extractf128_pd(t1,1);
__m128d t3 = _mm_add_sd(_mm256_castpd256_pd128(t1),t2);
return _mm_cvtsd_f64(t3);
}
AVX512
grep -A3 horizontal_add vectorf512.h
static inline float horizontal_add (Vec16f const & a) {
#if defined(__INTEL_COMPILER)
return _mm512_reduce_add_ps(a);
#else
return horizontal_add(a.get_low() + a.get_high());
#endif
}
--
static inline double horizontal_add (Vec8d const & a) {
#if defined(__INTEL_COMPILER)
return _mm512_reduce_add_pd(a);
#else
return horizontal_add(a.get_low() + a.get_high());
#endif
}
get_high() and get_low()
Vec8f get_high() const {
return _mm256_castpd_ps(_mm512_extractf64x4_pd(_mm512_castps_pd(zmm),1));
}
Vec8f get_low() const {
return _mm512_castps512_ps256(zmm);
}
Vec4d get_low() const {
return _mm512_castpd512_pd256(zmm);
}
Vec4d get_high() const {
return _mm512_extractf64x4_pd(zmm,1);
}
For integers look for horizontal_add in vectori128.h, vectori256.h, and vectori512.h.
You can also use the Vector Class Library (VCL) directly
#include <stdio.h>
#define MAX_VECTOR_SIZE 512
#include "vectorclass.h"
int main(void) {
float x[16]; for(int i=0;i<16;i++) x[i]=i+1;
Vec4f v4 = Vec4f().load(x);
Vec8f v8 = Vec8f().load(x);
Vec16f v16 = Vec16f().load(x);
printf("%f %d\n", horizontal_add(v4), 4*5/2);
printf("%f %d\n", horizontal_add(v8), 8*9/2);
printf("%f %d\n", horizontal_add(v16), 16*17/2);
}
Compile like this (GCC only my KNL is too old for AVX512)
SSE2: g++ -O3 test.cpp
AVX: g++ -O3 -mavx test.cpp
AVX512ER: icpc -O3 -xMIC-AVX512 test.cpp
output
10.000000 10
36.000000 36
136.000000 136
One nice thing with the VCL library is that if you use e.g. Vec8f with a system that only has SSE2 it will emulate AVX using SSE twice.
See the section "Instruction sets and CPU dispatching" in the vectorclass.pdf manual for how to compile for different instruction sets with MSVC, ICC, Clang, and GCC.
I have implemented the following inline function for AVX2. It sums all elements and returns the result. You can look this as a suggestion answer to develop your own function for this purpose.
Note: _mm256_extract_epi32 is not presented for AVX you can use your own method with vmovss such as float _mm256_cvtss_f32 (__m256 a) instead and develop your horizontal addition functions.
// my horizontal addition of epi32
inline int _mm256_hadd2_epi32(__m256i a)
{
__m256i a_hi;
a_hi = _mm256_permute2x128_si256(a, a, 1); //maybe it should be 4
a = _mm256_hadd_epi32(a, a_hi);
a = _mm256_hadd_epi32(a, a);
a = _mm256_hadd_epi32(a, a);
return _mm256_extract_epi32(a,0);
}

How to write equation of the form x'=f(x,t) where t appears explicitly in odeint

I'm trying to use odeint to solve a differential equation of the form:
y[0]'(r)=y1,
y[1]'(r)=f(y,r)
where t appears explicitly. How do I write "r" in the code for the equation?
See example below
typedef std::vector< double > state_type;
class phieq{
double lambda, mu, g, sigma, rv;
public:
phieq(double mlambda, double mmu, double mg, double msigma, double mrv) : lambda(mlambda), mu(mmu), g(mg), sigma(msigma), rv(mrv) {}
void operator() (const state_type &y , state_type &dydr , const double /* t */)
{ dydr[0] = y[1];
dydr[1] = -((2.0*y[1])/r)+lambda*y[0]*y[0]*y[0]-(mu*mu)*y[0];
}
};
r is your independent variable in this case. odeint originates from dynamical systems so it uses t (for time) for this in its example. In your case you should write
void operator() (const state_type &y , state_type &dydr , const double r)
and then you can use r in the expressions below.

Texture2D as function parameter

This is in a compute shader, but I think it's a general hlsl thing. Here's a snippet:
Texture2D<float> Ground : register(t1);
Texture2D<float> Water : register(t2);
SamplerState LinearSampler
{
Filter = MIN_MAG_MIP_LINEAR;
AddressU = Clamp;
AddressV = Clamp;
};
float4 Get(Texture2D source, float x, float y)
{
return source.SampleLevel(LinearSampler, float2(x * dimension.z, y * dimension.w), 0);
}
[numthreads(32, 32, 1)]
void main(uint3 threadID : SV_DispatchThreadID, uint3 groupThreadID : SV_GroupThreadID, uint3 blockID : SV_GroupID)
{
float4 g = GetGround(Ground, 0, 0);
Output[threadID.xy] = g.z;
}
Any calls to the Get method give me:
error X3017: 'Get': cannot implicitly convert from 'const Texture2D' to 'Texture2D'
I assume your lack of code tags are causing the angle brackets to be omitted. The error should be error X3017: 'Get': cannot implicitly convert from 'const Texture2D<float>' to 'Texture2D<float4>' right? This makes sense since Texture2D is implicitly Texture2D<float4> (4-channel), and your global textures are Texture2D<float> (single-channel). The compiler doesn't expand types implicitly (though it will truncate with a warning). This is regardless of whether it's a Texture2D or a plain old float4. To fix your code, you should make sure your source textures are really single-channel, and then make the function argument match (i.e. Texture2D<float> source).

Extracting X and Y from a GLKVector3 ? iOS

Say I have a GLKVector3 and want to read only the x and y values as CGPoints - how can I do this ?
In the GLKVector3 doc, there is type definition:
union _GLKVector3
{
struct { float x, y, z; };
struct { float r, g, b; };
struct { float s, t, p; };
float v[3];
};
typedef union _GLKVector3 GLKVector3;
There for there are 3 options:
GLKVector3's v attribute which is a float[3] array of {x,y,z}
i.e.:
GLKVector3 vector;
...
float x = vector.v[0];
float y = vector.v[1];
float z = vector.v[2];
CGPoint p = CGPointMake(x,y);
Then there are also float attributes x,y,z or less relevant r,g,b or s,t,p for different uses of the vector type:
CGPoint p = CGPointMake(vector.x,vector.y);
GLKVector3 is declared as
union _GLKVector3
{
struct { float x, y, z; };
struct { float r, g, b; };
struct { float s, t, p; };
float v[3];
};
typedef union _GLKVector3 GLKVector3;
So the easiest and most readable way to convert is:
GLKVector3 someVector;
…
CGPoint somePoint = CGPointMake(someVector.x,someVector.y);
Note however that CGPoint consist of CGFloats which might be a double in 64-Bit environments.

Resources