Copy complex numbers from host to device using ArrayFire - arrayfire

I am trying to copy complex number array from host to device using ArrayFire framework:
std::complex<float> hostArray[131072];
array deviceArray (131072, hostArray);
But it gives compilation error due to data type incompatibilities. What am I doing wrong?
I can copy the real and imaginary parts separately into device in order to create complex numbers inside gpu memory but it is costly and I also dont't know how to construct complex number from two numbers in ArrayFire framework.
I would be grateful if someone can help me with ArrayFire framework in this matter.

ArrayFire uses cuComplex(cfloat in ArrayFire 2.0RC) to store complex numbers. cuComplex is defined as a float2 internally which is a struct with two elements.
std::complex should have the same structure. You might be able to perform a reinterpret_cast to change the type of the variable without moving the data to a different data structure. On my machine(Linux Mint with g++ 4.7.1) I was able to create an ArrayFire array from a std::complex using the following code:
int count = 10;
std::complex<float> host_complex[count];
for(int i = 0; i < count; i++) {
std::real(host_complex[i]) = i;
std::imag(host_complex[i]) = i*2;
}
array af_complex(count, reinterpret_cast<cuComplex*>(host_complex));
print(af_complex);
Output:
af_complex =
0.0000 + 0.0000i
1.0000 + 2.0000i
2.0000 + 4.0000i
3.0000 + 6.0000i
4.0000 + 8.0000i
5.0000 + 10.0000i
6.0000 + 12.0000i
7.0000 + 14.0000i
8.0000 + 16.0000i
9.0000 + 18.0000i
Caveat
As far as I can tell the C++ standard does not specify the size or data layout of the std::complex type therefore this approach might not be portable. If you want a portable solution, I would suggest storing your complex data in a struct like a float2/cfloat to avoid compiler related issues.
Umar

Related

c++ vecotr with iterator can't be destroyed?

Recently I have faced a problem when the program running the memory keep increasing, and when program is closed the memory would restore normal level. Obviously, it's a memory leak. After some work, I have located the code responsible, but I don't know why? The program's work flow is simple:
first use lidar api to get point cloud and image data;
then transport to next tbb flow graph to process these data;
finally use open3d api to visualzie them.
In the first step, the lidar itself's api use asio to asynchronously invoke some callback function to transport data, so I create some tbb concurrent_queue to store these data, and a align function to match cloud and image with timestamp. The problem is in the align function. In the function, I create a vector<shared_ptr<open3d::..::PointCloud>> and use iterator to store point cloud elements. However, I found when the function complete, the shared_ptr use count don't reduce . Similar but simpler example code like this:
std::pair<std::shared_ptr<int>, int> helper() {
auto a = std::make_shared<int>(90);
auto c = 100;
std::vector<std::pair<std::shared_ptr<int>, int>> container;
container.reserve(5);
auto iter = container.begin();
for (int i = 0; i < 3; i++) {
*iter = std::make_pair(a, c);
iter++;
}
return *(iter-1);
}
int main() {
auto b = helper();
std::cout << "shared_ptr use count: " << std::get<0>(b).use_count() << std::endl;
return 0;
}
Ubuntu 20.04 + gcc 9.4, the print result is shared_ptr use count: 4.
Why the vector can't be auto destroyed when function is completed? Hope someone kindly explain this problem.
Thanks #Retired Ninja! The root of the problem is vector.reserve just reserve capacity not physical space. So the vector space after reserve is 0. The following iterator operation is assumed point to some undifined memory. While the result can be transport to main function with no value error, the shared_ptr use count can't reduce to 1 after function call.
To solve the problem, One can just modify the reserve to resize, which can change physical space of the vector and iterator point to defined memory space. Or avoid use iterator, just use push_back and return back().

What's the reason for this time difference?

there is some problem with my code
In follow code:
GainDetailMatI is Mat type with 9792*2448 matrix
ContrastGainBound4096x,ContrastGainLayerI is int
Platform: Android 4.4, NDK gcc 4.9
A:
Mat plus = ContrastGainLayerI * min(ContrastGainBound4096x, max(0, GainDetailMatI - 4096.0));
B:
Mat t=max(0, GainDetailMatI - 4096.0);
Mat plus = ContrastGainLayerI * min(ContrastGainBound4096x, t);
A use 13 millisec more than B.
I close gcc optimize by set APP_OPTIM := debug at Application.mk
Is there anyone know the reason?
I think maybe max(0, GainDetailMatI - 4096.0) return with type MatExpr
And t=max(0, GainDetailMatI - 4096.0); convert MatExpr to Mat
Maybe this is the reason?
Thanks a lot!
In example B you first store the object in t, retrieving it to use in the second part of your code. In example A you skip the storing and retrieving making the code more efficient. While this shows that dumping all your code on one line often makes it more efficient, keep in mind that readablility has ALOT of value. More info on Java performance can be found on the wiki. https://en.wikipedia.org/wiki/Java_performance#Compressed_Oops

Pattern matching a binary in erlang

I'm trying to pattern match a binary against this
<<_:(A * ?N + A + B)/binary,T:1/binary,_/binary>>
However it seems erlang throws an error saying that variable T is unbound. Just a quick explanation: I want to ignore a certain number of bytes and then read a byte and then ignore the remaining bytes. How can I achieve this?
In bit syntax we can't use runtime expressions as bit size.
We can use only constants, compile time expressions like _:(4*8)/binary and variables: _:Var/binary.
In your case, solution is to bind A * ?N + A + B to variable first.
IgnoredBytes = A * ?N + A + B,
<<_:IgnoredBytes/binary,T:1/binary,_/binary>> = SomeBinary,
T.
It's Better explained in answer from [erlang-questions]

mlpack sparse coding solution not found

I am trying to learn how to use the Sparse Coding algorithm with the mlpack library. When I call Encode() on my instance of mlpack::sparse_coding:SparseCoding, I get the error
[WARN] There are 63 inactive atoms. They will be reinitialized randomly.
error: solve(): solution not found
Is it simply that the algorithm cannot learn a latent representation of the data. Or perhaps it is my usage? The relevant section follows
EDIT: One line was modified to fix an unrelated error, but the original error remains.
double* Application::GetSparseCodes(arma::mat* trainingExample, int atomCount)
{
double* latentRep = new double[atomCount];
mlpack::sparse_coding::SparseCoding<mlpack::sparse_coding::DataDependentRandomInitializer> sc(*trainingExample, Utils::ATOM_COUNT, 1.0);
sc.Encode(Utils::MAX_ITERATIONS);
arma::mat& latentRepMat = sc.Codes();
for (int i = 0; i < atomCount; i++)
latentRep[i] = latentRepMat.at(i, 0);
return latentRep;
}
Some relevant parameters
const static int IMAGE_WIDTH = 20;
const static int IMAGE_HEIGHT = 20;
const static int PIXEL_COUNT = IMAGE_WIDTH * IMAGE_HEIGHT;
const static int ATOM_COUNT = 64;
const static int MAX_ITERATIONS = 100000;
This could be one of a handful of issues but given the description it's a little difficult to tell which of these it is (or if it is something else entirely). However, these three ideas should provide a good place to start:
Matrices in mlpack are column-major. That means each observation should represent a column. If you use mlpack::data::Load() to load, e.g., a CSV file (which are generally one row per observation), it will automatically transpose the dataset. SparseCoding will act oddly if you pass it transposed data. See also http://www.mlpack.org/doxygen.php?doc=matrices.html.
If there are 63 inactive atoms, then only one atom is actually active (given that ATOM_COUNT is 64). This means that the algorithm has found that the best way to represent the dictionary (at a given step) uses only one atom. This could happen if the matrix you are passing consists of all zeros.
mlpack will provide verbose output, which may also be helpful for debugging. Usually this is used by using mlpack's CLI class to parse command-line input, but you can enable verbose output with mlpack::Log::Info.ignoreInput = false. You may obtain a lot of output that way, but it will give a better look at what is going on...
The mlpack project has its own mailing list where you may be likely to get a quicker or more comprehensive response, by the way.

Moving Average across Variables in Stata

I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end

Resources