ArrayFire seq to int c++ - arrayfire

Imagine a gfor with a seq j...
If I need to use the value of the instance j as a index, who can I do that?
something like:
vector<double> a(n);
gfor(seq j, n){
//Do some calculation and save this on someValue
a[j] = someValue;
}
Someone can help me (again) ?
Thanks.

I've found a solution for this...
if someone had a better option, feel free to post...
First, create a seq with the same size of your gfor instances.
Then, convert that seq in a array.
Now, take the value of that line on array (it's equals the index)
seq sequencia(0, 200);
af::array sqc = sequencia;
//Inside the gfor loop
countLoop = (int) sqc(j).scalar<float>();

Your approach works, but breaks gfors parallelization as converting the index to a scalar forces it to be written from the gpu back to the host, slamming the breaks on the GPU.
You want to do it more like this :
af::array a(200);
gfor(seq j, 200){
//Do some calculation and save this on someValue
a[j] = af::array(someValue); // for someValue a primitive type, say float
}
// ... Now we're safe outside the parallel loop, let's grab the array results
float results[200];
a.host(results) // Copy array from GPU to host, populating a c-type array

Related

How to make an operation similar to _mm_extract_epi8 with non-immediate input?

What I want is extracting a value from vector using a variable scalar index.
Like _mm_extract_epi8 / _mm256_extract_epi8 but with non-immediate input.
(There are some results in the vector, the one with the given index is found out to be the true result, the rest are discarded)
Especially, if index is in a GPR, the easiest way is probably to store val to memory and then movzx it into another GPR. Sample implementation using C:
uint8_t extract_epu8var(__m256i val, int index) {
union {
__m256i m256;
uint8_t array[32];
} tmp;
tmp.m256 = val;
return tmp.array[index];
}
Godbolt translation (note that a lot of overhead happens for stack alignment -- if you don't have an aligned temporary storage area, you could just vmovdqu instead of vmovdqa): https://godbolt.org/z/Gj6Eadq9r
So far the best option seem to be using _mm_shuffle_epi8 for SSE
uint8_t extract_epu8var(__m128i val, int index) {
return (uint8_t)_mm_cvtsi128_si32(
_mm_shuffle_epi8(val, _mm_cvtsi32_si128(index)));
}
Unfortunately this does not scale well for AVX. vpshufb does not shuffle across lanes. There is a cross lane shuffle _mm256_permutevar8x32_epi32, but the resulting stuff seem to be complicated:
uint8_t extract_epu8var(__m256i val, int index) {
int index_low = index & 0x3;
int index_high = (index >> 2);
return (uint8_t)(_mm256_cvtsi256_si32(_mm256_permutevar8x32_epi32(
val, _mm256_zextsi128_si256(_mm_cvtsi32_si128(index_high))))
>> (index_low << 3));
}

Destructured iteration over variadic arguments like a tuple sequence in D

Let's say I want to process a variadic function which alternately gets passed start and end values of 1 or more intervals and it should return a range of random values in those intervals. You can imagine the input to be a flattened sequence of tuples, all tuple elements spread over one single range.
import std.meta; //variadic template predicates
import std.traits : isFloatingPoint;
import std.range;
auto randomIntervals(T = U[0], U...)(U intervals)
if (U.length/2 > 0 && isFloatingPoint!T && NoDuplicates!U.length == 1) {
import std.random : uniform01;
T[U.length/2] randomValues;
// split and iterate over subranges of size 2
foreach(i, T start, T end; intervals.chunks(2)) { //= intervals.slide(2,2)
randomValues[i] = uniform01 * (end - start) + start,
}
return randomValues.dup;
}
The example is not important, I only use it for explanation. The chunk size could be any finite positive size_t, not only 2 and changing the chunk size should only require changing the number of loop-variables in the foreach loop.
In this form above it will not compile since it would only expect one argument (a range) to the foreach loop. What I would like is something which rather automatically uses or infers a sliding-window as a tuple, derived from the number of given loop-variables, and fills the additional variables with next elements of the range/array + allows for an additional index, optionally. According to the documentation a range of tuples allows destructuring of the tuple elements in place into foreach-loop-variables so the first thing, I thought about, is turning a range into a sequence of tuples but didn't find a convenience function for this.
Is there a simple way to loop over destructured subranges (with such a simplicity as shown in my example code) together with the index? Or is there a (standard library) function which does this job of splitting a range into enumerated tuples of equal size? How to easily turn the range of subranges into a range of tuples?
Is it possible with std.algorithm.iteration.map in this case (EDIT: with a simple function argument to map and without accessing tuple elements)?
EDIT: I want to ignore the last chunk which doesn't fit into the entire tuple. It just is not iterated over.
EDIT: It's not, that I couldn't program this myself, I only hope for a simple notation because this use case of looping over multiple elements is quite useful. If there is something like a "spread" or "rest" operator in D like in JavaScript, please let me know!
Thank you.
(Added as a separate answer because it's significantly different from my previous answer, and wouldn't fit in a comment)
After reading your comments and the discussion on the answers thus far, it seems to me what you seek is something like the below staticChunks function:
unittest {
import std.range : enumerate;
size_t index = 0;
foreach (i, a, b, c; [1,2,3,1,2,3].staticChunks!3.enumerate) {
assert(a == 1);
assert(b == 2);
assert(c == 3);
assert(i == index);
++index;
}
}
import std.range : isInputRange;
auto staticChunks(size_t n, R)(R r) if (isInputRange!R) {
import std.range : chunks;
import std.algorithm : map, filter;
return r.chunks(n).filter!(a => a.length == n).map!(a => a.tuplify!n);
}
auto tuplify(size_t n, R)(R r) if (isInputRange!R) {
import std.meta : Repeat;
import std.range : ElementType;
import std.typecons : Tuple;
import std.array : front, popFront, empty;
Tuple!(Repeat!(n, ElementType!R)) result;
static foreach (i; 0..n) {
result[i] = r.front;
r.popFront();
}
assert(r.empty);
return result;
}
Note that this also deals with the last chunk being a different size, if only by silently throwing it away. If this behavior is undesirable, remove the filter, and deal with it inside tuplify (or don't, and watch the exceptions roll in).
chunks and slide return Ranges, not tuples. Their last element can contain less than the specified size, whereas tuples have a fixed compile time size.
If you need destructuring, you have to implement your own chunks/slide that return tuples. To explicitly add an index to the tuple, use enumerate. Here is an example:
import std.typecons, std.stdio, std.range;
Tuple!(int, int)[] pairs(){
return [
tuple(1, 3),
tuple(2, 4),
tuple(3, 5)
];
}
void main(){
foreach(size_t i, int start, int end; pairs.enumerate){
writeln(i, ' ', start, ' ', end);
}
}
Edit:
As BioTronic said using map is also possible:
foreach(i, start, end; intervals
.chunks(2)
.map!(a => tuple(a[0], a[1]))
.enumerate){
Your question has me a little confused, so I'm sorry if I've misunderstood. What you're basically asking is if foreach(a, b; [1,2,3,4].chunks(2)) could work, right?
The simple solution here is to, as you say, map from chunk to tuple:
import std.typecons : tuple;
import std.algorithm : map;
import std.range : chunks;
import std.stdio : writeln;
unittest {
pragma(msg, typeof([1,2].chunks(2).front));
foreach(a, b; [1,2,3,4].chunks(2).map!(a => tuple(a[0], a[1]))) {
writeln(a, ", ", b);
}
}
At the same time with BioTronic, I tried to code some own solution to this problem (tested on DMD). My solution works for slices (BUT NOT fixed-size arrays) and avoids a call to filter:
import std.range : chunks, isInputRange, enumerate;
import std.range : isRandomAccessRange; //changed from "hasSlicing" to "isRandomAccessRange" thanks to BioTronics
import std.traits : isIterable;
/** turns chunks into tuples */
template byTuples(size_t N, M)
if (isRandomAccessRange!M) { //EDITED
import std.meta : Repeat;
import std.typecons : Tuple;
import std.traits : ForeachType;
alias VariableGroup = Tuple!(Repeat!(N, ForeachType!M)); //Tuple of N repititions of M's Foreach-iterated Type
/** turns N consecutive array elements into a Variable Group */
auto toTuple (Chunk)(Chunk subArray) #nogc #safe pure nothrow
if (isInputRange!Chunk) { //Chunk must be indexable
VariableGroup nextLoopVariables; //fill the tuple with static foreach loop
static foreach(index; 0 .. N) {
static if ( isRandomAccessRange!Chunk ) { // add cases for other ranges here
nextLoopVariables[index] = subArray[index];
} else {
nextLoopVariables[index] = subArray.popFront();
}
}
return nextLoopVariables;
}
/** returns a range of VariableGroups */
auto byTuples(M array) #safe pure nothrow {
import std.algorithm.iteration : map;
static if(!isInputRange!M) {
static assert(0, "Cannot call map() on fixed-size array.");
// auto varGroups = array[].chunks(N); //fixed-size arrays aren't slices by default and cannot be treated like ranges
//WARNING! invoking "map" on a chunk range from fixed-size array will fail and access wrong memory with no warning or exception despite #safe!
} else {
auto varGroups = array.chunks(N);
}
//remove last group if incomplete
if (varGroups.back.length < N) varGroups.popBack();
//NOTE! I don't know why but `map!toTuple` DOES NOT COMPILE! And will cause a template compilation mess.
return varGroups.map!(chunk => toTuple(chunk)); //don't know if it uses GC
}
}
void main() {
testArrayToTuples([1, 3, 2, 4, 5, 7, 9]);
}
// Order of template parameters is relevant.
// You must define parameters implicitly at first to be associated with a template specialization
void testArrayToTuples(U : V[], V)(U arr) {
double[] randomNumbers = new double[arr.length / 2];
// generate random numbers
foreach(i, double x, double y; byTuples!2(arr).enumerate ) { //cannot use UFCS with "byTuples"
import std.random : uniform01;
randomNumbers[i] = (uniform01 * (y - x) + x);
}
foreach(n; randomNumbers) { //'n' apparently works despite shadowing a template parameter
import std.stdio : writeln;
writeln(n);
}
}
Using elementwise operations with the slice operator would not work here because uniform01 in uniform01 * (ends[] - starts[]) + starts[] would only be called once and not multiple times.
EDIT: I also tested some online compilers for D for this code and it's weird that they behave differently for the same code. For compilation of D I can recommend
https://run.dlang.io/ (I would be very surprised if this one wouldn't work)
https://www.mycompiler.io/new/d (but a bit slow)
https://ideone.com (it works but it makes your code public! Don't use with protected code.)
but those didn't work for me:
https://tio.run/#d2 (didn't finish compilation in one case, otherwise wrong results on execution even when using dynamic array for the test)
https://www.tutorialspoint.com/compile_d_online.php (doesn't compile the static foreach)

Search for sequence in Uint8List

Is there a fast (native) method to search for a sequence in a Uint8List?
///
/// Return index of first occurrence of seq in list
///
int indexOfSeq(Uint8List list, Uint8List seq) {
...
}
EDIT: Changed List<int> into Uint8List
No. There is no built-in way to search for a sequence of elements in a list.
I am also not aware of any dart:ffi based implementations.
The simplest approach would be:
extension IndexOfElements<T> on List<T> {
int indexOfElements(List<T> elements, [int start = 0]) {
if (elements.isEmpty) return start;
var end = length - elements.length;
if (start > end) return -1;
var first = elements.first;
var pos = start;
while (true) {
pos = indexOf(first, pos);
if (pos < 0 || pos > end) return -1;
for (var i = 1; i < elements.length; i++) {
if (this[pos + i] != elements[i]) {
pos++;
continue;
}
}
return pos;
}
}
}
This has worst-case time complexity O(length*elements.length). There are several more algorithms with better worst-case complexity, but they also have larger constant factors and more expensive pre-computations (KMP, BMH). Unless you search for the same long list several times, or do so in a very, very long list, they're unlikely to be faster in practice (and they'd probably have an API where you compile the pattern first, then search with it.)
You could use dart:ffi to bind to memmem from string.h as you suggested.
We do the same with binding to malloc from stdlib.h in package:ffi (source).
final DynamicLibrary stdlib = Platform.isWindows
? DynamicLibrary.open('kernel32.dll')
: DynamicLibrary.process();
final PosixMalloc posixMalloc =
stdlib.lookupFunction<Pointer Function(IntPtr), Pointer Function(int)>('malloc');
Edit: as lrn pointed out, we cannot expose the inner data pointer of a Uint8List at the moment, because the GC might relocate it.
One could use dart_api.h and use the FFI to pass TypedData through the FFI trampoline as Dart_Handle and use Dart_TypedDataAcquireData from the dart_api.h to access the inner data pointer.
(If you want to use this in Flutter, we would need to expose Dart_TypedDataAcquireData and Dart_TypedDataReleaseData in dart_api_dl.h https://github.com/dart-lang/sdk/issues/40607 I've filed https://github.com/dart-lang/sdk/issues/44442 to track this.)
Alternatively, could address https://github.com/dart-lang/sdk/issues/36707 so that we could just expose the inner data pointer of a Uint8List directly in the FFI trampoline.

How does return i++ work?

This is a question about the ++ operator. It is a question that doesn't solve a problem but I believe that it would be very informative to know the answer. So, while StackOverflow is a source of information, I would like to append the available knowledge here!
So, the problem:
int i = 0;
int getNextInt(){
return i++;
}
This is it. The question is how the ++ operator is implemented in the function call stack and where the increment actually happens when we call that function in our code.
I am really wondering a long time!
It happens after the function returns it's value. In this case, it returns 0 and then increments the variable, so the next time it will return 1
I have no idea what the compiler actually does with regard to the stack, but if it chose to, the compiler could easily rearrange it to this:
int getNextInt() {
int temp = i;
i++;
return temp;
}
i++ is the same as saying i = i+1
It is just a shorthand way of redefining the variable.

Java 2ME Swapping Vector elements

I want to sort my vector but I need to use a swap function in order to do that... Is there any pre-defined methods like Collections.swap(vector, index1, index2) in java2me?
Thats pretty simple actually
private static void swap(Vector src, int i, int j)
{
Object tmp = src.elementAt(i);
src.setElementAt(src.elementAt(j), i);
src.setElementAt(tmp, j);
}
Well, there's certainly a Collections.sort()
http://download.oracle.com/javame/config/cdc/ref-impl/cdc1.1.2/jsr218/java/util/Collections.html
Edit: And since I'm apparently blind, as Bala R points out below, there's a swap as well, exactly as you describe.

Resources