Get/Modify single entries in non-contiguous submatrix views - armadillo

I would like to access and modify single entries in a non-contiguous submatrix view. I tried it like this:
#include <armadillo> // version 5.200.2
int main()
{
arma::mat A(4, 4, arma::fill::zeros);
arma::uvec b(4);
b << 2 << 3;
auto view = A.elem(b, b);
view(0, 0) = 1.0; // Error: No operator()
}
This doesn't work because the expression returned by A.elem(b, b) appears to have no operator() defined. I found that the same thing works with contiguous views like e.g. submat(). Is there any solution/workaround for this or is it simply not possible in the non-contiguous case?

Related

About extending a Look Up Table at compile time

I'd like to extend my instrumental Profiler in order to avoid it affect too much performances.
Im my current implementation, I'm using a ProfilerHelper taking one string, which is put whereever you want in the profiling f().
The ctor is starting the measurement and the dector is closing it, logging the Delta in an unordered_map entry, which is key is the string.
Now, I'd like to turn all of that into a faster stuff.
First of all, I'd like to create a string LUT (Look Up Table) contaning the f()s names at compile time, and turn the unordered_map to a plain vector which is paired by the string function LUT.
Now the question is: I've managed to create a LUT but std::string_view, but I cannot find a way to extend it at compile time.
A first rought trial sounds like this:
template<unsigned N>
constexpr auto LUT() {
std::array<std::string_view, N> Strs{};
for (unsigned n = 0; n < N; n++) {
Strs[n] = "";
}
return Strs;
};
constexpr std::array<std::string_view, 0> StringsLUT { LUT<0>() };
constexpr auto AddString(std::string_view const& Str)
{
constexpr auto Size = StringsLUT.size();
std::array<std::string_view, Size + 1> Copy{};
for (auto i = 0; i < Size; ++i)
Copy[i] = StringsLUT[i];
Copy[Size] = Str;
return Copy;
};
int main()
{
constexpr auto Strs = AddString(__builtin_FUNCTION());
//for (auto const Str : Strs)
std::cout << Strs[0] << std::endl;
}
So my idea should be to recall the AddString whenever needed in my f()s to be profiled, extending this list at compile time.
But of course I should take the returned Copy and replace the StringsLUT everytime, to land to a final StringsLUT with all the f() names inside it.
Is there a way to do that at compile time?
Sorry, but I'm just entering the magic "new" world of constexpr applied to LUT right in these days.
Tx for your support in advance.

How to make an operation similar to _mm_extract_epi8 with non-immediate input?

What I want is extracting a value from vector using a variable scalar index.
Like _mm_extract_epi8 / _mm256_extract_epi8 but with non-immediate input.
(There are some results in the vector, the one with the given index is found out to be the true result, the rest are discarded)
Especially, if index is in a GPR, the easiest way is probably to store val to memory and then movzx it into another GPR. Sample implementation using C:
uint8_t extract_epu8var(__m256i val, int index) {
union {
__m256i m256;
uint8_t array[32];
} tmp;
tmp.m256 = val;
return tmp.array[index];
}
Godbolt translation (note that a lot of overhead happens for stack alignment -- if you don't have an aligned temporary storage area, you could just vmovdqu instead of vmovdqa): https://godbolt.org/z/Gj6Eadq9r
So far the best option seem to be using _mm_shuffle_epi8 for SSE
uint8_t extract_epu8var(__m128i val, int index) {
return (uint8_t)_mm_cvtsi128_si32(
_mm_shuffle_epi8(val, _mm_cvtsi32_si128(index)));
}
Unfortunately this does not scale well for AVX. vpshufb does not shuffle across lanes. There is a cross lane shuffle _mm256_permutevar8x32_epi32, but the resulting stuff seem to be complicated:
uint8_t extract_epu8var(__m256i val, int index) {
int index_low = index & 0x3;
int index_high = (index >> 2);
return (uint8_t)(_mm256_cvtsi256_si32(_mm256_permutevar8x32_epi32(
val, _mm256_zextsi128_si256(_mm_cvtsi32_si128(index_high))))
>> (index_low << 3));
}

Destructured iteration over variadic arguments like a tuple sequence in D

Let's say I want to process a variadic function which alternately gets passed start and end values of 1 or more intervals and it should return a range of random values in those intervals. You can imagine the input to be a flattened sequence of tuples, all tuple elements spread over one single range.
import std.meta; //variadic template predicates
import std.traits : isFloatingPoint;
import std.range;
auto randomIntervals(T = U[0], U...)(U intervals)
if (U.length/2 > 0 && isFloatingPoint!T && NoDuplicates!U.length == 1) {
import std.random : uniform01;
T[U.length/2] randomValues;
// split and iterate over subranges of size 2
foreach(i, T start, T end; intervals.chunks(2)) { //= intervals.slide(2,2)
randomValues[i] = uniform01 * (end - start) + start,
}
return randomValues.dup;
}
The example is not important, I only use it for explanation. The chunk size could be any finite positive size_t, not only 2 and changing the chunk size should only require changing the number of loop-variables in the foreach loop.
In this form above it will not compile since it would only expect one argument (a range) to the foreach loop. What I would like is something which rather automatically uses or infers a sliding-window as a tuple, derived from the number of given loop-variables, and fills the additional variables with next elements of the range/array + allows for an additional index, optionally. According to the documentation a range of tuples allows destructuring of the tuple elements in place into foreach-loop-variables so the first thing, I thought about, is turning a range into a sequence of tuples but didn't find a convenience function for this.
Is there a simple way to loop over destructured subranges (with such a simplicity as shown in my example code) together with the index? Or is there a (standard library) function which does this job of splitting a range into enumerated tuples of equal size? How to easily turn the range of subranges into a range of tuples?
Is it possible with std.algorithm.iteration.map in this case (EDIT: with a simple function argument to map and without accessing tuple elements)?
EDIT: I want to ignore the last chunk which doesn't fit into the entire tuple. It just is not iterated over.
EDIT: It's not, that I couldn't program this myself, I only hope for a simple notation because this use case of looping over multiple elements is quite useful. If there is something like a "spread" or "rest" operator in D like in JavaScript, please let me know!
Thank you.
(Added as a separate answer because it's significantly different from my previous answer, and wouldn't fit in a comment)
After reading your comments and the discussion on the answers thus far, it seems to me what you seek is something like the below staticChunks function:
unittest {
import std.range : enumerate;
size_t index = 0;
foreach (i, a, b, c; [1,2,3,1,2,3].staticChunks!3.enumerate) {
assert(a == 1);
assert(b == 2);
assert(c == 3);
assert(i == index);
++index;
}
}
import std.range : isInputRange;
auto staticChunks(size_t n, R)(R r) if (isInputRange!R) {
import std.range : chunks;
import std.algorithm : map, filter;
return r.chunks(n).filter!(a => a.length == n).map!(a => a.tuplify!n);
}
auto tuplify(size_t n, R)(R r) if (isInputRange!R) {
import std.meta : Repeat;
import std.range : ElementType;
import std.typecons : Tuple;
import std.array : front, popFront, empty;
Tuple!(Repeat!(n, ElementType!R)) result;
static foreach (i; 0..n) {
result[i] = r.front;
r.popFront();
}
assert(r.empty);
return result;
}
Note that this also deals with the last chunk being a different size, if only by silently throwing it away. If this behavior is undesirable, remove the filter, and deal with it inside tuplify (or don't, and watch the exceptions roll in).
chunks and slide return Ranges, not tuples. Their last element can contain less than the specified size, whereas tuples have a fixed compile time size.
If you need destructuring, you have to implement your own chunks/slide that return tuples. To explicitly add an index to the tuple, use enumerate. Here is an example:
import std.typecons, std.stdio, std.range;
Tuple!(int, int)[] pairs(){
return [
tuple(1, 3),
tuple(2, 4),
tuple(3, 5)
];
}
void main(){
foreach(size_t i, int start, int end; pairs.enumerate){
writeln(i, ' ', start, ' ', end);
}
}
Edit:
As BioTronic said using map is also possible:
foreach(i, start, end; intervals
.chunks(2)
.map!(a => tuple(a[0], a[1]))
.enumerate){
Your question has me a little confused, so I'm sorry if I've misunderstood. What you're basically asking is if foreach(a, b; [1,2,3,4].chunks(2)) could work, right?
The simple solution here is to, as you say, map from chunk to tuple:
import std.typecons : tuple;
import std.algorithm : map;
import std.range : chunks;
import std.stdio : writeln;
unittest {
pragma(msg, typeof([1,2].chunks(2).front));
foreach(a, b; [1,2,3,4].chunks(2).map!(a => tuple(a[0], a[1]))) {
writeln(a, ", ", b);
}
}
At the same time with BioTronic, I tried to code some own solution to this problem (tested on DMD). My solution works for slices (BUT NOT fixed-size arrays) and avoids a call to filter:
import std.range : chunks, isInputRange, enumerate;
import std.range : isRandomAccessRange; //changed from "hasSlicing" to "isRandomAccessRange" thanks to BioTronics
import std.traits : isIterable;
/** turns chunks into tuples */
template byTuples(size_t N, M)
if (isRandomAccessRange!M) { //EDITED
import std.meta : Repeat;
import std.typecons : Tuple;
import std.traits : ForeachType;
alias VariableGroup = Tuple!(Repeat!(N, ForeachType!M)); //Tuple of N repititions of M's Foreach-iterated Type
/** turns N consecutive array elements into a Variable Group */
auto toTuple (Chunk)(Chunk subArray) #nogc #safe pure nothrow
if (isInputRange!Chunk) { //Chunk must be indexable
VariableGroup nextLoopVariables; //fill the tuple with static foreach loop
static foreach(index; 0 .. N) {
static if ( isRandomAccessRange!Chunk ) { // add cases for other ranges here
nextLoopVariables[index] = subArray[index];
} else {
nextLoopVariables[index] = subArray.popFront();
}
}
return nextLoopVariables;
}
/** returns a range of VariableGroups */
auto byTuples(M array) #safe pure nothrow {
import std.algorithm.iteration : map;
static if(!isInputRange!M) {
static assert(0, "Cannot call map() on fixed-size array.");
// auto varGroups = array[].chunks(N); //fixed-size arrays aren't slices by default and cannot be treated like ranges
//WARNING! invoking "map" on a chunk range from fixed-size array will fail and access wrong memory with no warning or exception despite #safe!
} else {
auto varGroups = array.chunks(N);
}
//remove last group if incomplete
if (varGroups.back.length < N) varGroups.popBack();
//NOTE! I don't know why but `map!toTuple` DOES NOT COMPILE! And will cause a template compilation mess.
return varGroups.map!(chunk => toTuple(chunk)); //don't know if it uses GC
}
}
void main() {
testArrayToTuples([1, 3, 2, 4, 5, 7, 9]);
}
// Order of template parameters is relevant.
// You must define parameters implicitly at first to be associated with a template specialization
void testArrayToTuples(U : V[], V)(U arr) {
double[] randomNumbers = new double[arr.length / 2];
// generate random numbers
foreach(i, double x, double y; byTuples!2(arr).enumerate ) { //cannot use UFCS with "byTuples"
import std.random : uniform01;
randomNumbers[i] = (uniform01 * (y - x) + x);
}
foreach(n; randomNumbers) { //'n' apparently works despite shadowing a template parameter
import std.stdio : writeln;
writeln(n);
}
}
Using elementwise operations with the slice operator would not work here because uniform01 in uniform01 * (ends[] - starts[]) + starts[] would only be called once and not multiple times.
EDIT: I also tested some online compilers for D for this code and it's weird that they behave differently for the same code. For compilation of D I can recommend
https://run.dlang.io/ (I would be very surprised if this one wouldn't work)
https://www.mycompiler.io/new/d (but a bit slow)
https://ideone.com (it works but it makes your code public! Don't use with protected code.)
but those didn't work for me:
https://tio.run/#d2 (didn't finish compilation in one case, otherwise wrong results on execution even when using dynamic array for the test)
https://www.tutorialspoint.com/compile_d_online.php (doesn't compile the static foreach)

How do I find the SourceLocation of the commas between function arguments using libtooling?

My main goal is trying to get macros (or even just the text) before function parameters. For example:
void Foo(_In_ void* p, _Out_ int* x, _Out_cap_(2) int* y);
I need to gracefully handle things like macros that declare parameters (by ignoring them).
#define Example _In_ int x
void Foo(Example);
I've looked at Preprocessor record objects and used Lexer::getSourceText to get the macro names In, Out, etc, but I don't see a clean way to map them back to the function parameters.
My current solution is to record all the macro expansions in the file and then compare their SourceLocation to the ParamVarDecl SourceLocation. This mostly works except I don't know how to skip over things after the parameter.
void Foo(_In_ void* p _Other_, _In_ int y);
Getting the SourceLocation of the comma would work, but I can't find that anywhere.
The title of the questions asks for libclang, but as you use Lexer::getSourceText I assume that it's libTooling. The rest of my answer is viable only in terms of libTooling.
Solution 1
Lexer works on the level of tokens. Comma is also a token, so you can take the end location of a parameter and fetch the next token using Lexer::findNextToken.
Here is a ParmVarDecl (for function parameters) and CallExpr (for function arguments) visit functions that show how to use it:
template <class T> void printNextTokenLocation(T *Node) {
auto NodeEndLocation = Node->getSourceRange().getEnd();
auto &SM = Context->getSourceManager();
auto &LO = Context->getLangOpts();
auto NextToken = Lexer::findNextToken(NodeEndLocation, SM, LO);
if (!NextToken) {
return;
}
auto NextTokenLocation = NextToken->getLocation();
llvm::errs() << NextTokenLocation.printToString(SM) << "\n";
}
bool VisitParmVarDecl(ParmVarDecl *Param) {
printNextTokenLocation(Param);
return true;
}
bool VisitCallExpr(CallExpr *Call) {
for (auto *Arg : Call->arguments()) {
printNextTokenLocation(Arg);
}
return true;
}
For the following code snippet:
#define FOO(x) int x
#define BAR float d
#define MINUS -
#define BLANK
void foo(int a, double b ,
FOO(c) , BAR) {}
int main() {
foo( 42 ,
36.6 , MINUS 10 , BLANK 0.0 );
return 0;
}
it produces the following output (six locations for commas and two for parentheses):
test.cpp:6:15
test.cpp:6:30
test.cpp:7:19
test.cpp:7:24
test.cpp:10:17
test.cpp:11:12
test.cpp:11:28
test.cpp:11:43
This is quite a low-level and error-prone approach though. However, you can change the way you solve the original problem.
Solution 2
Clang stores information about expanded macros in its source locations. You can find related methods in SourceManager (for example, isMacroArgExpansion or isMacroBodyExpansion). As the result, you can visit ParmVarDecl nodes and check their locations for macro expansions.
I would strongly advice moving in the second direction.
I hope this information will be helpful. Happy hacking with Clang!
UPD speaking of attributes, unfortunately, you won't have a lot of choices. Clang does ignore any unknown attribute and this behaviour is not tweakable. If you don't want to patch Clang itself and add your attributes to Attrs.td, then you're limited indeed to tokens and the first approach.

Read cv::Mat pixel without knowing its pixel format

I am aware there are several ways to read and write a pixel value of an OpenCV cv::Mat image/matrix.
A common one is the .at<typename T>(int, int) method http://opencv.itseez.com/2.4/modules/core/doc/basic_structures.html#mat-at .
However, this requires the typename to be known, for instance .at<double>.
The same thing applies to more direct pointer access OpenCV get pixel channel value from Mat image .
How can I read a pixel value without knowing its type? For instance, it would be ok to receive a more generic CvScalar value in return. Efficiency is not an issue, as I would like to read rather small matrices.
Kind of. You can construct cv::Mat_ and provide explicit type for elements, after that you don't have to write element type each time. Quoting opencv2/core/mat.hpp
While Mat is sufficient in most cases, Mat_ can be more convenient if you use a lot of element
access operations and if you know matrix type at the compilation time. Note that
Mat::at(int y,int x) and Mat_::operator()(int y,int x) do absolutely the same
and run at the same speed, but the latter is certainly shorter.
Mat_ and Mat are very similar. Again quote from mat.hpp:
The class Mat_<_Tp> is a thin template wrapper on top of the Mat class. It does not have any
extra data fields. Nor this class nor Mat has any virtual methods. Thus, references or pointers to
these two classes can be freely but carefully converted one to another.
You can use it like this
Mat_<Vec3b> dummy(3,3);
dummy(1, 2)[0] = 10;
dummy(1, 2)[1] = 20;
dummy(1, 2)[2] = 30;
cout << dummy(1, 2) << endl;
Why I said 'kind of' in the first place? Because if you want to pass this Mat_ somewhere - you have to specify it's type. Like this:
void test(Mat_<Vec3b>& arr) {
arr(1, 2)[0] = 10;
arr(1, 2)[1] = 20;
arr(1, 2)[2] = 30;
cout << arr(1, 2) << endl;
}
...
Mat_<Vec3b> dummy(3,3);
test(dummy);
Technically, you are not specifying your type during a pixel read, but actually you still have to know it and cast the Mat to the appropriate type beforehand.
I guess you can find a way around this using some low-level hacks (for example make a method that reads Mat's type, calculates element size and stride, and then accesses raw data using pointer arithmetic and casting...). But I don't know any 'clean' way to do this using OpenCV's functionality.
If you already know the type, you can use Mat_<> type for easy access. If you don't know the type, you can:
convert the data to double, so data won't be truncated in any case
switch over the number of channels to access correctly the double matrix. Note that you can have at most of 4 channels, since Scalar has at most 4 elements.
The following code will convert only the selected element of the source matrix to a double value (with N channels).
You get a Scalar containing the value at position row, col in the source matrix.
#include <opencv2/opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;
Scalar valueAt(const Mat& src, int row, int col)
{
Mat dst;;
src(Rect(col, row, 1, 1)).convertTo(dst, CV_64F);
switch (dst.channels())
{
case 1: return dst.at<double>(0);
case 2: return dst.at<Vec2d>(0);
case 3: return dst.at<Vec3d>(0);
case 4: return dst.at<Vec4d>(0);
}
return Scalar();
}
int main()
{
Mat m(3, 3, CV_32FC3); // You can use any type here
randu(m, Scalar(0, 0, 0, 0), Scalar(256, 256, 256, 256));
Scalar val = valueAt(m, 1, 2);
cout << val << endl;
return 0;
}

Resources