Search for sequence in Uint8List - dart

Is there a fast (native) method to search for a sequence in a Uint8List?
///
/// Return index of first occurrence of seq in list
///
int indexOfSeq(Uint8List list, Uint8List seq) {
...
}
EDIT: Changed List<int> into Uint8List

No. There is no built-in way to search for a sequence of elements in a list.
I am also not aware of any dart:ffi based implementations.
The simplest approach would be:
extension IndexOfElements<T> on List<T> {
int indexOfElements(List<T> elements, [int start = 0]) {
if (elements.isEmpty) return start;
var end = length - elements.length;
if (start > end) return -1;
var first = elements.first;
var pos = start;
while (true) {
pos = indexOf(first, pos);
if (pos < 0 || pos > end) return -1;
for (var i = 1; i < elements.length; i++) {
if (this[pos + i] != elements[i]) {
pos++;
continue;
}
}
return pos;
}
}
}
This has worst-case time complexity O(length*elements.length). There are several more algorithms with better worst-case complexity, but they also have larger constant factors and more expensive pre-computations (KMP, BMH). Unless you search for the same long list several times, or do so in a very, very long list, they're unlikely to be faster in practice (and they'd probably have an API where you compile the pattern first, then search with it.)

You could use dart:ffi to bind to memmem from string.h as you suggested.
We do the same with binding to malloc from stdlib.h in package:ffi (source).
final DynamicLibrary stdlib = Platform.isWindows
? DynamicLibrary.open('kernel32.dll')
: DynamicLibrary.process();
final PosixMalloc posixMalloc =
stdlib.lookupFunction<Pointer Function(IntPtr), Pointer Function(int)>('malloc');
Edit: as lrn pointed out, we cannot expose the inner data pointer of a Uint8List at the moment, because the GC might relocate it.
One could use dart_api.h and use the FFI to pass TypedData through the FFI trampoline as Dart_Handle and use Dart_TypedDataAcquireData from the dart_api.h to access the inner data pointer.
(If you want to use this in Flutter, we would need to expose Dart_TypedDataAcquireData and Dart_TypedDataReleaseData in dart_api_dl.h https://github.com/dart-lang/sdk/issues/40607 I've filed https://github.com/dart-lang/sdk/issues/44442 to track this.)
Alternatively, could address https://github.com/dart-lang/sdk/issues/36707 so that we could just expose the inner data pointer of a Uint8List directly in the FFI trampoline.

Related

collecting Stream results in ClassCastException

I'm experiencing a ClassCastException with the following piece of code.
var temp = set.stream().flatMap(Arrays::stream).toArray(Token[]::new);
I also tried collecting into a set, but I got the same error.
var temp = set.stream().flatMap(Arrays::stream).collect(Collectors.toSet());
set is declared as a Set<Token[]>, but when debugging it says Set<Object[]>.
Edit:
I've narrowed down the issue to the method subsetsWithMinSize. Eventhough I call it with:
new Sets<Token>().subsetsWithMinSize(value, x);
it seems to return a Set<Object[]>. Can someone tell me why that is and how to fix it?
Here is the method subsetsWithMinSize:
public Set<T[]> subsetsWithMinSize(List<T> list, int min) {
Set<T[]> res = new HashSet<>();
for (int i = 0; i <= list.size() - min; i++)
for (int j = 0; list.size() - j >= i + min; j++) {
List<T> temp = new ArrayList<>();
for (int k = i; k < list.size() - j; k++) temp.add(list.get(k));
res.add((T[]) temp.toArray());
}
return res;
}
So after you guys helped me to minimise the search scope I found the issue.
When I was using (T[]) to cast it wouldn't work, because after type erasure T would just be cast to an Object. Here's the thread I found the answer on: How to properly return generic array in Java generic method?.
So to finally correct the code I had to change the method a little by using Array.newInstance() and put in Class<T> to make it work:
public Set<T[]> subsetsWithMinSize(Class<T> clazz, List<T> list, int min) {
Set<T[]> res = new HashSet<>();
for (int i = 0; i <= list.size() - min; i++)
for (int j = 0; list.size() - j >= i + min; j++) {
List<T> temp = new ArrayList<>();
for (int k = i; k < list.size() - j; k++)
temp.add(list.get(k));
var arr = (T[]) Array.newInstance(clazz, temp.size());
IntStream.range(0, temp.size()).forEach(k -> arr[k] = temp.get(k));
res.add(arr);
}
return res;
}
It uses Array.newInstance() to make T[] (cast is still necessary for the compiler) and then just copies temp into that array.
Another option would be to use lists instead of arrays, which is the better way.
Reference from Effective Java:
In Summary, arrays and generics have very different type rules. Arrays are covariant and reified; generics are invariant and erased. As a consequcne, arrays provide runtime type safety but not compile-time type safety and vice versa for generics. Generally speaking, arrays and generics don’t mix well. If you find yourself mixing them and getting compile-time error or warnings, your first impulse should be to replace the arrays with lists.
The issue with that is that, for my use at least, lists use an unnecessarily big amount of space. Combined with the fact that I am making a powerset of the results later everything would just run super slow.

How to ask Clang++ not to cache function result during -O3 optimization?

This is my code:
int foo(int x) {
return x + 1; // I have more complex code here
}
int main() {
int s = 0;
for (int i = 0; i < 1000000; ++i) {
s += foo(42);
}
}
Without -O3 this code works for a few minutes. With -O3 it returns the same result in no time. Clang++, I believe, caches the value of foo(42) (it's a pure function) and doesn't call it a million times. How can I instruct it NOT to apply this particular optimization for this particular function call?
Out of curiosity, can you share why you would want to disable that optimization?
Anyway, about your question:
In your example code, s is never read after the loop, so the compiler would throw the whole loop away. So let's assume that s is used after the loop.
I'm not aware of any pragmas or compiler options to disable a particular optimization in a particular section of code.
Is changing the code an option?
To prevent that optimization in a portable manner, you can look for a creative way to compute the function call argument in a way such that the compiler is no longer able to treat the argument as constant. Of course the challenge here is to actually use a trick that does not rely on undefined behavior and that cannot be "outsmarted" by a newer compiler version.
See the commented example below.
pro: you use a trick that uses only the language that you can apply selectively
con: you get an additional memory access in every loop iteration; however, the access will be satisfied by your CPU cache most of the time
I verified the generated assembly for your particular example with clang++ -O3 -S. The compiler now generates your loop and no longer caches the result. However, the function gets inlined. If you want to prevent that as well, you can declare foo with __attribute__((noinline)), for example.
int foo(int x) {
return x + 1; // I have more complex code here
}
volatile int dummy = 0; // initialized to 0 and never changed
int main() {
int s = 0;
for (int i = 0; i < 1000000; ++i) {
// Because of the volatile variable, the compiler is forced to assume
// that the function call argument is different for each loop
// iteration and it is no longer able to use a cached result.
s += foo(42 + dummy);
}
}

Destructured iteration over variadic arguments like a tuple sequence in D

Let's say I want to process a variadic function which alternately gets passed start and end values of 1 or more intervals and it should return a range of random values in those intervals. You can imagine the input to be a flattened sequence of tuples, all tuple elements spread over one single range.
import std.meta; //variadic template predicates
import std.traits : isFloatingPoint;
import std.range;
auto randomIntervals(T = U[0], U...)(U intervals)
if (U.length/2 > 0 && isFloatingPoint!T && NoDuplicates!U.length == 1) {
import std.random : uniform01;
T[U.length/2] randomValues;
// split and iterate over subranges of size 2
foreach(i, T start, T end; intervals.chunks(2)) { //= intervals.slide(2,2)
randomValues[i] = uniform01 * (end - start) + start,
}
return randomValues.dup;
}
The example is not important, I only use it for explanation. The chunk size could be any finite positive size_t, not only 2 and changing the chunk size should only require changing the number of loop-variables in the foreach loop.
In this form above it will not compile since it would only expect one argument (a range) to the foreach loop. What I would like is something which rather automatically uses or infers a sliding-window as a tuple, derived from the number of given loop-variables, and fills the additional variables with next elements of the range/array + allows for an additional index, optionally. According to the documentation a range of tuples allows destructuring of the tuple elements in place into foreach-loop-variables so the first thing, I thought about, is turning a range into a sequence of tuples but didn't find a convenience function for this.
Is there a simple way to loop over destructured subranges (with such a simplicity as shown in my example code) together with the index? Or is there a (standard library) function which does this job of splitting a range into enumerated tuples of equal size? How to easily turn the range of subranges into a range of tuples?
Is it possible with std.algorithm.iteration.map in this case (EDIT: with a simple function argument to map and without accessing tuple elements)?
EDIT: I want to ignore the last chunk which doesn't fit into the entire tuple. It just is not iterated over.
EDIT: It's not, that I couldn't program this myself, I only hope for a simple notation because this use case of looping over multiple elements is quite useful. If there is something like a "spread" or "rest" operator in D like in JavaScript, please let me know!
Thank you.
(Added as a separate answer because it's significantly different from my previous answer, and wouldn't fit in a comment)
After reading your comments and the discussion on the answers thus far, it seems to me what you seek is something like the below staticChunks function:
unittest {
import std.range : enumerate;
size_t index = 0;
foreach (i, a, b, c; [1,2,3,1,2,3].staticChunks!3.enumerate) {
assert(a == 1);
assert(b == 2);
assert(c == 3);
assert(i == index);
++index;
}
}
import std.range : isInputRange;
auto staticChunks(size_t n, R)(R r) if (isInputRange!R) {
import std.range : chunks;
import std.algorithm : map, filter;
return r.chunks(n).filter!(a => a.length == n).map!(a => a.tuplify!n);
}
auto tuplify(size_t n, R)(R r) if (isInputRange!R) {
import std.meta : Repeat;
import std.range : ElementType;
import std.typecons : Tuple;
import std.array : front, popFront, empty;
Tuple!(Repeat!(n, ElementType!R)) result;
static foreach (i; 0..n) {
result[i] = r.front;
r.popFront();
}
assert(r.empty);
return result;
}
Note that this also deals with the last chunk being a different size, if only by silently throwing it away. If this behavior is undesirable, remove the filter, and deal with it inside tuplify (or don't, and watch the exceptions roll in).
chunks and slide return Ranges, not tuples. Their last element can contain less than the specified size, whereas tuples have a fixed compile time size.
If you need destructuring, you have to implement your own chunks/slide that return tuples. To explicitly add an index to the tuple, use enumerate. Here is an example:
import std.typecons, std.stdio, std.range;
Tuple!(int, int)[] pairs(){
return [
tuple(1, 3),
tuple(2, 4),
tuple(3, 5)
];
}
void main(){
foreach(size_t i, int start, int end; pairs.enumerate){
writeln(i, ' ', start, ' ', end);
}
}
Edit:
As BioTronic said using map is also possible:
foreach(i, start, end; intervals
.chunks(2)
.map!(a => tuple(a[0], a[1]))
.enumerate){
Your question has me a little confused, so I'm sorry if I've misunderstood. What you're basically asking is if foreach(a, b; [1,2,3,4].chunks(2)) could work, right?
The simple solution here is to, as you say, map from chunk to tuple:
import std.typecons : tuple;
import std.algorithm : map;
import std.range : chunks;
import std.stdio : writeln;
unittest {
pragma(msg, typeof([1,2].chunks(2).front));
foreach(a, b; [1,2,3,4].chunks(2).map!(a => tuple(a[0], a[1]))) {
writeln(a, ", ", b);
}
}
At the same time with BioTronic, I tried to code some own solution to this problem (tested on DMD). My solution works for slices (BUT NOT fixed-size arrays) and avoids a call to filter:
import std.range : chunks, isInputRange, enumerate;
import std.range : isRandomAccessRange; //changed from "hasSlicing" to "isRandomAccessRange" thanks to BioTronics
import std.traits : isIterable;
/** turns chunks into tuples */
template byTuples(size_t N, M)
if (isRandomAccessRange!M) { //EDITED
import std.meta : Repeat;
import std.typecons : Tuple;
import std.traits : ForeachType;
alias VariableGroup = Tuple!(Repeat!(N, ForeachType!M)); //Tuple of N repititions of M's Foreach-iterated Type
/** turns N consecutive array elements into a Variable Group */
auto toTuple (Chunk)(Chunk subArray) #nogc #safe pure nothrow
if (isInputRange!Chunk) { //Chunk must be indexable
VariableGroup nextLoopVariables; //fill the tuple with static foreach loop
static foreach(index; 0 .. N) {
static if ( isRandomAccessRange!Chunk ) { // add cases for other ranges here
nextLoopVariables[index] = subArray[index];
} else {
nextLoopVariables[index] = subArray.popFront();
}
}
return nextLoopVariables;
}
/** returns a range of VariableGroups */
auto byTuples(M array) #safe pure nothrow {
import std.algorithm.iteration : map;
static if(!isInputRange!M) {
static assert(0, "Cannot call map() on fixed-size array.");
// auto varGroups = array[].chunks(N); //fixed-size arrays aren't slices by default and cannot be treated like ranges
//WARNING! invoking "map" on a chunk range from fixed-size array will fail and access wrong memory with no warning or exception despite #safe!
} else {
auto varGroups = array.chunks(N);
}
//remove last group if incomplete
if (varGroups.back.length < N) varGroups.popBack();
//NOTE! I don't know why but `map!toTuple` DOES NOT COMPILE! And will cause a template compilation mess.
return varGroups.map!(chunk => toTuple(chunk)); //don't know if it uses GC
}
}
void main() {
testArrayToTuples([1, 3, 2, 4, 5, 7, 9]);
}
// Order of template parameters is relevant.
// You must define parameters implicitly at first to be associated with a template specialization
void testArrayToTuples(U : V[], V)(U arr) {
double[] randomNumbers = new double[arr.length / 2];
// generate random numbers
foreach(i, double x, double y; byTuples!2(arr).enumerate ) { //cannot use UFCS with "byTuples"
import std.random : uniform01;
randomNumbers[i] = (uniform01 * (y - x) + x);
}
foreach(n; randomNumbers) { //'n' apparently works despite shadowing a template parameter
import std.stdio : writeln;
writeln(n);
}
}
Using elementwise operations with the slice operator would not work here because uniform01 in uniform01 * (ends[] - starts[]) + starts[] would only be called once and not multiple times.
EDIT: I also tested some online compilers for D for this code and it's weird that they behave differently for the same code. For compilation of D I can recommend
https://run.dlang.io/ (I would be very surprised if this one wouldn't work)
https://www.mycompiler.io/new/d (but a bit slow)
https://ideone.com (it works but it makes your code public! Don't use with protected code.)
but those didn't work for me:
https://tio.run/#d2 (didn't finish compilation in one case, otherwise wrong results on execution even when using dynamic array for the test)
https://www.tutorialspoint.com/compile_d_online.php (doesn't compile the static foreach)

Is there anything like a struct in dart?

In javascript it always bothered me people use objects as vectors like {x: 1, y: 2} instead of using an array [1,2]. Access time for the array is much faster than the object but accessing by index is more confusing especially if you need a large array. I know dart has fixed arrays but is there a way to name the offsets of an array like you would a struct or a tuple/record in another language? Define enum/constants maybe?
I'd want something like
List<int> myVector = new List([x,y]);
myVector.x = 5;
is there an equivalent or idiomatic way to do this?
That sounds like a class.
class MyVector {
int x;
int y;
MyVector(this.x, this.y);
}
There is no simpler and more efficient way to create a name-indexed structure at runtime. For simplicity you could usually use a Map, but it's not as efficient as a real class.
A class should be at least as efficient (time and memory) as a fixed length list, after all it doesn't have to do an index bounds check.
In Dart 3.0, the language will introduce records. At that point, you can use a record with named fields instead of creating a primitive class:
var myVector = (x: 42, y: 37);
print(myVector.x);
A record is unmodifiable, so you won't be able to update the values after it has been created.
For me, i see 2 way to do this. I will sort by best in my point of view
Class based method
Here, the approach is to encapsulate your need, in a dedicated object
Pros:
It's encapsultate
You can propose several way to access variable, depend of the need
You can extend functionality without break everything
I love it :p
Cons
More time spend to create class, etc.
Do you really need what i say in pros ?
Maybe weird for js people
example :
class Vector {
int x;
int y;
static final String X = "x";
static final String Y = "y";
Vector({this.x, this.y});
Vector.fromList(List<int> listOfCoor) {
this.x = listOfCoor[0];
this.y = listOfCoor[1];
}
// Here i use String, but you can use [int] an redefine static final member
int operator[](String coor) {
if (coor == "x") {
return this.x;
} else if (coor == "y") {
return this.y;
} else {
// Need to be change by a more adapt exception :)
throw new Exception("Wrong coor");
}
}
}
void main() {
Vector v = new Vector(x: 5, y: 42);
Vector v2 = new Vector.fromList([12, 24]);
print(v.x); // print 5
print(v["y"]); // print 42
print(v2.x); // print 12
print(v2[Vector.Y]); // print 24
}
Enum based method:
You can also defined a "enum" (actually not really implement but will be in the future version) that will contains "shortcut" to your value
Pros
More simple to implement
Is more like your example ;p
Cons
Less extendable
i think is not very pretty
Not OOP think
example:
class Vector {
static final int x = 0;
static final int y = 1;
}
void main() {
List<int> myVector = new List(2);
myVector[Vector.x] = 5;
myVector[Vector.y] = 42;
}
Make your choice ;p
This is only possible with a class in Dart.
There are some open feature requests at http://dartbug.com
introduce struct (lightweight class)
Give us a way to structure Bytedata
If you have reasonably big data structure, you can use "dart:typed_data" as a model and provide lightweight view for the stored data. This way the overhead should be minimal.
For example, if you need 4X4 matrix of Uint8 values:
import "dart:typed_data";
import "dart:collection";
import "package:range/range.dart";
class Model4X4Uint8 {
final Uint8List _data;
static const int objectLength = 4 * 4;
final Queue<int> _freeSlotIndexes;
Model4X4Uint8(int length): _data = new Uint8List((length) * objectLength),
_freeSlotIndexes = new Queue<int>.from(range(0, length));
int get slotsLeft => _freeSlotIndexes.length;
num operator [](int index) => _data[index];
operator []=(int index, int val) => _data[index] = val;
int reserveSlot() =>
slotsLeft > 0 ? _freeSlotIndexes.removeFirst() : throw ("full");
void delete(int index) => _freeSlotIndexes.addFirst(index);
}
class Matrix4X4Uint8 {
final int offset;
final Model4X4Uint8 model;
const Matrix4X4Uint8(this.model, this.offset);
num operator [](int index) => model[offset + index];
operator []=(int index, int val) => model[offset + index] = val;
void delete() => model.delete(offset);
}
void main() {
final Model4X4Uint8 data = new Model4X4Uint8(100);
final Matrix4X4Uint8 mat = new Matrix4X4Uint8(data, data.reserveSlot())
..[14] = 10
..[12] = 256; //overlow;
print("${mat[0]} ${mat[4]} ${mat[8]} ${mat[12]} \n"
"${mat[1]} ${mat[5]} ${mat[9]} ${mat[13]} \n"
"${mat[2]} ${mat[6]} ${mat[10]} ${mat[14]} \n"
"${mat[3]} ${mat[7]} ${mat[11]} ${mat[15]} \n");
mat.delete();
}
But this is very low level solution and can easily create sneaky bugs with memory management and overflows.
You could also use an extension on List to create aliases to specific indexes.
Although it will be difficult to set up mutually exclusive aliases, in some cases, it may be a simple solution.
import 'package:test/test.dart';
extension Coordinates<V> on List<V> {
V get x => this[0];
V get y => this[1];
V get z => this[2];
}
void main() {
test('access by property', () {
var position = [5, 4, -2];
expect(position.x, 5);
expect(position.y, 4);
expect(position.z, -2);
});
}
The Tuple package https://pub.dev/packages/tuple might be what you are looking for when a class is too heavy.
import 'package:tuple/tuple.dart';
const point = Tuple2<int, int>(1, 2);
print(point.item1); // print 1
print(point.item2); // print 2

I cannot understand the effectiveness of an algorithm in the Dart SDK

I cannot understand the effectiveness of an algorithm in the Dart SDK.
Here is the algorithm (List factory in dart:core, file list.dart)
factory List.from(Iterable other, { bool growable: true }) {
List<E> list = new List<E>();
for (E e in other) {
list.add(e);
}
if (growable) return list;
int length = list.length;
List<E> fixedList = new List<E>(length);
for (int i = 0; i < length; i ) {
fixedList[i] = list[i];
}
return fixedList;
}
If growable is false then both lists will be created.
List<E> list = new List<E>();
List<E> fixedList = new List<E>(length);
But the creation of list #1 in this case is redundant because it's a duplicate of Iterable other. It just wastes CPU time and memory.
In this case this algorithm will be more efficient because it wont create an unnecessary list # 1 (growable is false).
factory List.from(Iterable other, { bool growable: true }) {
if(growable) {
List<E> list = new List<E>();
for (E e in other) {
list.add(e);
}
return list;
}
List<E> fixedList = new List<E>(other.length);
var i = 0;
for (E e in other) {
fixedList[i++] = e;
}
return fixedList;
}
Or am I wrong and missed some subtleties of programming?
We usually avoid invoking the length getter on iterables since it can have linear performance and side-effects. For Example:
List list = [1, 2, 3];
Iterable iterable1 = list.map((x) {
print(x);
return x + 1;
});
Iterable iterable2 = iterable1.where((x) => x > 2);
var fixedList = new List.from(iterable2, growable: false);
If List.from invoked the length getter it would run over all elements twice (where does not cache its result). It would furthermore execute the side-effect (printing 1, 2, 3) twice. For more information on Iterables look here.
Eventually we want to change the List.from code so that we avoid the second allocation and the copying. To do this we need (internal) functionality that transforms a growable list into a fixed-length list. Tracking bug: http://dartbug.com/9459
It looks like it was just an incremental update to the existing function.
See this commit and this diff
The function started just with
List<E> list = new List<E>();
for (E e in other) {
list.add(e);
}
and had some more bits added as part of a fairly major refactoring of numerous libraries.
I would say that the best thing to do is to raise a bug report on dartbug.com, and either add a patch, or commit a CL - see instructions here: https://code.google.com/p/dart/wiki/Contributing (Note, you do need to jump through some hoops first, but once you're set up, it's all good).
It might also be worth dropping a note to one of the committers or reviewers from the original commit to let them know your plans.

Resources