Nested fixed-size arrays/records - z3

I want to model fixed-size arrays that can contain records and other fixed-size arrays. I then want to model store and select accesses to them.
I currently use ArraySorts for the arrays and Datatypes for the records. However, according to this answer (arrays) and this answer (records), these aren't really intended for my usecase. Is there anything else in Z3 I could use instead?
Background: I want to model pointers as they occur in the LLVM IR. For this, each pointer has a data array that represents the memory buffer into which it is pointing and an indices array that represents the indices used in getelementptr calls. Since the memory buffer could contain pointers or structs, I need to be able to nest the arrays (or store the records in arrays).
An example (in z3py) looks like this:
Vec3 = z3.Datatype("Vec3")
Vec3.declare("Vec3__init",
("x", z3.IntSort()),
("y", z3.IntSort()),
("z", z3.IntSort())
)
Vec3 = Vec3.create()
PointerVec3 = z3.Datatype("Pointer__Vec3")
PointerVec3.declare("Pointer__Vec3__init",
("data", z3.ArraySort(z3.BitVecSort(32), Vec3)),
("nindices", z3.IntSort()),
("indices", z3.ArraySort(z3.IntSort(), z3.BitVecSort(32)))
)
PointerVec3 = PointerVec3.create()

Arrays and records are the only way to model these things in z3/SMT-Lib. And they can be nested as you wish.
It is true that SMTLib arrays are not quite like arrays you find in regular programming languages. But it's not true that they are always unbounded. Their size exactly matches the cardinality of their domain. If you want a bounded array, I recommend using an appropriate BitVec type for your source. For an n-bit bit vector, your array will have 2^n elements. You can work with a small enough n, and treat that as your array; which typically matches what you see in practice: Most of such internal arrays will be a power-of-two anyhow. (Moreover, provers usually don't do well with large arrays; so sticking to a small power-of-two is a good idea, at least to start with.)
So, these are your only options. Stack-overflow works the best if you try it out and actually ask about what sort of problems you ran into.

Related

Z3: Should I use Arrays, IntVectors, or something else?

I am wondering what datatype I should be using for my z3 application. My understanding is that the only options for integer-array-like data structures are Array(IntSort(), IntSort()) and IntVector().
Reasons I think Arrays are overkill: Each array element is only written once, I'm not doing anything like Store((Store(X, y, z1)), y, z2). In addition, each array has a predefined length of <= 256 (and each integer in the array is between 0 and 63).
Reasons I think BitVectors won't work: I want to use Int variables to index into the arrays. For instance, I might have z = Int('z'), some clauses constraining z, and then Or(arr[z] == 2, arr[z + 1] == 2). My understanding after playing around with z3 and reading up is that vectors don't support this.
Is there a way I can get the power of variable-indexing without having to use expensive Array operations?
If you have small arrays of fixed-length with no symbolic index access, then I'd strongly recommend using an IntVector (See https://z3prover.github.io/api/html/namespacez3py.html#a7e166f891df8f17fd23290bec963b03c)
Note the important thing here is whether you need access with a symbolic index. (That is, do you always address your array with known constant indices, or do you need the ability to read/write to a symbolically addressed location.) From your description, it appears you always statically know the address, so IntVector is your best choice. If addresses can be symbolic, then you have to use good old SMTLib arrays, which are more costly.

Is an array of arrays cache local?

If an array
[1,2,3,4,5,6]
exists in a contiguous block of memory, does an array of arrays
[[1,2,3],[4,5,6]]
necessarily have the same cache locality?
In the array of arrays case, the arrays themselves will have the same locality properties that an array of elements has because this case is no different - the "elements" here are arrays. However, whether or not the elements of each sub-array are contiguous in memory to the elements of another sub-array depends on the implementation of the sub-arrays.
Since you didn't specify a language, I'll use C++ syntax to demonstrate, but this is fundamentally language agnostic, as it deals with aspects of the hardware. If your array-of-arrays is equivalent to the C++ std::vector<std::vector<int>> - which is a flexible container of containers that can grow/shrink as needed, each std::vector will be contiguous in memory with respect to the others, but a std::vector doesn't contain the elements directly. Rather, its a wrapper around a dynamically allocated array, typically holding a few pointers to the underlying data. What this means is that each wrapper will be contiguous, but the elements won't necessarily be. Here's a live demo demonstrating this.
However, consider the case where your array-of-arrays is a std::array<std::array<int,3>,2> instead - which is simply an array of 2 elements, where each element is an array of the fixed size ,3, ints. A std::array is a wrapper around a fixed-size C array which is not dynamically allocated, in contrast to its std::vector counterpart. In this case, you get the same locality property as you did for the std::vector case - each std::array is contiguous with respet to the others, but here you also get something more. Since a std::array actually contains the underlying data rather than just a few pointers to it, the elements of each sub-array are also contiguous with respect to each other. This, too can be seen here.
What this boils down to from a hardware perspective is simply how your objects are laid down in memory. In the former case, each wrapper is contiguous, but the data isn't, because the wrappers simply hold pointers to the data. In the latter case, each wrapper is contiguous, and so is the data, because each wrapper actually contains the data.
Perhaps if you specify which language you're referring to, I could help you with your specific case.

Increasing the length of a tuple in Erlang

How can I increase the length of a tuple in Erlang? For example, suppose Tup={1,2,3}, and now I want to add another element to it. Is there any way to do this?
Tuple is not suppose to be a flexible data structure. If you are resizing it often, then you should consider using other erlang data structures like lists, maps or sets - depends on your expectation. Here is nice introduction to key-value stores.
But if you really have to extend that tuple, then you can use erlang:append_element/2:
{1,2,3,4} = erlang:append_element({1,2,3}, 4).
Tuples aren't mutable so you can't, strictly speaking, increase the length.
Generally, if you want a variable-number-of-things datatype, a tuple will be very inconvenient. For example, iterating over all elements of a list is highly idiomatic, while iterating over all elements of a tuple whose size is unknown at compile-time is a pain.
However, a common pattern is to get a tuple as a result from some function and return elements of that tuple plus additions.
country_coords(Name) ->
{Lat, Lng} = find_address(Name),
{_Street, _City, _Zip, Country} = geocode(Lat, Lng),
{ok, Lat, Lng, Country}.
erlang:append_element(tuple_whose_length_to_increase, element_to_be).This is the inbuilt function but tuples,lists are not meant to be flexible.So avoid using this function unless there is no other way

Optimal storage for string/integer pairs with fast lookup of strings?

I need to maintain correspondence between strings and integers, then lookup the string value and return the integer. What's the best structure to store this info that meets the following requirements:
Speed and memory size are important, in that order.
I don't want to reinvent the wheel and write my own sorting routine. A call to Sort(CompareFunction) is fine of course.
Conditions:
The integers are not guaranteed to be sequential, neither is there a 'start value' like 0 or 1
Number of data pairs can vary from 100 to 100000
The data are all read in at the beginning, there's no subsequent additions/deletions/modifications
FWIW the strings are the hex entry ID's that Outlook (MAPI?) uses to identify entries. Example: 00000000FE42AA0A18C71A10E8850B651C24000003000000040000000000000018000000000000001E7FDF4152B0E944BA66DFBF2C6A6416E4F52000487F22
There's so many options (TStringList (with objects or name/value pairs), TObjectList, TDictionary, ...) that I'd better ask for advice first...
I have read How can I search faster for name/value pairs in a Delphi TStringList? which suggest TDictionary for string/string pairs, and Sorting multidimensional array in Delphi 2007 which suggest TStringlist objects for string/integer but where sorting is done on the integers.
The second link that you include in the question is not applicable. That is a question concerning sorting rather than efficient lookup. Although you discuss sorting a number of times in your question, you do not have a requirement to sort. Your requirement is simply a dictionary, also known as an associative array. Of course, you can implement that by sorting an array and using binary search for your lookup, but sorting is not a requirement. You simply need an efficient dictionary.
Out of the box, the most efficient and convenient data structure for your problem is TDictionary<string, Integer>. This has lookup complexity of O(1) and so scales well for large collections. For smaller collections a binary search based lookup with lookup complexity of O(log n) can be competitive and can indeed out-perform a dictionary.
Cosmin Prund wrote an excellent answer here on SO where he compared the performance of dictionary lookup against binary search based lookup. I recommend you have a read. I would say that for small containers, performance is probably not that big a problem for you. So even though binary search may be quicker, it probably does not matter because your performance is good either way. But performance probably becomes an issue for larger containers and that's where the dictionary is always stronger. For large enough containers, the performance of binary search may become unacceptable.
I'm sure that it is possible to produce more efficient implementations of dictionaries than the Embarcadero one, but I'd also say that the Embarcadero implementation is perfectly solid. It uses a decent hash function and does not have any glaring weaknesses.
In terms of memory complexity, there's little to choose between a dictionary and a sorted array. It's not possible to improve on a sorted array for memory use.
I suggest that you start with TDictionary<string, Integer> and only look beyond that if your performance requirements are not met.
It seems you are going to lookup long evenly distributed strings. One of the fastest data structures for this kind of problem is Trie.
But your dataset size is rather small, and ready-to-use Delphi solutions like THashedStringList or TDictionary (more convenient) would provide a fairly high speed.

F# how can I setup array length in type delcaration

I can do (x : int array)
But I need only 300 elements array , so how do I (x : int[300]) ?
Can't find such information over msdn )
#Marcelo Cantos No reason , but I always used sized arrays. Why not ?
No. The F# type system does not support types such as "array of size 300", and even if it did, using the type system to check potential array overflows at compile time is too impractical to implement.
Besides, "has exactly 300 elements" is an useless property in F# in almost all situations, because there is a wealth of functions and primitives that work on arrays of arbitrary size without any risk of overflow (map or iter, for instance). Why write code that works for 300 elements when you can just as easily write code that works for any number of elements ?
If you really need to represent the "has exactly 300 elements" property, the simplest thing you could do is create a wrapper type around the native array type. This lets you restrict those operations that return arrays to only operations that respect the 300-element invariant (such as a map from another 300-element array, or a create where the length property is always 300). I'm afraid this isn't as simple as you hoped, but since F# does not natively support 300-element arrays, you will need to describe all the function invariants yourself.

Resources