Checking whether an array is a suffix array for any binary string - binary-data

I'm currently trying to figure out whether a given array, which is a permutation of the numbers 1 to n, is a suffix array of any binary string.
For example for n = 3, A = {2, 1, 3} is valid, since there exists the binary string [101] with [01] < [1] < [101] (using lexicographical ordering). However {2, 3, 1} is not a valid suffix array, as there is no binary string for which the lexicographical order holds.
My current approach simply enumerates all binary strings of length n, and checks whether the suffix array is correctly ordered w.r.t. each string. This is obviously rather slow, as there are 2^n candidate binary strings to be checked in O(n) each.
The obvious approach to this kind of problem would be to look for similarities in the valid suffix arrays, however I've thus far only been able to deduce one property:
If A is a valid suffix array for a binary string, then prefixing A with 1 and incrementing all values of A yields a valid suffix array as well.
This property however does not help me with suffix arrays that do not start with 1, when attempting to validate them.

Related

Writing variable-length sequence to a compound array

I am using compound datatypes with h5py, with some elements being variable-length arrays. I can't find a way to set the item. The following MWE shows 6 various ways to do that (sequential indexing — which would not work in h5py anyway, fused indexing, read-modify-commit for columns/rows), neither of which works.
What is the correct way? Why is h5py saying Cannot change data-type for object array when writing integer list to int32 list?
with h5py.File('/tmp/test-vla.h5','w') as h5:
dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
dset=h5.create_dataset('test',(5,),dtype=dt)
dset['a'][2]=[1,2,3] # does not write the value back
dset[2]['a']=[1,2,3] # does not write the value back
dset['a',2]=[1,2,3] # Cannot change data-type for object array
dset[2,'a']=[1,2,3] # Cannot change data-type for object array
tmp=dset['a']; tmp[2]=[1,2,3]; dset['a']=tmp # Cannot change data-type for object array
tmp=dset[2]; tmp['a']=[1,2,3]; dset[2]=tmp # 'list' object has no attribute 'dtype'
When working with compound datasets, I've discovered it's best to add all row data in a single statement. I tweaked your code and to show how add 3 rows of data (each of different length). Note how I: 1) define the row of data with a tuple; 2) define the list of integers with np.array(); and 3) don't reference the field name ['a'].
with h5py.File('test-vla.h5','w') as h5:
dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
dset=h5.create_dataset('test',(5,),dtype=dt)
print (dset.dtype, dset.shape)
dset[0] = ( np.array([0,1,2]), )
dset[1] = ( np.array([1,2,3,4]), )
dset[2] = ( np.array([0,1,2,3,4]), )
For more info, take a look at this post on the HDF Group Forum under HDF5 Ancillary Tools / h5py:
Compound datatype with int, float and array of floats

how can we check wether similar elements are present in table in lua in minimum time complexity

if a table of N integer is present how to check if an element is repeating if present it shows message that table has repeating elements, if this is to be achieved in minimum time complexity
Hash table is the way to go (ie normal Lua table). Just loop over each integer and place it into the table as the key but first check if the key already exists. If it does then you have a repeat value. So something like:
values = { 1, 2, 3, 4, 5, 1 } -- input values
local htab = {}
for _, v in ipairs(values) do
if htab[v] then print('duplicate value: ' .. v)
else htab[v] = true end
end
With small integer values the table will use an array so will be O(1) to access. With larger and therefore sparser values the values will be in the hash table part of the table which can just be assumed to be O(1) as well. And since you have N values to insert this is O(N).
Getting faster than O(N) should not be possible since you have to visit each value in the list at least once.

Query jsonb array for integer member

Background: We use PaperTrail to keep the history of our changing models. Now I want to query for a Item, which belonged to a certain customer. PaperTrail optionally stores the object_changes and I need to query this field to understand, when something was created with this ID or changed to this ID.
My table looks simplified like this:
item_type | object_changes
----------|----------------------------------------------------------
"Item" | {"customer_id": [null, 5], "other": [null, "change"]}
"Item" | {"customer_id": [4, 5], "other": ["unrelated", "change"]}
"Item" | {"customer_id": [5, 6], "other": ["asht", "asht"]}
How do I query for elements changed from or to ID 5 (so all rows above)? I tried:
SELECT * FROM versions WHERE object_changes->'customer_id' ? 5;
Which got me:
ERROR: operator does not exist: jsonb ? integer
LINE 1: ...T * FROM versions WHERE object_changes->'customer_id' ? 5;
^
HINT: No operator matches the given name and argument type(s).
You might need to add explicit type casts.
For jsonb the contains operator #> does what you ask for:
Get all rows where the number 5 is an element of the "customer_id" array:
SELECT *
FROM versions
WHERE object_changes->'customer_id' #> '5';
The #> operator expects jsonb as right operand - or a string literal that is valid for jsonb (while ? expects text). The numeric literal without single quotes you provided in your example (5) cannot be coerced to jsonb (nor text), it defaults to integer. Hence the error message. Related:
No function matches the given name and argument types
PostgreSQL ERROR: function to_tsvector(character varying, unknown) does not exist
This can be supported with different index styles. For my query suggested above, use an expression index (specialized, small and fast):
CREATE INDEX versions_object_changes_customer_id_gin_idx ON versions
USING gin ((object_changes->'customer_id'));
This alternative query works, too:
SELECT * FROM versions WHERE object_changes #> '{"customer_id": [5]}';
And can be supported with a general index (more versatile, bigger, slower):
CREATE INDEX versions_object_changes_gin_idx ON versions
USING gin (object_changes jsonb_path_ops);
Related:
Index for finding an element in a JSON array
Query for array elements inside JSON type
According to the manual, the operator ? searches for any top-level key within the JSON value. Testing indicates that strings in arrays are considered "top-level keys", but numbers are not (keys have to be strings after all). So while this query would work:
SELECT * FROM versions WHERE object_changes->'other' ? 'asht';
Your query looking for a number in an array will not (even when you quote the input string literal properly). It would only find the (quoted!) string "5", classified as key, but not the (unquoted) number 5, classified as value.
Aside: Standard JSON only knows 4 primitives: string, number, boolean and null. There is no integer primitive (even if I have heard of software adding that), integer is a just a subset of number, which is implemented as numeric in Postgres:
https://www.postgresql.org/docs/current/static/datatype-json.html#JSON-TYPE-MAPPING-TABLE
So your question title is slightly misleading as there are no "integer" members, strictly speaking.
Use a lateral join and the jsonb_array_elements_text function to process each row's object_changes:
SELECT DISTINCT v.* FROM versions v
JOIN LATERAL jsonb_array_elements_text(v.object_changes->'customer_id') ids ON TRUE
WHERE ids.value::int = 5;
The DISTINCT is only necessary if the customer_id you're looking for could appear multiple times in the array (if a different field changed but customer_id is tracked anyway).

Understanding GameMonkey Script Mixed Arrays

I was just reading some introductory stuff from GameMonkey Script on https://www.gamedev.net/articles/programming/engines-and-middleware/introduction-to-gamemonkey-script-r3297/ and when they were explaining about Mixed Arrays they say that you can access the elements using and index or a key depending on how the value was declared, so for example if i have the next array
myMixedArray = table( 1, 3, 4, KeyV = "Test", 33);
then i can access 1, 2, 4 and 33 using the next indices 0, 1, 2, 3 and
to access "Test" i'll do it like this
myMixedArray["KeyV"] <- ("Test")
now according with the following image that you can find in the above link
The number expected to be at myTest[3] is 7, but that would mean that both regular values and key-val elements are not really separated in the array.
If not then why would 7 be at the index 3 of the array?
While you can treat a gm Table as an Array or Map, you can't effectively do both at the same time.
Internally, the Table is just a hash table, and your index access method is a bit like an iterator.
In your example, because value "Test" is assigned to key 'KeyV', it messes up the otherwise contiguous index order.
Hopefully that gives you an idea of the cause. Try iterating a table with no 'keys' and again with all key value pairs. Observe the different behavior.
If you are serious about arrays, you may be better off using a binding to create an Array type with the behavior you want. GM source has an example of an array container.

Rails method for checking if a number in a range appears in an array

I have an array (or possibly a set) of integers (potentially non sequential but all unique) and a range of (sequential) numbers. If I wanted to check if any of the numbers in the range existed in the array, what would be the most efficient way of doing this?
array = [1, 2, 5, 6, 7]
range = 3..5
I could iterate over the range and check if the array include? each element but this seems wasteful and both the array and the range could easily be quite large.
Are there any methods I could use to do some sort of array.include_any?(range), or should I look for efficient search algorithms?
I would do
(array & range.to_a).present?
or
array.any? { |element| range.cover?(element) }
I would choose a version depending on the size of the range. If the range is small that the first version is probably faster, because it creates the intersection once and doesn't need to check cover for every single element in the array. Whereas if the range is huge (but the array is small) the second version might be faster, because a few comparisons might be faster that generating an array out of a huge range and building the intersection.
([1, 2, 5, 6, 7] & (3..5).to_a).any?
# => true
Don't need no stinkin' &:
array.uniq.size + range.size > (array + range.to_a).uniq.size

Resources