Writing variable-length sequence to a compound array - hdf5

I am using compound datatypes with h5py, with some elements being variable-length arrays. I can't find a way to set the item. The following MWE shows 6 various ways to do that (sequential indexing — which would not work in h5py anyway, fused indexing, read-modify-commit for columns/rows), neither of which works.
What is the correct way? Why is h5py saying Cannot change data-type for object array when writing integer list to int32 list?
with h5py.File('/tmp/test-vla.h5','w') as h5:
dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
dset=h5.create_dataset('test',(5,),dtype=dt)
dset['a'][2]=[1,2,3] # does not write the value back
dset[2]['a']=[1,2,3] # does not write the value back
dset['a',2]=[1,2,3] # Cannot change data-type for object array
dset[2,'a']=[1,2,3] # Cannot change data-type for object array
tmp=dset['a']; tmp[2]=[1,2,3]; dset['a']=tmp # Cannot change data-type for object array
tmp=dset[2]; tmp['a']=[1,2,3]; dset[2]=tmp # 'list' object has no attribute 'dtype'

When working with compound datasets, I've discovered it's best to add all row data in a single statement. I tweaked your code and to show how add 3 rows of data (each of different length). Note how I: 1) define the row of data with a tuple; 2) define the list of integers with np.array(); and 3) don't reference the field name ['a'].
with h5py.File('test-vla.h5','w') as h5:
dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
dset=h5.create_dataset('test',(5,),dtype=dt)
print (dset.dtype, dset.shape)
dset[0] = ( np.array([0,1,2]), )
dset[1] = ( np.array([1,2,3,4]), )
dset[2] = ( np.array([0,1,2,3,4]), )
For more info, take a look at this post on the HDF Group Forum under HDF5 Ancillary Tools / h5py:
Compound datatype with int, float and array of floats

Related

Sorting rows in a Grid column of String type based on corresponding integer values

I have the following structure for a Grid and wanted to know how to sort a column based on the Integer values of the Strings. The data provider is not flexible to change, so I have to sort with some kind of intermediary step:
Grid<String[]> grid = new Grid<>();
...
grid.addColumn(str -> str[columnIndex]).setHeader("sample").setKey("integercolumn").setSortable(true);
...
GridSortOrder<String> order = new GridSortOrder<>(grid.getColumnByKey("integercolumn"), SortDirection.DESCENDING);
grid.sort(Arrays.asList(order));
This sorts two digit numbers properly but not one digit or 3+.
You can define a custom comparator on the column that is used for sorting its values. In your case the comparator needs to extract the column-specific value from the array, and then convert it to an int:
grid.addColumn(row -> row[columnIndex])
.setHeader("sample")
.setKey("integercolumn")
.setSortable(true)
.setComparator(row -> Integer.parseInt(row[columnIndex]));
See https://vaadin.com/docs/latest/components/grid/#specifying-the-sort-property.

Lua: Sort table of numbers with multiple dots

I have a table of strings like this:
{
"1",
"1.5",
"3.13",
"1.2.5.7",
"2.5",
"1.3.5",
"2.2.5.7.10",
"1.17",
"1.10.5",
"2.3.14.9",
"3.5.21.9.3",
"4"
}
And would like to sort that like this:
{
"1",
"1.2.5.7",
"1.3.5",
"1.5",
"1.10.5",
"1.17",
"2.2.5.7.10",
"2.3.14.9",
"2.5",
"3.5.21.9.3",
"3.13",
"4"
}
How do I sort this in Lua? I know that table.sort() will be used, I just don't know the function (second parameter) to use for comparison.
Given your requirements, you probably want something like natural sort order. I described several possible solution as well as their impact on the results in a blog post.
The simplest solution may look like this (below), but there are 5 different solutions listed with different complexity and the results:
function alphanumsort(o)
local function padnum(d) return ("%03d%s"):format(#d, d) end
table.sort(o, function(a,b)
return tostring(a):gsub("%d+",padnum) < tostring(b):gsub("%d+",padnum) end)
return o
end
table.sort sorts ascending by default. You don't have to provide a second parameter then. As you're sorting strings Lua will compare the strings character by character. Hence you must implement a sorting function that tells Lua which comes first.
I just don't know the function (second parameter) to use for
comparison.
That's why people wrote the Lua Reference Manual
table.sort (list [, comp])
Sorts the list elements in a given order, in-place, from list1 to
list[#list]. If comp is given, then it must be a function that
receives two list elements and returns true when the first element
must come before the second in the final order, so that, after the
sort, i <= j implies not comp(list[j],list[i]). If comp is not given,
then the standard Lua operator < is used instead.
The comp function must define a consistent order; more formally, the
function must define a strict weak order. (A weak order is similar to
a total order, but it can equate different elements for comparison
purposes.)
The sort algorithm is not stable: Different elements considered equal
by the given order may have their relative positions changed by the
sort.
Think about how you would do it with pen an paper. You would compare each number segment. As soon as a segment is smaller than the other you know this number comes first.
So a solution would probably require you to get those segments for the strings, convert them to numbers so you can compare their values...

Kdb+/q: How to bulk insert into a KDB+ table with an index?

I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks
In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!
Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f

Active Record Array array query - to check records that are present in an array

I have an Objective model which has an attribute called as labels whose values are array data type. I need to query all the Objectives whose labels attribute has values that are present in some particular array.
For Example:
I have an array
a = ["textile", "blazer"]
the Objective.labels may have values as ["textile, "ramen"]
I need to return all objectives that might have either "textile" or "blazer" as one of their labels array values
I tried the following:
Objective.where("labels #> ARRAY[?]::varchar[]", ["textile"])
This returns some records.Now when I try
Objective.where("labels #> ARRAY[?]::varchar[]", ["textile", "Blazer"])
I expect it to return all Objectives which contains at-least one of the labels array value as textile or blazer.
However, it returns an empty array. Any Solutions?
Try && overlap operator.
overlap (have elements in common)
Objective.where("labels && ARRAY[?]::varchar[]", ["textile", "Blazer"])
If you have many rows, a GIN index can speed it up.

How do I collect and combine multiple arrays for calculation?

I am collecting the values for a specific column from a named_scope as follows:
a = survey_job.survey_responses.collect(&:base_pay)
This gives me a numeric array for example (1,2,3,4,5). I can then pass this array into various functions I have created to retrieve the mean, median, standard deviation of the number set. This all works fine however I now need to start combining multiple columns of data to carry out the same types of calculation.
I need to collect the details of perhaps three fields as follows:
survey_job.survey_responses.collect(&:base_pay)
survey_job.survey_responses.collect(&:bonus_pay)
survey_job.survey_responses.collect(&:overtime_pay)
This will give me 3 arrays. I then need to combine these into a single array by adding each of the matching values together - i.e. add the first result from each array, the second result from each array and so on so I have an array of the totals.
How do I create a method which will collect all of this data together and how do I call it from the view template?
Really appreciate any help on this one...
Thanks
Simon
s = survey_job.survey_responses
pay = s.collect(&:base_pay).zip(s.collect(&:bonus_pay), s.collect(&:overtime_pay))
pay.map{|i| i.compact.inject(&:+) }
Do that, but with meaningful variable names and I think it will work.
Define a normal method in app/helpers/_helper.rb and it will work in the view
Edit: now it works if they contain nil or are of different sizes (as long as the longest array is the one on which zip is called.
Here's a method that will combine an arbitrary number of arrays by taking the sum at each index. It'll allow each array to be of different length, too.
def combine(*arrays)
# Get the length of the largest array, that'll be the number of iterations needed
maxlen = arrays.map(&:length).max
out = []
maxlen.times do |i|
# Push the sum of all array elements at a given index to the result array
out.push( arrays.map{|a| a[i]}.inject(0) { |memo, value| memo += value.to_i } )
end
out
end
Then, in the controller, you could do
base_pay = survey_job.survey_responses.collect(&:base_pay)
bonus_pay = survey_job.survey_responses.collect(&:bonus_pay)
overtime_pay = survey_job.survey_responses.collect(&:overtime_pay)
#total_pay = combine(base_pay, bonus_pay, overtime_pay)
And then refer to #total_pay as needed in your view.

Resources