Kdb+/q: How to bulk insert into a KDB+ table with an index? - time-series

I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks

In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!

Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f

Related

Denodo: How to aggregate varchar data types?

I'm creating an aggregate from a anstime column in a view table in Denodo and I'm using a Cast to convert it to float and it works only for those numbers with period (example 123.123) but does not work for the numbers without period (example 123). Here's my code which only works for those numbers with period:
SELECT row_date,
case
when sum(cast(anstime as float)) is null or sum(cast(anstime as float)) = 0
then 0
else sum(cast(anstime as float))
end as xans
FROM table where anstime like '%.%'
group by row_date
Can someone please help me how to handle those without period?
My guess is you've got values in anstime which are are not numeric, hence why not having the where anstime like '%.%' predicate causes a failure, as has been mentioned in other comments.
You could try adding in an intermediate view before this one which strips out any non numeric values (leaving the decimal point character of course) and this might then allow you to not have to use the where anstime like '%.%' filter.
Perhaps the REGEXP function which would possibly help there
Your where anstime like '%.%' clause is going to restrict possible responses to places where anstime has a period in it. Remove that if you want to allow all values.
I appreciate those who responded to my concern. In the end we had to reach out to our developers to fix the data type of the column from varchar to float rather than doing a workaround.

InfluxDB mixing agregation function with non-aggregat fields/values

I have a following issue:
I need to calculate difference between consecutive points where some arbitrary ID is equal. The following:
SELECT difference(value_field) FROM mesurementName WHERE "IdField" = '10'
Works, returns difference between each consecutive point with IdField BUT IdField is lost (only time is propagated to query result). In my case time is not unique (i.e. measurement may contain many points with same timestamp, but different IdField). So I tried:
SELECT difference(value_field), IdField FROM mesurementName WHERE "IdField" = '10'
which yields:
error parsing query: mixing aggregate and non-aggregate queries is not supported!!
My next attempt was using sub-query:
SELECT IdField, diff
FROM (
SELECT
difference(flow_val) as diff
FROM
mesurementA
WHERE "IdField" = '10'
)
Which resulted in always null value in IdField.
I'd like to ask you for help or suggestion how to solve issue. By the way, we are using InfluxDB 1.3, which is not supporting JOIN anymore
If anyone would stuck as I was, then solution is following:
SELECT difference(value_field) FROM mesurementName GROUP BY "IdField"
Above somehow implicitly add "IdField" to result series and is propagated to resulting measurements with INTO clause

How do query expression joins depend on the order of keys?

In the documentation for query expressions, I found:
Note that the order of the keys around the = sign in a join expression is significant.
I can't, however, find any information about how exactly the order is significant, what difference it makes, or what the rationale was for making an equality operator non-symmetric.
Can anyone either explain or point me to some better documentation?
This is important for joins. For example, if you look at the sample for leftOuterJoin:
query {
for student in db.Student do
leftOuterJoin selection in db.CourseSelection on
(student.StudentID = selection.StudentID) into result
for selection in result.DefaultIfEmpty() do
select (student, selection)
}
The order determines what happens when "missing" values occur. The key is this line in the docs:
If any group is empty, a group with a single default value is used instead.
With the current order, every StudentID within db.Student will be represented, even if db.CourseSelection doesn't have a matching element. If you reverse the order, the opposite is true - every "course selection" will be represented, with missing students getting the default value. This would mean that, in the above, if you switched the order, any students without a course selection would have no representation in the results, where the current order always shows every student.
The expression on the left of the operator must be derived from the "outer" thing being joined and the expression on the right must be derived from the "inner" thing (as you mention in your comment on Reed's answer). This is because of the LINQ API - the actual method that is invoked to build the query looks like this:
static member Join<'TOuter, 'TInner, 'TKey, 'TResult> :
outer:IQueryable<'TOuter> *
inner:IEnumerable<'TInner> *
outerKeySelector:Expression<Func<'TOuter, 'TKey>> *
innerKeySelector:Expression<Func<'TInner, 'TKey>> *
resultSelector:Expression<Func<'TOuter, 'TInner, 'TResult>> -> IQueryable<'TResult>
So you can't join on arbitrary boolean expressions (which you can do in SQL - something like JOIN ON a.x + b.y - 7 > a.w * b.z is fine in SQL but not in LINQ), you can only join based on an equality condition between explicit projections of the outer and inner tables. In my opinion this is a very unfortunate design decision, but it's been carried forward from LINQ into F#.

Return every nth row from database using ActiveRecord in rails

Ruby 1.9.2 / rails 3.1 / deploy onto heroku --> posgresql
Hi, Once a number of rows relating to an object goes over a certain amount, I wish to pull back every nth row instead. It's simply because the rows are used (in part) to display data for graphing, so once the number of rows returned goes above say 20, it's good to return every second one, and so forth.
This question seemed to point in the right direction:
ActiveRecord Find - Skipping Records or Getting Every Nth Record
Doing a mod on row number makes sense, but using basically:
#widgetstats = self.widgetstats.find(:all,:conditions => 'MOD(ROW_NUMBER(),3) = 0 ')
doesn't work, it returns an error:
PGError: ERROR: window function call requires an OVER clause
And any attempt to solve that with e.g. basing my OVER clause syntax on things I see in the answer on this question:
Row numbering in PostgreSQL
ends in syntax errors and I can't get a result.
Am I missing a more obvious way of efficiently returning every nth task or if I'm on the right track any pointers on the way to go? Obviously returning all the data and fixing it in rails afterwards is possible, but terribly inefficient.
Thank you!
I think you are looking for a query like this one:
SELECT * FROM (SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id) stats WHERE mod(rownum,3) = 0
This is difficult to build using ActiveRecord, so you might be forced to do something like:
#widgetstats = self.widgetstats.find_by_sql(
%{
SELECT * FROM
(
SELECT widgetstats.*, row_number() OVER () AS rownum FROM widgetstats ORDER BY id
) AS stats
WHERE mod(rownum,3) = 0
}
)
You'll obviously want to change the ordering used and add any WHERE clauses or other modifications to suit your needs.
Were I to solve this, I would either just write the SQL myself, like the SQL that you linked to. You can do this with
my_model.connection.execute('...')
or just get the id numbers and find by id
ids = (1..30).step(2)
my_model.where(id => ids)

ISQL Perform instruction: after editadd editupdate of table vs. after add update of table

INFORMIX-SQL 7.3 Perform Screens:
According to documentation, in an "after editadd editupdate of table" control block, its instructions are executed before the row is added or updated to the table, whereas in an "after add update of table" control block, its instructions are executed after the row has been added or updated to the table. Supposedly, this would mean that any instructions which would alter values of field-tags linked to table.columns would not be committed to the table, but field-tags linked to displayonly fields will change?
However, when using "after add update of table", I placed instructions which alter values for field-tags linked to table.columns and their displayed and committed values also changed! I would have thought that an "after add update of table" would only alter displayonly fields.
TABLES
customer
transaction
branch
interest
dates
ATTRIBUTES
[...]
q = transaction.trx_type, INCLUDE=("E","C","V","P","T"), ...;
tb = transaction.trx_int_table,
LOOKUP f1 = ta_days1_f,
t1 = ta_days1_t,
i1 = ta_int1,
[...]
JOINING *interest.int_table, ...;
[...]
INSTRUCTIONS
customer MASTER OF transaction
transaction MASTER OF customer
delimiters ". ";
AFTER QUERY DISPLAY ADD UPDATE OF transaction
if z = "E" then let q = "E"
if z = "C" then let q = "C"
if z = "1" then let q = "E"
[...]
END
Is 'z' a column in the transaction table?
Is the trouble that the value in 'z' is causing a change in the value of 'q' (aka transaction.trx_type), and the modified value is being stored in the database?
Is the value in 'z' part of the transaction table?
Have you verified that the value in the DB is indeed changed - using the Query Language option or a simple (default) form?
It might look as if it is because the instruction is also used AFTER DISPLAY, so when the values are retrieved from the DB, the value displayed in 'q' would be the mapped values corresponding to the value stored in 'z'. You would have to inspect the raw data to hide that mapping.
If this is not the problem, please:
Amend the question to show where 'z' comes from.
Also describe exactly what you do and see.
Confirm that the data in the database, as opposed to on the screen, is amended.
Please can you see whether this table plus form behaves the same for you as it does for me?
Table Transaction
CREATE TABLE TRANSACTION
(
trx_id SERIAL NOT NULL,
trx_type CHAR(1) NOT NULL,
trx_last_type CHAR(1) NOT NULL,
trx_int_table INTEGER NOT NULL
);
Form
DATABASE stores
SCREEN SIZE 24 BY 80
{
trx_id [f000]
trx_type [q]
trx_last_type [z]
trx_int_table [f001 ]
}
END
TABLES
transaction
ATTRIBUTES
f000 = transaction.trx_id;
q = transaction.trx_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T");
z = transaction.trx_last_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T","1");
f001 = transaction.trx_int_table;
INSTRUCTIONS
AFTER ADD UPDATE DISPLAY QUERY OF transaction
IF z = "E" THEN LET q = "E"
IF z = "C" THEN LET q = "C"
IF z = "1" THEN LET q = "E"
END
Experiments
[The parenthesized number is automatically generated by IDS/Perform.]
Add a row with data (1), V, E, 23.
Observe that the display is: 1, E, E, 23.
Exit the form.
Observe that the data in the table is: 1, V, E, 23.
Reenter the form and query the data.
Update the data to: (1), T, T, 37.
Observe that the display is: 1, T, T, 37.
Exit the form.
Observe that the data in the table is: 1, T, T, 37.
Reenter the form and query the data.
Update the data to: (1), P, 1, 49
Observe that the display is: 1, E, 1, 49.
Exit the form.
Observe that the data in the table is: 1, P, 1, 49.
Reenter the form and query the data.
Observe that the display is: 1, E, 1, 49.
Choose 'Update', and observe that the display changes to: 1, P, 1, 49.
I did the 'Observe that the data in the table is' steps using:
sqlcmd -d stores -e 'select * from transaction'
This generated lines like these (reflecting different runs):
1|V|E|23
1|P|1|49
That is my SQLCMD program, not Microsoft's upstart of the same name. You can do more or less the same thing with DB-Access, except it is noisier (13 extraneous lines of output) and you would be best off writing the SELECT statement in a file and providing that as an argument:
$ echo "select * from transaction" > check.sql
$ dbaccess stores check
Database selected.
trx_id trx_type trx_last_type trx_int_table
1 P 1 49
1 row(s) retrieved.
Database closed.
$
Conclusions
This is what I observed on Solaris 10 (SPARC) using ISQL 7.50.FC1; it matches what the manual describes, and is also what I suggested in the original part of the answer might be the trouble - what you see on the form is not what is in the database (because of the INSTRUCTIONS section).
Do you see something different? If so, then there could be a bug in ISQL that has been fixed since. Technically, ISQL 7.30 is out of support, I believe. Can you upgrade to a more recent version than that? (I'm not sure whether 7.32 is still supported, but you should really upgrade to 7.50; the current release is 7.50.FC4.)
Transcribing commentary before deleting it:
Up to a point, it is good that you replicate my results. The bad news is that in the bigger form we have different behaviour. I hope that ISQL validates all limits - things like number of columns etc. However, there is a chance that they are not properly validated, given the bug, or maybe there is a separate problem that only shows with the larger form. So, you need to ensure you have a supported version of the product and that the problem reproduces in it. Ideally, you will have a smaller version of the table (or, at least, of the form) that shows the problem, and maybe a still smaller (but not quite as small as my example) version that shows the absence of the problem.
With the test case (table schema and Perform screen that shows the problem) in hand, you can then go to IBM Tech Support with "Look - this works correctly when the form is small; and look, it works incorrectly when the form is large". The bug should then be trackable. You will need to include instructions on how to reproduce the bug similar to those I gave you. And there is no problem with running two forms - one simple and one more complex and displaying the bug - in parallel to show how the data is stored vs displayed. You could describe the steps in terms of 'Form A' and 'Form B', with Form A being Absolutely OK and Form B being Believed to be Buggy. So, add a record with certain values in Form B; show what is displayed in Form B after; show what is stored in the database in Form A after too; show that they are not different when they should be.
Please bear in mind that those who will be fixing the issue have less experience with the product than either you or me - so keep it as simple as possible. Remove as many attributes as you can; leave comments to identify data types etc.

Resources