I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks
In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!
Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f
I have the following situation: a loop (stack data) with only 1 index variable and with multiple items corresponding to the statements, as in the picture below (sorry it is Excel, but is the same as in SPSS):
stack data - cases on multiple lines, but never filling for 1 respondent all the columns
I want to reach to the following situation but without using casestovars to restructure, because that creates a lot of empty variables. I remember for older versions it was a command like Update, which was moving up the cases, to reach the following result:
reducing the cases per respondent
Like starting from this:
ID Index Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1
1 2 1 1
1 3 1 1
To reach to this:
ID Q1_1 Q1_2 Q1_3 Q1_4 Q1_5 Q1_6
1 1 1 1 1 1 1
But without using casestovars. Is there any command in SPSS syntax for this?
Thank you very much, have a nice day!
Not entirely sure how variable your data structure is likely to be in reality but if as demo'ed where you have only a single response for each q1_1 to q1_6 per respondent ID, then the below would be sufficient:
dataset declare dsAgg.
aggregate outfile="dsAgg" /break=respid /q1_1 to q1_6=max(q1_1 to q1_6).
Also not sure of the significance of duplicate index values within the same respondent IDs, if this was intended or not.
The following syntax could do the job -
* first we'll recreate your example data.
data list list/respid index q1_1 to q1_6.
begin data
1,1,1,,,,,
1,2,,2,,,,
1,3,,,1,,,
1,4,,,,2,,
1,5,,,,,1,
1,6,,,,,,2
2,1,3,,,,,
2,1,,4,,,,
2,2,,,5,,,
2,2,,,,4,,
2,3,,,,,3,
2,3,,,,,,2
end data.
* now to work: first thing is to make sure the data from each ID are together.
sort cases by respid index.
* the loop will fill down the data to the last line of each ID.
do repeat qq=q1_1 to q1_6.
if respid=lag(respid) and missing(qq) qq=lag(qq).
end repeat.
* the following lines will help recognize the last line for each ID and select it.
compute lineNR=$casenum.
aggregate /outfile=* mode=ADDVARIABLES/break=respid/MXlineNR=max(lineNR).
select if lineNR=MXlineNR.
exe.
I have a following issue:
I need to calculate difference between consecutive points where some arbitrary ID is equal. The following:
SELECT difference(value_field) FROM mesurementName WHERE "IdField" = '10'
Works, returns difference between each consecutive point with IdField BUT IdField is lost (only time is propagated to query result). In my case time is not unique (i.e. measurement may contain many points with same timestamp, but different IdField). So I tried:
SELECT difference(value_field), IdField FROM mesurementName WHERE "IdField" = '10'
which yields:
error parsing query: mixing aggregate and non-aggregate queries is not supported!!
My next attempt was using sub-query:
SELECT IdField, diff
FROM (
SELECT
difference(flow_val) as diff
FROM
mesurementA
WHERE "IdField" = '10'
)
Which resulted in always null value in IdField.
I'd like to ask you for help or suggestion how to solve issue. By the way, we are using InfluxDB 1.3, which is not supporting JOIN anymore
If anyone would stuck as I was, then solution is following:
SELECT difference(value_field) FROM mesurementName GROUP BY "IdField"
Above somehow implicitly add "IdField" to result series and is propagated to resulting measurements with INTO clause
I have a series, disk, that contains a path (/mnt/disk1, /mnt/disk2, etc) and total space of a disk. It also includes free and used values. These values are updated at a specified interval. What I would like to do, is query to get the sum of the total of the last() of each path. I would also like to do the same for free and for used, to get a aggregate of the total size, free space, and used space of all of my disks on my server.
I have a query here that will get me the last(total) of all the disks, grouped by its path (for distinction):
select last(total) as total from disk where path =~ /(mnt\/disk).*/ group by path
Currently, this returns 5 series, each containing 1 row (the latest) and the value of its total. I then want to take the sum of those series, but I cannot just wrap the last(total) into a sum() function call. Is there a way to do this that I am missing?
Carrying on from my comment above about nested functions.
Building a toy example:
CREATE DATABASE FOO
USE FOO
Assuming your data is updated at intervals greater than[1] every minute:
CREATE CONTINUOUS QUERY disk_sum_total ON FOO
BEGIN
SELECT sum("total") AS "total_1m" INTO disk_1m_total FROM "disk"
GROUP BY time(1m)
END
Then push some values in:
INSERT disk,path="/mnt/disk1" total=30
INSERT disk,path="/mnt/disk2" total=32
INSERT disk,path="/mnt/disk3" total=33
And wait more than a minute. Then:
INSERT disk,path="/mnt/disk1" total=41
INSERT disk,path="/mnt/disk2" total=42
INSERT disk,path="/mnt/disk3" total=43
And wait a minute+ again. Then:
SELECT * FROM disk_1m_total
name: disk_1m_total
-------------------
time total_1m
1476015300000000000 95
1476015420000000000 126
The two values are 30+32+33=95 and 41+42+43=126.
From there, it's trivial to query:
SELECT last(total_1m) FROM disk_1m_total
name: disk_1m_total
-------------------
time last
1476015420000000000 126
Hope that helps.
[1] Picking intervals smaller than the update frequency prevents minor timing jitters from making all the data being accidentally summed twice for a given group. There might be some "zero update" intervals, but no "double counting" intervals. I typically run the query twice as fast as the updates. If the CQ sees no data for a window, there will be no CQ performed for that window, so last() will still give the correct answer. For example, I left the CQ running overnight and pushed no new data in: last(total_1m) gives the same answer, not zero for "no new data".
I want an alternative to running frequency for string variables because I also want to get a case number for each of the string value (I have a separate variable for case ID).
After reviewing the string values I will need to find them to recode which is the reason I need to know the case number.
I know that PRINT command should do what I want but I get an error - is there any alternative?
PRINT / id var2 .
EXECUTE.
>Error # 4743. Command name: PRINT
>The line width specified exceeds the output page width or the record length or
>the maximum record length of 2147483647. Reduce the number of variables or
>split the output line into several records.
>Execution of this command stops.
Try the LIST command.
I often use the TEMPORARY commond prior to the LIST command, as often there is only a small select of record of interest I may want to "list"/investigate.
For example, in the below, only to list the records where VAR2 is not a blank string.
TEMP.
SELECT IF (len(VAR2)>0).
LIST ID VAR2.
Alternatively, you could also (but dependent on having CUSTOM TABLES add-on module), do something like below which would get the results into a tabular format also (which may be preferable if then exporting to Excel, for example.
CTABLES /TABLE CTABLES /VLABELS VARIABLES=ALL DISPLAY=NONE
/TABLE A[C]>B[C]
/CATEGORIES VARIABLES=ALL EMPTY=EXCLUDE.