How to perform a table join on two DFS tables and update the left table based on its result? - join

Two DFS tables, pt and pt1, with the same schema were created in the database dfs://db using the following scripts.
n=10000
ID=rand(100, n)
dates=2022.08.07..2022.08.11
date=rand(dates, n)
vol=rand(1..10 join int(), n)
t=table(ID, date, vol)
if(existsDatabase("dfs://db1")){
dropDatabase("dfs://db1")
}
db=database(directory="dfs://db1", partitionType=RANGE, partitionScheme=0 50 100)
pt=db.createPartitionedTable(table=t, tableName=`pt, partitionColumns=`ID)
pt.append!(t)
pt1=db.createPartitionedTable(table=t, tableName=`pt1, partitionColumns=`ID)
pt1.append!(t)
I want to update the left table (table pt1) after performing a left semi-join on pt1 and pt.
I tried the following two methods:
update pt1 set date = pt_date from lsj(pt1, pt, `ID)
x = select * from lsj(pt1, pt, `ID)
update pt1 set date = x.pt_date from x
But both ways failed and raised the errors as follows:

Currently, your update issue cannot be solved. You can get it cracked using the first method after the release of version 1.30.21/2.00.9.
In addition, the second error thrown did not exactly explain the problem. The error is reported when updating each partition. We will improve this error message in the subsequent versions.

Related

Neo4j generate heatmap data from X and Y coordinates

I need to generate heatmap data in CSV format, something like this:
X,Y,OCCURRENCES
269,697,41
199,493,8
125,318,2
205,526,24
261,572,2
My neo4j database has an entity called "Point" that contains a date, an X and a Y coordinate and it looks like this:
Point: {
"at": "2018-06-26T06:54:42.671141000+12:00"
"locationPlanX": 367,
"locationPlanY": 716
}
I have a query that gives the desired output, it works well with a few thousands of points but it starts to struggle with millions.
Query:
MATCH (point:Point)
WHERE datetime("2018-06-22T15:00:00.000000+12:00") <= point.at < datetime("2018-06-23T16:00:00.000000+12:00")
AND point.locationPlanX >= 0
AND point.locationPlanY >= 0
WITH point.locationPlanX as x, point.locationPlanY as y, COUNT(point) AS occurrences
RETURN x, y, occurrences
As I said before, the query works well for an hour of data, but it starts to struggle with days/weeks.
Is there any other thing I can do to improve my query? Or any other way to do it?
UPDATE: The 3 properties in the node are indexed.
You should create an index on :Point(at):
CREATE INDEX ON :Point(at);
That would allow your query to avoid scanning through every Point node to find the ones with acceptable at values. This should greatly speed up your query.
Also, if it is not necessary to test locationPlanX and locationPlanY for non-negativity, eliminate those tests.

Kdb+/q: How to bulk insert into a KDB+ table with an index?

I am trying to bulk insert multiple records simultaneously into a KDB+ database:
> trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
> t: .z.z / intentionally the same time
> `trades insert (t t;`buy `sell;10 10;10 10;`exch `exch;`sym `sym)
However It raises an error at the sym column
'sym
[0] `depths insert (t t;`buy `sell;10 10;10 10; `exch `exch;`sym `sym)
^
Have no Idea what I could be doing wrong here, but it seems to be value invariant i.e. it always raises an error on the last column irrespective of the value provided.
Could someone please advise me how I should go about inserting bulk records into kdb+ with an time index as depicted above.
Thanks
In your original insert statement, you had spaces between
`sym `sym
,
`exch `exch
and `buy `sell. The spaces between the symbols makes it an apply or index instead of a list which you desire.
Additionally, because you have specified your qty and price as
float
, you would have to specify the numbers as float when you are inserting to the
trades
table.
The following line should accomplish what you are intending to do:
`trades insert (2#t;`buy`sell;10 10f;10 10f;`exch`exch;`sym`sym)
Lastly, I would recommend changing the schema for the qtycolumn to int/long, as quantity generally does not require decimal points.
Hope this helps!
Daniel is on the money. To expand on his answer, q will collate space-separated lists into a single object for numeric values, and even then the type specification must be only present for the last item. Further details on list creation can be found here.
q)a:10f 10f
'10f
q)a:10 10f
Secondly, it's common for those learning kdb to often encounter type errors when appending to tables. The problem in this case is that kdb is not promoting a list of homogeneous atoms to a wider type (which is expected behaviour). The following is a useful little lambda for letting you know where you are going wrong when performing insert or upsert operations:
q)trades:([]time:`datetime$();side:`symbol$();qty:`float$();price:`float$();exch:`symbol$();sym:`symbol$())
q)rows:(t,t;`buy`sell;10 10;10 10;`exch`exch;`sym`sym)
q)insertTest:{[tab;rows] m:0!meta tab; wh: where not m[`t] ~' rt:.Q.ty each rows; #[flip;;enlist] `item`currType`expectedType!(m[`c] wh;rt wh; m[`t] wh)}
item currType expectedType
---------------------------
qty j f
price j f

InfluxDB mixing agregation function with non-aggregat fields/values

I have a following issue:
I need to calculate difference between consecutive points where some arbitrary ID is equal. The following:
SELECT difference(value_field) FROM mesurementName WHERE "IdField" = '10'
Works, returns difference between each consecutive point with IdField BUT IdField is lost (only time is propagated to query result). In my case time is not unique (i.e. measurement may contain many points with same timestamp, but different IdField). So I tried:
SELECT difference(value_field), IdField FROM mesurementName WHERE "IdField" = '10'
which yields:
error parsing query: mixing aggregate and non-aggregate queries is not supported!!
My next attempt was using sub-query:
SELECT IdField, diff
FROM (
SELECT
difference(flow_val) as diff
FROM
mesurementA
WHERE "IdField" = '10'
)
Which resulted in always null value in IdField.
I'd like to ask you for help or suggestion how to solve issue. By the way, we are using InfluxDB 1.3, which is not supporting JOIN anymore
If anyone would stuck as I was, then solution is following:
SELECT difference(value_field) FROM mesurementName GROUP BY "IdField"
Above somehow implicitly add "IdField" to result series and is propagated to resulting measurements with INTO clause

searching for closest number in a rails app

Give two parameters which correspond to two attributes on an object how can one find 20 records in a database that are closest to those two numbers.
The parameters you have are x, and y. The object also has those attributes. For example. x = 1, and y = 9999. You need to find the record that is the closest to x and y.
That depends on how you define the distance between two points. If you are using a two-dimensional cartesian coordinate system, this SQL statement will work:
SELECT id, x, y FROM points ORDER BY SQRT(POWER((X-x),2)+POWER((Y-y),2)) ASC LIMIT 20;
Where X,Y are the inputs.
It sounds like you're using geolocated data. If your database backend is Postgres, check to see if you have or can install the PostGIS extensions. This gives you very fast tools which give you searches like 'search for the nearest thing to this point', 'search for everything within this circle', 'search for everything within this square', and so on.
http://postgis.refractions.net/
You would do something like this:
CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometrycolumn] gist_geometry_ops);
Then you can do something like this - find everything within 100 metres of a point:
SELECT * FROM GEOTABLE WHERE
GEOM && GeometryFromText(’BOX3D(900 900,1100 1100)’,-1) AND
Distance(GeometryFromText(’POINT(1000 1000)’,-1),GEOM) < 100;
Examples from the manual.

ISQL Perform instruction: after editadd editupdate of table vs. after add update of table

INFORMIX-SQL 7.3 Perform Screens:
According to documentation, in an "after editadd editupdate of table" control block, its instructions are executed before the row is added or updated to the table, whereas in an "after add update of table" control block, its instructions are executed after the row has been added or updated to the table. Supposedly, this would mean that any instructions which would alter values of field-tags linked to table.columns would not be committed to the table, but field-tags linked to displayonly fields will change?
However, when using "after add update of table", I placed instructions which alter values for field-tags linked to table.columns and their displayed and committed values also changed! I would have thought that an "after add update of table" would only alter displayonly fields.
TABLES
customer
transaction
branch
interest
dates
ATTRIBUTES
[...]
q = transaction.trx_type, INCLUDE=("E","C","V","P","T"), ...;
tb = transaction.trx_int_table,
LOOKUP f1 = ta_days1_f,
t1 = ta_days1_t,
i1 = ta_int1,
[...]
JOINING *interest.int_table, ...;
[...]
INSTRUCTIONS
customer MASTER OF transaction
transaction MASTER OF customer
delimiters ". ";
AFTER QUERY DISPLAY ADD UPDATE OF transaction
if z = "E" then let q = "E"
if z = "C" then let q = "C"
if z = "1" then let q = "E"
[...]
END
Is 'z' a column in the transaction table?
Is the trouble that the value in 'z' is causing a change in the value of 'q' (aka transaction.trx_type), and the modified value is being stored in the database?
Is the value in 'z' part of the transaction table?
Have you verified that the value in the DB is indeed changed - using the Query Language option or a simple (default) form?
It might look as if it is because the instruction is also used AFTER DISPLAY, so when the values are retrieved from the DB, the value displayed in 'q' would be the mapped values corresponding to the value stored in 'z'. You would have to inspect the raw data to hide that mapping.
If this is not the problem, please:
Amend the question to show where 'z' comes from.
Also describe exactly what you do and see.
Confirm that the data in the database, as opposed to on the screen, is amended.
Please can you see whether this table plus form behaves the same for you as it does for me?
Table Transaction
CREATE TABLE TRANSACTION
(
trx_id SERIAL NOT NULL,
trx_type CHAR(1) NOT NULL,
trx_last_type CHAR(1) NOT NULL,
trx_int_table INTEGER NOT NULL
);
Form
DATABASE stores
SCREEN SIZE 24 BY 80
{
trx_id [f000]
trx_type [q]
trx_last_type [z]
trx_int_table [f001 ]
}
END
TABLES
transaction
ATTRIBUTES
f000 = transaction.trx_id;
q = transaction.trx_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T");
z = transaction.trx_last_type, UPSHIFT, AUTONEXT,
INCLUDE=("E","C","V","P","T","1");
f001 = transaction.trx_int_table;
INSTRUCTIONS
AFTER ADD UPDATE DISPLAY QUERY OF transaction
IF z = "E" THEN LET q = "E"
IF z = "C" THEN LET q = "C"
IF z = "1" THEN LET q = "E"
END
Experiments
[The parenthesized number is automatically generated by IDS/Perform.]
Add a row with data (1), V, E, 23.
Observe that the display is: 1, E, E, 23.
Exit the form.
Observe that the data in the table is: 1, V, E, 23.
Reenter the form and query the data.
Update the data to: (1), T, T, 37.
Observe that the display is: 1, T, T, 37.
Exit the form.
Observe that the data in the table is: 1, T, T, 37.
Reenter the form and query the data.
Update the data to: (1), P, 1, 49
Observe that the display is: 1, E, 1, 49.
Exit the form.
Observe that the data in the table is: 1, P, 1, 49.
Reenter the form and query the data.
Observe that the display is: 1, E, 1, 49.
Choose 'Update', and observe that the display changes to: 1, P, 1, 49.
I did the 'Observe that the data in the table is' steps using:
sqlcmd -d stores -e 'select * from transaction'
This generated lines like these (reflecting different runs):
1|V|E|23
1|P|1|49
That is my SQLCMD program, not Microsoft's upstart of the same name. You can do more or less the same thing with DB-Access, except it is noisier (13 extraneous lines of output) and you would be best off writing the SELECT statement in a file and providing that as an argument:
$ echo "select * from transaction" > check.sql
$ dbaccess stores check
Database selected.
trx_id trx_type trx_last_type trx_int_table
1 P 1 49
1 row(s) retrieved.
Database closed.
$
Conclusions
This is what I observed on Solaris 10 (SPARC) using ISQL 7.50.FC1; it matches what the manual describes, and is also what I suggested in the original part of the answer might be the trouble - what you see on the form is not what is in the database (because of the INSTRUCTIONS section).
Do you see something different? If so, then there could be a bug in ISQL that has been fixed since. Technically, ISQL 7.30 is out of support, I believe. Can you upgrade to a more recent version than that? (I'm not sure whether 7.32 is still supported, but you should really upgrade to 7.50; the current release is 7.50.FC4.)
Transcribing commentary before deleting it:
Up to a point, it is good that you replicate my results. The bad news is that in the bigger form we have different behaviour. I hope that ISQL validates all limits - things like number of columns etc. However, there is a chance that they are not properly validated, given the bug, or maybe there is a separate problem that only shows with the larger form. So, you need to ensure you have a supported version of the product and that the problem reproduces in it. Ideally, you will have a smaller version of the table (or, at least, of the form) that shows the problem, and maybe a still smaller (but not quite as small as my example) version that shows the absence of the problem.
With the test case (table schema and Perform screen that shows the problem) in hand, you can then go to IBM Tech Support with "Look - this works correctly when the form is small; and look, it works incorrectly when the form is large". The bug should then be trackable. You will need to include instructions on how to reproduce the bug similar to those I gave you. And there is no problem with running two forms - one simple and one more complex and displaying the bug - in parallel to show how the data is stored vs displayed. You could describe the steps in terms of 'Form A' and 'Form B', with Form A being Absolutely OK and Form B being Believed to be Buggy. So, add a record with certain values in Form B; show what is displayed in Form B after; show what is stored in the database in Form A after too; show that they are not different when they should be.
Please bear in mind that those who will be fixing the issue have less experience with the product than either you or me - so keep it as simple as possible. Remove as many attributes as you can; leave comments to identify data types etc.

Resources