Talend: tMap not accepting lookup connection - mapping

I have database queries which returns two sets of data from the same table. I need to map the results of those queries to each other based on the values they have in common. This is how far I got:
My original idea was to use the output from tLogRow_2 as a lookup, but Talend won't let me do that. What am I doing wrong?

You can't have a closed loop in Talend. You need to remove the 2nd OnComponentOk link (to tMysqlInput_1) if you want to use that output as a lookup. You only need to hook the OnComponentOk trigger to the first component of your next subjob (a startable component).

Related

"Transactional safety" in influxDB

We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.

How to SELECT, modify and INSERT data in Influx DB?

I'm new to InfluxDB. I currently have two databases in influx. I now want to copy certain data points from a measurement in the database1, then I want to introduce a couple field sets manually and modify the few field values that I copied and finally insert the changed data points in database2 under a different measurement.
I can use the select into statement however that will not allow me to make changes to the datapoint.
If I use the Insert command I will have to type all the field sets and tag sets individually.
I have a solution that I can use python and query the data points and manipulate the data then insert it back. However that will be a lengthy process.
Is there any easy way to accomplish this task?
Thanks.

Assign Key Field Value Only If Corresponding Lookup Result value Exist

I have ten master tables and one Transaction table. In my transaction table (it is a memory table just like ClientDataSet) there are ten lookup fields pointing to my ten master tables.
Now i am trying to dynamically assigning key field values to all my lookup key field values (of the transaction table) from a different Server(data is coming as a soap xml). Before assigning these values i need to check whether the corresponding result value is valid in master tables or not. I am using a filter (eg status = 1 ) to check whether it is valid or not.
Currently how we are doing is, before assigning each key field value we are filtering the master tables using this filter and using the locate function to check whether it is there or not. and if located we will assign its key field value.
This will work fine if there is only few records in my master tables. Consider my master tables having fifty thousand records each (yeah, customer is having so much data), this will lead to big performance issue.
Could you please help me to handle this situation.
Thanks
Basil
The only way to know if it is slow, why, where, and what solution works best is to profile.
Don't make a priori assumptions.
That being said, minimizing round trips to the server and the amount of data transferred is often a good thing to try.
For instance, if your master tables are on the server (not 100% clear from your question), sending only 1 Query (or stored proc call) passing all the values to check at once as parameters and doing a bunch of "IF EXISTS..." and returning all the answers at once (either output params or a 1 record dataset) would be a good start.
And 50,000 records is not much, so, as I said initially, you may not even have a performance problem. Check it first!

VirtualStringTree - Any way to determine when a collection of nodes have been checked?

I have a VST using TriStateChecking. This is connected to a database table, so when the user checks a node, its checked field is updated in the database. I would like this to be invisible to the end user; that is no 'Save' button.
I am currently using the OnChecked() event to update the database. The problem is when checking large number of nodes, it essentially executes #CheckedNodes SQL update statements. What i would like to do is capture/be notified when all the tristatechecking is complete so i can simply scan the tree and construct one SQL update statement.
Is there an event i could use once all tristatechecking has completed?
No, the iteration is the only way to do it. Even CheckedCount property do it in this way.
Just have an internal list where you could store checked nodes and onChecked event update the list. When checking large number of nodes, just iterate through your list and construct SQL statement.
Just doing some tests, but it looks like i can simply use the OnMouseUp() event. Should have probably checked before, oops.

TADOQuery filter and an expression always true

I am trying to filter some records from a TADOQuery. I set the filtered property to true and when I set the filter to field='value', all works fine. I would like to dynamically build this filter by appending
<space>AND field='value'
to a value always true, and I thought 1=1 would do the trick. So I would have 1=1 as the default filter and then just append AND field='value' to it as necessary.
This, however, does not work. The error message reads:
Arguments are of the wrong type, are out of acceptable range, or are in conflict with one another.
Could anyone please tell me what could I use as a versatile always-true expression for this filter?
I suppose it goes without saying, but it depends on the OLE DB provider whether or not this works. When you set a filter on an existing record set, it ends up going through a different OLE DB interface (IViewFilter if I remember correctly). So even if a filter works in a WHERE clause on an SQL statement, it does not necessarily mean that it will work as a filter. The filter that you set ends up getting parsed apart into the component pieces and then passed to the OLE DB interface. It may be that the provider's implementation is not expecting a filter of the form "constant = constant". As a workaround, you might try setting it all in the WHERE clause of the SQL statement.
You have to set the 'Filtered' property to False if you are not filtering something, and set it True and your condition when you want the resultset to be filtered.
I would dynamically build the correct SQL property though so that you always exactly know what is being send to the database (and you are sure that only those records you want is received by your program).
The 1=1 trick works fine in the where clause of a query, but not in the filtered property. If you want to disable the filter, set filtered to false and all records will be returned.
The problem with filtering is that it is done client side. If you are using a database engine such as SQL Server and expect to have a large set of records to filter, then your better served by changing the SQL Query which will allow the database server to return only the records requested. Just remember to close your TAdoQuery first, change the SQL then re-open.
A trick I use to avoid returning the entire dataset (used for large datasets) is to consider a maximum number of records I want to display, then use the TOP SQL Syntax to return one more than the number of records I wanted to display 'n' ...if I reach that number, then I notify the user that there were more than n-1 records returned and to adjust the search/filter criteria.

Resources