DD_INSERT in the update strategy transformation - data-warehouse

Informatica automatically inserts all the rows to the target...so why do we have to use update strategy transformation (DD_INSERT) to insert the records???

Good Question and you don't have to in case of insert only/update only case.
This assumption that informatica automatically inserts all the rows to the target is not true always. Yes, by default it inserts but there are many cases when we want to only update/only insert/delete/insert or update/reject the data.
Update strategy is used to control those scenarios.
DD_INSERT - This option is only to insert data.
DD_UPDATE - This updates the data based on key defined in target.
DD_REJECT - This rejects the data.
DD_DELETE - This deletes the data based on key defined in target.
Session should be data driven.
So, in your case, you can either mention dd_insert or set the session properties to insert only.
if you want to insert new data but update old data, you need to use dd_insert or dd_update.
if you want to insert new data but ignore old data, you need to use dd_insert or dd_reject.
if you want to only update old data, you need to use dd_update or set the session properties to update only..

Update strategy transformation is used when you want handy logic to check insert/update/delete records by comparing source key columns with target key columns. for each type of update strategy transformation, you can choose how you want to process your data. below link examples by Informatica will help you to understand this:
https://docs.informatica.com/data-integration/powercenter/10-5/transformation-language-reference/constants/dd_insert/examples.html
You can achieve the same by using a lookup and setting IUD Flag in expression transformation and then based on each flag you can route different types of records (ins, upd, del) into different target to update target in different ways.
Informatica is giving you read to use logic in the form of Update strategy transformation and dd_insert is one type of that.

While it is true that insert is the default behavior, it may be changed on session properties. See the Treat Source Rows As property documentation.
Only when you want to build a logic and decide what to do with a particular row, you should use Data Driven property value and then use DD_INSERT and other values according to your needs.

Related

"Transactional safety" in influxDB

We have a scenario where we want to frequently change the tag of a (single) measurement value.
Our goal is to create a database which is storing prognosis values. But it should never loose data and track changes to already written data, like changes or overwriting.
Our current plan is to have an additional field "write_ts", which indicates at which point in time the measurement value was inserted or changed, and a tag "version" which is updated with each change.
Furthermore the version '0' should always contain the latest value.
name: temperature
-----------------
time write_ts (val) current_mA (val) version (tag) machine (tag)
2015-10-21T19:28:08Z 1445506564 25 0 injection_molding_1
So let's assume I have an updated prognosis value for this example value.
So, I do:
SELECT curr_measurement
INSERT curr_measurement with new tag (version = 1)
DROP curr_mesurement
//then
INSERT new_measurement with version = 0
Now my question:
If I loose the connection in between for whatever reason in between the SELECT, INSERT, DROP:
I would get double records.
(Or if I do SELECT, DROP, INSERT: I loose data)
Is there any method to prevent that?
Transactions don't exist in InfluxDB
InfluxDB is a time-series database, not a relational database. Its main use case is not one where users are editing old data.
In a relational database that supports transactions, you are protecting yourself against UPDATE and similar operations. Data comes in, existing data gets changed, you need to reliably read these updates.
The main use case in time-series databases is a lot of raw data coming in, followed by some filtering or transforming to other measurements or databases. Picture a one-way data stream. In this scenario, there isn't much need for transactions, because old data isn't getting updated much.
How you can use InfluxDB
In cases like yours, where there is additional data being calculated based on live data, it's common to place this new data in its own measurement rather than as a new field in a "live data" measurement.
As for version tracking and reliably getting updates:
1) Does the version number tell you anything the write_ts number doesn't? Consider not using it, if it's simply a proxy for write_ts. If version only ever increases, it might be duplicating the info given by write_ts, minus the usefulness of knowing when the change was made. If version is expected to decrease from time to time, then it makes sense to keep it.
2) Similarly, if you're keeping old records: does write_ts tell you anything that the time value doesn't?
3) Logging. Do you need to over-write (update) values? Or can you get what you need by adding new lines, increasing write_ts or version as appropriate. The latter is a more "InfluxDB-ish" approach.
4) Reading values. You can read all values as they change with updates. If a client app only needs to know the latest value of something that's being updated (and the time it was updated), querying becomes something like:
SELECT LAST(write_ts), current_mA, machine FROM temperature
You could also try grouping the machine values together:
SELECT LAST(*) FROM temperature GROUP BY machine
So what happens instead of transactions?
In InfluxDB, inserting a point with the same tag keys and timestamp over-writes any existing data with the same field keys, and adds new field keys. So when duplicate entries are written, the last write "wins".
So instead of the traditional SELECT, UPDATE approach, it's more like SELECT A, then calculate on A, and put the results in B, possibly with a new timestamp INSERT B.
Personally, I've found InfluxDB excellent for its ability to accept streams of data from all directions, and its simple protocol and schema-free storage means that new data sources are almost trivial to add. But if my use case has old data being regularly updated, I use a relational database.
Hope that clear up the differences.

Is there any way to update multiple child values at once?

I need to set all the child distances to 0 (check photo of Firebase db below) in 1 setting. Is there any way I can do this? The usual update function for Firebase generally works for only one userID.
To write a value, the client must specify the complete path to that value in the database. Firebase does not support the equivalent of SQL's update queries.
So you will need to first load the data, and then update each child. You can perform those updates in a big batch if you want, using multi-location updates. For more on those, see the blog post introducing them and the answer here: Firebase - atomic write of multiple values to multiple locations

How to SELECT, modify and INSERT data in Influx DB?

I'm new to InfluxDB. I currently have two databases in influx. I now want to copy certain data points from a measurement in the database1, then I want to introduce a couple field sets manually and modify the few field values that I copied and finally insert the changed data points in database2 under a different measurement.
I can use the select into statement however that will not allow me to make changes to the datapoint.
If I use the Insert command I will have to type all the field sets and tag sets individually.
I have a solution that I can use python and query the data points and manipulate the data then insert it back. However that will be a lengthy process.
Is there any easy way to accomplish this task?
Thanks.

Erlang ets:insert_new for bag

In my code I want to take advantage of ETS's bag type that can store multiple values for single key. However, it would be very useful to know if insertion actually inserts a new value or not (i.e. if the inserted key with value was or was not present in the bag).
With type set of ETS I could use ets:insert_new, but semantics is different for bag (emphasis mine):
This function works exactly like insert/2, with the exception that instead of overwriting objects with the same key (in the case of set or ordered_set) or adding more objects with keys already existing in the table (in the case of bag and duplicate_bag), it simply returns false.
Is there a way to achieve such functionality with one call? I understand it can be achieved by a lookup followed by an optional insert, but I am afraid it might hurt performance of concurrent access.

Amazon SimpleDB Identity Seed equivalent

Is there an equivalent to an identity Seed in SimpleDB?
If the answer is no, how do you handle creating something like a customer number or order number that will prevent the creation duplicate numbers?
My experience is mainly from SQL Server in which I would either create a primary key with an identity seed or use transactions in a stored procedure to increment the number.
Thanks for your help!
You can create unique keys using conditional writes. Just do a PutAttributes with the next customer number you want to use and the data you want to store. You can't add a condition for the actual item name, but you can use an attribute that always exists, (like creation date or user group).
Set the conditions:
Expected.1.Name=creation_date
Expected.1.Exists=false
The call will succeed only if there is no creation_date in an item with that item name. If you always write the creation_date, then you get the effect of optimistic locking on the new item name. Of course you can use any attribute you want, so long you always include it in that first conditional put.
The performance of the conditional write is the same as a normal write in most situations but when SimpleDB is under heavy load or high internal network latencies, these calls will take longer, compared to normal writes. During rare failure scenarios inside SimpleDB, the conditional writes will fail completely for a period of time.
If you can't tolerate this, you will have to code some sort of alternate way to get your unique keys during outages. A different SimpleDB region could be used for key generation only, since SimpleDB will still accept the normal writes (non-conditional PutAttributes) during outages.
If you don't already have something unique that will work, using a GUID for the Item is probably the typical solution.

Resources