METADATA$ROW_ID outside a stream? - stream

(1) In snowflake, is there a way to access METADATA$ROW_ID outside of a snowflake table stream?
(2) Am I correct in thinking that this ROW_ID is actually necessary to process UPDATEs correctly? (The docs make it seem rather optional.)
Somewhat related,
(3) When a stream is exported and multiple files get created, is there any guarantee that each DELETE-INSERT pair will end up in the same file?

1) No, you'd need to store it in your downstream tables in order to use it, later.
2) Necessary only if you don't have a natural key in your data to use to UPDATE/MERGE on.
3) No, there isn't a way to guarantee what records get into which files on an COPY INTO location statement. The only option here is to it to SINGLE_FILE, which might not be an option and is a slower process.

Related

Dynamic query usage whole streaming with google dataflow?

I have a dataflow pipeline that is set up to receive information(JSON) and transform it into a DTO and then insert it into my database. This works great for insert, but where I am running into issues is with handling delete records. With the information I am receiving there is a deleted tag in the JSON to specify when that record is actually being deleted. After some research/experimenting, I am at a loss as whether or not this is possible.
My question: Is there a way to dynamically choose(or change) what sql statement the pipeline is using, while streaming?
To achieve this with Dataflow you need to think more in terms of water flowing through pipes than in terms of if-then-else coding.
You need to classify your records into INSERTs and DELETEs and route each set to a different sink that will do what you tell them to. You can use tags for that.
In this pipeline design example, instead of startsWithATag and startsWithBTag you can use tags for Insert and Delete.

Find changes quickly in larger SQL database?

There is a Java Swing application which uses an Informix database. I have user rights granted for the Swing application (i.e. no source code), and read only access to a mirror of the database.
Sometimes I need to find a database column, which is backing a GUI element (TextBox, TableField, Label...). What would be best approach to find out which database column and table is holding the data shown e.g. in a TextBox?
My general approach is to capture the state of the database. Commit a change using the GUI and then capture the state of the database again. Then I need to examine the difference. I've already tried:
Use the nrows field of systables: Didn't work, because the number in nrows does not seem to be a realtime representation of the row count.
Create a script with SELECT COUNT(*) ... for all tables: didn't work because too many tables (> 5000). Also tried to optimize by removing empty tables, but there are still too many left.
Is there a simple solution that I'm missing?
Please look at the Change Data Capture API and check if this suits your needs
There probably isn't a simple solution.
You probably need to build yourself a map of the database, or a data dictionary for it. It sounds as though you can eliminate many of the tables from consideration since they're empty — at least for a preliminary pass. If you're dealing with information in a text box, the chances are it is some sort of character data; you can analyze which (non-empty) tables which contain longer character strings, and they'd be the primary targets of your searches. If the schema is badly designed with lots of VARCHAR(255) columns even though the columns normally only hold short strings, life is more difficult. Over time, you can begin to classify tables and columns so that you end up knowing where to look for parts of the application.
One problem to beware of: the tabid in informix.systables isn't necessarily as stable as you'd like. Your data dictionary needs to record its own dd_tabid for the table it describes, and can store the last known tabid from informix.systables, but it needs to be ready to find a new tabid value on occasion. You should probably only mark data in your dictionary for logical deletion.
To some extent, this assumes you can create a database in which to record this information. If you can't create an Informix database, you may have to use something else (MySQL, or SQLite, perhaps) to store the data dictionary. Alternatively, go to your DBA team and ask them for the information. Unless you're trying something self-evidently untoward, they're likely to help (but politics can get in the way — I've no idea how collegial your teams are).

after clearing the neo4j database,when creating new node it starts to count from where the increment was before [duplicate]

Is there a possibility to reset the indices once I deleted the nodes just as if deleted the whole folder manually?
I am deleting the whole database with node.delete() and relation.delete() and just want the indices to start at 1 again and not where I had actually stopped...
I assume you are referring to the node and relationship IDs rather than the indexes?
Quick answer: You cannot explicitly force the counter to reset.
Slightly longer answer: Generally speaking, these IDs should not carry any relevance within your application. There have been a number of discussions about this within the Neo4j mailing list and Stack Overflow as the ID is an internal artifact and should not be used like a primary key. It's purpose is more akin to an in-memory address and if you require unique identifiers, you are better off considering something like a UUID.
You can stop your database, delete all the files in the database folder, and start it again.
This way, the ID generation will start back from 1.
This procedure completely wipes your data, so handle with care.
Now you certainly can do this using Python.
see https://stackoverflow.com/a/23310320

Trimming BOLD_CLOCKLOG table

I am doing some maintenance on a database for an application that uses the Bold for Delphi object persistence framework. This database has been been in production for several years and several of the tables have grown quite large. One of them is the BOLD_CLOCKLOG which has something to do with Bold's transaction management.
I want to trim this table (it is up to 1.2GB, with entries from Jan 2006).
Can anyone confirm the system does not need this old information?
From the bolds documentation:
BOLD_CLOCKLOG
To be able to map the transaction numbers used in the TimeStamp columns to the corresponding physical time (such as 2001-01-01 12:34) the persistence mapper will store a log with timestamps and times. Normally, this log is written for each database operation, but if the traffic to the database is very intensive, it is possible to restrict how often this log is written by setting the property ClockLogGranularity. The event OnGetCurrentTime should also be implemented to ensure that all clients have the same time.The usage of this table can be controlled with the tagged value: Model.UseClockLog
So I believe this is used for versioning Boldobjects, see Object Versioning Extension in bolds documentation. If your application don't need this you can drop this in the database.
In our Bold application we don't use that feature. Why don't simply test to turn off Bold_ClockLog in the model, drop that big table and try to use your application. I'm pretty sure if something is wrong then it say so at once.
I can also mention that we have an own custom objecthistoy. It is simply big string (as TStringList.DelimetedText) in a ObjectHistory class that have Time, user and a note about action. This suits our need better that Bolds builtin objecthistory. The disadvantage is of course that we need to add calls in the code when logging to history is done.
Bold_ClockLog is an optional table, it's purpose is to store mapping between integer timestamps and corresponding DateTime values.
This allows you to find out datetime of the last modification to any object.
If you don't need this feature feel free to empty the table, it won't cause any problems.
In addition to Bold_ClockLog, the Bold_XFiles is another optional table that tends to grow large. But unlike the Bold_ClockLog the Bold_XFiles can not be emptied.
Both of these tables can be turned on/off in the model tag values.

Amazon SimpleDB Identity Seed equivalent

Is there an equivalent to an identity Seed in SimpleDB?
If the answer is no, how do you handle creating something like a customer number or order number that will prevent the creation duplicate numbers?
My experience is mainly from SQL Server in which I would either create a primary key with an identity seed or use transactions in a stored procedure to increment the number.
Thanks for your help!
You can create unique keys using conditional writes. Just do a PutAttributes with the next customer number you want to use and the data you want to store. You can't add a condition for the actual item name, but you can use an attribute that always exists, (like creation date or user group).
Set the conditions:
Expected.1.Name=creation_date
Expected.1.Exists=false
The call will succeed only if there is no creation_date in an item with that item name. If you always write the creation_date, then you get the effect of optimistic locking on the new item name. Of course you can use any attribute you want, so long you always include it in that first conditional put.
The performance of the conditional write is the same as a normal write in most situations but when SimpleDB is under heavy load or high internal network latencies, these calls will take longer, compared to normal writes. During rare failure scenarios inside SimpleDB, the conditional writes will fail completely for a period of time.
If you can't tolerate this, you will have to code some sort of alternate way to get your unique keys during outages. A different SimpleDB region could be used for key generation only, since SimpleDB will still accept the normal writes (non-conditional PutAttributes) during outages.
If you don't already have something unique that will work, using a GUID for the Item is probably the typical solution.

Resources