How can I replace a ksqlDB table while keeping the same state store? - ksqldb

Using Confluent's ksqlDB 5.5.1 (which relates to ksqlDB 0.7.1 I think), I created an aggregated table:
CREATE TABLE xxx WITH (KAFKA_TOPIC = 'xxx') AS
SELECT xxx
FROM xxx
GROUP BY xxx
EMIT CHANGES;
Let's say I have to add attribute yyy to query. I have to DROP the table and re-create it. But in doing so, a new state is created and I lose all the old aggregated value.
One workaround would be to have infinite retention and re-create the table with 'auto.offset.reset'='earliest' but I'm looking for a better solution that doesn't involve infinite retention.
Any idea? It would be great to be able to do REPLACE TABLE xxx AS ....

This is not currently possible, but is being worked on: https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-28-create-or-replace.md

Related

What are best practices for deleting/altering cassandra columns of collection data-type?

In our Cassandra table, every time we change data-types of "collection-type" columns it start causing issue. For example:
For changing datatype from text to Map<text,float> we do this:
drop existing column
wait for cassandra to assimilate this change.
add column (same name) but different data-type.
This reflects fine in all nodes, but Cassandra logs start complaining during compaction with:
RuntimeException: 6d6...73 is not defined as a collection
I figured out the comparator entries are not correct in "system.schema_columnfamilies" table. Dropping table and recreating it fixes the problem but its not possible always.
Are there some best-practices when we are dealing with collection type columns in situations like above ?
database-version: DataStax-Enterprise: 4.7.1 Cassandra 2.1.8.621
cqlsh 5.0.1
I guess you stumbled upon one of those WAT moments in Cassandra. Its a bad practice to name a new column as previously dropped one. Sometimes it even doesn't work.
Regarding schema (or data) migrations take a look at our tool. It can help you execute schema updates while keeping data and populating fields.

Soft deletes with append only database in Ruby on Rails

Using Ruby on Rails and an append only database (Google BigQuery), what is the best practice for soft deletes? The pattern I'm considering is to append a new row for each update/delete and only collect the most recent record. But I'm not sure of a clean way to do that with Active Record. Any other suggested patterns / best practices?
Big Query is for analytics against massive datasets.
If this is your case - you can ignore slowness that will potentially be introduced by adding new update/delete rows and “keeping” historical rows.
In BigQuery it is quite simple to get most recent version of your row using window function.
For example, assuming "id" is the primary key defining record/row and "ts" is timestamp
SELECT <fields list> FROM (
SELECT <fields list>, ROW_NUMBER() OVER(PARTITION BY id ORDER BY ts DESC) AS num
FROM YourTable
)
WHERE num = 1
If you need to do historical analysis using historical data – here we go – it is easy to make selection that will represent respective version of row on that time.
If you do not need historical versions you can periodically do cleaning – for this it is better to keep you data partitioned by day (or month or whatever lese dimantion better fits to be partitioned by in your case)
BigQuery have excellent support for querying partitioned tables - Table wildcard functions
If you want to stick with BigQuery – this will be a good approach
I do recommend to explore it more
There's two things to consider here. How authentic do you want your revision history, and how important is performance?
The quick and dirty way to do this is to save a JSON copy of your record to a history table. This is easy to do, but there's no guarantee that the data in this will be schema compatible with future versions of the table, that is, you may not be able to easily restore it.
Don't pollute your primary table with deleted or historical versions. That leads to nothing but trouble and makes querying brutally slow.

Most effective, secure way to delete Postgres column data

I have a column in my Postgres table that I want to remove for expired rows. What's the best way to do this securely? It's my understanding that simply writing 0's for those columns is ineffective because Postgres creates a new row upon Updates and marks the old row as dead.
Is the best way to set the column to null and manually vacuum to clean up the old records?
I will first say that it is bad practice to alter data like this - you are changing history. Also the below is only ONE way to do this (a quick and dirty way and not to be recommended):
1 Backup your database first.
2 Open PgAdmin, select the database, open the Query Editor and run a query.
3 It would be something like this
UPDATE <table_name> SET <column_name>=<new value (eg null)>
WHERE <record is dead>
The WHERE part is for you to figure out based on you are identifying which rows are dead (eg. is_removed=true, is_deleted=true are common for identifying soft deleted records).
Obviously you would have to run this script regularly. The better way would be to update your application to do this job instead.

Unable to edit data created on primary node in mongodb after the node became secondary

I'm using mongodb replica set in my rails application with 1 primary(node A) and 1 secondary node(node B).
It was working prefectly fine until i added one more node(node C) and made node C as primary. Now that primary node (node C) is having all the content but as per my observation content created on previous primary(node A) can only be read now but not edited or destroyed. As i have understood that data can only be written to primary node so i guess data from secondary(node A- earlier primary) can only be read while being accessed.
Is this a common behaviour or i'm missing something?
EDIT:
I took a db dump of replica set from the primary node(node C) and then db.dropDatabase() and mongorestore again. I found data missing in some collections. Can anyone explain what could be the issue.
In a mongodb replica set you can only write (modify, create, destroy) on the primary node. Writes are then propagated to other (secondary) nodes in the replica set. Note that this propagation may not be immediate.
However when the primary change you should be able to write on data previously written by another primary.
Note that when you add a node to a replica set, it's preferable to load the latest database backup within this node before. The replication process is based on an oplog shared between each node that indicates creation/deletion/update, however this oplog has a limited number of entries. So earlier entries may not be considered by your new primary ...

Importing data from oracle to neo4j using java API

Can u please share any links/sample source code for generating the graph using neo4j from Oracle database tables data .
And my use case is oracle schema table names as Nodes and columns are properties. And also need to genetate graph in tree structure.
Make sure you commit the transaction after creating the nodes with tx.success(), tx.finish().
If you still don't see the nodes, please post your code and/or any exceptions.
Use JDBC to extract your oracle db data. Then use the Java API to build the corresponding nodes :
GraphDatabaseService db;
try(Transaction tx = db.beginTx()){
Node datanode = db.createNode(Labels.TABLENAME);
datanode.setProperty("column name", "column value"); //do this for each column.
tx.success();
}
Also remember to scale your transactions. I tend to use around 1500 creates per transaction and it works fine for me, but you might have to play with it a little bit.
Just do a SELECT * FROM table LIMIT 1000 OFFSET X*1000 with X being the value for how many times you've run the query before. Then keep those 1000 records stored somewhere in a collection or something so you can build your nodes with them. Repeat this until you've handled every record in your database.
Not sure what you mean with "And also need to genetate graph in tree structure.", if you mean you'd like to convert foreign keys into relationships, remember to just index the key and in stead of adding the FK as a property, create a relationship to the original node in stead. You can find it by doing an index lookup. Or you could just create your own little in-memory index with a HashMap. But since you're already storing 1000 sql records in-memory, plus you are building the transaction... you need to be a bit careful with your memory depending on your JVM settings.
You need to code this ETL process yourself. Follow the below
Write your first Neo4j example by following this article.
Understand how to model with graphs.
There are multiple ways of talking to Neo4j using Java. Choose the one that suits your needs.

Resources