What is the method for removing inactive, unwanted node labels in a Neo4j database (community edition version 2.2.2)?
I've seen this question in the past but for whatever reason it gets many interpretations, such as clearing browser cache, etc.
I am referring here to labels actually contained in the database, such that the REST command
GET /db/data/labels
will produce them in its output. The labels have been removed from any nodes and there are not active constraints attached to them.
I am aware this question has been asked in the past and there is a cumbersome way of solving it, which is basically, dump and reload the database. The dump command doesn't even contain scattered commit statements and thus needs to be edited before executing it back. Of course this takes forever with big databases. There has to be a better way, or at least there is a feature in the queue of requirements waiting to be implemented. Can someone clarify?
If you delete the last node with a certain label - as you've observed - the label itself does not get deleted. As of today there is no way to delete a label.
However you can copy over the datastore in offline mode using e.g. Michael's store copy tool to achieve this.
The new store is then aware of only those labels which actually are used.
Related
In neo4j I have an application where an API endpoint does CRUD operations on the graph, then I materialize reachable parts of the graph starting at known nodes, and finally I send out the materialized subgraphs to a bunch of other machines that don’t know how to query neo4j directly. However, the materialized views are moderately large, and within a given minute only small parts of each one will change, so I’d like to be able to query “what has changed since last time I checked” so that I only have to send the deltas. What’s the best way to do that? I’m not sure if it helps, but my data doesn’t contain arbitrary-length paths — if needed I can explicitly write each node and edge type into my query.
One possibility I imagined was adding a “last updated” timestamp as a property on every node and edge, and instead of deleting things directly, just add a “deleted” boolean property and update the timestamp, and then use some background process to actually delete a few minutes later (after the deltas have been sent out). Then in my query, select all reachable nodes and edges and filter them based on the timestamp property. However:
If there’s clock drift between two different neo4j write servers and the Raft leader changes from one to the other, can the timestamps go back in time? Or even worse, will two concurrent writes always give me a transaction time that is in commit order, or can they be reordered within a single box? I would rather use a graph-wide monotonically-increasing integer like
the write commit ID, but I can’t find a function that gives me that.
Or theoretically I could use the cookie used for causal consistency,
but since you only get that after the transaction is complete, it’d
be messy to have to do every write as two separate transactions.
Also, it just sucks to use deletion markers because then you have to explicitly remove deleted edges / nodes in every other query you do.
Are there other better patterns here?
the code i'm working on makes heavy usage of TFDMemTables, and clones of those tables using CloneCursor.
Sometimes, under specific conditions which I am unable to identify, the source table and its clone become out of sync: the data between them may be different, the record count as well.
Calling Refresh on the cloned table puts things back in order.
From my understanding, CloneCursor is used to address the same underlying memory where data is stored, meaning alterations to the underlying data from any of the two pointers should reflect on the other table, yet allow the user to have separate filter / record positioning per "view". so how can it possibly go out of sync?
I built a small simulator, where I can insert / delete / filter records in either the table or its clone, and observe the impact on the other one. Changes were reflected correctly.
Another downside of Refresh is that it slows the execution tremendously, if overused.
Has anyone faced similar issues or found explanations / documentation regarding this matter?
Edit:
to clarify what I mean by "out of sync", it means reading a value from the table using FieldByName will return X prior to Refresh, and Y post-refresh. I was not able to reproduce this behavior on the simulator mentioned above.
Before you link me to a duplicate, please read what I'm asking..
I'm building an app which basically has a list of about 5000 teams. These teams are fairly static (they don't change very often). I would like to observe any time one is changed though as it's essential it get's updated in the app ASAP.
If I include dbTeams.ref.observe(.childAdded, with: {}), it runs each time the app starts, loading over all 5000 records despite having them in the persistent storage already (I have enabled persistence).
Now the documentation says this will happen, I know, but with 5000 records (and potentially way more in the future), I can't have this happen.
My options so far (from what I've found and tried) are:
Add a timestamp to each record and create a custom query to call .childAdded after the last timestamp... This is inefficient. Storing a timestamp for soccer teams which will hardly ever change, is silly. It also means keeping a copy of the last time it was checked.
Create a sub-list within the Teams list. This too is silly as you may as well call .value and get the whole bunch of data in one go.
Just live with it... Fine - until it scales to tens of thousands of records. Not clever either.
It just seems weird that all the other event listeners only fire when they are "supposed to" except this one.
Any help would be appreciated - how do I achieve what I need?
The idea is to have git or a git-like system (users, revision tracking, branches, forks, etc) store the 'master copy' of objects and relationships.
Since the master copy is on the filesystem, any changes can be checked in, tracked, and backed up. Neo4j could then import the files and serve queries. This also gives freedom since node and connection files can be imported to any other database.
Changes in Neo4j can be written to these files as part of the query
Nodes and connections can be added by other means (like copying from a seed dataset)
Nodes and connections are rarely created/updated/deleted by users
Most of the usage is where Neo4j shines: querying
Due to these two, the performance penalty on importing can be safely ignored
What's the best way to set this up?
If this isn't wise; how come?
It's possible to do that, but it will be lot of work which would not have a real value. IMHO.
With unmanaged extension for Transaction Event API you are able to store information about each transaction onto disk in your common file format.
Here is the some information about Transaction Event API - http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html
Could you please tell us more about the use case and how would design that system?
In general nothing keeps you from just keeping neo4j database files around (zipped).
Otherwise I would probably use a format which can be quickly exported / imported and diffed too.
So very probably csv files with node-file per label ordered by a sensible key
And then relationship-files between pairs of nodes, with neo4j-import you can recover that data quickly into a graph again.
If you want to write changes to the files you have to make sure they are replayable (appends + updates + deletes) , i.e. you have to chose a format which is more or less a transaction-log (which Neo4j already has).
If you want to do it yourself the TransactionHandler is what you want to look at. Alternatively you could dump the full database to a snapshot at times you request.
There are plans to add point-in-time recovery on the existing tx-logs, which I think would also address your question.
Backstory
This afternoon, I replied to a text from my girlfriend, then apparently neglected to sleep my phone before putting it back in my pocket. When I pulled it back out a few minutes later, my phone had decided to hit "Edit->Clear All" on the conversation, vaporizing two years and two phones worth of SMS history with her. While I have a backup of the phone, it's close to three weeks old at this point, and there's enough solid discussion that I'd like to reconstruct; I've already grabbed a copy of sms.db, but I think the method I used vacuumed the file, so there are no soft-deleted texts in it.
Meat of the Question
I have a three-week old backup of my sms.db, and have access to date copy of her sms.db. I'd like to
export the texts she has but I don't (easy, at least to CSV)
change the "perspective" info (the address field and the sent/received/deleted/unknown field), keeping the timestamp and text
import/merge these new entries into my old sms.db backup
merge this updated backup with my current sms.db (optional/there seems to be an online utility for that)
I don't really know SQL but would be willing to learn; the problem I have is that from what I understand, the tables within sms.db have become more interdependent over the OS's lifespan, and the triggers now call C functions that don't exist outside the phone, so it's not a simple matter of calling a single trigger on multiple entries. Does anyone know of any ways to work around this complexity, or even better, any utilities that have already figured out how to import individual entries into sms.db?
Edit:
I've been examining sms.db, and from what I can tell, the relationships are pretty straightforward:
for message, I need to mostly make sure that the ROWID of any added messages are higher than the current highest ROWID
msg_group holds the message:ROWID of the last message for each contact; I can lookup the correct address within group_member; group_member:group_id corresponds with msg_group:ROWID
msg_group has a hash column; this will probably be the hardest thing to update, since I'm not immediately sure what it's updating, or what hash to use
sqlite_sequencedoesn't seem like it's quite up-to-date; its entries seem to all be smaller than the actual ROWIDs, but I assume this means I won't have to mess with it very much.
I'm not really sure that I'll be able to change msg_pieces at all: it's the table in charge of handling the multiple parts of an MMS message.
Hey did you get this sorted out? if you haven't I suggest taking a look at http://smsmerge.homedns.org/
I have been in a similar position as you have, but I was lucky and had a more recent backup than that.
Let me know if you need a hand with it