beatbox bulk delete: Getting MALFORMED_ID - beatbox

Just like upsert, I want to bulk delete records of a particular custom index using beatbox. Is there any way?
I am getting MALFORMED_ID when i am doing it.

Delete command in beatbox depends on delete() SOAP API call. It requires to know primary keys Id of deleted objects and there is no possibility to use external ID, because it should be known beforehand exactly what is deleted. (example for Contact object)
sql = "SELECT Id FROM Contact WHERE my_external_id__c in ({})".format(
', '.join("'{}'".format(x) for x in external_ids)
)
svc.delete([x['Id'] for x in soap.query(sql)])
You can see in the docs nearby that update() and upsert() calls support external IDs.

Related

Designing safe and efficient API for item state updates via events

Recently I've been working on a simple state-tracking system, its main purpose is to persist updates, sent periodically from a mobile client in relational database for further analysis/presentation.
The mobile client uses JWTs issued by AAD to authenticate against our APIs. I need to find a way to verify if user has permissions to send an update for a certain Item (at this moment only its creator should be able to do that).
We assume that those updates could be sent by a lot of clients, in small intervals (15-30 seconds). We will only have one Item in active state per user.
The backend application is based on Spring-Boot, uses Spring Security with MS AAD starter and Spring Data JPA.
Obviously we could just do the following:
User_1 creates Item_1
User_1 sends an Update for Item_1
Item has an owner_ID field, before inserting Update we simply check if Item_1.owner_ID=User_1.ID - this means we need to fetch the original Item before every insert.
I was wondering if there was a more elegant approach to solving these kind of problems. Should we just use some kind of caching solution to keep allowed ID pairs, eg. {User_1, Item_1}?
WHERE clause
You can include it as a condition in your WHERE clause. For example, if you are updating record X you might have started with:
UPDATE table_name SET column1 = value1 WHERE id = X
However, you can instead do:
UPDATE table_name SET column1 = value1 WHERE id = X AND owner_id = Y
If the owner isn't Y, then the value won't get updated. You can introduce a method in your Spring Data repository that looks up the Spring Security value:
#Query("UPDATE table_name SET column1 = ?value1 WHERE id = ?id AND owner_id = ?#{principal.ownerId}")
public int updateValueById(String value1, String id);
where principal is whatever is returned from Authentication#getPrincipal.
Cache
You are correct that technically a cache would prevent the first database call, but it would introduce other complexities. Keeping a cache fresh is enough of a challenge that I would try it only when it's obvious that introducing the complexity of a cache brings the required, observed performance gains.
#PostAuthorize
Alternatively, you can make the extra call and use the framework to simplify the boilerplate. For example, you can use the #PostAuthorize annotation, like so, in your controller:
#PutMapping("/updatevalue")
#Transactional
#PostAuthorize("returnObject?.ownerId == authentication.principal.ownerId")
public MyWidget update(String value1, String id) {
MyWidget widget = this.repository.findById(id);
widget.setColumn1(value1);
return widget;
}
With this arrangement, Spring Security will check the return value's ownerId against the logged-in user. If it fails, then the transaction will be rolled back, and the changes won't make it into the database.
For this to work, ensure that Spring's transaction interceptor is placed before Spring Security's post authorize interceptor like so:
#EnableMethodSecurity
#EnableTransactionManagement(order=-1)
The downside to this solution is that there are still the same two DB calls. I like it because it's allowing the framework to enforce the authorization rule. To learn more, take a look at this sample application that follows this pattern.

Dexie eachUniqueKey and Where Clause

I'm developing an application in Quasar/Electron and using Dexie/IndexedDB for my database. I want to find all distinct records in the database that contain both my Event ID and a Dog ID (both key indexed fields). I am able to do this with the following code:
await myDB.runTable
.orderBy('[fk_event+fk_dog]')
.eachUniqueKey((theDuo) => {
this.runsArray.push({eventID: theDuo[0], dogID: theDuo[1]})
})
I'm using a combined key which is working well. However, I need to have more of the records than just the keys. I need a few more fields, is this possible?
I was trying to get records with the unique key function while also using the where function, but that doesn't seem to work.
I need to get all the unique (distinct?) dogs in the table that are in a particular event. And also get their corresponding information. I'm not sure if there is a better, more efficient way to do this? I can always pull out all the records and loop through them to build a custom array, I was just hoping to do this at the table read level. (yeah I'm still in tables/records even though these are collections etc. :p ).
Even the above code gives me all the events, and I can pull out what I need with a filter. I just was thinking it would be faster and more efficient to do it at the read level.
this.enteredRuns = this.runsArray.filter((theEvent) => {
return ( (theEvent.eventID == this.currentEventID) )
})
Try
await myDB.runTable
.orderBy('[fk_event+fk_dog]')
.clone({unique: "unique"})
.toArray()
I know this isn't documented but it should do the work to use unique cursor while still extracting the whole objects and not just the keys. You cannot combine with where but you could use .filter. Just be aware that not all records with be scanned as it will jump over records with same keys - selecting the first visited records only.

create if not exists... with multiple properties (and unique ID)

First, sorry for my pretty bad english, i'm French :p
I'm currently switching from MySQL to Neo4j and i have a little question about my scripts.
I have artists and music albums; each of them linked (if needed) as (artist)-[:OWNS]->(album).
Now i develop the API for updating the information and i have a little "bug" for this :
How can i get an existing node and create it if not exist ?
For another part, i'm doing like that :
MATCH (u:User) WHERE u.id='83cac821-1607-49a3-e124-07431ef375ce' MERGE (c:Country {name:'France'}) CREATE UNIQUE (u)-[:FROM]->(c) RETURN u,c;
So, if the country "France" already exists, neo4j will not create a second one... Perfect 'cause my countries haven't ID's...
But for artists and albums, i need an unique identifier; and i can't create my request :
MATCH (ar:Artist) WHERE ar.id='83cac821-1607-49a3-e124-07431ef375ce' MERGE (al:Album {name:'Title01', id:'31efc821-1607-49a3-e124-074383ca75ce'}) CREATE UNIQUE (ar)-[:OWNS]->(al) RETURN ar,al;
In this way, i need to know the album'ID (and in my API, i don't !). In fact, i need Neo4j get the album "Title01" if exist, and create (with a fresh new ID) if not. In my exemple, if i don't give the ID, it can get the album if exist; but if not, it will create a new one without ID... And if i send an ID, neo4j will never get it (cause the title's already exist but not with this particular ID).
(Before, in Mysql i was using multiple requests : 1° search if album exist. If yes, return ID; if not create with new one and return ID. 2° the same for artist. 3° create link between them...)
Thanks for your help !
The MERGE command can be extended with ON MATCH and ON CREATE, see http://docs.neo4j.org/chunked/stable/query-merge.html#_use_on_create_and_on_match. I guess you have to something like
MATCH (ar:Artist) WHERE ar.id='83cac821-1607-49a3-e124-07431ef375ce'
MERGE (al:Album {name:'Title01'})
ON CREATE SET al.id = '31efc821-1607-49a3-e124-074383ca75ce'
CREATE UNIQUE (ar)-[:OWNS]->(al) RETURN ar,al
Here's a page that shows how to create a node if it doesn't exist: Link

Neo4j indexes and legacy data

I have a legacy dataset (ENRON data represented as GraphML) that I would like to query. In an comment in a related question, #StefanArmbruster suggests that I use Cypher to query the database. My query use case is simple: given a message id (a property of the Message node), retrieve the node that has that id, and also retrieve the sender and recipient nodes of that message.
It seems that to do this in Cypher, I first have to create an index of the nodes. Is there a way to do this automatically when the data is loaded from the graphML file? (I had used Gremlin to load the data and create the database.)
I also have an external Lucene index of the data (I need it for other purposes). Does it make sense to have two indexes? I could, for example, index the Neo4J node ids into my external index, and then query the graph based on those ids. My concern is about the persistence of these ids. (By analogy, Lucene document ids should not be treated as persistent.)
So, should I:
Index the Neo4j graph internally to query on message ids using Cypher? (If so, what is the best way to do that: regenerate the database with some suitable incantation to get the index built? Build the index on the already-existing db?)
Store Neo4j node ids in my external Lucene index and retrieve nodes via these stored ids?
UPDATE
I have been trying to get auto-indexing to work with Gremlin and an embedded server, but with no luck. In the documentation it says
The underlying database is auto-indexed, see Section 14.12, “Automatic Indexing” so the script can return the imported node by index lookup.
But when I examine the graph after loading a new database, no indexes seem to exist.
The Neo4j documentation on auto indexing says that a bunch of configuration is required. In addition to setting node_auto_indexing = true, you have to configure it
To actually auto index something, you have to set which properties
should get indexed. You do this by listing the property keys to index
on. In the configuration file, use the node_keys_indexable and
relationship_keys_indexable configuration keys. When using embedded
mode, use the GraphDatabaseSettings.node_keys_indexable and
GraphDatabaseSettings.relationship_keys_indexable configuration keys.
In all cases, the value should be a comma separated list of property
keys to index on.
So is Gremlin supposed to set the GraphDatabaseSettings parameters? I tried passing in a map into the Neo4jGraph constructor like this:
Map<String,String> config = [
'node_auto_indexing':'true',
'node_keys_indexable': 'emailID'
]
Neo4jGraph g = new Neo4jGraph(graphDB, config);
g.loadGraphML("../databases/data.graphml");
but that had no apparent effect on index creation.
UPDATE 2
Rather than configuring the database through Gremlin, I used the examples given in the Neo4j documentation so that my database creation was like this (in Groovy):
protected Neo4jGraph getGraph(String graphDBname, String databaseName) {
boolean populateDB = !new File(graphDBName).exists();
if(populateDB)
println "creating database";
else
println "opening database";
GraphDatabaseService graphDB = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( graphDBName ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "emailID" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.dump_configuration, "true").
newGraphDatabase();
Neo4jGraph g = new Neo4jGraph(graphDB);
if (populateDB) {
println "Populating graph"
g.loadGraphML(databaseName);
}
return g;
}
and my retrieval was done like this:
ReadableIndex<Node> autoNodeIndex = graph.rawGraph.index()
.getNodeAutoIndexer()
.getAutoIndex();
def node = autoNodeIndex.get( "emailID", "<2614099.1075839927264.JavaMail.evans#thyme>" ).getSingle();
And this seemed to work. Note, however, that the getIndices() call on the Neo4jGraph object still returned an empty list. So the upshot is that I can exercise the Neo4j API correctly, but the Gremlin wrapper seems to be unable to reflect the indexing state. The expression g.idx('node_auto_index') (documented in Gremlin Methods) returns null.
the auto indexes are created lazily. That is - when you have enabled the auto-indexing, the actual index is first created when you index your first property. Make sure you are inserting data before checking the existence of the index, otherwise it might not show up.
For some auto-indexing code (using programmatic configuration), see e.g. https://github.com/neo4j-contrib/rabbithole/blob/master/src/test/java/org/neo4j/community/console/IndexTest.java (this is working with Neo4j 1.8
/peter
Have you tried the automatic index feature? It's basically the use case you're looking for--unfortunately it needs to be enabled before you import the data. (Otherwise you have to remove/add the properties to reindex them.)
http://docs.neo4j.org/chunked/milestone/auto-indexing.html

Google Contacts: Unique Contacts?

I am building an application that I will need to distinguish the Google Contacts from each other. I am just wondering, as long as google sends contacts as First Name/Last Name/mail.. etc (Example) without a unique ID, what will be the first approach to distinguish each contacts?
1) Should I create an ID based on the user's fields? -> by a minimal change, it can break down.
2) Should I create an ID based on First Name + Last Name? -> but most people can have duplicate contacts on their page, would that be a problem? Or married contacts, which can create a little mess.
The reason I am asking this I am trying to create relations and I need to store the data somewhere like that [person=Darth Vader, subject=Luke Skywalker, type=father(or son)], so I need a fast algorithm that can make a mapping for each contact and retrieve the related contacts fast.
I believe they do send back an ID. From the return schema:
<link rel='self' type='application/atom+xml' href='https://www.google.com/m8/feeds/contacts/userEmail/full/contactId'/>
You could use the full HREF value as the ID, or parse out the contactID from the end of the URL, whichever you like better.

Resources