Implementation of object cache in Neo4j 3.2.3 - neo4j

I have read this post "Understanding of Neo4j object cache", but can't find 'NodeImpl' any more in the source code of Neo4j 3.2.3.
I tried some code to track down to the implementation of Neo4j, but fail to find access to any cache other than page cache. I tried to get property of same node twice, expect to hit the cache when shoot the second query.
Node n = db.getNodeById(0);
n.getProperty("name");
String name = (String) n.getProperty("name");
System.out.println("name: " + name);
There is a lot of 'InstanceCache' inside 'StoreStatement', but as the comment implies, instance cache is used for single object, not used for connection between node and relationship as described here in 'An overview of Neo4j Internals'.
My question is:
What's the implementation of object cache inside neo4j 3.2.3 ?
Is there anything newer for internals of neo4j ? The slide I got is published 6 year ago.

The Object Cache doesn't exist anymore in Neo4j (since version 3.0 from what I remember), there is only the page cache.
Slides from Tobias that explain the graph storage are still correct.

Related

Can we use Stardog to query .ttl files?

I'm asking myself a question : I have a .ttl file stored somewhere on the internet (let's say http://www.example/org/myFile) and I want to query it.
Can I use Stardog to query it ? Something like (in node.js)
const stardog = new Stardog({
endpoint: 'http://www.example.org'
});
and query it with a SPARQL command line ?
I'm asking myself this question because I think the .ttl file need to be stored in a Stardog instance. (and then, http://www.example.org is supposed to be a Stardog instance !)
Thanks,
Clément
It is true that you cannot query a Turtle file. You need to first load it into a Stardog database. See the Known Issues in Stardog documentation:
Queries with FROM NAMED with a named graph that is not in Stardog will not cause Stardog to download the data from an arbitrary HTTP URL and include it in the query.
If you have data stored in another SPARQL endpoint you can query it using SPARQL's federated query functionality (SERVICE keyword) without loading the data into Stardog.

How do I create a spacial index in neo4j using only cypher?

I want to play with neo4j and spacial indexes. I can't find any documentation that demonstrates how to do this through cypher, only through the REST API.
Is it possibly to create spacial indexes through Cypher, say in the neo4j web console?
There is currently no way to create a spatial index using Cypher. You can either use java API or a REST call, see docs at http://neo4j-contrib.github.io/spatial/#rest-api-create-a-spatial-index for details. Since Neo4j browser allows to send HTTP POST you can type there:
:POST /db/data/index/node {"name":"geom", "config":
{"provider":"spatial", "geometry_type":"point", "lat":"lat", "lon":"lon"}
}
Alternatively you can use the index command within neo4j-shell.
Update for Neo4j 3.0
Neo4j Spatial for 3.0 provides stored procedures to manage the spatial index - and therefore everything can be done through cypher. See https://github.com/neo4j-contrib/spatial/blob/master/src/main/java/org/neo4j/gis/spatial/procedures/SpatialProcedures.java.
Note: this version is not yet released, so you have to build from source yourself.

Updating core data performance

I'm creating an app that uses core data to store information from a web server. When there's an internet connection, the app will check if there are any changes in the entries and update them. Now, I'm wondering which is the best way to go about it. Each entry in my database has a last updated timestamp. Which of these 2 will be more efficient:
Go through all entries and check the timestamp to see which entry needs to be updated.
Delete the whole entity and re-download everything again.
Sorry if this seems like an obvious question and thanks!
I'd say option 1 would be most efficient, as there is rarely a case where downloading everything (especially in a large database with large amounts of data) is more efficient than only downloading the parts that you need.
I recently did something similiar.
I solve the problem, by assigning an unique ID and a global 'updated timestamp' and thinking about 'delta' change.
I explain better, I have a global 'latest update' variable stored in user preferences, with a default value of 01/01/2010.
This is roughly my JSON service:
response: {
metadata: {latestUpdate: 2013...ecc}
entities: {....}
}
Then, this is what's going on:
pass the 'latest update' to the web service and retrieve a list of entities
update the core data store
if everything went fine with core data, the 'latestUpdate' from the service metadata became my new 'latest update variable' stored in user preferences
That's it. I am only retrieving the needed change, and of course the web service is structured to deliver a proper list. Which is: a web service backed by a database, can deal with this matter quite well, and leave the iphone to be a 'simple client' only.
But I have to say that for small amount of data, it is still quite performant (and more bug free) to download the whole list at each request.
As per our discussion in the comments above, you can model your core data object entries with version control like this
CoreDataEntityPerson:
name : String
name_version : int
image : BinaryData
image_version : int
You can now model the server xml in the following way:
<person>
<name>michael</name>
<name_version>1</name_version>
<image>string_converted_imageData</image>
<image_version>1</image_version>
</person>
Now, you can follow the following steps :
When the response arrives and you parse it, you initially create a new object from entity and fill the data directly.
Next time, when you perform an update on the server, you increase the version count of an entry by 1 and store it.
E.g. lets say the name michael is now changed to abraham, then version count of name_version on server will be 2
This updated version count will come in the response data.
Now, while storing the data in the same object, if you find the version count to be same, then the data update of that entry can be skipped, but if you find the version count to be changed, then the update of that entry needs to be done.
This way you can efficiently perform check on each entry and perform updates only on the changed entries.
Advice:
The above approach works best when you're dealing with large amount of data updation.
In case of simple text entries for an object, simple overwrite of data on all entries is efficient enough. And this also keeps the data reponse model simple.

Dart: indexed_db open version

I need to use indexedDb for local storage.
When opening an indexedDb, the version is passed, and I presume that indicates whether an upgrade is needed. Can someone please explain what happens here and in particular the significance of the version, where the version is obtained from and also what an upgrade is?
For example :
import 'dart:indexed_db' as idb;
final int _iDbVersion = 1;
void fOpenDb(String sDbName) {
var request = window.indexedDB.open(sDbName, _iDbVersion);
request.on.success.add((e) => fDbOnOpened(request.result));
request.on.error.add(fDbOnOpenError);
request.on.upgradeNeeded.add((e) => fDbOnUpgradeNeeded(request.transaction));
}
I found this interesting description which appears to me to be largely correct. :
IndexedDb:
DATABASE
For every origin you can create an infinite number of databases. The only thing you need to create a database is an unique name. A database also has a version, this version will be used to determine the structure of the database. When a database is created for the first time, the version will be an empty string. Each database can only have one version at a time, this means the database can’t exist in multiple versions at once.
VERSION
The set of object stores can be changed, but it can only change by using the Version_change transaction. This transaction will change the version of the database and change the set of object stores as you defined.

Entity Framework 4.0 with Sql Compact Edition 4.0 performance issue

I have a web application with a EF model which I originally designed using a SQL Server 2008 backend. Later on, I decided to use SQL CE for portability, so I converted the model to target Sql CE 4.0. However, I am running into serious performance issues when running this app.
For example, I have a portion in code that retrieves an entity from the database:
Trace.Write("Retrieving node from database", "Application");
var name = value.ToString();
var node = DataContext.Entities.Nodes
.SingleOrDefault(n => n.Name == name);
Trace.Write("Node retrieved from database ", "Application");
When I look at the trace information (trace.axd), those lines of code take a wooping 0.6 seconds!!
Trace Information
Category Message From First(s) From Last(s)
Application Retrieving node from database 0.00057591118424269 0.000576
Application Node retrieved from database 0.595122564460008 0.594547
And this happens everywhere in my application where I query by Name.
Any ideas? I'm guessing I have to define an index on the column, but how would I do that in the EF model?
EDIT : Gramma
EDIT 2: Spelling in Title
I have a sample with some performance advice here, using global.asax: http://erikej.blogspot.com/2011/01/entity-framework-with-sql-server.html and other performance tips here: http://blogs.msdn.com/b/wriju/archive/2011/03/15/ado-net-entity-framework-performance-tips.aspx
I've used the older versions of SQL CE in the past, and I've found that one major bottleneck is opening the connection.
You might try managing the open connection on your own, and passing it into the data context by hand (rather than allowing the context to automatically manage the connection.)

Resources