How to update parsed data in rdflib ConjunctiveGraphs? - rdflib

I am merging RDF data from various remote sources using ConsecutiveGraph.parse().
Now I would like to have a way to update the data of individual sources, without affecting the other ones and the relations between them.
The triples from the various sources might overlap, so it has to be ensured that only the triples coming from a specific source get deleted before the update.

Each graph in a ConjunctiveGraph has its own ID - whether you explicitly set it or not. Just update the particular graph you want and export them individually.
If you want to do something more complex than this, such as keeping track of where new data you’ve created perhaps in the default graph (the unnamed graph you get automatically), you’re going to need to use some other method of tracking triples. Look up “reification” for how to annotate triples with more information.

Related

What is the difference between GetItems and GetExtendedItems in TFS SDK

I'm at my first tries with the TFS SDK (Microsoft.TeamFoundation.VersionControl.Client) and when came time to retrieve objects, I got confused on why and when I should use VersionControlServer.GetItems vs VersionControlServer.GetExtendedItems. What are the differences? Performance? Features?
Thank you! :)
Yes, you have a tradeoff between performance and features. You can imagine that GetItems is a simple query, whereas GetExtendedItems is a join on another table (or tables), and less efficient.
An Item, for example, contains information about an item at a particular version. An ExtendedItem adds in information about your version of that file as it exists in the workspace that you've specified in the query. If you have done a Get on that file then fields will be populated with the version that exists on your local disk and any pending changes that you've made on it.
ExtendedItems largely exist for the Source Control Explorer view; it can display information about both the items on the server and their status in your local repository in a single query. This reduces the number of round-trips that view makes, but the ExtendedItems query is more expensive than a query for simple Items.
If GetItems will give you the data that you need, you should prefer that. If not, use GetExtendedItems.

database solution for multiple isolated graphs

I have an interesting problem that I don't know how to solve.
I have collected a large dataset of 80 million graphs (they are CFG as in Control Flow Graph produced by programs I have analysed from Github) which I need to be able to search efficiently.
I looked into existing solutions like Neo4j but they are all designed to store a global single graph.
In my case this is the opposite all graphs are independent -like rows in a table - but I need to search through all of them efficiently.
For example I want to find all CFGs that has a particular IF condition or a WHILE loop with a particular condition.
What's the best database for this use case?
I don't think that there's a reason not to simply store all those graphs in a single graph, whether it's Neo4j or a different graph database. It's not a problem to have many disparate graphs in a single graph where the disparate graphs are disconnected from one another.
As for searching them efficiently, you would either (1) identify properties in your CFGs that you want to search on and convert them to some indexed value of the graph or (2) introduce some graph structure (additional vertices/edges) between the CFGs that will allow you to do the searches you want via graph traversal.
Depending on what you need to search on approach 1 may not be flexible enough for you especially, if what you intend to search on is not completely known at the time of loading the data. Also, it is important to note that with approach 2 you do not really lose the fact that you have 80 million distinct graphs just because you provided some connection between them. Those physical connections don't change that basic logical fact. You just need to consider those additional connections when you write traversals that you expect to occur only within a single CFG.
I'm not sure what Neo4j supports in this area, but with Apache TinkerPop (an open source graph processing framework that lets you write vendor agnostic code over different graph databases, including Neo4j), you might consider doing some form of graph partitioning to help with approach 2. Or you might subgraph() the larger graph to only contain the CFG and then operate with that purely in memory when querying. Both of these approaches will help you to blind your query to just the individual CFG you want to traverse.
Ultimately, however, I see this issue as a modelling problem. You will just need to make some choices on how to best establish the schema for your use case and virtually any graph database should be able to support that.

How to use Neo4J for temporary graph calculations?

I'm completely new to Neo4J and I am struggling with a design/architecture question.
Setup
I have a given Graph with different nodes. That could be the a company graph with customer, products, projects, sales and so on (like in the movie example https://neo4j.com/developer/get-started/). This graph can change from time to time.
In my use case I would like to take this graph, adapt it and test some scenarios. E. g. I would add a new product, define a new sales person with responsibilities or increase the price of a product. To the extended graph I will "ask questions" or in other words, I would use graph algorithms to extract information. The changes I made, shouldn't affect the original graph.
Requirements
I do not wanna write my changes to the original graph, because every time the original graph should be the base for the analysis. Also for the reason that changing and analysing the graph can happen concurrently from different users.
I still wanna use the power of Cypher to make the analysis, so having the graph only in the memory wouldn't do it.
Problem
On the one hand I do not wanna change the original graph, on the other I wanna add and change information temporarily for a specific user. Using a relational DB I would just point with an ID to the "static" part of the data or I would do the calculation in Code instead of SQL.
Questions
Any best practices for that?
Can I use Cypher directly in code (none-persistent, directly on the data in the memoty)?
Should I make a copy of the Graph, whenever I use it (not really,
right?)?
Is there a concept to link user specific data to a static graph?
I am happy about all ideas, concepts and tricks! It's more about graph databases in general....Neo4J was so far my first choice.
Cheers
Chris
What about using feature flags in your graph by using different relationship types ?
For example, let's say you have a User that likes 10 movies in your original graph.
(user)-[:LIKES]->(movies)
Then for your experiments, you can have
(user)-[:LIKES_EXPERIMENT]->(othermovies)
This offers you the possibility to traverse the graph in the original way without loosing performance by just enforcing the relationship types. On the other hand it also offers you the possibility to use only the experiments or combining original data with experiments by specifying both relationship types in your traversals.
The same goes for properties, you could prefix properties with experiment_ for eg. And finally you could also play with different labels. There are tons of possibilities before having to use different graph data stores.
Another possibility is to use some kind of versioning like described here :
http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/
But without the time factor.
There is also a nice plugin for it https://github.com/h-omer/neo4j-versioner-core
My suggestion is:
Copy the data folder of the original database to a new location: sudo cp /path/to/original/data/folder ~/neo4j
Run a Docker container mapping the copy of data folder as the container data folder.
Something like this:
docker run \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
neo4j
You can specify another ports if :7474 and :7686 are being used.
Work over this copy.
You can transform these instructions in a .sh file to automate the process.

Load Entire Neo4j Database into Linkurious's SigmaJS

How can I load an entire Neo4j database into Linkurious's SigmaJS Graph API? On that page, I don't see any methods that describe how to import a database in its entirety -- only how to build out a graph manually by adding nodes and edges. I suspect that the read() function almost does what I want (reading in an object), but it is unclear in what format I must supply this object in.
It would be great to be able to simply pass in the graph.db folder within my Neo4j folder.
I think you've got the idea of the library correct. It's a general purpose library to display graph visualizations and not specific to any graph database. I also suspect that it's not going to effectively hold your entire database (it depends on the size). The idea of it is to load in the required subset of the data and make it easy to display and work with that data.
The linksurious team could correct me if I'm wrong here, though ;)

Core Data many-to-many relationship & data integrity

I'm working with Core Data and a many-to-many relationship: a building can have multiple departments, and a department can be in multiple buildings. Having worked with databases before, I was unsure of how to implement this in Core Data, but I found this in the Core Data Programming Guide:
If you have a background in database management and this causes you
concern, don't worry: if you use a SQLite store, Core Data
automatically creates the intermediate join table for you.
However, there's not really any data integrity. I've tried inserting a few building objects, which for now only have one attribute (number), and every time I set the department object (relationship) it relates to. This results in the database containing multiple building objects with the same building number, all relating to a different department object. Ideally, there would be one object per building number, with in it all the different departments that are located in it.
So, my question is: can Core Data maintain data integrity somehow, or should I check to see if a building object with that number already exists before inserting it? It looks like I'll have to manually check it, but it would be cool if Core Data could do this for me.
What melsam wrote is right. In addition to his answer I suggest you to use inverse relationships. About inverse, Apple says:
You should typically model relationships in both directions, and
specify the inverse relationships appropriately. Core Data uses this
information to ensure the consistency of the object graph if a change
is made (see “Manipulating Relationships and Object Graph Integrity”).
For a discussion of some of the reasons why you might want to not
model a relationship in both directions, and some of the problems that
might arise if you don’t, see “Unidirectional Relationships.”
A key point to understand is that when you work with Core Data, you work with objects. So, integrity criteria are resolved when you save the context or you explicity says to context to process also process pending changes (see processPendingChanges method).
About your question, I guess you have to create a fetch request and retrieve the object(s) you are looking for (e.g. you could provide to each object a specific id and set up a predicate with the id you want).
If the fetch request retrieve some objects, then you can update them. If not, create a new object with insertNewObjectForEntityForName:inManagedObjectContext:. Finally save the context.
I suggest you to read about Efficiently Importing Data.
Hope it helps.
Core Data maintains data integrity for you. I can assure you (from lots of experience with Core Data) that you do not have to manually check integrity. Doublecheck how your relationships and delete rules are set up in Xcode's Core Data Model Editor. I can't tell exactly what may be wrong with the details you've provided, but you'll find it if you poke around.

Resources