Jena reasoner delete statements from infModel - jena

I have an ontology which I run the jena reasoner with custom rules on. Now, I want to add new data (experimental results) to the model und delete some old data, if a the model gets to big (due to memory issues) with getting the infModel updated.
Adding data isn't much of a problem, since I can simply add it to my OntModel and the reasoning unit adds the conclusions to the infModel.
Removing a "result" is more of a problem for me. I have to delete them from the infModel. At the moment I simply remove all the statements which the "result" is part of. This approach is very slow. It seems that every removed statement triggers a reasoning for the possible changes in the InfModel. In my example removing a "result" which is part of a lot of statements can take up to 12 times the time of the initial reasoning.
If found a possible solution here:
Toggle Jena Reasoner
My question is: is there a solution without creating a second model without a reasoner and rebind the changes to the infModel?
Or is there another way to remove data from the infModel which only triggers the reasoning once?

Related

CoreData Selecting All Rows in Association One-by-one When Persisting to Disk

I'm using a parent-child MOC architecture as described by Marcus Zarra in his blog post and talk.
It's generally working great, but I have an ordered one-to-many relationship where the "many" accumulates a lot of records over time. The issue is, in the process of saving the private context to disk, CoreData runs a select query for what appears to be every single object in the association, one at a time, even if it hasn't been touched. As you can imagine, this is incredibly slow.
Any ideas on how to eliminate this or at least make it batch it into one query?
Ordered relationships are problematic for a number of reasons, but this is out of scope for this question.
One obvious solution attempt is to replicate the ordered property yourself by introducing your own attribute to keep track of the order. This is the path I have taken in all my projects where I had this use case. Your own ordering logic gives you much more granular control over the expensive processes, such as inserting an element into the series (rather than just appending it at the end).
Please note that in many applications the ordered property can be modeled differently, e.g. with a time stamp, which in some cases can spare you the necessity of modifying the whole chain.
As for the problem of using "one query": You could fetch all objects needing to be reordered, modifying their order (e.g. by adding them one by one to the parent object), and save.

How to filter down a large Jena Model in TDB

I have a large RDF model that doesn't fit in memory. I am currently loading the entire thing into TDB, but I would like to instead filter it down by focusing on only a subgraph (all properties about all resources which are subclassof or type of some "root" concept).
What I have tried is to execute a DESCRIBE statement against the full TDB model which describes the subset of the graph I am interested in ({ ?x rdf:type/rdfs:subClassOf* ?type }). The problem I have is twofold:
On a smaller [sample] dataset, the DESCRIBE statement completes, but I can't figure out how to write the resulting Model back into the TDB (I want to throw away all the other data). I tried to call tdbModel.setDefaultModel() but it throws exception. So, what I am doing now is to create a second TDB location, get the default model, and then add the result of the DESCRIBE statement into this other model. Is there a better way?
On the full dataset, I think the DESCRIBE statement would result in over 500k triples and its been running for a couple hours without completion. Is there a more efficient way to do this?

entity framework ref, deref, createref [duplicate]

This question is about why I would use the above keywords. I've found plenty of MSDN pages that explain how. I'm looking for the why.
What query would I be trying to write that means I need them? I ask because the examples I have found appear to be achievable in other ways...
To try and figure it out myself, I created a very simple entity model using the Employee and EmployeePayHistory tables from the AdventureWorks database.
One example I saw online demonstrated something similar to the following Entity SQL:
SELECT VALUE
DEREF(CREATEREF(AdventureWorksEntities3.Employee, row(h.EmployeeID))).HireDate
FROM
AdventureWorksEntities3.EmployeePayHistory as h
This seems to pull back the HireDate without having to specify a join?
Why is this better than the SQL below (that appears to do exactly the same thing)?
SELECT VALUE
h.Employee.HireDate
FROM
AdventureWorksEntities3.EmployeePayHistory as h
Looking at the above two statements, I can't work out what extra the CREATEREF, DEREF bit is adding since I appear to be able to get at what I want without them.
I'm assuming I have just not found the scenarios that demostrate the purpose. I'm assuming there are scenarios where using these keywords is either simpler or is the only way to accomplish the required result.
What I can't find is the scenarios....
Can anyone fill in the gap? I don't need entire sets of SQL. I just need a starting point to play with i.e. a brief description of a scenario or two... I can expand on that myself.
Look at this post
One of the benefits of references is that it can be thought as a ‘lightweight’ entity in which we don’t need to spend resources in creating and maintaining the full entity state/values until it is really necessary. Once you have a ref to an entity, you can dereference it by using DEREF expression or by just invoking a property of the entity
TL;DR - REF/DEREF are similar to C++ pointers. It they are references to persisted entities (not entities which have not be saved to a data source).
Why would you use such a thing?: A reference to an entity uses less memory than having the DEFEF'ed (or expanded; or filled; or instantiated) entity. This may come in handy if you have a bunch of records that have image information and image data (4GB Files stored in the database). If you didn't use a REF, and you pulled back 10 of these entities just to get the image meta-data, then you'd quickly fill up your memory.
I know, I know. It'd be easier just to pull back the metadata in your query, but then you lose the point of what REF is good for :-D

EF4 QueryView or DefiningQuery?

I am in the middle of trying to complete a design for a project and have basically come to a fork in the road. I have made up my mind that I want to use EF4 as my data persistence layer, but my existing database is causing me some pains. Changing or augmenting the database is not an option. I have a single table that really serves multiple purposes and contains 120 columns (I didn't design this table!!! - it is a DB2 carryover after a SQL Server conversion long ago). I have designed a class diagram that creates five entities from this table, at varying levels of aggregation. In my research of what to do in these situations, I have narrowed it down to either using a “QueryView” in my MSL layer or a “DefiningQuery” in my SSDL layer to create the entities I need from this monolith table. The resultant data will only need to be read-only. I’d prefer getting back a proper entity, but anonymous types or dbdatarecord would be okay.
I have attempted to use a QueryView in MSL with my entity defined in my CSDL but the MSL keeps getting regenerated and my changes lost when I compile. Why?
Can anyone provide input as to what I should do here? Is using a DefiningQuery or QueryView preferable in this situation? Any input as to keeping these changes after updating my model from the database or compiling would be also very appreciated.
QueryView should not be regenerated. I'm not sure how QueryView behaves when you do update from database. I'm sure that DefiningQuery will be deleted when doing Update from database because DefiningQuery is defined in SSDL which is completely deleted during Update from database. I have some workaround for custom DefiningQueries by using two different EDMXs - one just for queries and second for entities updated from database. General concept is described here.
Difference between QueryView and DefiningQuery is the level where these constructs are included. QueryView is MSL element built as custom ESQL query on top of existing entity so your 120 columns entity must exists in EDMX. From unknown reason QueryView has no support for aggregations. DefiningQuery is SSDL element build as custom SQL query. It is by default used for database views (btw. probably best choice for you).

SPROC to update record: how to handle unchanged values

I'm calling a update SPROC from my DAL, passing in all(!) fields of the table as parameters. For the biggest table this is a total of 78.
I pass all these parameters, even if maybe just one value changed.
This seems rather inefficent to me and I wondered, how to do it better.
I could define all parameters as optional, and only pass the ones changed, but my DAL does not know which values changed, cause I'm just passing it the model - object.
I could make a select on the table before updateing and compare the values to find out which ones changed but this is probably way to much overhead, also(?)
I'm kinda stuck here ... I'm very interested what you think of this.
edit: forgot to mention: I'm using C# (Express Edition) with SQL 2008 (also Express). The DAL I wrote "myself" (using this article).
Its maybe not the latest state of the art way (since its from 2006, "pre-Linq" so to say but Linq works only for local SQL instances in Express anyways) of doing it, but my main goal was learning C#, so I guess this isn't too bad.
If you can change the DAL (without changes being discarded once the layer is "regenerated" from the new schema when changes are made), i would recomend passing a structure containing the column being changed with values, and a structure kontaing key columns and values for the update.
This can be done using hashtables, and if the schema is known, should be fairly easy to manipulate this in the "new" update function.
If this is an automated DAL, these are some of the drawbacks using DALs
You could implement journalized change tracking in your model objects. This way you could keep track of any changes in your objects by saving the previous value of a property every time a new value is set.This information could be stored in one of two ways:
As part of each object's own private state
Centrally in a "manager" class.
In the first solution, you could easily implement this functionality in a base class and have it run in all model objects through inheritance.
In the second solution, you need to create some kind of container class that will keep a reference and a unique identifier to any model object that is created and record all changes in its state in a central store.This is similar to the way many ORM (Object-Relational Mapping) frameworks achieve this kind of functionality.
There are off the shelf ORMs that support these kinds of scenarios relatively well. Writing your own ORM will leave you without many features like this.
I find the "object.Save()" pattern leads to this kind of behavior, but there is no reason you need to follow that pattern (while I'm not personally a fan of object.Save(), I feel like I'm in the minority).
There are multiple ways your data layer can know what changed and most of them are supported by off the shelf ORMs. You could also potentially make the UI and/or business layer's smart enough to pass that knowledge into the data layer.
Two options that I prefer:
Generating/hand coding update
methods that only take the set of
parameters that tend to change.
Generating the update statements
completely on the fly.

Resources