Is the Triple store commonly used for CRUD operations? [closed] - jena

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I'm learning the semantic-web technologies and the power of linked data. Use of RDF, RDFS, OWL inference could come really handy. Sparql query to read linked data from the triple store is cool and seamless. As I think more about the practical use, wondering if it's good for full blown CRUD transactional usage. While Sparql supports insert and update operations, is it practically adopted? any best practice guidance?

Virtuoso supports full ACID operations for (C)reate, (R)ead, (U)pdate, (D)elete operations.
It pulls this off by being a multi-model DBMS that fuses the relational operational features of both SQL and SPARQL.
As already indicated, you can issue a SQL query to Virtuoso that includes SPARQL (via the FROM CLAUSE). Even better, you can add "FOR UPDATE" to said SQL to trigger full ACID behavior.
Links
Virtuoso CRUD Post that includes FOR UPDATE usage

I'm not sure what you want to know exactly but I try to answer as best as I understand your question (can you maybe improve it a bit and write what your exact problem is that you want to solve?):
SPARQL 1.1 Update (formerly known as SPARUL or SPARQL Update in SPARQL 1.0) allows creating, reading updating and deleting resources.
In contrast to the relational database world, where databases commonly have read and write access, but are only accessible to a select few using some method of authentication (data silos), it is very common in the Semantic Web world to publish data over public SPARQL endpoints. Contrary to some other forms of data sharing like Wikipedia, those are provided only with read access in all cases that I know of.
However it is absolutely still a common use case to allow SPARQL 1.1 Update queries over a protected connection separate from the public SPARQL endpoint interface.
For example, one could have a CRUD application, like OntoWiki, which is installed on the same server as a Virtuoso SPARQL endpoint, and which connects to the endpoint using ISQL on the network, as Virtuoso ISQL supports SPARQL queries, including updates, using the SPARQL keyword in the first line of your ISQL query.
If you only rarely want to perform some specific SPARQL 1.1 update queries and you don't need a separate CRUD editor for that, in the case of Virtuoso SPARQL you can also run those queries in the conductor web interface in the SQL tab.
However most SPARQL endpoints (often excepting Virtuoso, which may or not behave as described, depending on various settings and the specific methods and patterns of interaction) do not preserve data integrity beyond the triple level, because as far as they are concerned they only store graphs, which are sets of triples. Integrity conditions described on a higher level (for example, using OWL, RDFS, or SHACL) are not checked and thus not preserved by such a SPARQL endpoint. This includes:
domain and range restrictions (every Mother must be Human and Female)
cardinality (every child must have exactly one Father and one Mother)
non-binary relationships such as OWL axioms that are expressed using multiple helper triples connected to a single relationship resource.
For some use cases it may make sense to use a traditional relational database with a CRUD interface for specific user input and later transform it, e.g. using R2RML to RDF. Virtuoso may serve both of these functions, among others, due to its hybrid nature.

Related

F# On Server and Database [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm starting a new webapi Project
For domain implementation I m going to use F#.
To Store Data I m going to use Database, Which I haven't decided.
ORM
if it will be C# I would be using Entity Framework with no question in my mind.
But using F# + Entity Framework I have following questions
1- Bcs of my domain is in F# all my Entities and Value Objects will be in F#. Does EF core will work with it ? (eg Discriminated Union)
2- if not what are different Options I have ?
3- What Options do u use?
feel free to share sample projects
I don't want to use type providers bcs I don't want to manage DB Manually (not at start).
It is possible to use EF with F#, though there's no support for DUs (what does a DU even look like in Sql Server/PostGres?) I do it in a pet project, but I have a ton of mappers that convert between the immutable records of my domain model and the mutable changed tracked entities. I don't necessarily believe this is the best approach. It may be possible to map a table to an F# record, but I'm not sure if there are significant advantages to that due to the loss of mutability, since EF uses change tracking to create updates. If you're doing a bunch of "get only" calls, dapper is significantly simpler and easier to use.
If you're trying to avoid directly managing a database, e.g. creating/editing tables/columns/PKs/FKs, there are a bunch of tools that make doing that easy, like Sql Server Management Studio or Navicat. I've rarely had to drop into SQL to tweak the schema. Combined with mssql-scripter or pgdump, and versioning your DB is simple. I'm assuming that you're using MSSQL or PostGres, of course.

Does GraphQL negate the need for Graph Databases

Most of the reasons for using a graph database seem to be that relational databases are slow when making graph like queries.
However, if I am using GraphQL with a data loader, all my queries are flattened and combined using the data loader, so you end up making simpler SELECT * FROM X type queries instead of doing any heavy joins. I might even be using a No-SQL database which is usually pretty fast at these kinds of flat queries.
If this is the case, is there a use case for Graph databases anymore when combined with GraphQL? Neo4j seems to be promoting GraphQL. I'd like to understand the advantages if any.
GraphQL doesn't negate the need for graph databases at all, the connection is very powerful and makes GraphQL more performant and powerful.
You mentioned:
However, if I am using GraphQL with a data loader, all my queries are flattened and combined using the data loader, so you end up making simpler SELECT * FROM X type queries instead of doing any heavy joins.
This is a curious point, because if you do a lot of SELECT * FROM X and the data is connected by a graph loader, you're still doing the joins, you're just doing them in software outside of the database, at another layer, by another means. If even that software layer isn't joining anything, then what you gain by not doing joins in the database you're losing by executing many queries against the database, plus the overhead of the additional layer. Look into the performance profile of sequencing a series of those individual "easy selects". By not doing those joins, you may have lost 30 years value of computer science research...rather than letting the RDMBS optimize the query execution path, the software layer above it is forcing a particular path by choosing which selects to execute in which order, at which time.
It stands to reason that if you don't have to go through any layer of formalism transformation (relational -> graph) you're going to be in a better position. Because that formalism translation is a cost you must pay every time, every query, no exceptions. This is sort of equivalent to the obvious observation that XML databases are going to be better at executing XPath expressions than relational databases that have some XPath abstraction on top. The computer science of this is straightforward; purpose-built data structures for the task typically outperform generic data structures adapted to a new task.
I recommend Jim Webber's article on the motivations for a native graph database if you want to go deeper on why the storage format and query processing approach matters.
What if it's not a native graph database? If you have a graph abstraction on top of an RDBMS, and then you use GraphQL to do graph queries against that, then you've shifted where and how the graph traversal happens, but you still can't get around the fact that the underlying data structure (tables) isn't optimized for that, and you're incurring extra overhead in translation.
So for all of these reasons, a native graph database + GraphQL is going to be the most performant option, and as a result I'd conclude that GraphQL doesn't make graph databases unnecessary, it's the opposite, it shows where they shine.
They're like chocolate and peanut butter. Both great, but really fantastic together. :)
Yes GraphQL allows you to make some kind of graph queries, you can start from one entity, and then explore its neighborhood, and so on.
But, if you need performances in graph queries, you need to have a native graph database.
With GraphQL you give a lot of power to the end-user. He can make a deep GraphQL query.
If you have an SQL database, you will have two choices:
to compute a big SQL query with a lot of joins (really bad idea)
make a lot of SQL queries to retrieve the neighborhood of the neighborhood, ...
If you have a native graph database, it will be just one query with good performance! It's a graph traversal, and native graph database are made for this.
Moreover, if you use GraphQL, you consider your data model as a graph. So to store it as graph seems obvious and gives you less headache :)
I recommend you to read this post: The Motivation for Native Graph Databases
Answer for Graph Loader
With Graph loader you will do a lot of small queries (it's the second choice on my above answer) but wait no, ... there is a cache record.
Graph loaders just do batch and cache.
For comparaison:
you need to add another library and implement the logic (more code)
you need to manage the cache. There is a lot of documentation about this topic. (more memory and complexity)
due to SELECT * in loaders, you will always get more data than needed Example: I only want the id and name of a user not his email, birthday, ... (less performant)
...
The answer from FrobberOfBits is very good. There are many reasons to add (or avoid) using GraphQL, whether or not a graph database is involved. I wanted to add a small consideration against putting GraphQL in front of a graph. Of course, this is just one of what ought to be many other considerations involved with making a decision.
If the starting point is a relational database, then GraphQL (in front of that datbase) can provide a lot of flexibility to the caller – great for apps, clients, etc. to interact with data. But in order to do that, GraphQL needs to be aligned closely with the database behind it, and specifically the database schema. The database schema is sort of "projected out" to apps, clients, etc. in GraphQL.
However, if the starting point is a native graph database (Neo4j, etc.) there's a world of schema flexibility available to you because it's a graph. No more database migrations, schema updates, etc. If you have new things to model in the data, just go ahead and do it. This is a really, really powerful aspect of graphs. If you were to put GraphQL in front of a graph database, you also introduce the schema concept – GraphQL needs to be shown what is / isn't allowed in the data. While your graph database would allow you to continue evolving your data model as product needs change and evolve, your GraphQL interactions would need to be updated along the way to "know" about what new things are possible. So there's a cost of less flexibility, and something else to maintain over time.
It might be great to use a graph + GraphQL, or it might be great to just use a graph by itself. Of course, like all things, this is a question of trade-offs.

What is the benefit of storing data in databases like SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is very elementary question but why does a framework like Rails use ActiveRecord to run SQL commands to get data from a DB? I heard that you can cached data on the Rails server itself, so why not just store all data on the the server instead of the DB? Is it because space on the server is a lot more expensive/valuable than on the DB? If so, why is that? Also can the reason be that you want a ORM in the DB and that just takes too much code to set up on the Rails server? Sorry if this question sounds dumb but I don't know where else I can go for an answer.
What if some other program/person wants to access this data and for some reason cannot use your rails application? What if in future you decide to discontinue using rails and decide to go with some other technology for front end but want to keep the data? In these cases having a separate database helps. Also could you run complex join queries on cached data on Rail Server?
databases hold a substantial amount of advantages against other types of databases. Some of them are listed below:
Data integrity is maximised and data redundancy is minimised, as
the single storing place of all the data also implies that a given
set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data
reliability.
Generally bigger data security, as the single data storage location
implies only a one possible place from which the database can be
attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to
often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a
single database design.
Generally easier data portability and database administration. More
cost effective than other types of database systems as labour, power
supply and maintenance costs are all minimised.
Data kept in the same location is easier to be changed, re-organised,
mirrored, or analysed.
All the information can be accessed at the same time from the same
location.
Updates to any given set of data are immediately received by every
end-user.

How to store RDF graphs within a data storage?

I want to write a web app with rails what uses RDF to represent linked data. But I really don't know what might be the best approach to store RDF graphs within a database for persistent storage. Also I want to use something like paper_trail to provide versioning database objects.
I read about RDF.rb and activeRDF. But RDF.rb does not include a layer to store data in a database. What about activeRDF?
I'm new to RDF. What is the best approach to handle large RDF graphs with rails?
Edit:
I found 4Store and AllegroGraph what fits for Ruby on Rails. I read that 4Store is entirely for free and AllegroGraph is limited to 50 million triples in the free version. What are the advantages of each of them?
Thanks.
Your database survey is quite incomplete. There is also BigData, OWLIM, Stardog, Virtuoso, Sesame, Mulgara, and TDB or SDB which are provided by Jena.
To clarify, Fuseki is just a server component for a backend that supports the Jena API to provide support for the SPARQL protocol. Generally, since you're using Ruby, this is how you will interact with a database -- via HTTP using SPARQL protocol. Probably every single database supports the SPARQL HTTP protocol for querying, and many will support something in the ballpark of either SPARQL update protocol, the graph store protocol, or a similar custom HTTP protocol for handling updates.
So if you're set on using Rails, then your best bet is to pick a database, work out a simple wrapper for the HTTP protocol, perhaps forking support in an existing Ruby library if it exists, and building your application based on that support.
Versioning is something that's not readily supported in a lot of systems. I think there is still a lot of thought going into how to do it properly in an RDF database. So likely, if you want versioning in your application, you're going to have to do something custom.

To go API or not [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
My company has this Huge Database that gets fed with (many) events from multiple sources, for monitoring and reporting purposes. So far, every new dashboard or graphic from the data is a new Rails app with extra tables in the Huge Database and full access to the database contents.
Lately, there has been an idea floating around of having external (as in, not our company but sister companies) clients to our data, and it has been decided we should expose a read-only RESTful API to consult our data.
My point is - should we use an API for our own projects too? Is it overkill to access a RESTful API, even for "local" projects, instead of direct access to the database? I think it would pay off in terms of unifying our team's access to the data - but is it worth the extra round-trip? And can a RESTful API keep up with the demands of running 20 or so queries per second and exposing the results via JSON?
Thanks for any input!
I think there's a lot to be said for consistency. If you're providing an API for your clients, it seems to me that by using the same API you'll understand it better wrt. supporting it for your clients, you'll be testing it regularly (beyond your regression tests), and you're sending a message that it's good enough for you to use, so it should be fine for your clients.
By hiding everything behind the API, you're at liberty to change the database representations and not have to change both API interface code (to present the data via the API) and the database access code in your in-house applications. You'd only change the former.
Finally, such performance questions can really only be addressed by trying it and measuring. Perhaps it's worth knocking together a prototype API system and studying it under load ?
I would definitely go down the API route. This presents an easy to maintain interface to ALL the applications that will talk to your application, including validation etc. Sure you can ensure database integrity with column restrictions and stored procedures, but why maintain that as well?
Don't forget - you can also cache the API calls in the file system, memory, or using memcached (or any other service). Where datasets have not changed (check with updated_at or etags) you can simply return cached versions for tremendous speed improvements. The addition of etags in a recent application I developed saw HTML load time go from 1.6 seconds to 60 ms.
Off topic: An idea I have been toying with is dynamically loading API versions depending on the request. Something like this would give you the ability to dramatically alter the API while maintaining backwards compatibility. Since the different versions are in separate files it would be simple to maintain them separately.
Also if you use the Api internally then you should be able to reduce the amount of code you are having to maintain as you will just be maintaining the API and not the API and your own internal methods for accessing the data.
I've been thinking about the same thing for a project I'm about to start, whether I should build my Rails app from the ground up as a client of the API or not. I agree with the advantages already mentioned here, which I'll recap and add to:
Better API design: You become a user of your API, so it will be a lot more polished when you decided to open it;
Database independence: with reduced coupling, you could later switch from an RDBMS to a Document Store without changing as much;
Comparable performance: Performance can be addressed with HTTP caching (although I'd like to see some numbers comparing both).
On top of that, you also get:
Better testability: your whole business logic is black-box testable with basic HTTP resquest/response. Headless browsers / Selenium become responsible only for application-specific behavior;
Front-end independence: you not only become free to change database representation, you become free to change your whole front-end, from vanilla Rails-with-HTML-and-page-reloads, to sprinkled-Ajax, to full-blown pure javascript (e.g. with GWT), all sharing the same back-end.
Criticism
One problem I originally saw with this approach was that it would make me lose all the amenities and flexibilities that ActiveRecord provides, with associations, named_scopes and all. But using the API through ActveResource brings a lot of the good stuff back, and it seems like you can also have named_scopes. Not sure about associations.
More Criticism, please
We've been all singing the glories of this approach but, even though an answer has already been picked, I'd like to hear from other people what possible problems this approach might bring, and why we shouldn't use it.

Resources