Slow Berlin sparql benchmark queries in Neo4j

Slow Berlin sparql benchmark queries in Neo4j - neo4j

I am trying Berlin benchmark SPARQL queries in neo4j. I have created Neo4j graph from triples using http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html
To summarize data loading, My graph has a following structure,
Subject => Node
Predicate => Relationship
Object => Node
If predicate is date, string, integer (primitive) then a property is created instead of relationship and stored in Node.
Now, I am trying following queries which are really slow in Noe4j,
Query 4: Feature with the highest ratio between price with that feature and price without that feature.
corresponding SPARQL query for this,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio)
{
{ Select (count(?price) As ?countTotal) (sum(xsd:float(str(?price))) As ?sumTotal)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> .
?offer bsbm:product ?product ;
bsbm:price ?price .
}
}
{ Select ?feature (count(?price2) As ?countF) (sum(xsd:float(str(?price2))) As ?sumF)
{
?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> ;
bsbm:productFeature ?feature .
?offer2 bsbm:product ?product2 ;
bsbm:price ?price2 .
}
Group By ?feature
}
}
Order By desc(?priceRatio) ?feature
Limit 100
Cypher query I created for this,
MATCH p1 = (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH p2 = (offer2:Offer)-[r2:`product`]->products2:ProductType294)-[:`productFeature`]->features
return (sum( DISTINCT offer2.price) * ( count( DISTINCT offer1.price) - count( DISTINCT offer2.price)) /(count(DISTINCT offer2.price)*(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price)))) AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui
This query is really slow, Please let me know whether I am formulating the query in wrong way.
Another query is Query 5: Show the most popular products of a specific product type for each country - by review count ,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?country ?product ?nrOfReviews ?avgPrice
{
{ Select ?country (max(?nrOfReviews) As ?maxReviews)
{
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product ;
rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
}
Group By ?country
}
{ Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?offer bsbm:product ?product .
?offer bsbm:price ?price .
}
Group By ?product
}
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product
Cypher query I created for this is following,
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with count(reviews) AS reviewcount,products2.__URI__ AS pruis, countries.__URI__ AS cntrs
MATCH (products1:ProductType403)<-[:`product`]-(offer:Offer)
with AVG(offer.price) AS avgPrice, MAX(reviewcount) AS maxrevs, cntrs
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with avgPrice, maxrevs,countries, count(reviews) AS rvs, countries.__URI__ AS curis, products2.__URI__ AS puris
where maxrevs=rvs
RETURN curis,puris,rvs,avgPrice
Even this query is really slow. Am I formulating queries in correct way?
I had 10M triples (berlin benchmark dataset)
Every type predicate was converted into label.
(For Query 4) what I'm trying to get is Feature with the highest ratio between price with
that feature and price without that feature. Is this a right way to
formulate query?
(For Query 4) I get correct results for this query.
If I don't compute the sum and count then query gets executed real fast.
Thanks in advance :) SPARQL queries and information can be found at : http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html#queries

These look like global graph queries to me?
What is the size of your dataset?
You create a cartesian product between the two paths?
Shouldn't those two paths somehow connected ?
Shouldn't there be a property type on a ProductType label? (:ProductType {type:"294"})
And if there was you'd have an index on :ProductType(type) and probably :Order(orderNo)
I don't really understand the calculation?
delta of count distinct prices times sum of distinct prices of offer 2
by
count of distinct prices of offer 2, times the delta of the sum of the two order prices?
MATCH (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH (offer2:Offer)-[r2:`product`]->(products2:ProductType294)-[:`productFeature`]->features
RETURN (sum( DISTINCT offer2.price) *
( count( DISTINCT offer1.price) - count( DISTINCT offer2.price))
/ (count(DISTINCT offer2.price)*
(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price))))
AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui

Related

SPARQL get all the data before it reaches timeout

I am trying to get all the city names of all countries in the world using this below query. Whenever I execute this below query it returns this message "Query timeout limit reached".
Is there any other way to get all the data before it reaches timeout limit?
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?country

I am not at all sure why, but, this query works for me:
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryLabel
LIMIT 100000
The two differences from your original query are:
Ordering by countryLabel is, I'm guessing, what you actually wanted instead of ordering by country. In my experience ordering by label is sometimes faster too.
I set a limit number. The query appears to return results of the same length as it would without a limit, since the limit is higher than the proper number of results.

I've posted this answer on the Open Data site, based on my comment, but removing ORDER BY made the query go through.

Here is a query that works using our recently released Wikidata SPARQL Query Service endpoint.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country ;
rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en")
?country rdfs:label ?countryLabel .
FILTER (lang(?countryLabel) = "en")
}
ORDER BY ?country
Live Query Results Page.

Here is a query that works.
SELECT DISTINCT ?cityID ?cityIDLabel ?countryID ?countryIDLabel WHERE
{
{
SELECT * WHERE
{
?cityID wdt:P31 ?cityInstance.
VALUES (?cityInstance) {
(wd:Q515)
(wd:Q5119)
}
OPTIONAL {
?cityID wdt:P17 ?countryID.
?countryID p:P31/ps:P31 wd:Q6256.
}
FILTER NOT EXISTS {
?cityID wdt:P17 "".
?countryID wdt:P30 "".
}
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryIDLabel

Get count of child objects in grails Criteria query

Lets Say a domain class A has many Class B objects. I need to do a criteria query which returns
A.id
A.name
B.count(no of B elements associated with A)
B.last Updated(date of most recent update of B elements associated with A considering i have last_updated date for all B elements)
Also the query should be flexible enough to add conditions/restrictions to both A and B domain objects.
Currently I have gotten as far as this:
A.createCriteria().list {
createAlias('b','b')
projections{
property('id')
property('gender')
property('dateOfBirth')
count('b.id')
property('publicId')
}
}
But the problem is that it only returns one object and the count of child objects is for all the elements of B instead of just those associated with A

Recently I was in a similar scenario I needed a query in which one of your rows will store the count of many in a one-to-many relationship
But unlike your scenario I used native sql queries to resolve the query.
The solution was to use derived tables (I do not know how to implement them using criteria query).
In case you find it useful I share a code with the implementation taken from a grails service:
List<Map> resumeInMonth(final String monthName) {
final session = sessionFactory.currentSession
final String query = """
SELECT
t.id AS id,
e.full_name AS fullName,
t.subject AS issue,
CASE t.status
WHEN 'open' THEN 'open'
WHEN 'pending' THEN 'In progress'
WHEN 'closed' THEN 'closed'
END AS status,
CASE t.scheduled
WHEN TRUE THEN 'scheduled'
WHEN FALSE THEN 'non-scheduled'
END AS scheduled,
ifnull(d.name, '') AS device,
DATE(t.date_created) AS dateCreated,
DATE(t.last_updated) AS lastUpdated,
IFNULL(total_tasks, 0) AS tasks
FROM
tickets t
INNER JOIN
employees e ON t.employee_id = e.id
LEFT JOIN
devices d ON d.id = t.device_id
LEFT JOIN
(SELECT
ticket_id, COUNT(1) AS total_tasks
FROM
tasks
GROUP BY ticket_id) ta ON t.id = ta.ticket_id
WHERE
MONTHNAME(t.date_created) = :monthName
ORDER BY dateCreated DESC"""
final sqlQuery = session.createSQLQuery(query)
final results = sqlQuery.with {
resultTransformer = AliasToEntityMapResultTransformer.INSTANCE
setString('monthName', monthName)
list()
}
results
}
The part of interest is to declare a row within the main select and then in the clause from declare the derived query that stores the result in a row with the same name declared in the main select
SELECT ...
total_tasks --Add the count column to your select
FROM ticket t
JOIN (SELECT ticked_id, COUNT(1) as total_tasks
FROM tasks
GROUP BY ticked_id) ta ON t.id = ta.ticked_id
...rest of query
This last example I share from the answer made by the user Aaron Dietz to the question that I also formulate
I hope it is useful for you

Turns out I wasn't very far from the solution and i just needed to do grouping based on the right property which is the foreign key column in the child table which is b.a in this case so the following works now
A.createCriteria().list {
createAlias('b','b')
projections{
property('id')
property('gender')
property('dateOfBirth')
count('b.id')
groupProperty('b.a')
property('publicId')
}
}

In the criteria you need to group by the property which are not aggregate.
Try following:
A.createCriteria().list {
createAlias('b','b')
projections{
groupProperty('id','id')
groupProperty('gender','gender')
groupProperty('dateOfBirth','dateOfBirth')
count('b.id','total')
groupProperty('publicId','publicId')
}
}
or If you want to have a list of map object return you can try add resultTransformer(CriteriaSpecification.ALIAS_TO_ENTITY_MAP)
A.createCriteria().list {
resultTransformer(CriteriaSpecification.ALIAS_TO_ENTITY_MAP)
createAlias('b','b')
projections{
groupProperty('id','id')
groupProperty('gender','gender')
groupProperty('dateOfBirth','dateOfBirth')
count('b.id','total')
groupProperty('publicId','publicId')
}
}
Hope it can help

Neo4j Cypher pattern comprehension as object property

I have a Decision node with a collection of Tag:
#NodeEntity
public class Decision {
#Relationship(type = BELONGS_TO, direction = Relationship.OUTGOING)
private Set<Tag> tags;
....
}
Based on the issue described at the following question SDN4/OGM Cypher query and duplicates at Result I have created the following query in order to select Decision + it's Tags:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = {decisionId}
WITH childD
SKIP 0 LIMIT 100
RETURN childD AS decision,
[ (childD)-[rdt:BELONGS_TO]->(t:Tag) | t ] AS tags
Is it possible to change the RETURN statement in order to place tags inside decision(as decision.tags) instead of having both of them at the same level ?

Sure that's easy, it's then just not a node anymore, but a map.
MATCH (parentD:Decision)-[:CONTAINS]->(childD:Decision)
WHERE parentD.id = {decisionId}
WITH childD
SKIP 0 LIMIT 100
RETURN childD {.*, tags: [ (childD)-[:BELONGS_TO]->(t:Tag) | t ] } AS decision
This uses map expressions, where you have a map constructed by:
variable { .property, .*, foo:"bar", bar:nested-expression }

neo4j counting relations of multiple nodes

I work on neo4j graph , I wrote this query
match (rec:Recipe) , (rec1:Recipe) , (rec)-[r:ContainsIngredient]->() , (rec1)- [r1:ContainsIngredient]->()
where rec.name = "a" AND rec1.name = "b"
return count(r) , count(r1)
it returns the same value , although Recipe("a") have three relations and Recipe("b") have 5 relations .
note : I noticed that it always returns the bigger value .

You aren't grouping by the recipe name. Try this:
MATCH (rec:Recipe)
WHERE rec.name = "a" OR rec.name = "b"
MATCH (rec)-[:ContainsIngredient]->()
RETURN rec.name, COUNT(*)

How to get genre info using Dbpedia ruby gem

I am trying to fetch artist info from wikipedia using Dbpedia gem https://github.com/farbenmeer/dbpedia
But I am unable to figure out what is the genre of a result item.
Basically I want to modify following function to find out which result is an artist and then return its url:
def self.get_slug(q)
results = Dbpedia.search(q)
result = # Do something to find out the result that is an artist
uri = result.uri rescue ""
return uri
end
The last resort will be for me to scrape each result url and then find out if it is an artist or not based on if there is genre info available.

You could leverage from DBpedia's SPARQL endpoint, rather than scrapping over all results.
Suppose you want a list of everything that has a genre. You could query:
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre
}
LIMIT 1000
But say you don't want everything, you're looking just for artists. It could be a musician, a painter, an actor, etc.
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Artist
}
LIMIT 1000
Or maybe you just want musicians OR bands:
SELECT DISTINCT ?thing WHERE {
{
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist # `a` is a shortcut for `rdf:type`
}
}
LIMIT 1000
Ultimately, you want musicians or bands that have "mega" in their names, e.g. Megadeath or Megan White, along with the URL of the resource.
SELECT DISTINCT ?thing, ?url, ?genre WHERE {
?thing foaf:name ?name ;
foaf:isPrimaryTopicOf ?url .
?name bif:contains "'mega*'" .
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist
}
UNION
{
?thing a <http://umbel.org/umbel/rc/MusicalPerformer>
}
}
LIMIT 1000
Give it a try to this queries using the DBpedia's SPARQL Query Editor.
The dbpedia gem you pointed out, reveals the sparql-client in its API. So, I think you will be able to run all this queries using the #query method
Dbpedia.sparql.query(query_string)
Best luck!

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Slow Berlin sparql benchmark queries in Neo4j - neo4j

Related

SPARQL get all the data before it reaches timeout

Get count of child objects in grails Criteria query

Neo4j Cypher pattern comprehension as object property

neo4j counting relations of multiple nodes

How to get genre info using Dbpedia ruby gem

Categories

Resources