SPARQL get all the data before it reaches timeout - timeout

I am trying to get all the city names of all countries in the world using this below query. Whenever I execute this below query it returns this message "Query timeout limit reached".
Is there any other way to get all the data before it reaches timeout limit?
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?country

I am not at all sure why, but, this query works for me:
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryLabel
LIMIT 100000
The two differences from your original query are:
Ordering by countryLabel is, I'm guessing, what you actually wanted instead of ordering by country. In my experience ordering by label is sometimes faster too.
I set a limit number. The query appears to return results of the same length as it would without a limit, since the limit is higher than the proper number of results.

I've posted this answer on the Open Data site, based on my comment, but removing ORDER BY made the query go through.

Here is a query that works using our recently released Wikidata SPARQL Query Service endpoint.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country ;
rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en")
?country rdfs:label ?countryLabel .
FILTER (lang(?countryLabel) = "en")
}
ORDER BY ?country
Live Query Results Page.

Here is a query that works.
SELECT DISTINCT ?cityID ?cityIDLabel ?countryID ?countryIDLabel WHERE
{
{
SELECT * WHERE
{
?cityID wdt:P31 ?cityInstance.
VALUES (?cityInstance) {
(wd:Q515)
(wd:Q5119)
}
OPTIONAL {
?cityID wdt:P17 ?countryID.
?countryID p:P31/ps:P31 wd:Q6256.
}
FILTER NOT EXISTS {
?cityID wdt:P17 "".
?countryID wdt:P30 "".
}
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryIDLabel

Related

SPARQL wikidata get state/province of all cities of a particular country

I am trying to get all cities and their states/province of a country. Here I got all cities of Canada successfully using this below query.
Cities Wikidata SPARQL -- It works
SELECT ?city ?cityLabel WHERE {
?city wdt:P17 wd:Q16;
(wdt:P31/(wdt:P279*)) wd:Q515;
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?cityLabel
I want to get the state/province of all cities. So, I tried this below query. But, it didn't work.
Cities & State/Province Wikidata SPARQL -- Didn't work
SELECT ?city ?cityLabel ?stateLabel WHERE {
?city wdt:P17 wd:Q16;
(wdt:P31/(wdt:P279*)) wd:Q515;
wdt:P131* ?state . ?state wdt:P31 wd:Q11828004
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?cityLabel
LIMIT 5000

Neo4j LIMIT if parameter is set

is there a way to set a LIMIT only, if the parameter {limit} has an numeric value.
...
RETURN whatever
LIMIT {limit}
maybe in a way like this (i know, that the next code example does not work)
...
RETURN whatever
if({limit}>0)
LIMIT {limit}
thanks!
You should process this logic in your application layer by building dynamic queries.
Edit :
This can simply be done like the example below (in php but possible in all languages)
public function doMatchQuery($limit = null)
{
$query = 'MATCH (n) RETURN n';
if ($limit && $limit !== 0) {
// extend the query string
$query .= ' LIMIT '.$limit;
}
}
// Calling your function
$matchAll = $this->doMatchQuery(); // Return all n elements from the db
$matchFirstTen = $this->doMatchQuery(10); // Return the n elements with a limit of 10

How to get genre info using Dbpedia ruby gem

I am trying to fetch artist info from wikipedia using Dbpedia gem https://github.com/farbenmeer/dbpedia
But I am unable to figure out what is the genre of a result item.
Basically I want to modify following function to find out which result is an artist and then return its url:
def self.get_slug(q)
results = Dbpedia.search(q)
result = # Do something to find out the result that is an artist
uri = result.uri rescue ""
return uri
end
The last resort will be for me to scrape each result url and then find out if it is an artist or not based on if there is genre info available.
You could leverage from DBpedia's SPARQL endpoint, rather than scrapping over all results.
Suppose you want a list of everything that has a genre. You could query:
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre
}
LIMIT 1000
But say you don't want everything, you're looking just for artists. It could be a musician, a painter, an actor, etc.
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Artist
}
LIMIT 1000
Or maybe you just want musicians OR bands:
SELECT DISTINCT ?thing WHERE {
{
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist # `a` is a shortcut for `rdf:type`
}
}
LIMIT 1000
Ultimately, you want musicians or bands that have "mega" in their names, e.g. Megadeath or Megan White, along with the URL of the resource.
SELECT DISTINCT ?thing, ?url, ?genre WHERE {
?thing foaf:name ?name ;
foaf:isPrimaryTopicOf ?url .
?name bif:contains "'mega*'" .
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist
}
UNION
{
?thing a <http://umbel.org/umbel/rc/MusicalPerformer>
}
}
LIMIT 1000
Give it a try to this queries using the DBpedia's SPARQL Query Editor.
The dbpedia gem you pointed out, reveals the sparql-client in its API. So, I think you will be able to run all this queries using the #query method
Dbpedia.sparql.query(query_string)
Best luck!

Slow Berlin sparql benchmark queries in Neo4j

I am trying Berlin benchmark SPARQL queries in neo4j. I have created Neo4j graph from triples using http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html
To summarize data loading, My graph has a following structure,
Subject => Node
Predicate => Relationship
Object => Node
If predicate is date, string, integer (primitive) then a property is created instead of relationship and stored in Node.
Now, I am trying following queries which are really slow in Noe4j,
Query 4: Feature with the highest ratio between price with that feature and price without that feature.
corresponding SPARQL query for this,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio)
{
{ Select (count(?price) As ?countTotal) (sum(xsd:float(str(?price))) As ?sumTotal)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> .
?offer bsbm:product ?product ;
bsbm:price ?price .
}
}
{ Select ?feature (count(?price2) As ?countF) (sum(xsd:float(str(?price2))) As ?sumF)
{
?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> ;
bsbm:productFeature ?feature .
?offer2 bsbm:product ?product2 ;
bsbm:price ?price2 .
}
Group By ?feature
}
}
Order By desc(?priceRatio) ?feature
Limit 100
Cypher query I created for this,
MATCH p1 = (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH p2 = (offer2:Offer)-[r2:`product`]->products2:ProductType294)-[:`productFeature`]->features
return (sum( DISTINCT offer2.price) * ( count( DISTINCT offer1.price) - count( DISTINCT offer2.price)) /(count(DISTINCT offer2.price)*(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price)))) AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui
This query is really slow, Please let me know whether I am formulating the query in wrong way.
Another query is Query 5: Show the most popular products of a specific product type for each country - by review count ,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?country ?product ?nrOfReviews ?avgPrice
{
{ Select ?country (max(?nrOfReviews) As ?maxReviews)
{
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product ;
rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
}
Group By ?country
}
{ Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?offer bsbm:product ?product .
?offer bsbm:price ?price .
}
Group By ?product
}
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product
Cypher query I created for this is following,
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with count(reviews) AS reviewcount,products2.__URI__ AS pruis, countries.__URI__ AS cntrs
MATCH (products1:ProductType403)<-[:`product`]-(offer:Offer)
with AVG(offer.price) AS avgPrice, MAX(reviewcount) AS maxrevs, cntrs
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with avgPrice, maxrevs,countries, count(reviews) AS rvs, countries.__URI__ AS curis, products2.__URI__ AS puris
where maxrevs=rvs
RETURN curis,puris,rvs,avgPrice
Even this query is really slow. Am I formulating queries in correct way?
I had 10M triples (berlin benchmark dataset)
Every type predicate was converted into label.
(For Query 4) what I'm trying to get is Feature with the highest ratio between price with
that feature and price without that feature. Is this a right way to
formulate query?
(For Query 4) I get correct results for this query.
If I don't compute the sum and count then query gets executed real fast.
Thanks in advance :) SPARQL queries and information can be found at : http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html#queries
These look like global graph queries to me?
What is the size of your dataset?
You create a cartesian product between the two paths?
Shouldn't those two paths somehow connected ?
Shouldn't there be a property type on a ProductType label? (:ProductType {type:"294"})
And if there was you'd have an index on :ProductType(type) and probably :Order(orderNo)
I don't really understand the calculation?
delta of count distinct prices times sum of distinct prices of offer 2
by
count of distinct prices of offer 2, times the delta of the sum of the two order prices?
MATCH (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH (offer2:Offer)-[r2:`product`]->(products2:ProductType294)-[:`productFeature`]->features
RETURN (sum( DISTINCT offer2.price) *
( count( DISTINCT offer1.price) - count( DISTINCT offer2.price))
/ (count(DISTINCT offer2.price)*
(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price))))
AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui

Does Entity framework check for redundant OrderBy() methods

I have the following Repository model method:
public IQueryable<AccountDefinition> FindAccountDefinition(string q)
{
return entities.AccountDefinitions.Include(a => a.SDOrganization)
.Where (a=> String.IsNullOrEmpty(q) ||
a.ORG_NAME.ToUpper().StartsWith(q.ToUpper()))
.OrderBy(a=>a.ORG_NAME);
}
And the following action method:
[OutputCache(CacheProfile = "short", Location = OutputCacheLocation.Client,VaryByHeader="X-Requested-With", VaryByParam = "*")]
public ActionResult Index(string searchTerm=null, int page = 1)
{
//code goes here
var accountdefinition = repository.FindAccountDefinition(searchTerm== null ? null : searchTerm.Trim())
.OrderBy(a => a.ORG_NAME)
.ToPagedList(page, pagesize);
if (Request.IsAjaxRequest())
{
ViewBag.FromSearch = true;
return PartialView("_CustomerTable",accountdefinition);
}
return View(accountdefinition);
}
Currently I am doing the OrderBy(a => a.ORG_NAME) both inside the action method & inside the repository method. so how will entity framework and linq deal with this. I need to keep the OrderBy inside the repository method, since many other action methods are calling this repository method. and inside the action method I can not apply the ToPagedList unless I specify an OrderB. so my questions are:-
how will EF & linq deal with the duplicate OrberBy?
and will the OrderBy be done the DB level or on the server ?
Thanks
Edit
here is the generated sql statement from the sql profiler , and there is two OrderBy commands:-
exec sp_executesql N'SELECT TOP (15)
[Project1].[C1] AS [C1],
[Project1].[ORG_ID] AS [ORG_ID],
[Project1].[LOG_LOGO] AS [LOG_LOGO],
[Project1].[HEAD_LOGO] AS [HEAD_LOGO],
[Project1].[ORG_NAME] AS [ORG_NAME],
[Project1].[HASATTACHMENT] AS [HASATTACHMENT],
[Project1].[LOGIN_URI] AS [LOGIN_URI],
[Project1].[SUPPORT_EMAIL] AS [SUPPORT_EMAIL],
[Project1].[DEFAULTSITEID] AS [DEFAULTSITEID],
[Project1].[ORG_ID1] AS [ORG_ID1],
[Project1].[NAME] AS [NAME],
[Project1].[CREATEDTIME] AS [CREATEDTIME],
[Project1].[DESCRIPTION] AS [DESCRIPTION]
FROM ( SELECT [Project1].[ORG_ID] AS [ORG_ID], [Project1].[LOG_LOGO] AS [LOG_LOGO], [Project1].[HEAD_LOGO] AS [HEAD_LOGO], [Project1].[ORG_NAME] AS [ORG_NAME], [Project1].[HASATTACHMENT] AS [HASATTACHMENT], [Project1].[LOGIN_URI] AS [LOGIN_URI], [Project1].[SUPPORT_EMAIL] AS [SUPPORT_EMAIL], [Project1].[DEFAULTSITEID] AS [DEFAULTSITEID], [Project1].[ORG_ID1] AS [ORG_ID1], [Project1].[NAME] AS [NAME], [Project1].[CREATEDTIME] AS [CREATEDTIME], [Project1].[DESCRIPTION] AS [DESCRIPTION], [Project1].[C1] AS [C1], row_number() OVER (ORDER BY [Project1].[ORG_NAME] ASC) AS [row_number]
FROM ( SELECT
[Extent1].[ORG_ID] AS [ORG_ID],
[Extent1].[LOG_LOGO] AS [LOG_LOGO],
[Extent1].[HEAD_LOGO] AS [HEAD_LOGO],
[Extent1].[ORG_NAME] AS [ORG_NAME],
[Extent1].[HASATTACHMENT] AS [HASATTACHMENT],
[Extent1].[LOGIN_URI] AS [LOGIN_URI],
[Extent1].[SUPPORT_EMAIL] AS [SUPPORT_EMAIL],
[Extent1].[DEFAULTSITEID] AS [DEFAULTSITEID],
[Extent2].[ORG_ID] AS [ORG_ID1],
[Extent2].[NAME] AS [NAME],
[Extent2].[CREATEDTIME] AS [CREATEDTIME],
[Extent2].[DESCRIPTION] AS [DESCRIPTION],
1 AS [C1]
FROM [dbo].[AccountDefinition] AS [Extent1]
INNER JOIN [dbo].[SDOrganization] AS [Extent2] ON [Extent1].[ORG_ID] = [Extent2].[ORG_ID]
WHERE (#p__linq__0 IS NULL) OR (( CAST(LEN(#p__linq__0) AS int)) = 0) OR (( CAST(CHARINDEX(UPPER(#p__linq__1), UPPER([Extent1].[ORG_NAME])) AS int)) = 1)
) AS [Project1]
) AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[ORG_NAME] ASC',N'#p__linq__0 nvarchar(4000),#p__linq__1 nvarchar(4000)',#p__linq__0=NULL,#p__linq__1=NULL
1) how will EF & linq deal with the duplicate OrberBy?
It will ignore the second OrderBy because the correct way to specify multiple ORDER BY clauses is to use OrderBy first and then use ThenBy
2) and will the OrderBy be done the DB level or on the server ?
At the DB level since you are applying it to the IQueryable<T> before calling ToPagedList which will execute the statement.
I suppose, you have to call OrderBy in your controller, because the ToPagedList method requires IOrderedQueryable instead of IQueryable as the parameter, so what about returning IOrderedQueryable<AccountDefinition> from your repository?
public IOrderedQueryable<AccountDefinition> FindAccountDefinition(string q) {
return entities.AccountDefinitions
.Include(a => a.SDOrganization)
.Where (a => String.IsNullOrEmpty(q) ||
a.ORG_NAME.ToUpper().StartsWith(q.ToUpper()))
.OrderBy(a=>a.ORG_NAME);
}
OrderBy returns IOrderQueryable by default, so there is no overhead and the other methods shouldn't be affected by this change.

Resources