SPARQL wikidata get state/province of all cities of a particular country - timeout

I am trying to get all cities and their states/province of a country. Here I got all cities of Canada successfully using this below query.
Cities Wikidata SPARQL -- It works
SELECT ?city ?cityLabel WHERE {
?city wdt:P17 wd:Q16;
(wdt:P31/(wdt:P279*)) wd:Q515;
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?cityLabel
I want to get the state/province of all cities. So, I tried this below query. But, it didn't work.
Cities & State/Province Wikidata SPARQL -- Didn't work
SELECT ?city ?cityLabel ?stateLabel WHERE {
?city wdt:P17 wd:Q16;
(wdt:P31/(wdt:P279*)) wd:Q515;
wdt:P131* ?state . ?state wdt:P31 wd:Q11828004
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?cityLabel
LIMIT 5000

Related

SPARQL get all the data before it reaches timeout

I am trying to get all the city names of all countries in the world using this below query. Whenever I execute this below query it returns this message "Query timeout limit reached".
Is there any other way to get all the data before it reaches timeout limit?
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?country
I am not at all sure why, but, this query works for me:
SELECT ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryLabel
LIMIT 100000
The two differences from your original query are:
Ordering by countryLabel is, I'm guessing, what you actually wanted instead of ordering by country. In my experience ordering by label is sometimes faster too.
I set a limit number. The query appears to return results of the same length as it would without a limit, since the limit is higher than the proper number of results.
I've posted this answer on the Open Data site, based on my comment, but removing ORDER BY made the query go through.
Here is a query that works using our recently released Wikidata SPARQL Query Service endpoint.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX bd: <http://www.bigdata.com/rdf#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?country ?countryLabel ?city ?cityLabel
WHERE
{
?city wdt:P31/wdt:P279* wd:Q515;
wdt:P17 ?country ;
rdfs:label ?cityLabel .
FILTER (lang(?cityLabel) = "en")
?country rdfs:label ?countryLabel .
FILTER (lang(?countryLabel) = "en")
}
ORDER BY ?country
Live Query Results Page.
Here is a query that works.
SELECT DISTINCT ?cityID ?cityIDLabel ?countryID ?countryIDLabel WHERE
{
{
SELECT * WHERE
{
?cityID wdt:P31 ?cityInstance.
VALUES (?cityInstance) {
(wd:Q515)
(wd:Q5119)
}
OPTIONAL {
?cityID wdt:P17 ?countryID.
?countryID p:P31/ps:P31 wd:Q6256.
}
FILTER NOT EXISTS {
?cityID wdt:P17 "".
?countryID wdt:P30 "".
}
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryIDLabel

Using alias in Laravel Query

I have a function on a model:
public function getAll()
{
$allusers = DB::table('users')->join('countries', 'countries.id', '=', 'users.country_id')->get();
return $allusers;
}
This works fine except the id of the user is replaced by the id of the country:
array(2) { [0]=> object(stdClass)#201 (20) { ["id"]=> int(42) ["name"]=> string(11) "Jim Elliott" ... and so on
The ID should be 1 as the country_id is 42.
Should I add an alias for the country ID and is so how? Or can I restrict the fields of the countries table to just be the country and flag?
In the end I cheated with a view in which I linked the 2 tables and gave an alias to the country ID. Not an elegant solution but it has worked. I am sure there is a proper way to do it in Laravel without a view, so my function then became:
$allusers = DB::table('v_users')->get();
return $allusers;
$users = DB::table('users')->join('countries', 'countries.id', '=', 'users.country_id')
->select(DB::raw(
"SELECT users.*,countries.*, users.id AS 'user_id',
countries.id AS 'country_id'"
);

How to select distinct graph nodes by property

I have a database including PlayStation games and it contains games from all regions and platforms. Some of the games from different regions have the same title and platform so I would like to filter out "duplicates". At this time, I don't have region information on each game so the best I can do is filter out by game name and platform.
Is it possible to select distinct nodes by property? I seem to remember that you can return distinct rows based on a column in SQL, but it seems that Cypher applies distinct to the entire row and not just a specific column.
I would like to achieve something like the following:
MATCH (game:PSNGame) RETURN game WHERE distinct game.TitleName, distinct game.Platforms
The above query if it were valid would return all PSNGame nodes with a distinct TitleName and Platforms combination. Since the above query is not valid Cypher, I have tried returning a list of distinct TitleName/Platforms where distinct is applied to both columns.
The query I have for returning the distinct TitleName/Platforms list looks like this:
MATCH (game:PSNGame) RETURN distinct game.TitleName, game.Platforms
The JSON response from Neo4j is similar to this:
[["God of War", ["PS3", "PSVITA"]], ["God of War II", ["PS3", "PSVITA"]]]
The problem I'm facing is that the JSON response is not really an object with properties. It's more of an array of arrays. If I could get the response to be more like an object, I could deserialize without issues. I tried to deserialize as an IList<PsnGame>, but haven't had much luck.
Here's my POCO for the IList<PsnGame> implementation:
public class PsnGame
{
public string TitleName { get; set; }
public string[] Platforms { get; set; }
}
EDIT:
Here is the simplest example of my Neo4jClient query:
// helper function for handling searching by name and platform
private ICypherFluentQuery BuildPSNGamesQuery(string gameName, string platform)
{
var query = client.Cypher
.Match("(g:PSNGame)");
if (!string.IsNullOrWhiteSpace(gameName))
{
query = query.Where($"g.TitleName =~ \"(?i).*{gameName}.*\"");
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.AndWhere($"\"{platform}\" in g.Platforms");
}
}
else
{
if (!string.IsNullOrWhiteSpace(platform) && platform.ToLower() != "all")
{
query = query.Where($"\"{platform}\" in g.Platforms");
}
}
return query;
}
Distinct games:
var distinctGames = await BuildPSNGamesQuery(gameName, platform)
.With("DISTINCT g.TitleName AS TitleName, g.Platforms AS Platforms")
.With("{ TitleName: TitleName, Platforms: Platforms } as Games")
.OrderBy("TitleName")
.Return<PsnGame>("Games")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
All games (somehow need to filter based on previous query):
var results = await BuildPSNGamesQuery(gameName, platform)
.Return(g => new Models.PSN.Composite.PsnGame
{
Game = g.As<PsnGame>()
})
.OrderBy("g.TitleName")
.Skip((pageNumber - 1) * pageSize)
.Limit(pageSize)
.ResultsAsync;
By using a map, I'm able to return the TitleName/Platforms pairing that I want, but I suspect I'll need to do a collect on the Platforms to get all platforms for a particular game title. Then I can filter the entire games list by the distinctGames that I return. However, I would prefer to perform a request and merge the queries to reduce HTTP traffic.
An example of duplicates can be seen on my website here:
https://www.gamerfootprint.com/#/games/ps
Also, the data for duplicates looks something like this:
MATCH (n:PSNGame)
WHERE n.TitleName = '1001 Spikes'
RETURN n.TitleName, n.Platforms LIMIT 25
JSON:
{
"columns":[
"n.TitleName",
"n.Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
},
{
"row":[
"1001 Spikes",
[
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
EDIT: 10-31-15
I was able to get distinct game title and platforms returning with the platforms for each game rolled up into a single collection. My new query is the following:
MATCH (game:PSNGame)
WITH DISTINCT game.TitleName as TitleName,
game.Platforms as coll UNWIND coll as Platforms
WITH TitleName as TitleName, COLLECT(DISTINCT Platforms) as Platforms
RETURN TitleName, Platforms
ORDER BY TitleName
Here is a small subset of the results:
{
"columns":[
"TitleName",
"Platforms"
],
"data":[
{
"row":[
"1001 Spikes",
[
"PSVITA",
"PS4"
]
],
"graph":{
"nodes":[
],
"relationships":[
]
}
}
],
"stats":{
"contains_updates":false,
"nodes_created":0,
"nodes_deleted":0,
"properties_set":0,
"relationships_created":0,
"relationship_deleted":0,
"labels_added":0,
"labels_removed":0,
"indexes_added":0,
"indexes_removed":0,
"constraints_added":0,
"constraints_removed":0
}
}
Finally, 1001 Spikes is in the list once and has both PS VITA and PS4 listed as platforms. Now, I need to figure out how to grab the full game nodes and filter against the above query.
try this one:
MATCH (game:PSNGame)
with game, collect([game.TitleName, game.Platforms]) as wow
return distinct(wow)
If I understand you correctly, you want to select different nodes by property and remove duplicates? If so, it would be something like this:
MATCH (game:PSNGame {property:'value'}) RETURN DISTINCT game.property
That should remove duplicates and return your node by property.

How to get genre info using Dbpedia ruby gem

I am trying to fetch artist info from wikipedia using Dbpedia gem https://github.com/farbenmeer/dbpedia
But I am unable to figure out what is the genre of a result item.
Basically I want to modify following function to find out which result is an artist and then return its url:
def self.get_slug(q)
results = Dbpedia.search(q)
result = # Do something to find out the result that is an artist
uri = result.uri rescue ""
return uri
end
The last resort will be for me to scrape each result url and then find out if it is an artist or not based on if there is genre info available.
You could leverage from DBpedia's SPARQL endpoint, rather than scrapping over all results.
Suppose you want a list of everything that has a genre. You could query:
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre
}
LIMIT 1000
But say you don't want everything, you're looking just for artists. It could be a musician, a painter, an actor, etc.
SELECT DISTINCT ?thing WHERE {
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Artist
}
LIMIT 1000
Or maybe you just want musicians OR bands:
SELECT DISTINCT ?thing WHERE {
{
?thing dbpedia-owl:genre ?genre ;
rdf:type dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist # `a` is a shortcut for `rdf:type`
}
}
LIMIT 1000
Ultimately, you want musicians or bands that have "mega" in their names, e.g. Megadeath or Megan White, along with the URL of the resource.
SELECT DISTINCT ?thing, ?url, ?genre WHERE {
?thing foaf:name ?name ;
foaf:isPrimaryTopicOf ?url .
?name bif:contains "'mega*'" .
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:Band
}
UNION
{
?thing dbpedia-owl:genre ?genre ;
a dbpedia-owl:MusicalArtist
}
UNION
{
?thing a <http://umbel.org/umbel/rc/MusicalPerformer>
}
}
LIMIT 1000
Give it a try to this queries using the DBpedia's SPARQL Query Editor.
The dbpedia gem you pointed out, reveals the sparql-client in its API. So, I think you will be able to run all this queries using the #query method
Dbpedia.sparql.query(query_string)
Best luck!

Slow Berlin sparql benchmark queries in Neo4j

I am trying Berlin benchmark SPARQL queries in neo4j. I have created Neo4j graph from triples using http://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html
To summarize data loading, My graph has a following structure,
Subject => Node
Predicate => Relationship
Object => Node
If predicate is date, string, integer (primitive) then a property is created instead of relationship and stored in Node.
Now, I am trying following queries which are really slow in Noe4j,
Query 4: Feature with the highest ratio between price with that feature and price without that feature.
corresponding SPARQL query for this,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?feature ((?sumF*(?countTotal-?countF))/(?countF*(?sumTotal-?sumF)) As ?priceRatio)
{
{ Select (count(?price) As ?countTotal) (sum(xsd:float(str(?price))) As ?sumTotal)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> .
?offer bsbm:product ?product ;
bsbm:price ?price .
}
}
{ Select ?feature (count(?price2) As ?countF) (sum(xsd:float(str(?price2))) As ?sumF)
{
?product2 a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType294> ;
bsbm:productFeature ?feature .
?offer2 bsbm:product ?product2 ;
bsbm:price ?price2 .
}
Group By ?feature
}
}
Order By desc(?priceRatio) ?feature
Limit 100
Cypher query I created for this,
MATCH p1 = (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH p2 = (offer2:Offer)-[r2:`product`]->products2:ProductType294)-[:`productFeature`]->features
return (sum( DISTINCT offer2.price) * ( count( DISTINCT offer1.price) - count( DISTINCT offer2.price)) /(count(DISTINCT offer2.price)*(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price)))) AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui
This query is really slow, Please let me know whether I am formulating the query in wrong way.
Another query is Query 5: Show the most popular products of a specific product type for each country - by review count ,
prefix bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
prefix bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
prefix rev: <http://purl.org/stuff/rev#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
Select ?country ?product ?nrOfReviews ?avgPrice
{
{ Select ?country (max(?nrOfReviews) As ?maxReviews)
{
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product ;
rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
}
Group By ?country
}
{ Select ?product (avg(xsd:float(str(?price))) As ?avgPrice)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?offer bsbm:product ?product .
?offer bsbm:price ?price .
}
Group By ?product
}
{ Select ?country ?product (count(?review) As ?nrOfReviews)
{
?product a <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/ProductType403> .
?review bsbm:reviewFor ?product .
?review rev:reviewer ?reviewer .
?reviewer bsbm:country ?country .
}
Group By ?country ?product
}
FILTER(?nrOfReviews=?maxReviews)
}
Order By desc(?nrOfReviews) ?country ?product
Cypher query I created for this is following,
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with count(reviews) AS reviewcount,products2.__URI__ AS pruis, countries.__URI__ AS cntrs
MATCH (products1:ProductType403)<-[:`product`]-(offer:Offer)
with AVG(offer.price) AS avgPrice, MAX(reviewcount) AS maxrevs, cntrs
MATCH (products2:ProductType403)<-[:`reviewFor`]-(reviews:Review)-[:`reviewer`]->(rvrs)-[:`country`]->(countries)
with avgPrice, maxrevs,countries, count(reviews) AS rvs, countries.__URI__ AS curis, products2.__URI__ AS puris
where maxrevs=rvs
RETURN curis,puris,rvs,avgPrice
Even this query is really slow. Am I formulating queries in correct way?
I had 10M triples (berlin benchmark dataset)
Every type predicate was converted into label.
(For Query 4) what I'm trying to get is Feature with the highest ratio between price with
that feature and price without that feature. Is this a right way to
formulate query?
(For Query 4) I get correct results for this query.
If I don't compute the sum and count then query gets executed real fast.
Thanks in advance :) SPARQL queries and information can be found at : http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html#queries
These look like global graph queries to me?
What is the size of your dataset?
You create a cartesian product between the two paths?
Shouldn't those two paths somehow connected ?
Shouldn't there be a property type on a ProductType label? (:ProductType {type:"294"})
And if there was you'd have an index on :ProductType(type) and probably :Order(orderNo)
I don't really understand the calculation?
delta of count distinct prices times sum of distinct prices of offer 2
by
count of distinct prices of offer 2, times the delta of the sum of the two order prices?
MATCH (offer1:Offer)-[r1:`product`]->(products1:ProductType294)
MATCH (offer2:Offer)-[r2:`product`]->(products2:ProductType294)-[:`productFeature`]->features
RETURN (sum( DISTINCT offer2.price) *
( count( DISTINCT offer1.price) - count( DISTINCT offer2.price))
/ (count(DISTINCT offer2.price)*
(sum( DISTINCT offer1.price) - sum(DISTINCT offer2.price))))
AS cnt,features.__URI__ AS frui
ORDER BY cnt DESC,frui

Resources