how to use Union/or in sparql path with arbitrary length? - path

I'm using below query to find all properties with domain of city (or superclasses of city) or range of country (or superclasses of country) from DBPedia ontology. when I use path with fixed length there is no problem but when I put * to define paths with arbitrary length, I get this error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable
'_::trans_subj_6_4' is used in subexpressions of the query but not
assigned
my SPARQL:
define sql:signal-void-variables 1
define input:default-graph-uri <http://dbpedia.org>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX res: <http://dbpedia.org/resource/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
select ?property where{
{ ?property rdfs:domain/^rdfs:subClassOf* dbo:City }
UNION
{ ?property rdfs:range/^rdfs:subClassOf* dbo:Country }
}
Also when I put any number instead of *, I get same error. I'm using Virtuoso as DBPedia SPARQL endpoint.

Use VALUES instead of UNION (when you can)
The error Virtuoso is giving you is more about its implementation of property paths and union than the actual SPARQL query. The SPARQL part of the query looks correct. (I can't speak to the Virtuoso specific defines.)
In many places that required union in the original SPARQL standard, you can now use values to specify the particular values that variables can have. It typically leads to more readable queries (at least, in my opinion), and some endpoints, such as Virtuoso, seem to handle it better.
Using values (and using the dbpedia-owl prefix that the web interface to the endpoint uses), you query becomes the following, and Virtuoso returns what you're looking for:
select ?property where {
values (?p ?v) { (rdfs:domain dbpedia-owl:City)
(rdfs:range dbpedia-owl:Country) }
?property ?p ?class .
?class ^rdfs:subClassOf* ?v .
}
SPARQL results
Other Notes
Also when I put any number instead of *, I get same error. I'm using
Virtuoso as DBPedia SPARQL endpoint.
While Virtuoso accepts the {n,m} notation for lengths of property paths, do be aware that while those appeared in some drafts of property paths, they didn't actually make it into the SPARQL 1.1 standard. Virtuoso still accepts them, but if you use them, you might not be able to use your query with other endpoints.

Related

Solr and Rails: [* TO *] value instead of nil (asterisk TO asterisk)

Inside my model at searchable block I have index time added_at.
At search block for searching I added with(:added_at, nil), made reindex and now inside search object I have:
<Sunspot::Search:{:fq=>["-added_at_d:[* TO *]"]...}>
What is the meaning of this [* TO *] ? Something went wrong?
By adding with(:added_at, nil) you narrow down the search results to documents having no values in the field added_at, so we can expect the corresponding query filter to be defined as :
fq=>["added_at_d:null"] # not valid
The problem is that Solr Standard Query Parser does not support searching a field for empty/null value. In this situation the filter needs to be negated (exluding documents having any value in the field) so that the query remains valid.
The operator - can be used to exclude the field, and the wildcard character * can be used to match any value, now we can expect the query filter to look like :
fq=>["-added_at_d:*"]
However, although the above is valid for the query parser, using a range query should be preferred to prevent inconsitent behaviors when using wildcard within negative subqueries.
Range Queries allow one to match documents whose field(s) values are
between the lower and upper bound specified by the Range Query. Range
Queries can be inclusive or exclusive of the upper and lower bounds.
A * may be used for either or both endpoints to specify an open-ended range query.
Eventually there is nothing wrong with this filter that ends up looking like :
fq=>["-added_at_d:[* TO *]"]
cf. Lucene Range Queries, Solr Standard Query Parser

How to use parameter in the Cypher statement for variable length pattern matching?

In our project , PEPOLEs are connected by KNOWS relationships. We need to query one’s friends in depth n which n is a parameter inputed by user.We use Spring Data Neo4j to implement it.
public interface PeopleRepository extends GraphRepository<People>
{
#Query("MATCH (startnode{name:{name}})-[:KNOWS*1..{depth}]-(remote_friend) RETURN remote_friend.name");
List<People> getFriendsInDepth(#Param("name") String name, #Param("depth") Integer depth);
}
The above codes won’t work. But if I replace {depth} parameter with a fixed Integer value as follows:
#Query("MATCH (startnode{name:{name}})-[:KNOWS*1..2]-(remote_friend) RETURN remote_friend.name");
List<People> getFriendsInDepth(#Param("name") String name, #Param("depth") Integer depth);
it works. I know the problem is caused by the depth parameter. But I have tried a lot of methods to replace {depth}, for example: toInt({depth}), it still won’t work. Is there anyone know how to use the parameter in the Cypher statement for variable length pattern matching?
Cypher does not allow you to parameterize the depth of a relationship, hence #Query won't support it either.
If you use Spring Data Neo4j 4, then perhaps you can translate your #Query to a set of org.neo4j.ogm.cypher.Filter.
Then you can use the Session.loadAll methods which accept Filters as well as a depth.
MusicIntegrationTest contains a couple of Filter examples.

store temp variables in neo4j

I have some cypher queries that I execute against my neo4j database. The query is in this form
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ 'VERY_LONG_LIST')
RETURN count(r1) AS number_iframes;
If you can't understand what I am doing. This is a much simpler query
MATCH (s:WORD)
WHERE NOT (s.text=~"badword1|badword2|badword3")
RETURN s
I am basically trying to match some words against specific list
The problem is that this list is very large as you can see my job_id=5000 and I have more than 20000 jobs, so if my whitelist length is 1MB then I will end up with very large queries. I tried 500 jobs and end up with 200 MB queries file.
I was trying to execute these queries using transactions from py2neo but this is wont be feasible because my post request length will be very large and it will timeout. As a result, I though of using
neo4j-shell -file <queries_file>
However as you can see the file size is very large because of the large whitelist. So my question is there anyway that I can store this "whitelist" in a variable in neo4j using cypher??
I wish if there is something similar to this
SAVE $whitelist="word1,word2,word3,word4,word5...."
MATCH p=(j:JOB)-[r:HAS|STARTS]->(s:URL)-[r1:VISITED]->(t:URL)
WHERE j.job_id =5000 and r1.origin='iframe' and r1.job_id=5000 AND NOT (t.netloc =~ $whitelist)
RETURN count(r1) AS number_iframes;
What datatype is your netloc?
If you have an index on netloc you can also use t.netloc IN {list} where {list} is a parameter provided from the outside.
Such large regular expressions will not be fast
What exactly is your regexp and netloc format like? Perhaps you can change that into a split + index-list lookup?
In general also for regexps you can provide an outside parameter.
You can also use "IN" + index for job_ids.
You can also run a separate job that tags the jobs within your whitelist with a label and use that label for additional filtering e.g. in the match already.
Why do you have to check this twice ? Isn't it enough that the job has id=5000?
j.job_id =5000 and r1.job_id=5000

NEO4J execute severals statement

How it's possible to run a collection of query like this (came from a spreadsheet copy) directly in one cypher query? one by one it's ok, but need 100 copy/paste
*******************************
MATCH (c:`alpha`)
where c.name = "a-01"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=2
return c ;
MATCH (c:`beta`)
where c.name = "a-02"
SET c.CP_PRI=1, c.TO_PRI=1, c.TA_PRI=0
return c ;
and 100 other lines ...
*********************************
you may try the 'union' clause, which joins the results of queries into one big-honkin result set:
http://docs.neo4j.org/chunked/milestone/query-union.html
That said - the root behavior of what you are trying to do could use some details - maybe there's a better way to write the query - you could use Excel to 'build' the unified query via calculations / macros, you could possibly write a unified query that combines the rules you are trying to follow, there's a lot of options, but it's hard to know a starting direction w/o context....
Talking about the REST API you can use the transactional endpoint in Neo4J 2.0, or the batch endpoint in Neo4J 1.x.
If you want to use the shell, have a look to the import page, in particular the neo4j-shell-tools where they're importing massive quantity of data batching multiple queries.

Prevent EF from escaping wildcard character

I have something like this
var query = repo.GetQuery(); // IQueryable
query.Where(item => item.FieldName.Contains("xxx%yyy"));
It results in following statement on SQL server
exec sp_executesql N'SELECT
// clipped
WHERE ([Extent1].[FieldName] LIKE #p__linq__0 ESCAPE N''~'')',
N'#p__linq__0 nvarchar(4000),#p__linq__0=N'%xxx~%yyy%'
#p__linq__0=N'%xxx~%yyy% causes the SQL server to look for xxx%yyy with % as literal (as it is escaped) while I would like it to match string like xxx123yyy, xxxABCyyy, xxxANYTHINGyyy, xxxyyy etc. Addition of prefix % and suffix % is fine but I could do it manually if needed.
In the above example I have simplified and written only one where condition but I have a dynamic logic that build the predicate with many of such keywords and I would like to allow the wildcards to be embedded inside the keywords. Is there a way to tell EF not to escape the % in the search keyword?
It is not possible. Contains("xxx") means that in SQL you want LIKE '%xxx%'. Linq-to-entities and none of its String mapped methods offer full wildcard searching = any wildcard character is always escaped. If you want to use wildcard searching you must use Entity SQL.

Resources