I'm attempting to parse Medical research reports, using the Stanford NLP. I can get the GrammaticalRelation of all the nodes except the first or root node. How do I get this valus.
I have written a java program which parses reports by getting the dependency graph and can get the child pairs of all the nodes except the root node.
public void DocAnnotationParse(String Input_text) {
Annotation document = new Annotation(Input_text);
Properties props = new Properties();
//props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse");
props.setProperty("annotators", "tokenize,ssplit,pos,parse");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
int sentNum = 0;
Map<String, Map<String, Map<String,IndexedWord>>> sentMap = new LinkedHashMap<>(); // A map contains maps of each sentence
for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
SemanticGraph dependencyParse = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
IndexedWord firstVertex = dependencyParse.getFirstRoot();
Map<String, Map<String,IndexedWord>> outterMap = new LinkedHashMap<>();
RecursiveChild(outterMap, dependencyParse, firstVertex, 0);
sentMap.put(Integer.toString(++sentNum), outterMap);
logger.debug("outtermap: "+outterMap);
}
logger.debug("all sentMaps: "+sentMap);
PrettyPrintBySentence(sentMap);
}
public void RecursiveChild(Map<String, Map<String, IndexedWord>> outterMap,
SemanticGraph dependencyParse,
IndexedWord vertex, int hierLevel) {
Map<String, IndexedWord> pairMap = new LinkedHashMap<>();
pairMap.put("Root", vertex);
List<IndexedWord>indxwdsL = dependencyParse.getChildList(vertex);
List<Pair<GrammaticalRelation,IndexedWord>>childPairs = dependencyParse.childPairs(vertex);
List<IndexedWord> nxtLevalAL = new ArrayList<>();
if(!indxwdsL.isEmpty()) {
++hierLevel;
for(Pair<GrammaticalRelation, IndexedWord> aPair : childPairs) { //at level hierLevel x
logger.debug(aPair);
String grammRel = aPair.first.toString(); //Gramatic Relation
IndexedWord indxwd = aPair.second;
pairMap.put(grammRel, indxwd);
List<Pair<GrammaticalRelation,IndexedWord>>childPairs2 = dependencyParse.childPairs(indxwd);
if(!childPairs2.isEmpty()) {
nxtLevalAL.add(indxwd);
}
}
}
String level = Integer.toString(hierLevel);
outterMap.put(level, pairMap);
//Go to each lower level
for(IndexedWord nxtIwd : nxtLevalAL) {
RecursiveChild(outterMap, dependencyParse, nxtIwd, hierLevel);
}
}
The childPair for the root vertex does not contain a Grammatical Relation, which I want. Looking at the dependency graph there is no value but just the string root. How do I get the Grammatical Relation for that node. For example the simple sentence "I love French fries." gives the graph:
-> love/VBP (root)
-> I/PRP (nsubj)
-> fries/NNS (dobj)
-> French/JJ (amod)
-> ./. (punct)
Hi I'm not a linguistics person, but my understanding is that there is simply a ROOT node outside of the SemanticGraph, and a root edge points from the root to a word(s) in the sentence.
So in your example the ROOT node is attached to the word love with the root relation.
If you look at the code of SemanticGraph it explicitly states:
* The root is not at present represented as a vertex in the graph.
* At present you need to get a root/roots
* from the separate roots variable and to know about it.
You can access the list of roots (I guess there can hypothetically be more than one?) with the getRoots() method. But I think all that means is that a root edge flows from the ROOT node into those words.
If you want an actual Java object to represent that rather than a String, there is edu.stanford.nlp.trees.GrammaticalRelation.ROOT which represents this relationship between the "faked ROOT node" and the roots.
/**
* The "root" grammatical relation between a faked "ROOT" node, and the root of the sentence.
*/
public static final GrammaticalRelation ROOT =
new GrammaticalRelation(Language.Any, "root", "root", null);
Related
I am trying to create multiple nodes in Neo4j using Cypher by passing properties as parameters as part of an UNWIND function, but I keep receiving the error Type mismatch: expected Collection<T> but was Map.
This happens even when using the following example from the Neo4j documentation (link):
UNWIND {
props : [ {
name : "Andres",
position : "Developer"
}, {
name : "Michael",
position : "Developer"
} ]
} AS map
CREATE (n)
SET n = map
Can anyone point out what I am doing wrong here?
Note, the example above is not exactly as in the Neo4j documentation. Their example wraps the property names in double quotes, but this causes my instance of Neo4j to throw the errorInvalid input '"': expected whitespace...)
UNWIND is expecting a collection, not a map as you're currently passing in, try this instead (just remove the wrapping curly braces and prop top level field):
UNWIND [ {
name : "Andres",
position : "Developer"
}, {
name : "Michael",
position : "Developer"
} ] AS map
CREATE (n)
SET n = map
Chris's answer is of course the correct one, but here's why your solution doesn't work when you're following the documentation: you're not copying the documentation.
The documentation shows the use of a named parameter:
UNWIND { props } AS map
CREATE (n)
SET n = map
with props passed in the map of parameters, which would look like:
{
"props" : [ {
"name" : "Andres",
"position" : "Developer"
}, {
"name" : "Michael",
"position" : "Developer"
} ]
}
if you displayed the map as JSON. It means the {props} placeholder will be replaced by the value for the props key. Which is exactly what Chris did.
Here's what the Java code would look like:
GraphDatabaseService db = /* init */;
Map<String, Object> andres = new HashMap<>();
andres.put("name", "Andres");
andres.put("position", "Developer");
Map<String, Object> michael = new HashMap<>();
michael.put("name", "Michael");
michael.put("position", "Developer");
Map<String, Object> params = new HashMap<>();
params.put("props", Arrays.asList(andres, michael));
try (Transaction tx = db.beginTx()) {
db.execute("UNWIND {props} AS map CREATE (n) SET n = map", params);
tx.success();
}
I have a graph database with 150 million nodes and a few hundred million relationships.
There are two types of nodes in the network: account node and transaction node. Each account node has a public key and each transaction node has a number (the amount of total bitcoin involved in this transaction).
There are also two types of relationships in the network. Each relationship connects an account node with a transaction node. One type of relationships is "send" and the other type is "receive". Each relationship also has a number to represent how much bitcoin it sends or receives.
This is an example:
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = B)
(account: publickey = A)-[send: bitcoin=1.0]->(transaction :id = 1, Tbitcoin=1.0)-[receive: bitcoin=0.5]->(account: publickey = C)
As you can imagine, B or C can also send or receive bitcoins to or from other accounts which involves many different transactions.
What I wants to do is to find all paths with depth equaling to 4 between two accounts, e.g. A and C. I can do this by Cypher although it is slow. It takes about 20mins. My cypher is like this:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
However, when I try to do that using Java API, I got the OutOfMemoryError. Here is my function:
public ArrayList<Path> getPathsWithConditionsBetweenNodes(String indexName, String sfieldName, String sValue1, String sValue2,
int depth, final double threshold, String relType){
ArrayList<Path> res = null;
if (isIndexExistforNode(indexName)) {
try (Transaction tx = graphDB.beginTx()) {
IndexManager index = graphDB.index();
Index<Node> accounts = index.forNodes(indexName);
IndexHits<Node> hits = null;
hits = accounts.get(sfieldName, sValue1);
Node src = null, dest = null;
if(hits.iterator().hasNext())
src = hits.iterator().next();
hits = null;
hits = accounts.get(sfieldName, sValue2);
if(hits.iterator().hasNext())
dest = hits.iterator().next();
if(src==null || dest==null){
System.out.println("Either src or dest node is not avaialble.");
}
TraversalDescription td = graphDB.traversalDescription()
.depthFirst();
if (relType.equalsIgnoreCase("send")) {
td = td.relationships(Rels.Send, Direction.OUTGOING);
td = td.relationships(Rels.Receive, Direction.OUTGOING);
} else if (relType.equalsIgnoreCase("receive")) {
td= td.relationships(Rels.Receive,Direction.INCOMING);
td = td.relationships(Rels.Send,Direction.INCOMING);
} else {
System.out
.println("Traverse Without Type Constrain Because Unknown Relationship Type is Provided to The Function.");
}
td = td.evaluator(Evaluators.includingDepths(depth, depth))
.uniqueness(Uniqueness.RELATIONSHIP_PATH)
.evaluator(Evaluators.returnWhereEndNodeIs(dest));
td = td.evaluator(new Evaluator() {
#Override
public Evaluation evaluate(Path path) {
if (path.length() == 0) {
return Evaluation.EXCLUDE_AND_CONTINUE;
} else {
Node node = path.endNode();
if (!node.hasProperty("TBitcoin"))
return Evaluation.INCLUDE_AND_CONTINUE;
double coin = (double) node.getProperty("TBitcoin");
if (threshold!=Double.MIN_VALUE) {
if (coin<=threshold) {
return Evaluation.EXCLUDE_AND_PRUNE;
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
} else {
return Evaluation.INCLUDE_AND_CONTINUE;
}
}
}
});
res = new ArrayList<Path>();
int i=0;
for(Path path : td.traverse(src)){
i++;
//System.out.println(path);
//res.add(path);
}
System.out.println();
tx.success();
} catch (Exception e) {
e.printStackTrace();
}
} else {
;
}
return res;
}
Can someone take a look at my function and give me some ideas why it is so slow and will cause out-of-memory error? I set Xmx=15000m while runing this program.
My $0.02 is that you shouldn't do this with java, you should do it with Cypher. But your query needs some work. Here's your basic query:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-->(t1)-->(r1)-->(t2)-->dest
return count(p);
There are at least two problems with this:
The intermediate r1 could be the same as your original src, or your original dest (which probably isn't what you want, you're looking for intermediaries)
You don't specify that t1 or t2 are send or receive. Meaning that you're forcing cypher to match both kinds of edges. Meaning cypher has to look through a lot more stuff to give you your answer.
Here's how to tighten your query so it should perform much better:
start src=node:keys(PublicKey="A"),dest=node:keys(PublicKey="C")
match p=src-[:send]->(t1:transaction)-[:receive]->(r1)-[:send]->(t2:transaction)-[:receive]->dest
where r1 <> src and
r1 <> dest
return count(p);
This should prune out a lot of possible edge and node traversals that you're currently doing, that you don't need to be doing.
If I have understood what you are trying to achieve and because you have a direction on your relationship I think that you can get away with something quite simple:
MATCH (src:keys{publickey:'A')-[r:SEND|RECEIVE*4]->(dest:keys{publickey:'C'})
RETURN COUNT(r)
Depending on your data set #FrobberOfBits makes a good point regarding testing equality of intermediaries which you cannot do using this approach, however with just the two transactions you are testing for cases where a Transaction source and destination are the same (r1 <> src and r1 <> dest), which may not even be valid in your model. If you were testing 3 or more transactions then things would get more interesting as you might want to exclude paths like (A)-->(T1)-->(B)-->(T2)-->(A)-->(T3)-->(C)
Shameless theft:
MATCH path=(src:keys{publickey:'A')-[r:SEND|RECEIVE*6]->(dest:keys{publickey:'C'})
WHERE ALL (n IN NODES(path)
WHERE (1=LENGTH(FILTER(m IN NODES(path)
WHERE m=n))))
RETURN COUNT(path)
Or traversal (caveat, pseudo code, never used it):
PathExpander expander = PathExapnder.forTypesAndDirections("SEND", OUTGOING, "RECEIVE", OUTGOING)
PathFinder<Path> finder = GraphAlgoFactory.allSimplePaths(expander, 6);
Iterable<Path> paths = finder.findAllPaths(src, dest);
I am starting to investigate the use of Neo4j using the neo4client API.
I have created a basic database, and can query it using the web client. I am now trying to build a sample C# interface. I am having some problems with index lookups. My database consists of nodes with two properties: conceptID and fullySpecifiedName. Auto-indexing is enabled, and both node properties are listed in the node_keys_indexable property of neo4j.properties.
I keep getting IntelliSense errors in my C# when using the Node class. It appears to be defined as Node<T>, but I don't know what to supply as the value of the type. Consider this example from this forum...
var result = _graphClient
.Cypher
.Start(new
{
n = Node.ByIndexLookup("index_name", "key_name", "Key_value")
})
.Return((n) => new
{
N = n.Node<Item>()
})
.Results
.Single();
var n = result.N;
Where does the "Item" in Node<Item> come from?
I have deduced that the index name I should use is node_auto_index, but I can't figure out a default node type.
Item is the type of node you have stored in the DB, so if you have you're storing a class:
public class MyType { public int conceptId { get; set; } public string fullySpecifiedName { get;set; } }
You would be retrieving Node<MyType> back.
Simple flow:
//Store a 'MyType'
_graphClient.Create(new MyType{conceptId = 1, fullySpecifiedName = "Name");
//Query MyType by Index
var query =
_graphClient.Cypher
.Start(new { n = Node.ByIndexLookup("node_auto_index", "conceptId", 1)
.Return<Node<MyType>>("n");
Node<MyType> result = query.Results.Single();
//Get the MyType instance
MyType myType = result.Data;
You can bypass the result.Data step by doing .Return<MyType>("n") instead of Node<MyType> as you'll just get an instance of MyType in that case.
I defined the recursive domain class in grails:
class Work {
String code
String title
String description
static hasMany = [subWorks:Work]
static mappedBy = [subWorks: 'parentWork']
Work getRootWork(){
if(parentWork) return parentWork.getRootWork()
else return this
}
boolean isLeafWork(){
return subWorks.isEmpty()
}
boolean isRootWork(){
return !parentWork
}
I have a list of Works, but the hierarchy structure is not built yet. The structure looks like:
def works = [new Work(code:'A', title:'TitleA'),
new Work(code:'B', title:'TitleB'),
new Work(code:'A.1', title:'Titile A.1'),
new Work(code:'B.1', title:'Title B.1'),
new Work(code:'B.2', title:'Title B.2'),
new Work(code:'B.3', title:'Title B.3'),
new Work(code:'B.2.2', title:'Title B.2.2'),
new Work(code:'B.2.3', title:'Title B.2.3'),
new Work(code:'A.1.1', title:'Title A.1.1'),
new Work(code:'A.1.2', title:'Title A.1.2'),]
What I need is to build the hierarchical relationship among these works, based on the code hinted. e.g. A.1 is the first child work of A; B.1.1 is the first child of B.1 work, whose parent is B work. I know that Groovy supports recursive closures to build this kind of hierarchical structure. How do I achieve my goal using Groovy recursive closure, such as the JN2515 Fibonacci number example, in Groovy official documentation?
Many thanks!
like this...?
def root = new Work(code:'*', title:'ROOT')
def build
build = { p, list ->
list.groupBy{it.code.split('\\.').first()}.each{ el, sublist ->
el = sublist[0]
el.parentWork = p
if(sublist.size()>1){
build(el, sublist[1..-1] )
}
}
}
build(root, works.sort{it.code.length()})
if I'm not in error even in this anonim form may work
def root = new Work(code:'*', title:'ROOT')
{ p, list ->
list.groupBy{it.code.split('\\.').first()}.each{ el, sublist ->
el = sublist[0]
el.parentWork = p
if(sublist.size()>1){
call(el, sublist[1..-1] )
}
}
}(root, works.sort{it.code.length()})
I am a bit rusty with Grails, but i seem to remember that it managed mapped collections in an intelligent way, such that if you do: work1.parentWork = work2 then work1 in work2.subWorks will verify. If that's the case, all you need to do is set the parentWork for every work, and you don't need to do any complicated computation for this: the parent work of X.Y.Z will be X.Y, and the parent work of X will be none:
def works = [new Work(code:'A', title:'TitleA'),
new Work(code:'B', title:'TitleB'),
new Work(code:'A.1', title:'Titile A.1'),
new Work(code:'B.1', title:'Title B.1'),
new Work(code:'A.1.1', title:'Title A.1.1')]
def worksByCode = works.collectEntries { [it.code, it] }
works.each {
if (it.code.contains('.')) {
def parentCode = it.code[0..it.code.lastIndexOf('.') - 1]
it.parentWork = worksByCode[parentCode]
}
}
Hi i have data in a 3 level tree structure. Can I use SOlr JOIN to get the root node when the user searches 3rd level node.
FOr example -
PATIENT1
-> FirstName1
-> LastName1
-> DOCUMENTS1_1
-> document_type1_1
-> document_description1_1
-> document_value1_1
-> CODE_ITEMS1_1_1
-> Code_id1_1_1
-> code1_1_1
-> CODE_ITEMS1_1_1
-> Code_id1_1_2
-> code1_1_2
-> DOCUMENTS1_2
-> document_type1_2
-> document_description1_2
-> document_value1_2
-> CODE_ITEMS1_2_1
-> Code_id1_2_1
-> code1_2_1
-> CODE_ITEMS1_2_2
-> Code_id1_2_2
-> code1_2_2
PATIENT2
-> FirstName2
-> LastName2
-> DOCUMENTS2_1
-> document_type2_1
-> document_description2_1
-> document_value2_1
-> CODE_ITEMS2_1_1
-> Code_id2_1_1
-> code2_1_1
-> CODE_ITEMS2_1_2
-> Code_id2_1_2
-> code2_1_2
I want to search a CODE_ITEM and return all the patient that matches the code items search criteria. How can this be done. Is it possible to implement join twice. First join gives all the documents for the code_item search and the next join gives all the Patient.
Something like in SQL query -
select * from patients where docID (select DOCID from DOCUMENTS where CODEID IN (select CODEID from CODE_ITEMS where CODE LIKE '%SEARCH_TEXT%'))
I really don't know how internally Solr joins work, but knowing that RDB multiple joins are extremely inefficient on large data sets, I'd probably end up writing my own org.apache.solr.handler.component.QueryComponent that would, after doing normal search, get root parent (of course, this approach requires that each child doc has a reference to its root patient).
If you choose to go this path I'll post some examples. I had similar (more complex - ontology) problem in one of my previous Solr projects.
The simpler way to go (simpler when it comes to solving this problem, not the whole approach) is to completely flatten this part of your schema and store all information (documents and code items) into its parent patient and just do a regular search. This is more in line with Solr (you have to look at Solr schema in a different way. It's nothing like your regular RDB normalized schema, Solr encourages data redundancy so that you may search blindingly fast without joins).
Third approach would be to do some joins testing on representative data sets and see how search performance is affected.
In the end, it really depends on your whole setup and requirements (and test results, of course).
EDIT 1:
I did this couple of years back, so you'll have to figure out whether things changed in the mean time.
1. Create custom request handler
To do completely clean job, I suggest you define your own Request handler (in solrconfig.xml) by simply copying the whole section that starts with
<requestHandler name="/select" class="solr.SearchHandler">
...
...
</requestHandler>
and then changing name to something meaningful to your users, like e.g. /searchPatients.
Also, add this part inside:
<arr name="components">
<str>patients</str>
<str>facet</str>
<str>mlt</str>
<str>highlight</str>
<str>stats</str>
<str>debug</str>
</arr>
2. Create custom search component
Add this to your solrconfig:
<searchComponent name="patients" class="org.apache.solr.handler.component.PatientQueryComponent"/>
Create PatientQueryComponent class:
The following source probably has errors since I edited my original source in text editor and posted it without testing, but the important thing is that you get recipe, not finished source, right? I threw out caching, lazy loading, prepare method and left only the basic logic. You'll have to see how the performance will be affected and then tweak the source if needed. My performance was fine, but I had a couple of million documents in total in my index.
public class PatientQueryComponent extends SearchComponent {
...
#Override
public void process(ResponseBuilder rb) throws IOException {
SolrQueryRequest req = rb.req;
SolrQueryResponse rsp = rb.rsp;
SolrParams params = req.getParams();
if (!params.getBool(COMPONENT_NAME, true)) {
return;
}
searcher = req.getSearcher();
// -1 as flag if not set.
long timeAllowed = (long)params.getInt( CommonParams.TIME_ALLOWED, -1 );
DocList initialSearchList = null;
SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
cmd.setTimeAllowed(timeAllowed);
cmd.setSupersetMaxDoc(UNLIMITED_MAX_COUNT);
// fire standard query
SolrIndexSearcher.QueryResult result = new SolrIndexSearcher.QueryResult();
searcher.search(result, cmd);
initialSearchList = result.getDocList();
// Set which'll hold patient IDs
List<String> patientIds = new ArrayList<String>();
DocIterator iterator = initialSearchList.iterator();
int id;
// loop through search results
while(iterator.hasNext()) {
// add your if logic (doc type, ...)
id = iterator.nextDoc();
doc = searcher.doc(id); // , fields) you can try lazy field loading and load only patientID filed value into the doc
String patientId = doc.get("patientID") // field that's in child doc and points to its root parent - patient
patientIds.add(patientId);
}
// All all unique patient IDs in TermsFilter
TermsFilter termsFilter = new TermsFilter();
Term term;
for(String pid : patientIds){
term = new Term("patient_ID", pid); // field that's unique (name) to patient and holds patientID
termsFilter.addTerm(term);
}
// get all patients whose ID is in TermsFilter
DocList patientsList = null;
patientsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
long totalSize = initialSearchList.size() + patientsList.size();
logger.info("Total: " + totalSize);
SolrDocumentList solrResultList = SolrPluginUtils.docListToSolrDocumentList(patientsList, searcher, null, null);
SolrDocumentList solrInitialList = SolrPluginUtils.docListToSolrDocumentList(initialSearchList, searcher, null, null);
// Add patients to the end of the list
for(SolrDocument parent : solrResultList){
solrInitialList.add(parent);
}
// replace initial results in response
SolrPluginUtils.addOrReplaceResults(rsp, solrInitialList);
rsp.addToLog("hitsRef", patientsList.size());
rb.setResult( result );
}
}
Take a look at this post: http://blog.griddynamics.com/2013/12/grandchildren-and-siblings-with-block.html
Actually you can do it in SOLR 4.5