I have an image of a Block Diagram and I want to extract all the triples from that image like HEAD RELATION OBJECT based on that block diagram image. How I will get that?
Like from the given block diagram I want to extract triples like JOHN LIKES MANGO and ALLEN DISLIKES MANGO
Block Diagram
Related
I am completely new to NEO4j and using it for the first time ever now for my masters program. Ive read the documentation and watched tutorials online but can’t seem to figure out how I can represent my nodes in the way I want.
I have a dataframe with 3 columns, the first represents a page name, the second also represents a page name, and the third represents a similarity score between those two pages. How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)? I want to show the similarity score as the text of the relationship.
Furthermore, I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
I’ve added a screenshot of the header of my DF for clarity https://imgur.com/a/pg0knh6. I hope anyone can help me, thanks in advance!
Edit: What I have tried
LOAD CSV WITH HEADERS FROM 'file:///wiki-small.csv' AS line
MERGE (p:Page {name: line.First})
MERGE (p2:Page {name: line.Second})
MERGE (p)-[r:SIMILAR]->(p2)
ON CREATE SET r.similarity = toFloat(line.Sim)
Next block to remove the similarities relationships which are 0
MATCH ()-[r:SIMILAR]->() WHERE r.Sim=0
DELETE r
This works partially. As in it gives me the correct structure of the nodes but doesn't give me the similarity scores as relationship labels. I also still need to figure out how I can find the node with the most connections.
For the first question:
How can I create a graph in NEO4J where the nodes are my unique page names and the relationships between nodes are drawn if there is a similarity score between them (so if the sim-score is 0 they don’t draw a relationship)?
I think a better approach is to remove in advance the rows with similarity = 0.0 before ingesting them into Neo4j. Could it be something feasible? If your dataset is not so big, I think it is very fast to do in Python. Otherwise the solution you provide of deleting after inserting the data is an option.
In case of a big dataset, maybe it's better if you load the data using apoc.periodic.iterate or USING PERIODIC COMMIT.
Second question
I want to know if there is an easy way to figure out which node had the most relationships to other nodes?
This is an easy query. Again, you can do it with play Cypher or using APOC library:
# Plain Cypher
MATCH (n:Page)-[r:SIMILAR]->()
RETURN n.name, count(*) as cat
ORDER BY cnt DESC
# APOC
MATCH (n:Page)
RETURN apoc.node.degree(n, "SIMILAR>") AS output;
EDIT
To display the similarity scores, in Neo4j Desktop or in the others web interfaces, you can simply: click on a SIMILARITY arrow --> on the top of the running cell the labels are shown, click on the SIMILAR label marker --> on the bottom of the running cell, at the right of Caption, select the property that you want to show (similarity in your case)
Then all the arrows are displayed with the similarity score
To the second question: I think you should keep a clear separation between the way you store data and the way you visualize it. Having the similarity score (a property of the SIMILARITY edge) as a "label" is something that is best dealt with by using an adequate viz library or platform. Ours (Graphileon) could be such a platform, although there are also others.
We offer the possibility to "style" the edges with so-called selectors like
"label":"(%).property.simScore" that would use the simScore as a label. On top of that you could do thing like
"width":"evaluate((%).properties.simScore < 0.500 ? 3 : 10)"
or
"fillColor":"evaluate((%).properties.simScore < 0.500 ? grey : red)"
to distinguish visually high simScores.
Full disclosure : I work for Graphileon.
I am looking into the docs here but could not decipher much out from there. If someone can please define me in simple terms what shape is and then what pattern is.
Patterns are used to describe the shape of the data you’re looking for.
A shape is a representation of the pattern(graph).
Nodes are represented using circles and relationships are represented using arrows between them.
In the following query
MATCH (user)
RETURN user
LIMIT 1
The pattern is (user)
Shape for the same is:
And for the following query:
MATCH (me)-[:KNOWS]->(friend)
WHERE me.name = 'Filipa'
RETURN friend.name
The pattern is (me)-[:KNOWS]->(friend)
Shape for the same is:
Imagine you want to draw a data model on a whiteboard. You'd probably use shapes like circles to represent nodes, and lines or arrows to represent relationships.
The Cypher language was designed to use patterns that look a bit like the shapes you'd draw on the board.
For example, instead of a circle shape for a node, the equivalent Cypher pattern would be something like this (if we wanted to refer to the node by the variable "a"):
(a)
And, instead of a line or arrow for a relationship between 2 nodes, in Cypher you could use one of these patterns:
(a)--(b)
(a)-->(b)
Patterns can be a lot more complex, but this is the basic idea.
I'm implementing abstractive summarization based on this paper, and I'm having trouble deciding the most optimal way to implement the graph such that it can be used for multi-domain analysis. Let's start with Twitter as an example domain.
For every tweet, each sentence would be graphed like this (ex: "#stackoverflow is a great place for getting help #graphsftw"):
(#stackoverflow)-[next]->(is)
-[next]->(a)
-[next]->(great)
-[next]->(place)
-[next]->(for)
-[next]->(getting)
-[next]->(help)
-[next]->(#graphsftw)
This would yield a graph similar to the one outlined in the paper:
To have a kind of domain layer for each word, I'm adding them to the graph like this (with properties including things like part of speech):
MERGE (w:Word:TwitterWord {orth: "word" }) ON CREATE SET ... ON MATCH SET ...
In the paper, they set a property on each word {SID:PID}, which describes the sentence id of the word (SID) and also the position of each word in the sentence (PID); so in the example sentence "#stackoverflow" would have a property of {1:1}, "is" would be {1:2}, "#graphsftw" {1:9}, etc. Each subsequent reference to the word in another sentence would add an element to the {SID:PID} property array: [{1:x}, {n:n}].
It doesn't seem like having sentence and positional information as an array of elements contained within a property of each node is efficient, especially when dealing with multiple word-domains and sub-domains within each word layer.
For each word layer or domain like Twitter, what I want to do is get an idea of what's happening around specific domain/layer entities like mentions and hashtags; in this example, #stackoverflow and #graphsftw.
What is the most optimal way to add subdomain layers on top of, for example, a 'Twitter' layer, such that different words are directed towards specific domain-entities like #hashtags and #mentions? I could use a separate label for each subdomain, like :Word:TwitterWord:Stackoverflow, but that would give my graph a ton of separate labels.
If I include the subdomain entities in a node property array, then it seems like traversal would become an issue.
Since all tweets and extracted entities like #mentions and #hashtags are being graphed as nodes/vertices prior to the word-graph step, I could have edges going from #hashtags and #mentions to words. Or, I could have edges going from tweets to words with the entities as an edge property. Basically, I'm looking for a structure that is the "cheapest" in terms of both storage and traversal.
Any input on how generally to structure this graph would be greatly appreciated. Thanks!
You could also put the domains / positions on the relationships (and perhaps also add a source-id).
OTOH you can also infer that information as long as your relationships represent the original sentence.
You could then either aggregate the relationships dynamically to compute the strengths or have a separate "composite" relationship that aggregates all the others into a counter or sum.
I'm a Cypher newbie so I might be missing something obvious but after reading all the basic recommendation engine posts/tutorials I could find, I can't seem to be able to solve this so all help is appreciated.
I'm trying to make a recommendation function that recommends Places to User based on Tags from previous Places she enjoyed. User have LIKES relationship to Tag which carries weight property. Places have CONTAINS relationship with Tag but Contain doesn't have any weights associated with it. Also the more Tags with LIKES weighted above certain threshold (0.85) Place has, the higher it should be ordered so this would add SUM aggregator.
(User)-[:LIKES]->(Tag)<-[:CONTAINS]-(Place)
My problem is that I can't wrap my head around how to order Places based on the amount of Tags pointing to it that have LIKES relationship with User and then how to use LIKES weights to order Places.
Based on the following example neo4j console : http://console.neo4j.org/r/klmu5l
The following query should do the trick :
MATCH (n:User {login:'freeman.williamson'})-[r:LIKES]->(tag)
MATCH (place:Place)-[:CONTAINS]->(tag)
WITH place, sum(r.weight) as weight, collect(tag.name) as tags
RETURN place, size(tags) as rate, weight
ORDER BY rate DESC, weight DESC
Which returns :
(42:Place {name:"Alveraville"}) 6 491767416
(38:Place {name:"Raynorshire"}) 5 491766715
(45:Place {name:"North Kristoffer"}) 5 491766069
(36:Place {name:"Orrinmouth"}) 5 491736638
(44:Place {name:"New Corachester"}) 5 491736001
Explanation :
I match the user and the tags he likes
I match the places containing at least one tag he likes
Then I use WITH to pipe the sum of the rels weights, a collection of the tags, and the place
Then I return those except I count with size the number of tags in the collection
All ordered in descending order
I'm facing the following task in ArcGIS - I'm using ArcMap 10.2
I have a polygon shapefile with counties of (say) a state in US. From this shapefile, I create a layer which marks all counties in which there is at least 1 city of more than 50000 inhabitants (I think of this as the treatment condition). Then I'm creating buffers around the polygons in my layer of counties with those large cities, i.e. I'm drawing a buffer of say 100km around every county that has at least one city with more than 50000 inhabitants.
So far so good!
The final step of this exercise should be to create a count for every polygon with the number of buffers that are touching this polygon. For instance, the buffers around counties B, C and D all touch county A. However county A doesn't have a city of more than 50000 inhabitants. Hence, I want the count for city A to be 3 (it's touched by B, C and D). I created the union of all my buffers but I simply can't find the right way to create this count for every polygon.
I've done an extensive Google search and I'm apologize if I overlooked the obvious solution.
Any help is appreciated!
Michael Kaiser
[Staff Research Assistant UCSD]
If I understand what you want correctly, then creating the union of buffers won't help you - as it leaves you with a single object and you need the count of all buffered objects intersecting against every object in the original table.
In SQL I would join the original (all counties) layer to your new (filtered, buffered) layer using the STIntersects() method. Something like the following:
DECLARE #original TABLE
(
[Original_Id] INT NOT NULL,
[Original_Geom] GEOGRAPHY NOT NULL
);
DECLARE #filtered TABLE
(
[Buffered_Id] INT NOT NULL,
[Buffered_Geom] GEOGRAPHY NOT NULL
);
-- We'll pretend the above tables are filled with data
SELECT
ORIGINAL.[Original_Id],
COUNT(FILTERED.[Filtered_Id]) AS [NumberOfIntersections]
FROM
#original AS ORIGINAL
JOIN
#filtered AS FILTERED ON (ORIGINAL.[Original_Geom].STIntersects(FILTERED.[Filtered_Geom] = 1)
GROUP BY
ORIGINAL.[Original_Id]
Explanation:
In this example, the #original table would contain all of your counties in your given state - as they were before you buffered them. [Original_Id] would contain something that you can relate to or use to relate back to your data and [Original_Geometry] would contain the county's boundary.
The #filtered table would contain a subset of #original - in your case only those with at least 1 city of 50,000 inhabitants. The [Buffered_Id] would match records in [Original_Id] (as an example Orange County may have Id 32) and [Buffered_Geometry] would contain the county's boundary, buffered by (as in your example) 100km.
Using my example exactly, you need to get the required data out of your tables and in to mine, but you should be able to use your tables and adjust as necessary to reference them.
NOTE: If you do not wish "Orange County" to count "Orange County (Buffered)" in the above query, you will need to add a WHERE clause to filter them out.
I haven't the data to hand to test this, but it should be mostly there. Hope it helps.