how to create nodes and edges for sorghum gene list using gene id from Phytozome - cytoscape

I am using Sorghum ID from Phytozome in my RNAseq. I want to create gene network by cytoscape for specific gene list. Unfortunately, I could not use string or other tools because the gene id I used from Phytozome 12 is not recognized. So my question is how to overcome this problem and create nodes and edges for my gene list??? Thanks in advance.

I'm not sure which ID you are using. Phytozome transcript names (e.g. Sobic.008G038200.1) don't work, but I was able to get the "Sb" symbols to work just fine in String. You could also try to map to the gene symbol (e.g. ACR4 for Sb08g003290). Beyond that, you'll just have to do the mapping between the transcript names and a more common identifier (e.g. Ensembl or Entrez).
-- scooter

Related

NEO4J - Best practices to store 40 millions of text nodes

I've been using Neo4j for some weeks and I think it's awesome.
I'm building an NLP application, and basically, I'm using Neo4j for storing the dependency graph generated by a semantic parser, something like this:
https://explosion.ai/demos/displacy?text=Hi%20dear%2C%20what%20is%20your%20name%3F&model=en_core_web_sm&cpu=1&cph=0
In the nodes, I store the single words contained in the sentences, and I connect them through relations with a number of different types.
For my application, I have the requirement to find all the nodes that contain a given word, so basically I have to search through all the nodes, finding those that contain the input word. Of course, I've already created an index on the word text field.
I'm working on a very big dataset:
On my laptop, the following query takes about 20 ms:
MATCH (t:token) WHERE t.text="avoid" RETURN t.text
Here are the details of the graph.db:
47.108.544 nodes
45.442.034 relationships
13.39 GiB db size
Index created on token.text field
PROFILE MATCH (t:token) WHERE t.text="switch" RETURN t.text
------------------------
NodeIndexSeek
251,679 db hits
---------------
Projection
251,678 db hits
--------------
ProduceResults
251,678 db hits
---------------
I wonder if I'm doing something wrong in indexing such amount of nodes. At the moment, I create a new node for each word I encounter in the text, even if the text is the same of other nodes.
Should I create a new node only when a new word is encountered, managing the sentence structures through relationships?
Could you please help me with a suggestion or best practice to adopt for this specific case?
Thank you very much
For this use case, each of your :Token nodes should be unique. When you create these you should be using MERGE instead of CREATE for the node itself, so if the node already exists it will use the existing one rather than creating a new one.
It may help to also add a unique constraint for this after you've cleaned up your data.

Display two separated records according to multiple values in properties neo4j

I faced a need to make a strange thing. I have some query which is can’t be changed. It’s a match query for getting record:
MATCH (j:journal) WHERE j.id in [12] RETURN j.`id` AS ID, j.`language` AS LANGUAGE
And I have some node that contains array as property: e.g. can be created like this: create (j:journal {id:12, language:[“English”, “Polish”]})
So, is there any possibility to display this node like two records with the same id, but with different language fields? Like the following:
ID | LANGUAGE
12 | English
12 | Polish
The important thing is that match query can’t be changed at all.
But the node can be changed.
I know that I can add UNWIND keyword for the language field in the source query. But there is a requirement to not to.
I didn’t find something like that in the documentation nor in the internet. I’m not sure if it’s even possible (but consumer wants it). Just I don’t have much experience with neo4j.
I understand that it can sound weird, but I need to understand if it can be implemented this way.
Thanks in advance.
If you can change the DB, you can change it so that each journal node contains a single language (as a scalar value, not in a list). However, this change might break any other queries that you might have.
If this conversion is acceptable, here is a query that should: (a) convert existing journal nodes to have a scalar language value, and (b) create new journal nodes as necessary for the remaining language values. The nodes that are spawned from an original journal node will share the same properties (except for language).
MATCH (j:journal)
WITH j, j.language[1..] AS langs
SET j.language = j.language[0]
WITH j, langs
UNWIND langs AS lang
CREATE (k:journal)
SET k = j, k.language = lang
If a node's language property had N values, you will end up with N nodes, each with the same properties -- except for the language property, which will contain a different language value (as a string). For efficiency, the original node is reused.

Using Cypher, multiple nodes have a relationship with a single node, how to query matches of those multiple nodes to find matching root node

Ok, let me set this up: I have created a Neo4J Database with every vehicle trim in north America as a Vehicle Node (Every vehicle node has a :Vehicle label). Now, I have also created 22 other labeled nodes to describe a feature. For example, I have a ":MDL" feature node, a ":YR" feature node, and a ":DRIVE" feature node, and a ":DIV" feature node. Each of the feature nodes have a property called "value".
So, If I want to find all 2016 Chevrolet Models that have 4WD, my Cypher query would be as follows :
MATCH
(v:Vehicle)--(:DIV{value:"Chevrolet"}),
(v)--(:DRIVE{value:"4WD"}),
(v)--(:YR{value:"2016"}),
(v)--(model:MDL)
return distinct(model.value)
And, this successfully returns the 8 Chevy models that offer 4WD (as opposed to AWD) as follows:
"Silverado 3500HD"
"Colorado"
"Silverado 2500HD"
"Silverado 1500"
"Silverado 3500HD Chassis"
"Tahoe"
"Suburban"
"Suburban 3500HD"
My question, is looking at the profile plan, I don't think this is the most efficient way. Because basically Cypher is making each match pattern independently, and then merging the results. I am trying to get Cypher to do this all in one step. Does anyone have any recomendations on how to make this more efficient?
What about
MATCH (v:Vehicle)--(model:MDL)
WHERE (v)--(:DIV{value:"Chevrolet"})
AND (v)--(:DRIVE{value:"4WD"})
AND (v)--(:YR{value:"2016"})
RETURN DISTINCT (model.value)
not sure that'll change the profile a whole lot, but it does seem to express better what you are trying to accomplish.
Hope this helps !
Regards,
Tom

Adding relationship to existing nodes with Cypher doesn't work

I am working on Panama dataset using Neo4J graph database 1.1.5 web version. I identified Ion Sturza, former Prime Minister of Moldova on the database and want to make a map of his related network. I used following code to query using Cypher (creating a variable 'IonSturza'):
MATCH (IonSturza {name: "Ion Sturza"}) RETURN IonSturza
I identified that the entity 'CONSTANTIN LUTSENKO' linked differently to entities like 'Quade..' and 'Kinbo...' with a name in small letters as in this picture. I hence want to map a relationship 'SAME_COMPANY_AS' between the capslock and the uncapped version. I tried the following code based on this answer by #StefanArmbruster:
MATCH (a:Officer {name :"Constantin Lutsenko"}),(b:Officer{name :
"CONSTANTIN LUTSENKO"})
where (a:Officer{name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(b:Entity{id:'284429'})
CREATE (a)-[:SAME_COMPANY_AS]->(b)
Instead of indexing, I used the 'where' statement to specify the uncapped version which is linked only to the entity bearing id '284429'.
My code however shows the cartesian product error message:
This query builds a cartesian product between disconnected patterns.If a part of a query contains multiple disconnected patterns, this will build a cartesian product between all those parts. This may produce a large amount of data and slow down query processing. While occasionally intended, it may often be possible to reformulate the query that avoids the use of this cross product, perhaps by adding a relationship between the different parts or by using OPTIONAL MATCH (identifier is: (b))<<
Also when I execute, there are no changes, no rows!! What am I missing here? Can someone please help me with inserting this relationship between the nodes. Thanks in advance!
The cartesian product warning will appear whenever you're matching on two or more disconnected patterns. In this case, however, it's fine, because you're looking up both of them by what is likely a unique name, s your result should be one node each.
If each separate part of that pattern returned multiple nodes, then you would have (rows of a) x (rows of b), a cartesian product between the two result sets.
So in this particular case, don't mind the warning.
As for why you're not seeing changes, note that you're reusing variables for different parts of the graph: you're using variable b for both the uppercase version of the officer, and for the :Entity in your WHERE. There is no node that matches to both.
Instead, use different variables for each, and include the :Entity in your match. Also, once you match to nodes and bind them to variables, you can reuse the variable names later in your query without having to repeat its labels or properties.
Try this:
MATCH (a:Officer {name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(:Entity{id:'284429'}),(b:Officer{name : "CONSTANTIN LUTSENKO"})
CREATE (a)-[:SAME_COMPANY_AS]->(b)
Though I'm not quite sure of what you're trying to do...is an :Officer a company? That relationship type doesn't quite seem right.
I tried the answer by #InverseFalcon and thanks to it, by modifying the property identifier from 'id' to 'name' and using the property for both 'a' and 'b', 4 relationships were created by the following code:
MATCH (a:Officer {name :"Constantin Lutsenko"})-[:SHAREHOLDER_OF]->
(:Entity{name:'KINBOROUGH PORTFOLIO LTD.'}),(b:Officer{name : "CONSTANTIN
LUTSENKO"})-[:SHAREHOLDER_OF]->(:Entity{name:'Chandler Group Holdings Ltd'})
CREATE (a)-[:SAME_NAME_AS]->(b)
Thank you so much #InverseFalcon!

Assumptions regarding Node ID strings in Neo4j - cypher

In my recent question, Modeling conditional relationships in neo4j v.2 (cypher), the answer has led me to another question regarding my data model and the cypher syntax to represent it. Lets say in my model, there is a node CLT1 that is what I'll call the Source node. CLT1 has relationships to other 286 Target nodes. This is a model of a target node:
CREATE
(Abnormally_high:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{Pro1:'x',Prop2:'y',Prop3:'z'})
Key point: I am assuming the string after the CREATE clause is
The ID of this target node
The ID is significant because its content has domain-specific meaning
and is query-able.
in this case its the phrase ...."Abnormally_high".
I made this assumption based on the movie database example.
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
The first strings after CREATE definitely have domain-specific meaning!
In my earlier post I discuss Problem 2. I find that problem 2 arises because among the 286 target nodes, there are many instances where there was at least one more Target node who shares the identical ID. In this instance, the ID is "Abnormally_high". The other Target nodes may differ in the value of any of Label1 - Label10 or the associated properties.
Apparently, Cypher doesn't like that. In Problem 2, I was discussing the ways to deal with the fact that cypher doesn't like using the same node ID multiple times even though the labels or properties were different.
My problem are my assumptions about the Target node ID.
AM I RIGHT?
I am now thinking that I could instead use this....
CREATE (CLT1_target_1:Label1:Label2:Label3:Label4:Label5:Label6:Label7:Label8:Label9:Label10
{name:'Abnormally_high',Prop2:'y',Prop3:'z'})
If indeed the first string after the CREATE clause is an ID, then all I have to do is put a unique target node identifier.... like CLT1_target_1 and increment up to CLT1_target_286. If I do this, then I can have the name as a property and change whatever label or property I want.
Do I have this right?
You are wrong. In Cypher, a node name (like "Abnormally_high") is just a variable name that exists for the lifetime of the query (and sometimes not even that long). The node name used in a Cypher query is never persisted in any way, and can be any arbitrary string.
Also, in neo4j, the term "ID" has a specific meaning. The neo4j DB will automatically assign a (currently) unique integer ID to each new node. You have no control over the ID value assigned to a node. And when a node is deleted, neo4j can reassign its ID to a new node.
You should read the neo4j manual (available at docs.neo4j.org), especially the section on Cypher, to get a better understanding.

Resources