Neo4j Query to Table Problems - neo4j

My problem is I put data into Neo4j from what was essentially a large spreadsheet essentially. Now I want to be able to get that data back out in a similar tabular format.
Lets say I have some notional spreadsheet of data that went in looked something like the following.
| Artist | Album | Song | Live | Filename | Genre | Year | Source | Label |
|--------|-------|------|------|----------|-------|------|--------|-------|
| .... | ..... | .... | .... | ........ | ..... | .... | ...... | ..... |
---------------------------------------------------------------------------
The spreadsheet was a listing of files with some metadata about each file. For analytic purposes it made more sense to not have the file be at the center of the graph but rather the Albums. So that every record in the table above would map to a handful of nodes and relationships. The data model for this might look something like this:
(Song)-[_IS_ON_]->(Album)
(Artist)-[_SINGS_]->(Song)
(Album)-[IS_IN_]->(Genre)
(Song)-[_IS_IN_]->(Genre)
(Album)-[_IS_]->(Live)
(Album)-[_FROM_]-(Year)
(Album)-[_IS_ON_]->(Source)
(Label)-[_PRODUCED_]->(Album)
I am able to query a single record from my spreadsheet above using a query similar to this.
MATCH (a:Album {name: "Hells Bells"})-[r]-(b)
OPTIONAL MATCH (s:Song)<-[_SINGS_]-(aa:Artist)
RETURN *
I have two questions here.
How do I make the above query return a table that looks similar to the original normalized table? If I did RETURN b.filename, b.genre ... I get a table that has a lot of null values. It would seem I need to do a DISTINCT on each of the fields. But I am still really new to Neo4j and am not positive I understand how to do this.
It would be great if there was a way to get all the fields in all the nodes without having to type them out in the query like this RETURN b.filename, b.genre .... I think I figured this out once but I stupidly didn't save it.
I hope this was clear enough. I can't share my graph model or data so I had to make this up on the fly.
TIA

Try the following (but, since you did not state how to get the filename, that value might be missing):
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN *
In a RETURN clause, if you specified a node/relationship (without a property name), that would generate a map of all its properties. The above query, for example, would return a map for each matched node.
If you actually want to have a single merged map, you can use the APOC function apoc.map.mergeList. For example:
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN apoc.map.mergeList([artist,song,year,genre,live,source,label,album]) AS result

Related

Splunk join with an in-memory record

Sorry for the lame question, I am new to Splunk.
What I am trying to do is to join my search result with a declared in the search body fake record, something like
index=...
| joint type=outer <column>
[ | <here declare a record to join with>
......
The idea is to make sure there is at least one record in the resulting search. There are the following cases expected:
the original search returns records
the original search does not return anything because the result is filtered
the original search does not return anything because the source is empty
I need to distinguish cases 2 and 3, which the join is for. The fake record will eliminate the case 3 so I will only need to filter the result.
There's a better way to handle the case of no results returned. Use the appendpipe command to test for that condition and add fields needed in later commands.
| appendpipe [ stats count | eval column="The source is empty"
| where count=0 | fields - count ]

How to count and compare amount of regex matches

I want to use Sumo Logic to count how often different APIs are called. I want to have a table with API call name and value. My current query is like this:
_sourceCategory="my_category"
| parse regex "GET.+443 (?<getUserByUserId>/user/v1/)\d+" nodrop
| parse regex "GET.+443 (?<getUserByUserNumber>/user/v1/userNumber)\d+"
| count by getUserByUserId, getUserByUserNumber
This gets correct values but they go to different columns. When I have more variables, table becomes very wide and hard to read.
I figured it out, I need to use same group name for all rexexes. Like this:
_sourceCategory="my_category"
| parse regex "GET.+443 (?<endpoint>/user/v1/)\d+" nodrop
| parse regex "GET.+443 (?<endpoint>/user/v1/userNumber)\d+"
| count by endpoint

How to get the first elements of COLLECT whithout limiting the global query?

In a twitter like app, I would like to get only the 3 last USERS which has PUBLISH a tweet for particular HASHTAG (A,B,C,D,E)
START me=node(X), hashtag=node(A,B,C,D,E)
MATCH n-[USED_IN]->tweet<-[p:PUBLISH]-user-[FRIEND_OF]->me
WITH p.date? AS date,hashtag,user ORDER BY date DESC
WITH hashtag, COLLECT(user.name) AS users
RETURN hashtag._id, users;
This is the result I get with this query. This is good but if the friend list is big, I could have a very large array in the second column.
+-------------------------------------------+
| hashtag | users |
+-------------------------------------------+
| "paradis" | ["Alexandre","Paul"] |
| "hello" | ["Paul"] |
| "public" | ["Alexandre"] |
+-------------------------------------------+
If I add a LIMIT clause, at the end of the query, the entire result set is limited.
Because a user can have a very large number of friends, I do not want to get back all those USER, but only the last 2 or 3 which has published in those hashtags
Is the any solution with filter/reduce to get what I expect?
Running neo4j 1.8.2
Accessing sub-collection will be worked on,
meanwhile you can use this workaround: http://console.neo4j.org/r/f7lmtk
start n=node(*)
where has(n.name)
with collect(n.name) as names
return reduce(a=[], x in names : a + filter(y in [x] : length(a)<2)) as two_names
Reduce is used to build up the result list in the aggregator
And filter is used instead of the conditional case ... when ... which is only available in 2.0
filter(y in [x] : length(a)<2) returns a list with the element when the condition is true and an empty list when the condition is false
adding that result to the accumulator with reduce builds up the list incrementally
Be careful, the new filter syntax is:
filter(x IN a.array WHERE length(x)= 3)

Comparing values in two columns of two different Splunk searches

I am new to splunk and facing an issue in comparing values in two columns of two different queries.
Query 1
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" A_to="**" A_from="**" | transaction call_id keepevicted=true | search "xyz event:" | table _time, call_id, A_from, A_to | rename call_id as Call_id, A_from as From, A_to as To
Query 2
index="abc_ndx" source="*/ jkdhgsdjk.log" call_id="**" B_to="**" B_from="**" | transaction call_id keepevicted=true | search " xyz event:"| table _time, call_id, B_from, B_to | rename call_id as Call_id, B_from as From, B_to as To
These are my two different queries. I want to compare each values in A_from column with each values in B_from column and if the value matches, then display the those values of A_from.
Is it possible?
I have run the two queries separately and exported the results of each into csv and used vlookup function. But the problem is there is a limit of max 10000 rows of data which can be exported and so I miss out lots of data as my data search has more than 10000 records.
Any help?
Haven't got any data to test this on at the moment, however, the following should point you in the right direction.
When you have the table for the first query sorted out, you should 'pipe' the search string to an appendcols command with your second search string. This command will allow you to run a subsearch and "import" a columns into you base search.
Once you have the two columns in the same table. You can use the eval command to create a new field which compares the two values and assigns a value as you desire.
Hope this helps.
http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Appendcols
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
I'm not sure why there is a need to keep this as two separate queries. Everything is coming from the same sourcetype, and is using almost identical data. So I would do something like the following:
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" (A_to="**" A_from="**") OR (B_to="**" B_from="**")
| transaction call_id keepevicted=true
| search "xyz event:"
| eval to=if(A_from == B_from, A_from, "no_match")
| table _time, call_id, to
This grabs all events from your specified sourcetype and index, which have a call_id, and either A_to and A_from or B_to and B_from. Then it transactions all of that, lets you filter based on the "xyz event:" (Whatever that is)
Then it creates a new field called 'to' which shows A_from when A_from == B_from, otherwise it shows "no_match" (Placeholder since you didn't specify what should be done when they don't match)
There is also a way to potentially tackle this without using transactions. Although without more details into the underlying data, I can't say for sure. The basic idea is that if you have a common field (call_id in this case) you can just use stats to collect values associated with that field instead of an expensive transaction command.
For example:
index="abc_ndx" index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**"
| stats last(_time) as earliest_time first(A_to) as A_to first(A_from) as A_from first(B_to) as B_to first(B_from) as B_from by call_id
Using first() or last() doesn't actually matter if there is only one value per call_id. (You can even use min() max() avg() and you'll get the same thing) Perhaps this will help you get to the output you need more easily.

Using SharePoint's Data Query Webpart to link two lists

I have two SharePoint Lists: A & B. List A has a column where the user can add multilple references (displayed as hyperlinks) for each entry to entries in B
A: B:
... | RefB | ... Name | OtherColumns....
----------------- -----------------------
... | B1 | ... B1 |
... | B2,B3 | ... B2 |
... | B1,B3 | ... B3 |
Now I want to display all entries from list B that are referenced by an (specific) entry in A. I.e: I set the filter to [Entry 2] and the Web part displays all the stuff from entries B2 and B3. Is this even possible?
I think the problem you've got which is ruining some of the way's I'm thinking of solving it is that the RefB column is multi-valued. You may have some joy doing filtering with the DataView but it might get messy fast, as you try to split RefB on the comma and compare against the resulting array of values.
I think the problem could be made easier by having only a single value in the RefB column.
Three solutions come to mind.
Have only one value in RefB per item in Table A and repeat the other fields in Table A. You'd have to accept some data redundancy and would need to be careful with data entry.
The normal relational database way of solving your data redundancy problem would be to have a 3rd table joining tabe A to table B. If you're not familiar with relational database techniques, there are lots of straight-forward tutorials on data normalisation on the net. While there's some more work, it may lead to a cleaner solution. Be careful when trying to fake a relational database within SharePoint though - it's not meant for relational data. You may be better off using a SQL database.
Put everything in one table, though I think you've already ruled this one out.

Resources