Splunk join with an in-memory record - join

Sorry for the lame question, I am new to Splunk.
What I am trying to do is to join my search result with a declared in the search body fake record, something like
index=...
| joint type=outer <column>
[ | <here declare a record to join with>
......
The idea is to make sure there is at least one record in the resulting search. There are the following cases expected:
the original search returns records
the original search does not return anything because the result is filtered
the original search does not return anything because the source is empty
I need to distinguish cases 2 and 3, which the join is for. The fake record will eliminate the case 3 so I will only need to filter the result.

There's a better way to handle the case of no results returned. Use the appendpipe command to test for that condition and add fields needed in later commands.
| appendpipe [ stats count | eval column="The source is empty"
| where count=0 | fields - count ]

Related

[splunk]: Obtain a count of hits in a query of regexes

I am searching for a list of regexes in a splunk alert like this:
... | regex "regex1|regex2|...|regexn"
Can I modify this query to get a table of the regexes found along with their count. The table shouldn't show rows with 0 counts.
regex2 17
regexn 3
The regex command merely filters events. All we know is each result passed the regular expression. There is no record or indication of why or how any event passed.
To do that, you'd have to extract a unique field or value from each regex and then test the resulting events to see which field or value was present. The regex command, however, does not extract anything. You'd need the rex command or the match function to do that.
Looks like | regex line is not needed. This is working for me. Notice the extra brackets.
| rex max_match=0 "(?P<countfields>((regex1)|(regex2)|..|(regexn)))"
| stats count by countfields

Neo4j Query to Table Problems

My problem is I put data into Neo4j from what was essentially a large spreadsheet essentially. Now I want to be able to get that data back out in a similar tabular format.
Lets say I have some notional spreadsheet of data that went in looked something like the following.
| Artist | Album | Song | Live | Filename | Genre | Year | Source | Label |
|--------|-------|------|------|----------|-------|------|--------|-------|
| .... | ..... | .... | .... | ........ | ..... | .... | ...... | ..... |
---------------------------------------------------------------------------
The spreadsheet was a listing of files with some metadata about each file. For analytic purposes it made more sense to not have the file be at the center of the graph but rather the Albums. So that every record in the table above would map to a handful of nodes and relationships. The data model for this might look something like this:
(Song)-[_IS_ON_]->(Album)
(Artist)-[_SINGS_]->(Song)
(Album)-[IS_IN_]->(Genre)
(Song)-[_IS_IN_]->(Genre)
(Album)-[_IS_]->(Live)
(Album)-[_FROM_]-(Year)
(Album)-[_IS_ON_]->(Source)
(Label)-[_PRODUCED_]->(Album)
I am able to query a single record from my spreadsheet above using a query similar to this.
MATCH (a:Album {name: "Hells Bells"})-[r]-(b)
OPTIONAL MATCH (s:Song)<-[_SINGS_]-(aa:Artist)
RETURN *
I have two questions here.
How do I make the above query return a table that looks similar to the original normalized table? If I did RETURN b.filename, b.genre ... I get a table that has a lot of null values. It would seem I need to do a DISTINCT on each of the fields. But I am still really new to Neo4j and am not positive I understand how to do this.
It would be great if there was a way to get all the fields in all the nodes without having to type them out in the query like this RETURN b.filename, b.genre .... I think I figured this out once but I stupidly didn't save it.
I hope this was clear enough. I can't share my graph model or data so I had to make this up on the fly.
TIA
Try the following (but, since you did not state how to get the filename, that value might be missing):
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN *
In a RETURN clause, if you specified a node/relationship (without a property name), that would generate a map of all its properties. The above query, for example, would return a map for each matched node.
If you actually want to have a single merged map, you can use the APOC function apoc.map.mergeList. For example:
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN apoc.map.mergeList([artist,song,year,genre,live,source,label,album]) AS result

Auto-assigning objects to users based on priority in Postgres/Ruby on Rails

I'm building a rails app for managing a queue of work items. I have several types of users ("access levels") to whom I want to auto-assign these work items.
The end goal is an "Auto-assign" button on one of my views that will automatically grab the next work item based on a priority, which is defined by the users's access level.
I'm trying to set up a class method in my work_item model to automatically sort work items by type based on the user's access level. I am looking at something like this:
def self.auto_assign_next(access_level)
case
when access_level = 2
where("completed = 'f'").order("requested_time ASC").limit(1)
when access_level > 2
where("completed = 'f'").order("CASE WHEN form='supervisor' THEN 1 WHEN form='installer' THEN 2 WHEN form='repair' THEN 3 WHEN form='mail' THEN 4 WHEN form='hp' THEN 5 ELSE 6 END").limit(1)
end
This isn't very DRY, though. Ideally I'd like the sort order to be configurable by administrators, so maybe setting up a separate table on which the sort order is kept would be best. The problem with that idea is that I have no idea how to pass the priority order on that table to the [postgre]SQL query. I'm new to SQL in general and somewhat lost with this one. Does anybody have any suggestions as to how this should be handled?
One fairly simple approach starts with turning your case statement into a new table, listing form values versus what precedence value they should be sorted by:
id | form | precedence
-----------------------------------
1 | supervisor | 1
2 | installer | 2
(etc)
Create a model for this, say, FormPrecedences (not a great name, but I don't totally grok your data model so pick one that better describes it). Then, your query can look like this (note: I'm assuming your current model is called WorkItems):
when access_level > 2
joins("LEFT JOIN form_precedences ON form_precedences.form = work_items.form")
.where("completed = 'f'")
.order("COALESCE(form_precedences.precedence, 6)")
.limit(1)
The way this works isn't as complicated as it looks. A "left join" in SQL simply takes all the rows of the table on the left (in this case, work_items) and, for each row, finds all the matching rows from the table on the right (form_precedences, where "matching" is defined by the bit after the "ON" keyword: form_precedences.form = work_items.form), and emits one combined row. If no match is found, a LEFT JOIN will still emit a row, but with all the right-hand values being NULL. A normal join would skip any rows with no right-hand match found.
Anyway, with the precedence data joined on to our work items, we can just sort by the precedence value. But, in case no match was found during the join above, that value will be NULL -- so, I use COALESCE (which returns the first of its arguments that's not NULL) to default to a precedence of 6.
Hope that helps!

Return only results based on current object for dynamic menus

If I have an object that has_many - how would I go about getting back only the results that are related to the original results related ids?
Example:
tier_tbl
| id | name
1 low
2 med
3 high
randomdata_tbl
| id | tier_id | name
1 1 xxx
2 1 yyy
3 2 zzz
I would like to build a query that returns only, in the case of the above example, rows 1 and 2 from tier_tbl, because only 1 and 2 exist in the tier_id data.
Im new to activerecord, and without a loop, don't know a good way of doing this. Does rails allow for this kind of query building in an easier way?
The reasoning behind this is so that I can list only menu items that relate to the specific object I am dealing with. If the object i am dealing with has only the items contained in randomdata_tbl, there is no reason to display the 3rd tier name. So i'd like to omit it completely. I need to go this direction because of the way the models are set up. The example im dealing with is slightly more complicated.
Thanks
Lets call your first table tiers and second table randoms
If tier has many randoms and you want to find all tiers whoes id present in table randoms, you can do it that way:
# database query only
Tier.joins(:randoms).uniq
or
# with some ruby code
Tier.select{ |t| t.randoms.any? }

Comparing values in two columns of two different Splunk searches

I am new to splunk and facing an issue in comparing values in two columns of two different queries.
Query 1
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" A_to="**" A_from="**" | transaction call_id keepevicted=true | search "xyz event:" | table _time, call_id, A_from, A_to | rename call_id as Call_id, A_from as From, A_to as To
Query 2
index="abc_ndx" source="*/ jkdhgsdjk.log" call_id="**" B_to="**" B_from="**" | transaction call_id keepevicted=true | search " xyz event:"| table _time, call_id, B_from, B_to | rename call_id as Call_id, B_from as From, B_to as To
These are my two different queries. I want to compare each values in A_from column with each values in B_from column and if the value matches, then display the those values of A_from.
Is it possible?
I have run the two queries separately and exported the results of each into csv and used vlookup function. But the problem is there is a limit of max 10000 rows of data which can be exported and so I miss out lots of data as my data search has more than 10000 records.
Any help?
Haven't got any data to test this on at the moment, however, the following should point you in the right direction.
When you have the table for the first query sorted out, you should 'pipe' the search string to an appendcols command with your second search string. This command will allow you to run a subsearch and "import" a columns into you base search.
Once you have the two columns in the same table. You can use the eval command to create a new field which compares the two values and assigns a value as you desire.
Hope this helps.
http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Appendcols
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
I'm not sure why there is a need to keep this as two separate queries. Everything is coming from the same sourcetype, and is using almost identical data. So I would do something like the following:
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" (A_to="**" A_from="**") OR (B_to="**" B_from="**")
| transaction call_id keepevicted=true
| search "xyz event:"
| eval to=if(A_from == B_from, A_from, "no_match")
| table _time, call_id, to
This grabs all events from your specified sourcetype and index, which have a call_id, and either A_to and A_from or B_to and B_from. Then it transactions all of that, lets you filter based on the "xyz event:" (Whatever that is)
Then it creates a new field called 'to' which shows A_from when A_from == B_from, otherwise it shows "no_match" (Placeholder since you didn't specify what should be done when they don't match)
There is also a way to potentially tackle this without using transactions. Although without more details into the underlying data, I can't say for sure. The basic idea is that if you have a common field (call_id in this case) you can just use stats to collect values associated with that field instead of an expensive transaction command.
For example:
index="abc_ndx" index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**"
| stats last(_time) as earliest_time first(A_to) as A_to first(A_from) as A_from first(B_to) as B_to first(B_from) as B_from by call_id
Using first() or last() doesn't actually matter if there is only one value per call_id. (You can even use min() max() avg() and you'll get the same thing) Perhaps this will help you get to the output you need more easily.

Resources