How to count and compare amount of regex matches - sumologic

I want to use Sumo Logic to count how often different APIs are called. I want to have a table with API call name and value. My current query is like this:
_sourceCategory="my_category"
| parse regex "GET.+443 (?<getUserByUserId>/user/v1/)\d+" nodrop
| parse regex "GET.+443 (?<getUserByUserNumber>/user/v1/userNumber)\d+"
| count by getUserByUserId, getUserByUserNumber
This gets correct values but they go to different columns. When I have more variables, table becomes very wide and hard to read.

I figured it out, I need to use same group name for all rexexes. Like this:
_sourceCategory="my_category"
| parse regex "GET.+443 (?<endpoint>/user/v1/)\d+" nodrop
| parse regex "GET.+443 (?<endpoint>/user/v1/userNumber)\d+"
| count by endpoint

Related

[splunk]: Obtain a count of hits in a query of regexes

I am searching for a list of regexes in a splunk alert like this:
... | regex "regex1|regex2|...|regexn"
Can I modify this query to get a table of the regexes found along with their count. The table shouldn't show rows with 0 counts.
regex2 17
regexn 3
The regex command merely filters events. All we know is each result passed the regular expression. There is no record or indication of why or how any event passed.
To do that, you'd have to extract a unique field or value from each regex and then test the resulting events to see which field or value was present. The regex command, however, does not extract anything. You'd need the rex command or the match function to do that.
Looks like | regex line is not needed. This is working for me. Notice the extra brackets.
| rex max_match=0 "(?P<countfields>((regex1)|(regex2)|..|(regexn)))"
| stats count by countfields

Neo4j Query to Table Problems

My problem is I put data into Neo4j from what was essentially a large spreadsheet essentially. Now I want to be able to get that data back out in a similar tabular format.
Lets say I have some notional spreadsheet of data that went in looked something like the following.
| Artist | Album | Song | Live | Filename | Genre | Year | Source | Label |
|--------|-------|------|------|----------|-------|------|--------|-------|
| .... | ..... | .... | .... | ........ | ..... | .... | ...... | ..... |
---------------------------------------------------------------------------
The spreadsheet was a listing of files with some metadata about each file. For analytic purposes it made more sense to not have the file be at the center of the graph but rather the Albums. So that every record in the table above would map to a handful of nodes and relationships. The data model for this might look something like this:
(Song)-[_IS_ON_]->(Album)
(Artist)-[_SINGS_]->(Song)
(Album)-[IS_IN_]->(Genre)
(Song)-[_IS_IN_]->(Genre)
(Album)-[_IS_]->(Live)
(Album)-[_FROM_]-(Year)
(Album)-[_IS_ON_]->(Source)
(Label)-[_PRODUCED_]->(Album)
I am able to query a single record from my spreadsheet above using a query similar to this.
MATCH (a:Album {name: "Hells Bells"})-[r]-(b)
OPTIONAL MATCH (s:Song)<-[_SINGS_]-(aa:Artist)
RETURN *
I have two questions here.
How do I make the above query return a table that looks similar to the original normalized table? If I did RETURN b.filename, b.genre ... I get a table that has a lot of null values. It would seem I need to do a DISTINCT on each of the fields. But I am still really new to Neo4j and am not positive I understand how to do this.
It would be great if there was a way to get all the fields in all the nodes without having to type them out in the query like this RETURN b.filename, b.genre .... I think I figured this out once but I stupidly didn't save it.
I hope this was clear enough. I can't share my graph model or data so I had to make this up on the fly.
TIA
Try the following (but, since you did not state how to get the filename, that value might be missing):
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN *
In a RETURN clause, if you specified a node/relationship (without a property name), that would generate a map of all its properties. The above query, for example, would return a map for each matched node.
If you actually want to have a single merged map, you can use the APOC function apoc.map.mergeList. For example:
MATCH
(artist:Artist)-[:_SINGS_]->(song:Song)-[:_IS_ON_]->(album:Album {name: "Hells Bells"})-[:_FROM_]-(year:Year),
(album)-[:_IS_IN_]->(genre:Genre),
(album)-[:_IS_]->(live:Live),
(album)-[:_IS_ON_]->(source:Source),
(label:Label)-[:_PRODUCED_]->(album)
RETURN apoc.map.mergeList([artist,song,year,genre,live,source,label,album]) AS result

In Sumo Logic, how to search for logs matching a regular expression?

I'm trying to do a Sumo Logic search for logs matching the following regular expression:
"Authorization \d+ for story is not voided. Story not removed"
That is, the \d+ consists of one or more digits, but it doesn't matter what they are exactly.
Based on the search examples cheat sheet (https://help.sumologic.com/05Search/Search-Cheat-Sheets/General-Search-Examples-Cheat-Sheet), I've tried to use a * | parse regex pattern for this, but that doesn't work:
I get a 'No capture group found in regex' error. I'm actually not really interested in capturing the digits, though, just in matching the regular expression in my search. How can I achieve this?
I managed to get it to work in two ways. Firstly, using the regular parse instead of parse regex:
* | parse "Authorization * for story is not voided. Story not removed" as id |
count by _sourceHost | sort by _count
or, when using a regular expression, it needs to be a named group:
* | parse regex "Authorization (?<id>\d+) for story is not voided. Story not removed" |
count by _sourceHost | sort by _count

Use both columns and previously defined values in fitnesse ColumnFixture

The rows in my test-table all repeat the same values, except for two columns which are different for each row. I would like to use values i defined earlier for the repeating rows.
The Fixture uploads files to FTP, each row in the test-table now has username, password, host and so on, these are always the same. The name of the file is different.
If your tests use Slim you can use constructor parameters to define the repeated values in the first (i.e. header) row of your table. In that case you only have to define the file names in the table's rows.
If your table is a 'decision table' based on a 'scenario' you can also supply repeated parameters in the header row (using a 'having' syntax). More details can be found in FitNesse's own acceptance tests. For instance:
|scenario |Division _ _ _|numerator, denominator, quotient?|
|setNumerator |#numerator |
|setDenominator|#denominator |
|$quotient= |quotient |
|Division |having|numerator|9|
|denominator|quotient? |
|3 |3.0 |
|2 |4.5 |
Another option, but this seems less appropriate when the values are really the same for ALL rows, is to use a baseline decision table where the first row defines values for all columns and subsequent rows only define the altered values.
You can use FitNesse variables:
!define username {bob}
!define password {secret}
|myfixture|
|username|password|other|stuff|
|${username}|${password}|a|b|
|${username}|${password}|c|d|
|${username}|${password}|x|y|
The answer by Fried Hoeben works for Slim, the following answer is for fit:
If your Fixture is a child of Fixture, then you can define extra parameters by adding extra columns in the header row.
|!-UploadFileToFtps-! |ftpPassword=${password} | ftpUserName=${userName}|
|host |ftpDir |localFile |result? |
|${ftpHost}|${ftpSrc}|${folder1}${file1}.xlsx |File '${folder1}${file1}.xlsx' successfully uploaded|
|${ftpHost}|${ftpSrc}|${folder2}${file2}.xlsx |File '${folder2}${file2}.xlsx' successfully uploaded|
|${ftpHost}|${ftpSrc}|${folder2}${file3}.pdf |File '${folder2}${file3}.pdf' successfully uploaded |
You can access the values in those columns with getArgs() which retrieves a String Array.
I use key-value pairs separated by '=', this enables me to use named parameters. Otherwise i would have to reference the parameters in order, which i think is wrong.

Comparing values in two columns of two different Splunk searches

I am new to splunk and facing an issue in comparing values in two columns of two different queries.
Query 1
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" A_to="**" A_from="**" | transaction call_id keepevicted=true | search "xyz event:" | table _time, call_id, A_from, A_to | rename call_id as Call_id, A_from as From, A_to as To
Query 2
index="abc_ndx" source="*/ jkdhgsdjk.log" call_id="**" B_to="**" B_from="**" | transaction call_id keepevicted=true | search " xyz event:"| table _time, call_id, B_from, B_to | rename call_id as Call_id, B_from as From, B_to as To
These are my two different queries. I want to compare each values in A_from column with each values in B_from column and if the value matches, then display the those values of A_from.
Is it possible?
I have run the two queries separately and exported the results of each into csv and used vlookup function. But the problem is there is a limit of max 10000 rows of data which can be exported and so I miss out lots of data as my data search has more than 10000 records.
Any help?
Haven't got any data to test this on at the moment, however, the following should point you in the right direction.
When you have the table for the first query sorted out, you should 'pipe' the search string to an appendcols command with your second search string. This command will allow you to run a subsearch and "import" a columns into you base search.
Once you have the two columns in the same table. You can use the eval command to create a new field which compares the two values and assigns a value as you desire.
Hope this helps.
http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Appendcols
http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Eval
I'm not sure why there is a need to keep this as two separate queries. Everything is coming from the same sourcetype, and is using almost identical data. So I would do something like the following:
index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**" (A_to="**" A_from="**") OR (B_to="**" B_from="**")
| transaction call_id keepevicted=true
| search "xyz event:"
| eval to=if(A_from == B_from, A_from, "no_match")
| table _time, call_id, to
This grabs all events from your specified sourcetype and index, which have a call_id, and either A_to and A_from or B_to and B_from. Then it transactions all of that, lets you filter based on the "xyz event:" (Whatever that is)
Then it creates a new field called 'to' which shows A_from when A_from == B_from, otherwise it shows "no_match" (Placeholder since you didn't specify what should be done when they don't match)
There is also a way to potentially tackle this without using transactions. Although without more details into the underlying data, I can't say for sure. The basic idea is that if you have a common field (call_id in this case) you can just use stats to collect values associated with that field instead of an expensive transaction command.
For example:
index="abc_ndx" index="abc_ndx" source="*/jkdhgsdjk.log" call_id="**"
| stats last(_time) as earliest_time first(A_to) as A_to first(A_from) as A_from first(B_to) as B_to first(B_from) as B_from by call_id
Using first() or last() doesn't actually matter if there is only one value per call_id. (You can even use min() max() avg() and you'll get the same thing) Perhaps this will help you get to the output you need more easily.

Resources