Trying to parse logs from a rsylog server and insert them to elasticsearch.
My incoming logline is
Feb 13 01:17:11 xxxx xxx-xxxx_error 2016/02/13 01:17:02 [error] 13689#0: *1956118 open() "xxxxxx" failed (2: No such file or directory), client: xx.xx.xx.xx, server: xxxxx.xx, request: "xxxxxxx HTTP/1.1", host: "xxxxx.xx"
I am extracting fields with the following logstash filters:
grok {
match => {
"message" => [
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) %{WORD:hostname} (?<source>[^\s]+) (?<timestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}) %{GREEDYDATA:error_message}"
]
}
date {
locale => "en"
match => [ "timestamp", "yyyy/MM/dd HH:mm:ss" ]
}
}
mutate {
remove_field => [ "#version", "_score", "message", "host", "_type", "logstamp" ]
}
Based on http://grokdebug.herokuapp.com/, my syntax is sane.
I have two dates in the log line because the first one is when rsyslog received the line, and the second one is from nginx. What I want is to pass the second one to "timestamp".
The error I get in logstash is:
#metadata_accessors=#<LogStash::Util::Accessors:0x1d630482 #store={"path"=>"..."}, #lut={"[path]"=>[{"path"=>"..."},
"path"]}>, #cancelled=false>], :response=>{"create"=>{"_index"=>"...", "_type"=>"...", "_id"=>"...", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception",
"reason"=>"Invalid format: \"2016/02/16 12:25:16\" is malformed at \"/02/16 12:25:16\""}}}}, :level=>:warn}
(I clipped the output to make it shorter)
EDIT: WORKING CONFIG
I ended up converting the timestamp from Nginx log to a more standard one (as seen in the ruby part), and using that one in the date match as #timestamp.
grok {
match => {
"message" => [
"(?<logstamp>\h{3} \d{2} \d{2}:\d{2}:\d{2}) %{WORD:hostname} (?<source>[^\s]+) (?<ngxstamp>[^\s]+ [^\s]+) %{GREEDYDATA:error_message}"
]
}
}
ruby {
code => "event['ngxstamp'] = event.timestamp.time.localtime.strftime('%Y-%m-%d %H:%M:%S')"
}
date {
match => [ "ngxstamp", "yyyy-MM-dd HH:mm:ss" ]
locale => "en"
}
mutate {
remove_field => [ "#version", "_score", "message", "host", "_type", "logstamp" ]
}
Since the type of your timestamp field is strict_date_optional_time, the date pattern you should be using in your date filter should be
yyyy-MM-dd HH:mm:ss
instead of
yyyy/mm/dd HH:mm:ss
So:
Use dashes instead of slashes in the date part
use MM instead of mm for the months
There might still be an issue with the missing T between the date and time parts, since strict_date_optional_time mandates it, though.
Related
I want to dummy in the real-time database in this structure:
{
"0" : {
"0c1592ca-0fa5-43b9-88d2-c9cd77b30611" : {
"token" : "0cu9CJPb_DIUfbr-Ay8vh6:-KQXn....",
"member_id" : "123456789102",
"update_at" : "2021/06/14 08:08:08"
},
"<uid>" : {
"token" : "167 random characters",
"member_id" : "12 random numbers",
"update_at" : "YYYY/mm/DD HH:mm:ss"
}
},
"1" : {
....
},
"2" : {
....
},
.....
"9" : {
....
}
}
A record is like this:
"36 random characters" : {
"token" : "167 random characters",
"member_id" : "12 random numbers",
"update_at" : "YYYY/mm/DD HH:mm:ss"
}
I've tried to import JSON files from the firebase console for a million records per node. But I got a crash from the second node, like the image below. I can't import easily like before.
Is there any other way that I can dummy 10 million child nodes like above, faster and stable?
https://i.stack.imgur.com/nuBoM.png
The error message says that the JSON is invalid, so you might want to pass it through a JSON validator like https://jsonlint.com/.
Aside from that, I can imagine that you browser, the console, or the server runs into memory problems with this number of nodes in one write (see limits). I recommend using the API to instead read the JSON file locally, and then add it to Firebase in chunks, or use a tool like https://github.com/FirebaseExtended/firebase-import.
Also see https://www.google.com/search?q=firebase+realtime+database+upload+large+JSON
I'm trying to use the Aerospike bulk loader to seed a cluster with data from a tab-separated file.
The source data looks like this:
set key segments
segment 123 10,20,30,40,50
segment 234 40,50,60,70
The third column, 'segments', contains a comma separated list of integers.
I created a JSON template:
{
"version" : "1.0",
"input_type" : "csv",
"csv_style": { "delimiter": " " , "n_columns_datafile": 3, "ignore_first_line": true}
"key": {"column_name":"key", "type": "integer"},
"set": { "column_name":"set" , "type": "string"},
"binlist": [
{"name": "segments",
"value": {"column_name": "segments", "type": "list"}
}
]
}
... and ran the loader:
java -cp aerospike-load-1.1-jar-with-dependencies.jar com.aerospike.load.AerospikeLoad -c template.json data.tsv
When I query the records in aql, they seem to be a list of strings:
aql> select * from test
+--------------------------------+
| segments |
+--------------------------------+
| ["10", "20", "30", "40", "50"] |
| ["40", "50", "60", "70"] |
+--------------------------------+
The data I'm trying to store is a list of integers. Is there an easy way to convert the objects stored in this bin to a list of integers (possibly a Lua UDF) or perhaps there's a tweak that can be made to the bulk loader template?
Update:
I attempted to solve this by creating a Lua UDF to convert the list from strings to integers:
function convert_segment_list_to_integers(rec)
for i=1, table.maxn(rec['segments']) do
rec['segments'][i] = math.floor(tonumber(rec['segments'][i]))
end
aerospike:update(rec)
end
... registered it:
aql> register module 'convert_segment_list_to_integers.lua'
... and then tried executing against my set:
aql> execute convert_segment_list_to_integers.convert_segment_list_to_integers() on test.segment
I enabled some more verbose logging and notice that the UDF is throwing an error. Apparently, it's expecting a table and it was passed userdata:
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_result:527) FAILURE when calling convert_segment_list_to_integers convert_segment_list_to_integers ...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata)
Dec 04 2015 23:23:34 GMT: DEBUG (udf): (udf_rw.c:send_udf_failure:407) Non-special LDT or General UDF Error(...rospike/usr/udf/lua/convert_segment_list_to_integers.lua:2: bad argument #1 to 'maxn' (table expected, got userdata))
It seems that maxn isn't an applicable method to a userdata object.
Can you see what needs to be done to fix this?
To convert your lists with string values to lists of integer values you can run the following record udf:
function convert_segment_list_to_integers(rec)
local list_with_ints = list()
for value in list.iterator(rec['segments']) do
local int_value = math.floor(tonumber(value))
list.append(list_with_ints, int_value)
end
rec['segments'] = list_with_ints
aerospike:update(rec)
end
When you edit your existing lua module, make sure to re-run register module 'convert_segment_list_to_integers.lua'.
The cause of this issue is within the aerospike-loader tool: it will always assume/enforce strings as you can see in the following java code:
case LIST:
/*
* Assumptions
* 1. Items are separated by a colon ','
* 2. Item value will be a string
* 3. List will be in double quotes
*
* No support for nested maps or nested lists
*
*/
List<String> list = new ArrayList<String>();
String[] listValues = binRawText.split(Constants.LIST_DELEMITER, -1);
if (listValues.length > 0) {
for (String value : listValues) {
list.add(value.trim());
}
bin = Bin.asList(binColumn.getBinNameHeader(), list);
} else {
bin = null;
log.error("Error: Cannot parse to a list: " + binRawText);
}
break;
Source on Github: http://git.io/vRAQW
If you prefer, you can modify this code and re-compile to always assume integer list values. Change line 266 and 270 to something like this (untested):
List<Integer> list = new ArrayList<Integer>();
list.add(Integer.parseInt(value.trim());
I have been searching online, but cannot quite figure this out.
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{URIPATH:url}
}
I need to get contents out of the url and put stuff it in elastic search.
Logs have urls like this
URL 1 =
/NEED_A/Constant_A/Constant_B/Constant_C/Need_B/Constant_D/Need_C/Need_D
URL 2 =
/NEED_A/Constant_A /Constant_B/Constant_C/Need_B/Constant_D
URL 3 =
/Wierd_A
Need_A, NEED_B, NEED_C, Need_D, Wierd_A should go in respective fields.
I have been trying to find a if else-if loop, but not really gotten anything yet.
grok {
match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} {URIPATH:url} %]
}
if[url] == "/*/Constant_A/Constant_B/Constant_C/*/Constant_D/*/*" {
\/%{WORD:NEED_A}\/.*\/.*\/.*\/%{WORD:NEED_B}\/.*\/%{WORD:NEED_C}
}
else-if[url] == "/*/Constant_A/Constant_B/Constant_C/*/Constant_D" {
\/%{WORD:NEED_A}\/.*\/.*\/.*\/%{WORD:NEED_B}\/.*\/
}
else-if
//something similar for url 3
}
//move on if nothing matches
Any thoughts?
logstash isn't a looping type system.
If you want to run multiple patterns against your input, just list them:
grok {
match => {
"message" => [
"%{PATTERN1}",
"%{PATTERN2}"
]
]
}
I am reading data from files by defining path as *.log etc,
Files names are like app1a_test2_heep.log , cdc2a_test3_heep.log etc
How to configure logstash so that the part of string that is string before underscore (app1a, cdc2a..) should be grepped and added to host field and removing the default host.
Eg:
fileName: app1a_test2_heep.log
host => app1a
Thanks in advance,
Ravi
Ruby filter can do what you want.
input {
file {
path => "/home/benlim/app1a_test2_heep.log"
}
}
filter {
ruby {
code => "
filename = event['path'].split('/').last
event['host'] = filename.split('_').first
"
}
}
output {
stdout { codec => rubydebug }
}
First, get the filename from the path. Then, get the hostname.
You can gsub (substitute) in mutate filter. It takes 3 arguments - field, what to sub and on what. You can use regexp here.
filter {
mutate {
gsub => [
# replace all after slashes with nothing
"host", "[_.*]", ""
]
}
}
I am looking at the documentation for neo4j and I see that I can use parameters when I create objects. Specifically when I look at this page I see the code:
{
"props" : {
"position" : "Developer",
"name" : "Andres"
}
}
Query.
CREATE ({ props })
Yet when I use the web interface to access my neo4j database on my local machine I do not know how to specify the parameter. Simply copy/pasting that JSON object yields an error. I see on the page that
Exactly how to submit them depends on the driver in use.
but how does one use them on that command line/web interface?
Cypher supports queries with parameters which are submitted as JSON. For example, the following is the REST API usage. For the Java embedded API please refer to the following documentation: http://docs.neo4j.org/chunked/milestone/tutorials-cypher-parameters-java.html
MATCH (x { name: { startName }})-[r]-(friend)
WHERE friend.name = { name }
RETURN TYPE(r)
Example request
POST http://localhost:7474/db/data/cypher
Accept: application/json; charset=UTF-8
Content-Type: application/json
{
"query" : "MATCH (x {name: {startName}})-[r]-(friend) WHERE friend.name = {name} RETURN TYPE(r)",
"params" : {
"startName" : "I",
"name" : "you"
}
}
Example response
200: OK
Content-Type: application/json; charset=UTF-8
{
"columns" : [ "TYPE(r)" ],
"data" : [ [ "know" ] ]
}
Parameters are not currently supported in regular Cypher statements in the Neo4j 2.0 browser. However, you can use the :POST syntax to achieve this.
Refer to the documentation for more information on Cypher queries via REST API.
http://docs.neo4j.org/chunked/milestone/rest-api-cypher.html
Update:
The following query allows you to accomplish this in the browser, although it is not an ideal experience:
:POST /db/data/transaction/commit {
"statements": [
{
"statement": "MATCH (u:User {name:{username}}) RETURN u.name as username",
"parameters": {
"username": "my name"
}
}
]
}
The syntax to define params in the browser's command line is :params followed by the variable you would like to define, just type in :params and you will get a sense of how this command works from the resulting prompts.
For the case of HTTP API, in the latest version (v3.5 as of this writing), use the following syntax, copied from https://neo4j.com/docs/http-api/current/http-api-actions/execute-multiple-statements/:
{
"statements" : [ {
"statement" : "CREATE (n) RETURN id(n)"
}, {
"statement" : "CREATE (n {props}) RETURN n",
"parameters" : {
"props" : {
"name" : "My Node"
}
}
} ]
}