fluentD not parisng logs without a timestap - fluentd

This is my current fluentID configuration that parses JSON and standard logs with a timestamp:
<source>
#id fluentd-containers.log
#type tail
path /var/log/containers/*.log
pos_file /var/log/containers.log.pos
tag raw.test.*
read_from_head true
<parse>
#type multi_format
<pattern>
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
</source>
<match raw.test.**>
#id raw.test
#type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
</match>
<filter **>
#id filter_concat
#type concat
key message
multiline_end_regexp /\n$/
separator ""
</filter>
<filter test.**>
#id filter_kubernetes_metadata
#type kubernetes_metadata
</filter>
<filter test.**>
#id filter_test
#type grep
<regexp>
key log
pattern ^(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(?:\.\d{3})?)\s+(ERROR|INFO|WARN|DEBUG|TRACE)\s?\s.*
</regexp>
</filter>
<filter test.**>
#id filter_severity
#type parser
key_name log
reserve_data true
<parse>
#type regexp
expression ^(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}(?:.\d{3})?)\s+(?<severity>ERROR|INFO|WARN|DEBUG|TRACE)\s+(?:--- +\[\s*(?<process>[^]]*)\] +(?<class>\S*) +: +(?<syslog_message>.*)|traceId=(?<traceId>[^\s]*)\sspanId=(?<spanId>[^\s]*)\ssampled=(?<sampled>[^\s]*)\s\[(?<class2>[^]]+)\]\s\((?<process2>[^)]+)\)\s(?<syslog_message2>.*))
</parse>
</filter>
<filter test.**>
#type record_transformer
enable_ruby
<record>
process ${record["process2"] != nil ? record["process2"] : record["process"]}
class ${record["class2"] != nil ? record["class2"] : record["class"]}
syslog_message ${record["syslog_message2"] != nil ? record["syslog_message2"] : record["syslog_message"]}
</record>
remove_keys process2,class2,syslog_message2
</filter>
<filter test.**>
#id filter_parser
#type parser
key_name log
reserve_data true
remove_key_name_field true
<parse>
#type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
<match test.**>
#id elasticsearch
#type elasticsearch_dynamic
Can you please advise me on setting up the filter that would not ignore multiline logs without a timestamp?
[ERROR] testAll{TestCase}[41] Time elapsed: 0.059 s <<< ERROR!
javax.ws.rs.NotFoundException: HTTP 404 Not Found
[ERROR] Test.testAll:225->restClientPut:292 ยป NotFound HTTP 404 Not Foun...
java.lang.RuntimeException: java.net.SocketTimeoutException: Connect timed out
[INFO] Results:
I've tried to create a filter with regex where add level and message
/(?<level>[^\s]+)(?<message>.*)/
, but fluentD ignores it and does not send the logs.

Related

Fluentd does not parse nested json array

I have the following json message on Fluentd input
{"foo":{"bar":{"abc":"[\n {\n \"ip\":\"192.168.1.1\",\n \"hostname\":\"pc\",\n \"mac\":\"01:02:03:04:05:06\"\n} \n]"}}}
And want to get the output message
{"foo":{"bar":{"abc":[{"ip":"192.168.1.1", "hostname":"pc", "mac":"01:02:03:04:05:06}]"}}}
I'm trying to parse it with the filter
<filter **>
#type parser
key_name foo
reserve_data true
remove_key_name_field false
<parse>
#type multi_format
<pattern>
format json
</pattern>
<pattern>
format none
</pattern>
</parse>
</filter>
But without any effect. The output is the same as the input.
When I try
<filter **>
#type parser
key_name $['foo']['bar']['abc']
<parse>
#type none
</parse>
</filter>
<filter **>
#type parser
key_name message
<parse>
#type json
</parse>
</filter>
it "works" but of course I get only
[{"ip":"192.168.1.1","hostname":"ps","mac":"01:02:03:04:05:06}]
Thank you for your suggestions.
You can use record_transformer by enabling Ruby like this:
<filter **>
#type record_transformer
enable_ruby true
<record>
message ${ record['message'].gsub(/(\\n)|(\\)|(\s+)/, '') }
</record>
</filter>
Output:
2021-05-17 16:22:14.568177376 +0500 test.*: {"message":"{\"foo\":{\"bar\":{\"abc\":\"[{\"ip\":\"192.168.1.1\",\"hostname\":\"pc\",\"mac\":\"01:02:03:04:05:06\"}]\"}}}"}
Output with puts:
{"foo":{"bar":{"abc":"[{"ip":"192.168.1.1","hostname":"pc","mac":"01:02:03:04:05:06"}]"}}}
That array enclosed in quotes is a JSON value. To make it a valid JSON array, its quotes should also be removed.
So, to do that, you'll use something like this:
s.gsub(/(\\n)|(\\)|(\s+)/, '').gsub(/\"\[/, '[').gsub(/\]\"/, ']')
Here's the complete test:
fluent.conf
<source>
#type forward
bind 0.0.0.0
port 8080
tag test.*
</source>
<filter **>
#type record_transformer
enable_ruby true
<record>
# message ${ record['message'].gsub(/(\\n)|(\\)|(\s+)/, '') }
message ${ m = record['message'].gsub(/(\\n)|(\\)|(\s+)/, ''); puts m; m; }
</record>
</filter>
<match **.*>
#type stdout
</match>
Run fluentd:
fluentd -c fluent.conf
Send message to fluentd with fluent-cat:
echo '{"foo":{"bar":{"abc":"[\n {\n \"ip\":\"192.168.1.1\",\n \"hostname\":\"pc\",\n \"mac\":\"01:02:03:04:05:06\"\n} \n]"}}}' | fluent-cat -p 8080 -f none test
Output in the fluentd logs:
{"foo":{"bar":{"abc":"[{"ip":"192.168.1.1","hostname":"pc","mac":"01:02:03:04:05:06"}]"}}}
2021-05-17 16:22:14.568177376 +0500 test.*: {"message":"{\"foo\":{\"bar\":{\"abc\":\"[{\"ip\":\"192.168.1.1\",\"hostname\":\"pc\",\"mac\":\"01:02:03:04:05:06\"}]\"}}}"}
this parses unparsed nested json
<filter **>
#type parser
key_name $['foo']['bar']['abc']
hash_value_field parsed_abc
reserve_data true
remove_key_name_field true
<parse>
#type json
</parse>
</filter>
<filter **>
#type record_transformer
enable_ruby true
<record>
parsed_abc ${record['foo']['bar']['abc'] = record['parsed_abc']}
</record>
remove_keys parsed_abc
</filter>

How to prevent td-agent / fluentd from inserting undesirable metadata?

Please assist me in understanding how to prevent td-agent from inserting undesirable metadata.
It transforms a record of the form JSONBLOB to TIMESTAMP LOGNAME JSONBLOB.
I Only want the json, not the timestamp and logname.
for example --
td-agent transforms a log that looks like this:
{"log":"I0123 01:58:21.668297 1 nanny.go:108] dnsmasq[14]: 130404 192.168.178.209/44096 reply bitesize-docker-registry.s3.amazonaws.com.cluster.local is NXDOMAIN\n","stream":"stderr","docker":{"container_id":"52d1f2122ea4d144fb07835e1d8b7d210e2ac05c4c0bfd7d2e09237b597bf6a3"},"kubernetes":{"container_name":"dnsmasq","namespace_name":"kube-system","pod_name":"kube-dns-5c9464f66b-whljm"},"target_index":"kube-system-1970.01.01"}
to this:
1970-01-01T00:33:40+00:00 kubernetes.var.log.containers.kube-dns-5c9464f66b-whljm_kube-system_dnsmasq-52d1f2122ea4d144fb07835e1d8b7d210e2ac05c4c0bfd7d2e09237b597bf6a3.log {"log":"I0123 01:58:21.668297 1 nanny.go:108] dnsmasq[14]: 130404 192.168.178.209/44096 reply bitesize-docker-registry.s3.amazonaws.com.cluster.local is NXDOMAIN\n","stream":"stderr","docker":{"container_id":"52d1f2122ea4d144fb07835e1d8b7d210e2ac05c4c0bfd7d2e09237b597bf6a3"},"kubernetes":{"container_name":"dnsmasq","namespace_name":"kube-system","pod_name":"kube-dns-5c9464f66b-whljm"},"target_index":"kube-system-1970.01.01"}
my config looks like this:
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
exclude_path ["/var/log/containers/td-agent*.log"]
pos_file /var/log/td-agent/td-agent-containers.log.pos
tag kubernetes.*
read_from_head true
keep_time_key true
<parse>
#type json
json_parser json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<filter kubernetes.var.log.containers.**>
#type parser
<parse>
#type json
json_parser json
</parse>
replace_invalid_sequence true
emit_invalid_record_to_error false
key_name log
reserve_data true
</filter>
#Filter only kubernetes logs
<filter kubernetes.**>
#type grep
regexp kubernetes namespace_name
</filter>
<filter **>
#type record_transformer
enable_ruby
<record>
target_index system_log
</record>
</filter>
<filter kubernetes.**>
#type record_transformer
enable_ruby
<record>
target_index ${record["kubernetes"]["namespace_name"]}-${time.strftime('%Y.%m.%d')}
</record>
</filter>
<match fluent.**>
#type null
</match>
# relabel
<match kubernetes.**>
#type copy
<store>
#type relabel
#label #AWSES
</store>
<store>
#type relabel
#label #CCL
</store>
</match>
# AWS ElasticSearch logging
<label #AWSES>
<match kubernetes.**>
#type aws-elasticsearch-service
#log_level info
ssl_verify false
reload_connections false
time_key "time"
ssl_version TLSv1_2
resurrect_after 5s
target_index_key target_index
logstash_format true
include_tag_key false
type_name "access_log"
<buffer>
flush_mode interval
retry_type exponential_backoff
flush_thread_count 4
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 32M
queue_limit_length 32
total_limit_size 5G
queued_chunks_limit_size 100
overflow_action block
disable_chunk_backup true
</buffer>
<endpoint>
url https://elasticsearch.idoug0122.us-east-2.dev:443
region us-east-2
</endpoint>
</match>
</label>
# CISO Centralized logging
<label #CCL>
<match kubernetes.**>
#type s3
s3_bucket loggingbucket-us-east-2-602604727914
s3_region us-east-2
path k8s/dev/idoug0122/
<buffer>
flush_mode interval
retry_type exponential_backoff
flush_thread_count 4
flush_interval 5s
retry_forever true
retry_max_interval 30
chunk_limit_size 32M
queue_limit_length 32
total_limit_size 5G
queued_chunks_limit_size 100
overflow_action block
disable_chunk_backup true
</buffer>
</match>
</label>
Any help would be appreciated!
the desirable format is obtained by adding a format directive under either a 'filter' or 'match' section:
<match kubernetes.**>
<format>
#type json
</format>
</match kubernetes.**>

Data starting with same words in sunspot rails

My controller is
#search = Sunspot.search(User) do
fulltext params[:search]
end
#search_products = #search.results
I want it also match those name which start with search params
create a new type in your solr schema file (solr/conf/schame.xml). Add this to your schema.xml under types tag :
<fieldType class="solr.TextField" name="text_pre" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="10" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Here minGramSize is the minimum number of characters and maxGramSize is the maximum number of characters to start with that can be searched.
For example, minGramSize=8 would mean that a word like Kapil Kumar Yadav can be searched with all these substring => "K", "Ka", "Kap", "Kapi", "Kapil", "Kapil ", "Kapil K", "Kapil Ku", but will not be searched with search params having length geater then 8 characters, so its suggested to keep a value for maxGramSize that fits all your fields that have been indexed.
Now create a new dynamic field definition in same solr schama file, Add this to your schema.xml under fields tag:
<dynamicField name="*_textp" stored="false" type="text_pre" multiValued="true" indexed="true"/>
Finally onfigure all text fields in User class to index into your newly created dynamic field using as option like:
class User < ActiveRecord::Base
searchable do
text :name, as: :name_textp
text :address, as: :address_textp
# etc.
end
end
I found this here Matching substrings in fulltext search

How to return search results in Sunspot Solr (rails) where not all tokens are present?

I'm running Sunspot Solr in my rails app. I am using it to enable a user to search for different "articles" by using fulltext search on the :name attribute. At this point in time, I have Sunspot Solr configured and it's working nicely.
However, when I search for dog mouse cat (as an example), it only returns articles that contain all of the keywords. How can I configure Solr to show articles like 'The dog and the cat' - which contains only 2 of the 3 search keywords in the query example above?
My searchable block in the model:
searchable do
text :name
end
My current schema.xml for fulltext search looks like this:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Model.search do
fulltext text do
fields :name
minimum_match 2
end
end

How do you enable partial matching with "sunspot for rails"?

I just have done setting up sunspot_rails and it seems working well except one thing.
After I made 3 records like below
name=John
name=John2
name=John3
when I search with the keyword "John", only 1st record shows up. it looks like complete matching.
I'd like to have all of them to be appeared as search result.
Is this supposed to be happened as default?
or did I setup something wrong??
If you want return substrings in fulltext search, you can take a look in
https://github.com/sunspot/sunspot/wiki/Matching-substrings-in-fulltext-search
Also you can add a file sunspot_solr.rb for pagination of results in myapp/config/initializers/ with:
Sunspot.config.pagination.default_per_page = 100
return 100 results for this case.
Added:
Your schema.xml file is founded in yourappfolder/solr/conf
Also you can add <filter class="solr.NGramFilterFactory"/> to match arbitrary substrings.
This is my particular config for schema.xml:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldtype class="solr.TextField" name="text_pre" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="10"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="10"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldtype>
For me it does works fine with full strings and substrings for all keywords. Please do not forget to restart the server and reindex your models for the changes to take effect.
Regards!
Thanks!!!
block from girls controller(girls_controller.rb)
def index
#search = Girl.search do
fulltext params[:search]
end
#girls = #search.results
# #girls = Girl.all
#
# respond_to do |format|
# format.html # index.html.erb
# format.json { render json: #girls }
# end
end
block from Girl model(girl.rb)
searchable do
text :name_en, :name_es, name_ja
end

Resources