Logstash nginx parser for http_forwared_for - parsing

I am sending nginx logs to elasticsearch by using filebeat and logstash. My logs have the following form:
000.000.000.000 - - [17/Oct/2022:08:25:18 +0000] "OPTIONS /favicon.svg HTTP/1.1" 405 559 "https://example.net/auth/login" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "111.111.111.111, 222.222.222.222"
I have the following configuration file for logstash:
input {
beats {
port => 5035
}
}
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:http_x_forwarded_for}"]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
useragent {
source => "message"
}
}
output {
elasticsearch {
hosts => "elasticsearch:9200"
index => "weblogs-%{+YYYY.MM.dd}"
document_type => "nginx_logs"
user => "elastic"
password => "changeme"
}
stdout { codec => rubydebug }
}
This pipeline saves the logs to elasticsearch in the following form:
"response" : 405,
"timestamp" : "17/Oct/2022:08:25:18 +0000",
"os_version" : "10",
"auth" : "-",
"verb" : "OPTIONS",
"clientip" : "000.000.000.000",
"httpversion" : "1.1",
"referrer" : "\"https://example.net/auth/login\"",
"geoip" : { },
"os" : "Windows",
"os_name" : "Windows",
"agent" : {
"version" : "7.17.6",
"hostname" : "0242869f2486",
"type" : "filebeat",
"id" : "4de3a108-35bf-4bd9-8b18-a5d8f9f2bc83",
"ephemeral_id" : "3a5f78b5-bae0-41f6-8d63-eea700df6c3c",
"name" : "0242869f2486"
},
"log" : {
"file" : {
"path" : "/var/log/nginx/access.log"
},
"offset" : 1869518
},
"bytes" : 559,
"ident" : "-",
"http_x_forwarded_for" : " \"111.111.111.111, 222.222.222.222\"",
"os_full" : "Windows 10",
"#timestamp" : "2022-10-17T08:25:18.000Z",
"request" : "/favicon.svg",
"device" : "Spider",
"name" : "favicon",
"input" : {
"type" : "log"
},
"host" : {
"name" : "0242869f2486"
},
"os_major" : "10",
"#version" : "1",
"message" : "000.000.000.000 - - [17/Oct/2022:08:25:18 +0000] \"OPTIONS /favicon.svg HTTP/1.1\" 405 559 \"https://example.net/auth/login\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\" \"111.111.111.111, 222.222.222.222\"",
"tags" : [
"beats_input_codec_plain_applied",
"_geoip_lookup_failure"
]
However, my goal is to parse the first IP from the http_forwared_for field and add a new filed called real_client_ip and add it save it to the index. Is there a way to achieve that?

You can add one more grok filter to your logstash pipeline after first grok filter.
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:http_x_forwarded_for}"]
}
grok {
match => [ "http_x_forwarded_for" , "%{IP:real_client_ip}"]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
useragent {
source => "message"
}
}
PS: I have validated grok pattern in Kibana but not by running logstash pipeline. but this should work for your usecase.

Related

Cypress: Is it possible to complete a test after failure but as failed test

UseCase :
I want to automatically test all sitemap URLs (more than 1000) of our website after each new code release to see if the update broke any of them.
My problem :
Test fails once a URL returns anything else than status 200 (even with failOnStatus: false) but instead I need to save the results in a sort of a callback function and only check them at the end and fail the test at the end (in order to find all the broken links)
My code :
describe('Validate sitemaps files', () => {
let urls = [];
it("Should succesfully load each url in the sitemap", () => {
cy.fixture('sitemaps.json').then((data) => {
for (var index in data) {
cy.log(data[index].url)
cy.request({
//url: Cypress.config().baseUrl + data[index].url, failOnStatusCode: false,
url: data[index].url, failOnStatus: false,
headers: {
"Content-Type": "text/xml; charset=utf-8",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36",
},
}).as("sitemap").then((response) => {
urls = Cypress.$(response.body)
.find("loc")
.toArray()
.map((el) => el.innerText
)
})
}
urls.forEach((url) => {
// check if the resource exists
cy.request(url).its('status').then(status => {
if (status !== 200) {
failed.push(url)
}
}).then(() => {
// check inside then to ensure loop has finished
cy.log('Failed links: ' + `${failed.join(', ')}`)
expect(failed.length).to.eq(0)
})
cy.wrap('passed').as('ctrl')
})
})
})
})
Fixture (just an example to test my code) :
[
{
"url": "https://gooogle.com"
},
{
"url": "https://browserstack.com/notfound"
},
{
"url": "https://yahoo.com"
},
{
"url": "https://browserstack.com"
}
]
Test result
enter image description here
I have already seen this answer ==> Google but no success so far.
Any help is much appreciated

Does POST /reviews-v1 using Crucible REST API allow us to add reviewers?

Is there a way to add reviewers when creating a code review through the REST API? I tried adding "reviewers" with an array of usernames but that didn't work. Got an error
Unrecognized field "reviewers" (Class com.atlassian.crucible.spi.data.ReviewData), not marked as ignorable
The request body example on their api docs is below:
{
"reviewData" : {
"projectKey" : "CR-FOO",
"name" : "Example review.",
"description" : "Description or statement of objectives for this example review.",
"author" : {
"userName" : "auth",
"displayName" : "Jane Authy",
},
"moderator" : {
"userName" : "scott",
"displayName" : "Scott the Moderator",
},
"creator" : {
"userName" : "joe",
"displayName" : "Joe Krustofski",
},
"permaId" : {
"id" : "CR-FOO-21"
},
"summary" : "some review summary.",
"state" : "Review",
"type" : "REVIEW",
"allowReviewersToJoin" : true,
"metricsVersion" : 4,
"createDate" : "2022-06-20T09:37:11.621+0000",
"dueDate" : "2022-06-23T09:37:11.621+0000",
"reminderDate" : "2022-06-21T09:37:11.621+0000",
"linkedIssues" : [ "DEF-456", "ABC-123", "GHI-789" ],
"jiraIssueKey" : "ABC-123"
},
"patch" : "Index: emptytests/notempty/a.txt\n===================================================================\ndiff -u -N -r1.31 -r1.32\n--- emptytests/notempty/a.txt\t22 Sep 2004 00:38:15 -0000\t1.31\n+++ emptytests/notempty/a.txt\t5 Dec 2004 01:04:25 -0000\t1.32\n## -4,4 +4,5 ##\n hello there :D\n CRU-123\n http://madbean.com/blog/\n-!\n\\ No newline at end of file\n+!\n+foobie\n\\ No newline at end of file\nIndex: test/a.txt\n===================================================================\ndiff -u -N -r1.31 -r1.32\n--- test/a.txt\t22 Sep 2004 00:38:15 -0000\t1.31\n+++ test/a.txt\t5 Dec 2004 01:04:25 -0000\t1.32\n## -4,4 +4,5 ##\n hello there :D\n CRU-123\n http://madbean.com/blog/\n-!\n\\ No newline at end of file\n+!\n+foobie\n\\ No newline at end of file",
"anchor" : {
"anchorPath" : "/",
"anchorRepository" : "REPO",
"stripCount" : 2
},
"changesets" : {
"changesetData" : [ {
"id" : "63452"
} ],
"repository" : "REPO"
}
}

Logstash cannot parse user_agent field of nginx

I have nginx logs with the following format:
192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] "GET / HTTP/1.1" 200 15 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
192.168.128.1 - - [18/Jul/2022:13:22:15 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
I am using the following pipeline to parse them and store them into elasticsearch:
input {
beats {
port => 5044
}
}
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "weblogs-%{+YYYY.MM.dd}"
document_type => "nginx_logs"
user => "elastic"
password => "changeme"
}
stdout { codec => rubydebug }
}
However, it seems that the part of useragent does not work, since I cannot see it:
{
"httpversion" => "1.1",
"clientip" => "192.168.0.1",
"ident" => "-",
"timestamp" => "18/Jul/2022:11:20:28 +0000",
"verb" => "GET",
"#timestamp" => 2022-07-18T11:20:28.000Z,
"#version" => "1",
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_geoip_lookup_failure"
],
"host" => {
"name" => "9a852bd136fd"
},
"auth" => "-",
"bytes" => 15,
"referrer" => "\"-\"",
"geoip" => {},
"message" => "192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] \"GET / HTTP/1.1\" 200 15 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" \"-\"",
"response" => 200,
"agent" => {
"version" => "7.3.2",
"ephemeral_id" => "0c38336d-1e30-4aaa-9ba8-20bd7bd8fb48",
"type" => "filebeat",
"hostname" => "9a852bd136fd",
"id" => "8991142a-95df-4aed-a190-bda4649c04cd"
},
"input" => {
"type" => "log"
},
"request" => "/",
"extra_fields" => " \"-\"",
"log" => {
"file" => {
"path" => "/var/log/nginx/access.log"
},
"offset" => 11021
},
"ecs" => {
"version" => "1.0.1"
}
}
What I need is to have a field including the whole http_user_agent content. Any idea of what is causing the error?

logstash change type format

I have ror application that in admin dashboard, admin could observe the location of his employee, in my case, I use elk to gather information of employees that contains latitude and longitude and which send to my map based on his movement, My problem is, I have a template that logstash based on template create daily index but recently I found every field in my index that have type changed to text when indexed created.
this is my json that logstash reads:
{"driver_id": 31,"driver_email": "ankith.ravindran#mailinator.com","location": {"latitude": "-35.2824767","longitude": "149.1326453"},"created_at": "2021-06-29 14:28:47", "required_matches": 1, "type": "location"}
this is my logstash.conf file:
input {
file {
path => ["/usr/share/logstash/MPD_LOCATION/*",
"/usr/share/logstash/MPD_LOCATION/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*",
"/usr/share/logstash/MPD_LOCATION/*/*/*/*/*"]
start_position => "beginning"
type => "json"
sincedb_path => "/dev/null"
}
}
filter {
mutate {
gsub => ["message","/}+({)/", "}::{"]
}
mutate {
gsub => ["message","/}+( )/", "}::"]
}
split {
field => "message"
terminator => "::"
}
json { source => "message" }
mutate {
add_field => { "uuid" => "D%{driver_id}T%{created_at}" }
rename => {
"[location][latitude]" => "[location][lat]"
"[location][longitude]" => "[location][lon]"
}
convert => {
"[location][lat]" => "float"
"[location][lon]" => "float"
}
}
}
output {
if ([type] == "location") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "live_locations_%{+YYYY_MM_dd}"
# manage_template => true
template => "/usr/share/logstash/Template/live_locations.json"
template_name => "live_locations"
# template_overwrite => true
document_id => "%{uuid}"
}
} else if ([type] == "app_info") {
elasticsearch {
hosts => "http://elasticsearch:9200"
index => "app_info_%{+YYYY_MM_dd}"
document_id => "%{uuid}"
}
}
stdout { codec => rubydebug }
}
this is my template file:
{
"settings": {
"index": {
"number_of_shards": 5,
"number_of_replicas": 1
}
},
"mappings": {
"properties": {
"driver_id": { "type": "integer" },
"email": { "type": "text" },
"location": { "type": "geo_point" },
"app-platform": { "type": "text" },
"app-version": { "type": "text" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},
"required_matches": { "type": "integer" }
}
}
}
for example, I defined type of created_at , date but when index created this field return as text and I can't understand what happened or field of location it's return float so I could not use my index as geo_point, I have to add I use elk in the version of 7.13 and used on docker.
Updated : I have two types of JSON that one of them just returns the location of the employee the second of them just returns app_version and app_platform of the employee that used.
Updated 2 : I change my input from logstash to filebeat but I still have the same problem.

How to check elasticsearch tokens after running a query in Rails?

My problem is the following:
I run an elasticsearch query in a rails app using specific settings to my index and my search analyzer, the problem is that it doesnt return any results in the app, in the other hand when i try to run it directly from my elasticsearch docker, i have tokens returned. If i use these tokens in my app query, i get results...
so this is my elasticsearch query:
curl -XGET 'localhost:9200/development-stoot-services/_analyze?analyzer=search_francais' -d 'cours de guitare'
{"tokens":[{"token":"cour","start_offset":0,"end_offset":5,"type":"<ALPHANUM>","position":1},{"token":"guitar","start_offset":9,"end_offset":16,"type":"<ALPHANUM>","position":3}]}
here is the query from my rails app to elasticsearch:
query = {
"query" : {
"bool" : {
"must" : [
{
"range" : {
"deadline" : {
"gte" : "2016-05-26T10:27:19+02:00"
}
}
},
{
"terms" : {
"state" : [
"open"
]
}
},
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
}
]
}
},
"filter" : {
"and" : [
{
"geo_distance" : {
"distance" : "40km",
"location" : {
"lat" : 48.855736,
"lon" : 2.32927300000006
}
}
}
]
},
"sort" : [
{
"created_at" : "desc"
}
]
}
the last query does not return any result, but if i try a query with the tokens returned by elasticsearch ('cour', 'guitar') i have expected results. So i guess there is a problem between rails and elasticsearch that i dont find...
Can anyone help on that ?
Try to modify your query like this, i.e. you need to specify the search_francais analyzer in your query_string in order to analyze cours de guitare the same way you did with the _analyze endpoint:
...
{
"query_string" : {
"query" : "cours de guitare",
"default_operator" : "AND",
"analyzer": "search_francais", <--- add this line
"fields" : [
"title",
"description",
"brand",
"category_name"
]
}
},
...

Resources