Logstash grok - Apache access log - parsing

I'd like to parsing my apache access log, but I guess that there are something wrong in my grok pattern.
Following logs are what I want to analyze.
"message": "221.251.246.139 - - [14/Sep/2020:04:56:04 +0000] \"POST /services/api/agent/liveview_image_upload HTTP/1.1\" 200 45 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Trident/7.0; rv:11.0) like Gecko\" 6019 234 1618 \"-\""
And here's what I've been parsing using that message.
%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})) %{NUMBER:response} (?:%{NUMBER:bytes})
According to my parsing process, It can parse up to \"-\". I want to parsing rest of my messages.
How can I parsing the rest? And are the things I've done so far correct?

Related

Grails special characters in URL not working

I am trying to call some urls with special characters in it. But it does not work.
This works:
GET .../rest/validation/checknameunique/?className=lomnido.Template&rename=true&name=Templaa%3Ea
This not: PUT ../rest/template/rename/526/Templaa%3Ea
There I get a 400 back from grails.
In the NGINX Log there is this entry
213.162.73.171 - - [22/Apr/2022:13:16:32 +0000] "PUT /rest/template/rename/28484/Bla%3Eaa HTTP/1.1" 400 2307 "https://mytest.com/configuration/template/28484" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36"
When I debug this, the request does not reach the Security Interceptor (all requests go through this).
What is wrong here?
Best regards,
Peter

Splunk AWS ALB logs not properly parsing

I'm trying to ingest my AWS ALB logs into Splunk. After all, I could search my ALB logs in Splunk. But still the events are not properly parsing. Did anyone had similar issue or any suggestion?
Here is my prop.conf
[aws:alb:accesslogs]
SHOULD_LINEMERGE=false
FIELD_DELIMITER = whitespace
pulldown_type=true
FIELD_NAMES=type,timestamp,elb,client_ip,client_port,target,request_processing_time,target_processing_time,response_processing_time,elb_status_code,target_status_code,received_bytes,sent_bytes,request,user_agent,ssl_cipher,ssl_protocol,target_group_arn,trace_id
EXTRACT-elb = ^\s*(?P<type>[^\s]+)\s+(?P<timestamp>[^\s]+)\s+(?P<elb>[^\s]+)\s+(?P<client_ip>[0-9.]+):(?P<client_port>\d+)\s+(?P<target>[^\s]+)\s+(?P<request_processing_time>[^\s]+)\s+(?P<target_processing_time>[^\s]+)\s+(?P<response_processing_time>[^\s]+)\s+(?P<elb_status_code>[\d-]+)\s+(?P<target_status_code>[\d-]+)\s+(?P<received_bytes>\d+)\s+(?P<sent_bytes>\d+)\s+"(?P<request>.+)"\s+"(?P<user_agent>.+)"\s+(?P<ssl_cipher>[-\w]+)\s*(?P<ssl_protocol>[-\w\.]+)\s+(?P<target_group_arn>[^\s]+)\s+(?P<trace_id>[^\s]+)
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time
Sample data
https 2020-08-20T12:40:00.274478Z app/my-aws-alb/e7538073dd1a6fd8 162.158.26.188:21098 172.0.51.37:80 0.000 0.004 0.000 405 405 974 424 "POST https://my-aws-alb-domain:443/api/ps/fpx/callback HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.2840.91 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:ap-southeast-1:111111111111:targetgroup/my-aws-target-group/41dbd234b301e3d84 "Root=1-5f3e6f20-3fdasdsfffdsf" "api.mydomain.com" "arn:aws:acm:ap-southeast-1:11111111111:certificate/be4344424-a40f-416e-8434c-88a8a3b072f5" 0 2020-08-20T12:40:00.270000Z "forward" "-" "-" "172.0.51.37:80" "405" "-" "-"
Using transforms is pretty straightforward. Start with a stanza in transforms.conf.
[elb]
REGEX = ^\s*(?P<type>[^\s]+)\s+(?P<timestamp>[^\s]+)\s+(?P<elb>[^\s]+)\s+(?P<client_ip>[0-9.]+):(?P<client_port>\d+)\s+(?P<target>[^\s]+)\s+(?P<request_processing_time>[^\s]+)\s+(?P<target_processing_time>[^\s]+)\s+(?P<response_processing_time>[^\s]+)\s+(?P<elb_status_code>[\d-]+)\s+(?P<target_status_code>[\d-]+)\s+(?P<received_bytes>\d+)\s+(?P<sent_bytes>\d+)\s+"(?P<request>.+)"\s+"(?P<user_agent>.+)"\s+(?P<ssl_cipher>[-\w]+)\s*(?P<ssl_protocol>[-\w\.]+)\s+(?P<target_group_arn>[^\s]+)\s+(?P<trace_id>[^\s]+)
Then refer to the transform in props.conf
[aws:alb:accesslogs]
TIME_PREFIX = https\s
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%6N%Z
MAX_TIMESTAMP_LOOKAHEAD = 32
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
TRANSFORMS-elb = elb
EVAL-rtt = request_processing_time + target_processing_time + response_processing_time

Yandex-tank add cookie and Host headers to requests from access log

I have an access.log nginx with cookie:
99.20.231.22 www.carite.com - [01/Dec/2015:03:00:10 -0600] "GET /?mode=_ajax&_imod[]=i159330&make=Mercedes-Benz&_=1448960297171 HTTP/1.1" 200 1182 "http://www.carite.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1" "PHPSESSID=ebg5n89m9pc1iamekii1qra5k0; chooseStoreNotificationShown=1; dfa_visit=1448960180633603603; dfa_visitor=1448960180633796491; mod-compare-box=%7B%22vehicles%22%3A%7B%22v11279294%22%3A%7B%22vuid%22%3A%2211279294%22%2C%22isCompared%22%3Afalse%7D%7D%2C%22compareAll%22%3Atrue%2C%22cookieLifeTime%22%3A30%2C%22cookiePath%22%3A%22%5C%2F%22%7D; _ga=GA1.2.10339867.1448960182; _gali=make; _gat_a1=1; _gat_a2=1; _gat_a3=1; _gat_a4=1; usy46gabsosd=collserve__-2_1448960382693_8786" 80 0.295
Can I specify Yandex-tank get cookie from access log and add it to every yandex-tank request?
Also I need get header "Host:" from access log instead of specify it in load.ini like:
headers = [Host: www.carite.com]
You have two options:
to make stepper read cookies along with uri from access.log (it
should be done around there
https://github.com/yandex/yandex-tank/blob/master/yandextank/stepper/missile.py#L213)
make a separate file from access.log, in https://yandextank.readthedocs.org/en/latest/tutorial.html#uri-style-uris-in-file format. Headers are overriden on the go, so you can redefine headers anywhere
For example it could be like this:
[Host: www.carite.com]
[Cookie: PHPSESSID=ebg5n89m9pc1iamekii1qra5k0; chooseStoreNotificationShown=1; dfa_visit=1448960180633603603; dfa_visitor=1448960180633796491; ...]
/?mode=_ajax&imod[]=i159330&make=Mercedes-Benz&=1448960297171
...
[Host: example.com]
[Cookie: myowncookie=1]
/something
...
I would advice to use the 2nd way as an easiest one

Weird characters in URL

In my webserver when user requests URLs with weird characters, I remove these characters. And system logs these cases. When I check sanitized cases I found these. I'm curious that what would be the objective of these URLs ?
I check the IPs and these are real people and uses website as a normal person. But 1 time in their 20 URL requets of these people, URL has these weird characters at last.
http://example.com/#%EF%BF%BD%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0,
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/p%EF%BF%BD%1D%01?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDC%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDR%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD`%EF%BF%BD%EF%BF%BD%7F, agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
http://example.com/%EF%BF%BDe%EF%BF%BDv8%01%EF%BF%BD?o=3&g=P%01%EF%BF%BD&s=&z=%EF%BF%BD%EF%BF%BD%15%01%EF%BF%BD%EF%BF%BD, agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
http://en.wikipedia.org/wiki/Specials_(Unicode_block)
They are essentially malformed URLs. They can be generated from a specific malware that is trying to exploit web site vulnerabilities, from malfunctioning browser plugin or extension, or from a bug in a JS file (i.e. tracking with Google Analytics) in combination with a specific browser version/operating system. In any case, you can't actually control what requests will come from a client and there's nothing you can do to stop that so, if your generated HTML/JS code is correct, you have done your work.
If you like to correct those URLs for any reason, you can enable URL rewriting and set a rule with a regular expression filter to transform those URLs to valid URLs. Anyway, I don't suggest do that: the web server should respond with a error 404 page not found message, because that is the standard (it's a client error, after all), and this is in my opinion a faster and safer method than applying URL rewriting. (rewriting procedure may contains bugs, so someone can try to exploit that, etc, etc)
For sake of curiosity, you can easily decode those URLs with an online URL decoder of your choice (i.e. this), but essentially you will discover what you already know: there are a lot of UTF-8 replacement characters in those URLs.
In fact, %EF%BF%BD is the url-encoded version of the hex representation of the 3 bytes (EF BF BD) of the UTF-8 replacement character. You can see that character also as � or EF BF BD or FFFD or ï ¿ ½, and so on, depending of the representation method you choose.
Also, you can check by your own how the client handles that character. Go here:
http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char
press the GO button and, using your browser developer tools, check what really happens: the browser is actually encoding the unknown character with %EF%BF%BD before sending it to the web server.
These look like corrupted URLs being inserted by a piece of Malware/Adware called "Adpeak".
Here are some details on Adpeak:
How to remove AdPeak lqw.me script from my web pages?
Adpeak has a client side component that sticks the following tag into web pages:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.lqw.me/xuiow/?g=7FC3E74A-AFDA-0667-FB93-1C86261E6E1C&s=4150&z=1385998326"></script>
Adpeak also sometimes uses the host names "d.sitespeeds.com", "d.jazzedcdn.com", "d.deliversuper.com", "d.blazeapi.com", "d.quikcdn.com", probably others. Here are a few more examples:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.deliversuper.com/xuiow/?o=3&g=823F0056-D574-7451-58CF-01151D4A9833&s=7B0A8368-1A6F-48A5-B236-8BD61816B3F9&z=1399243226"></script>
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.jazzedcdn.com/xuiow/?o=3&g=B43EA207-C6AC-E01B-7865-62634815F491&s=B021CBBD-E38E-4F8C-8E93-6624B0597A23&z=1407935653"></script>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=3&g=87B35A3E-C25D-041E-0A0F-C3E8E473A019&s=BBA5481A-926B-4561-BD79-249F618495E6&z=1393532281"></SCRIPT>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=2&g=0AD3E5F2-B632-382A-0473-4C994188DBBA&s=9D0EB5E9-CCC9-4360-B7CA-3E645650CC53&z=1387549919"></SCRIPT>
The "id" is consistent: it's always "2f2a695a6afce2c2d833c706cd677a8e" in the cases we've seen.
There's always a "g", "s", and "z" parameter, and sometimes a "o" parameter that has values of 2 or 3.
We've noticed that with our pages, a certain version of this script is 100% correlated with seeing corrupted characters in the DOM: if "o" is omitted or set to 2, we'll see a Unicode FFFD injected near the end of the page or sometimes a Ux000E character, a.k.a. SHIFT OUT, which blows up standard JSON/XML serialization libraries, which is why we've been researching these URLs. We've never seen a corruption for "o=3"
However, sometimes it looks like Adpeak gets confused, and inserts junk like this:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="��?o=3&g=&s=&z=����������~?"></script>
Now, we don't know that this is Adpeak, because the URLs are mangled, but the "o=3", "g", "s", and "z" parameters are four smoking guns. The host is missing here, so it will resolve against our server, so these UxFFFDs will get sent up as UTF-8 hex-encoded "%EF%BF%BD" sequences, which are identical to what people have been seeing above.
If you're curious about how common this is, for a particular customer with high traffic and a wide demographic, we see Adpeak URLs injected into about 1.09% of their web pages, both well-formed Adpeak URLs as well as URLs with UxFFFD's. If you just look for Adpeak URLs with UxFFFD sequences, those appear in 0.053% of all web pages. And if you just look for Adpeak URLs that cause DOM corruptions (e.g., the valid URLs that contain "o=2" or no "o" parameter), that covers 0.20% of all web pages.
Probably your site's character-set is not initialized to UTF-8, but when you request a page in the site it thinks that the character are encoded with utf-8. When it "understands" that the characters are not encoded in UTF-8 format, it replaces any character that it doesn't know with the bytes sequence EF BF BD ("character place keeper").
Make sure you use UTF-8 in everyplace in your site by using <meta charset="UTF-8"> in every page.
Another example for this in a different situation: Whats going on with this byte array?
You have to use Regular Expression Functions, Search for it in php official site or google it...
The url's which are in other languages rather than english are causing this problem,
Meta charset utf 8 will not affect the url,so it wont help..meta charset only helps you to display other languages text on your web page ,not your URL..
using php Regex you can shown even chinese text in url..
Hope it will work ..
just un-check the EnableBrowserLink option in visual studio. Every Thing will work out of box.

Malicious requests from search.live.com hitting production server

I've been checking my production.log today and there's a number of requests hitting my site that appear to be malicious, but I'm confused as to how they're even getting to us.
For example:
Processing PublicController#unknown_request (for 217.23.4.13 at 2009-11-09 09:15:52) [GET]
Parameters: {"anything"=>["results.aspx"], "action"=>"unknown_request", "first"=>"200", "controller"=>"public", "q"=>"\"bbs/cbbs.cgi?\" intitle:\"Book\" intext:\"2008\" site:.uz ", "count"=>"200", "FORM"=>"PERE"}
Completed in 16ms (View: 12, DB: 0) | 200 OK [http : // search . live .com /results.aspx?q=%22bbs/cbbs.cgi%3F%22%20intitle%3A%22Book%22%20intext%3A%222008%22%20site%3A.uz%20&count=200&first=200&FORM=PERE]
These are happening every 30 seconds or so. Obviously, PublicController/Unknown_request is my controller/action 404 error.
The access log shows these requests as:
217.23.4.13 - - [09/Nov/2009:09:57:25 +1000] "GET http://search.live.com/results.aspx?q=%22en-gb.html%22%20intitle%3A%22Home%22%20intext%3A%222006%22%20site%3A.mn%20&count=200&first=400&FORM=PERE HTTP/1.1" 200 3626 "-" "Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.1$
How are these requests even hitting my site? Does anyone have any ideas?
I think this might be the same problem you're having: http://penguinpetes.com/b2evo/index.php?p=567&more=1&c=1&tb=1&pb=1
Basically, live/bing are doing some sort of testing that involves going to your site looking like someone searched something completely irrelevant to the content you have.

Resources