In my webserver when user requests URLs with weird characters, I remove these characters. And system logs these cases. When I check sanitized cases I found these. I'm curious that what would be the objective of these URLs ?
I check the IPs and these are real people and uses website as a normal person. But 1 time in their 20 URL requets of these people, URL has these weird characters at last.
http://example.com/#%EF%BF%BD%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0,
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/p%EF%BF%BD%1D%01?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDC%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDR%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD`%EF%BF%BD%EF%BF%BD%7F, agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
http://example.com/%EF%BF%BDe%EF%BF%BDv8%01%EF%BF%BD?o=3&g=P%01%EF%BF%BD&s=&z=%EF%BF%BD%EF%BF%BD%15%01%EF%BF%BD%EF%BF%BD, agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
http://en.wikipedia.org/wiki/Specials_(Unicode_block)
They are essentially malformed URLs. They can be generated from a specific malware that is trying to exploit web site vulnerabilities, from malfunctioning browser plugin or extension, or from a bug in a JS file (i.e. tracking with Google Analytics) in combination with a specific browser version/operating system. In any case, you can't actually control what requests will come from a client and there's nothing you can do to stop that so, if your generated HTML/JS code is correct, you have done your work.
If you like to correct those URLs for any reason, you can enable URL rewriting and set a rule with a regular expression filter to transform those URLs to valid URLs. Anyway, I don't suggest do that: the web server should respond with a error 404 page not found message, because that is the standard (it's a client error, after all), and this is in my opinion a faster and safer method than applying URL rewriting. (rewriting procedure may contains bugs, so someone can try to exploit that, etc, etc)
For sake of curiosity, you can easily decode those URLs with an online URL decoder of your choice (i.e. this), but essentially you will discover what you already know: there are a lot of UTF-8 replacement characters in those URLs.
In fact, %EF%BF%BD is the url-encoded version of the hex representation of the 3 bytes (EF BF BD) of the UTF-8 replacement character. You can see that character also as � or EF BF BD or FFFD or ï ¿ ½, and so on, depending of the representation method you choose.
Also, you can check by your own how the client handles that character. Go here:
http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char
press the GO button and, using your browser developer tools, check what really happens: the browser is actually encoding the unknown character with %EF%BF%BD before sending it to the web server.
These look like corrupted URLs being inserted by a piece of Malware/Adware called "Adpeak".
Here are some details on Adpeak:
How to remove AdPeak lqw.me script from my web pages?
Adpeak has a client side component that sticks the following tag into web pages:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.lqw.me/xuiow/?g=7FC3E74A-AFDA-0667-FB93-1C86261E6E1C&s=4150&z=1385998326"></script>
Adpeak also sometimes uses the host names "d.sitespeeds.com", "d.jazzedcdn.com", "d.deliversuper.com", "d.blazeapi.com", "d.quikcdn.com", probably others. Here are a few more examples:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.deliversuper.com/xuiow/?o=3&g=823F0056-D574-7451-58CF-01151D4A9833&s=7B0A8368-1A6F-48A5-B236-8BD61816B3F9&z=1399243226"></script>
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.jazzedcdn.com/xuiow/?o=3&g=B43EA207-C6AC-E01B-7865-62634815F491&s=B021CBBD-E38E-4F8C-8E93-6624B0597A23&z=1407935653"></script>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=3&g=87B35A3E-C25D-041E-0A0F-C3E8E473A019&s=BBA5481A-926B-4561-BD79-249F618495E6&z=1393532281"></SCRIPT>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=2&g=0AD3E5F2-B632-382A-0473-4C994188DBBA&s=9D0EB5E9-CCC9-4360-B7CA-3E645650CC53&z=1387549919"></SCRIPT>
The "id" is consistent: it's always "2f2a695a6afce2c2d833c706cd677a8e" in the cases we've seen.
There's always a "g", "s", and "z" parameter, and sometimes a "o" parameter that has values of 2 or 3.
We've noticed that with our pages, a certain version of this script is 100% correlated with seeing corrupted characters in the DOM: if "o" is omitted or set to 2, we'll see a Unicode FFFD injected near the end of the page or sometimes a Ux000E character, a.k.a. SHIFT OUT, which blows up standard JSON/XML serialization libraries, which is why we've been researching these URLs. We've never seen a corruption for "o=3"
However, sometimes it looks like Adpeak gets confused, and inserts junk like this:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="��?o=3&g=&s=&z=����������~?"></script>
Now, we don't know that this is Adpeak, because the URLs are mangled, but the "o=3", "g", "s", and "z" parameters are four smoking guns. The host is missing here, so it will resolve against our server, so these UxFFFDs will get sent up as UTF-8 hex-encoded "%EF%BF%BD" sequences, which are identical to what people have been seeing above.
If you're curious about how common this is, for a particular customer with high traffic and a wide demographic, we see Adpeak URLs injected into about 1.09% of their web pages, both well-formed Adpeak URLs as well as URLs with UxFFFD's. If you just look for Adpeak URLs with UxFFFD sequences, those appear in 0.053% of all web pages. And if you just look for Adpeak URLs that cause DOM corruptions (e.g., the valid URLs that contain "o=2" or no "o" parameter), that covers 0.20% of all web pages.
Probably your site's character-set is not initialized to UTF-8, but when you request a page in the site it thinks that the character are encoded with utf-8. When it "understands" that the characters are not encoded in UTF-8 format, it replaces any character that it doesn't know with the bytes sequence EF BF BD ("character place keeper").
Make sure you use UTF-8 in everyplace in your site by using <meta charset="UTF-8"> in every page.
Another example for this in a different situation: Whats going on with this byte array?
You have to use Regular Expression Functions, Search for it in php official site or google it...
The url's which are in other languages rather than english are causing this problem,
Meta charset utf 8 will not affect the url,so it wont help..meta charset only helps you to display other languages text on your web page ,not your URL..
using php Regex you can shown even chinese text in url..
Hope it will work ..
just un-check the EnableBrowserLink option in visual studio. Every Thing will work out of box.
Related
I am trying to use the graph api upload feature (in my case it is using Windev Mobile 21).
The files are appearing in the appfolder. They are the right size and have the right extensions but they can not be opened
sCd1 is ANSI string = "Authorization: Bearer"+" "+gsTokenAcc
HTTPCreateForm("driveEnq")
sContentType is ANSI string = "image/jpeg"
HTTPAddFile("driveEnq","File",sAd,sContentType)=False
sEnding is ANSI string
sHTTPRes is string
sUserAgent is ANSI string = "'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393'"
IF HTTPSendForm("driveEnq",sEnding,httpPut,sUserAgent ,sCd1)=True THEN
bufResHTTP is Buffer = HTTPGetResult(httpResult)
I am convinced that this is something to do with the content type or the format by which the files are added
The key to this appears to be to add the file (or fragment) via an empty
HTTPAddParameter("form_name","",sFrag)
Adding the form as a application/XML appears to help with the boundaries of each field.
Hope this helps
i want to crawl an e commerce website using Html agility pack but i am having an issue the html agility pack is getting source of front web page only because when i try to get the source of inner or sub items in that website i am not having that bunch of code in the source that i get from html agility pack.when i click on items then i can see code of submenu items through firebug but not in the actual source that i have.so please guide me or tell me what to do
string url="";
WebClient client = new WebClient();
client.Headers[HttpRequestHeader.UserAgent] = "Mozilla/45.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36";
html = client.DownloadString(url);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
through this code using html agility pack i can only have code of first web page
tested this, it gets the code of the whole website against particular URL
In an application am using spring + thymeleaf. I want to get the user agent for including cetain files.
<%
String browser = request.getHeader("User-Agent")
%>
i need to get this done in a thymeleaf page.How can i do that. Any help will be appreciated
SHANib's answer didn't work for me. getRequest on ServletRequest has no parameters, at least on my version. However,
<span th:text="${#httpServletRequest.getHeader('User-Agent')}">Agent</span>
worked just fine.
you can access the HttpServletRequest object with #httpServletRequest
So, for example, you can print the user agent like this
<span th:text="${#httpServletRequest.getRequest('User-Agent')}">Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/5.0)</span
Previously i was able to download YouTube videos as mp3 via youtube-mp3.org Using this method:
http://www.youtube-mp3.org/api/pushItem/?item=http%3A//www.youtube.com/watch%3Fv%3D<VIDEOID>&xy=_
Then it returned the video id and they started converting the video on their servers. Then this request would return a JSON string with info about the video and the current conversion status:
http://www.youtube-mp3.org/api/itemInfo/?video_id=<VIDEOID>&adloc=
After repeating the request until the value for status is 'serving' I then started the last request by taking the value for key h from the JSON response from the previous request, and this would download a the mp3 file.
http://www.youtube-mp3.org/get?video_id=<VIDEOID>&h=<JSON string value for h>
Now the first request always returns nothing. The second and third requests only succeed if the requested video is cached on their servers (like popular music videos). If thats not the case then the second request would return nil and so the 3rd request can't be started because of the missing hvalue from the second request. Could anybody help me with getting the website to start a conversion something needs to be wrong with the first URL i just dont know what. Thanks
I just tested it. For the first request, you need to send with it a header of:
Accept-Location: *
Otherwise, it will return a 500 (Internal Server Error). But with that header, it will return a string of the youtube video id, and you can use the 2nd api for checking the progress.
Here's the C# code I used for testing:
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create("FIRST_API_URL");
wr.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.75 Safari/535.7";
wr.Headers.Add("Accept-Location", "*");
string res = (new StreamReader(wr.GetResponse().GetResponseStream())).ReadToEnd();
Btw, you can keep track of the headers in the browser's Network (Chrome) debug tab.
Regards
Is there a way to get the following data from the Application_Error event in the Global.ascx file?
action error came from,
ipaddress error came from,
browser error came from,
browser version error came from,
hostName error came from
??
All that information is contained in the Context.Request property.
Context.Request.Url; // /controller/action?foo=bar so up to you to extract the action
Context.Request.UserHostAddress; // 123.456.789.0123
Context.Request.UserAgent // Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13
And once you are sick of parsing all this crap manually and repeating this code all over again among all your applications you might consider using ELMAH.
hostName error came from
Not sure what you mean here. Isn't that the IP of the client?