I am trying to read the contents of a web page using a Groovy script. The page contains the readings from one of my temperature sensors that I want to save regularly. I have tried the simplest variant:
def url = "https://measurements.mobile-alerts.eu/Home/MeasurementDetails?deviceid=021B5594EAB5&vendorid=60122a8b-b343-49cb-918b-ad2cdd6dff16&appbundle=eu.mobile_alerts.mobilealerts&fromepoch=1674432000&toepoch=1674518400&from=23.01.2023%2000:00&to=24.01.2023%2000:00&command=refresh"
def res = url.toURL().getText()
println( res)
The result is:
Caught: java.io.IOException: Server returned HTTP response code: 403 for URL: (my url)
In any browser, this URL works without problems.
I would be very grateful for any tips on how to solve this problem.
HTTP code 403 means that a client is forbidden from accessing a valid URL. In other words, the server knows that you are not making a request via a web browser. To bypass this restriction, you need to specify a User-Agent in the request header.
For example:
def url = 'https://measurements.mobile-alerts.eu/Home/MeasurementDetails?deviceid=021B5594EAB5&vendorid=60122a8b-b343-49cb-918b-ad2cdd6dff16&appbundle=eu.mobile_alerts.mobilealerts&fromepoch=1674432000&toepoch=1674518400&from=23.01.2023%2000:00&to=24.01.2023%2000:00&command=refresh'
def res = url.toURL().getText(requestProperties:
['User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'])
println res
You can switch to other valid user-agent values.
I've been using Radview's Webload IDE tool for a couple of test simulation projects and it has worked well. But for this one scenario where I have a client web session for a login a screen, it would always fail with a 500 Response error for a particular HTTP post as the page loads.
When I try the scenario to load the page manually with a browser it works fine with no issues.
During the recording I would set clear browser cache and cookies and no luck. And I've also tried out many configuration combinations from the "Recording and Script Generatinon Options: Post Data" settings.
/***** WLIDE - URL : http://192.168.2.2/ - ID:2 *****/
wlGlobals.GetFrames = false
wlGlobals.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko"
wlHttp.Get("http://192.168.2.2/")
// END WLIDE
/***** WLIDE - URL : http://192.168.2.2/Api.ashx?c=Images&action=GetSettings - ID:3 *****/
wlHttp.Header["Referer"] = "http://192.168.2.2/"
wlHttp.FormdataEncodingType = 1
wlHttp.ContentType = "application/x-www-form-urlencoded"
wlHttp.FormData["c"] = "Images"
wlHttp.FormData["action"] = "GetSettings"
wlHttp.Post("http://192.168.2.2/Api.ashx"+"?c=Images&action=GetSettings")
// END WLIDE
Anybody with experience with Radview's Webload can give me some suggestions?
I noticed that commenting out the formdata "c" and "actions" lines works. but later I notice a similar error which requires a sessionID in the URL so I'm not sure if I can comment out the formdata "sessionID" line.
To run the API from Webload you need to specify the authorization if its secured.
Using wlHttp.FormData is not the same as adding a parameter to the URL for a POST request.
FormData will be send as part of the post-data request body, while adding it to the URL will send it as a query string - your sever probably expects one form but not the other.
Contact RadView support if you can't get it to work and they'll help you
I am using Sendgrid to send emails from my rails app. Sendgrid send HTTP POST requests back to my app when events occur on the emails that I send - such as when an email is opened.
Sendgrid requires a URL to be provided which post requests are sent to. Mine is
my_domain.com/contact_processor
My routes.
resources :contact_processor
I know I can define a specific route, I used resources however and learned that the post request was looking for a create action.
My terminal shows the params being received.
parameters: {"_json"=>[{"ip"=>"66.249.82.220",
"sg_event_id"=>"YWRREWM4ZmItMzY4YS00MjY1LWE3YTAtOTI0MzcwNTJhMTBj",
"sg_message_id"=>"YZ8_123AQzOXoILstbNB4Q.filter0018p1las1.11190.577E6B3116.0",
"useragent"=>"Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via
ggpht.com GoogleImageProxy)", "event"=>"open", "foo_id"=>"19",
"email"=>"test#mydomain.com", "timestamp"=>1467905735,
"bar_id"=>"23"}], "contact_processor"=>{}}
I'm wanting to access the foo_id, bar_id, and event values so as to use them to update attributes of objects within my app.
What appeared to be a fairly simple task has stumped me.
Any help on how to access these and a bit of an explanation as to what I'm dealing with here would be greatly appreciated.
You can access them in the controller action hit by the callback, just like normal.
1.9.3-p551 :024 > params['_json']
=> [{"ip"=>"66.249.82.220", "sg_event_id"=>"YWRREWM4ZmItMzY4YS00MjY1LWE3YTAtOTI0MzcwNTJhMTBj", "sg_message_id"=>"YZ8_123AQzOXoILstbNB4Q.filter0018p1las1.11190.577E6B3116.0", "useragent"=>"Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via \n ggpht.com GoogleImageProxy)", "event"=>"open", "foo_id"=>"19", "email"=>"test#mydomain.com", "timestamp"=>1467905735, "bar_id"=>"23"}]
1.9.3-p551 :025 > params['_json'].first['ip']
=> "66.249.82.220"
In my webserver when user requests URLs with weird characters, I remove these characters. And system logs these cases. When I check sanitized cases I found these. I'm curious that what would be the objective of these URLs ?
I check the IPs and these are real people and uses website as a normal person. But 1 time in their 20 URL requets of these people, URL has these weird characters at last.
http://example.com/#%EF%BF%BD%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0,
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%60E%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/p%EF%BF%BD%1D%01?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDC%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%3E?, agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0
http://example.com/%EF%BF%BDR%EF%BF%BD%02?o=3&g=&s=&z=%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD%EF%BF%BD`%EF%BF%BD%EF%BF%BD%7F, agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36
http://example.com/%EF%BF%BDe%EF%BF%BDv8%01%EF%BF%BD?o=3&g=P%01%EF%BF%BD&s=&z=%EF%BF%BD%EF%BF%BD%15%01%EF%BF%BD%EF%BF%BD, agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
http://en.wikipedia.org/wiki/Specials_(Unicode_block)
They are essentially malformed URLs. They can be generated from a specific malware that is trying to exploit web site vulnerabilities, from malfunctioning browser plugin or extension, or from a bug in a JS file (i.e. tracking with Google Analytics) in combination with a specific browser version/operating system. In any case, you can't actually control what requests will come from a client and there's nothing you can do to stop that so, if your generated HTML/JS code is correct, you have done your work.
If you like to correct those URLs for any reason, you can enable URL rewriting and set a rule with a regular expression filter to transform those URLs to valid URLs. Anyway, I don't suggest do that: the web server should respond with a error 404 page not found message, because that is the standard (it's a client error, after all), and this is in my opinion a faster and safer method than applying URL rewriting. (rewriting procedure may contains bugs, so someone can try to exploit that, etc, etc)
For sake of curiosity, you can easily decode those URLs with an online URL decoder of your choice (i.e. this), but essentially you will discover what you already know: there are a lot of UTF-8 replacement characters in those URLs.
In fact, %EF%BF%BD is the url-encoded version of the hex representation of the 3 bytes (EF BF BD) of the UTF-8 replacement character. You can see that character also as � or EF BF BD or FFFD or ï ¿ ½, and so on, depending of the representation method you choose.
Also, you can check by your own how the client handles that character. Go here:
http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%BD&mode=char
press the GO button and, using your browser developer tools, check what really happens: the browser is actually encoding the unknown character with %EF%BF%BD before sending it to the web server.
These look like corrupted URLs being inserted by a piece of Malware/Adware called "Adpeak".
Here are some details on Adpeak:
How to remove AdPeak lqw.me script from my web pages?
Adpeak has a client side component that sticks the following tag into web pages:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.lqw.me/xuiow/?g=7FC3E74A-AFDA-0667-FB93-1C86261E6E1C&s=4150&z=1385998326"></script>
Adpeak also sometimes uses the host names "d.sitespeeds.com", "d.jazzedcdn.com", "d.deliversuper.com", "d.blazeapi.com", "d.quikcdn.com", probably others. Here are a few more examples:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.deliversuper.com/xuiow/?o=3&g=823F0056-D574-7451-58CF-01151D4A9833&s=7B0A8368-1A6F-48A5-B236-8BD61816B3F9&z=1399243226"></script>
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="http://d.jazzedcdn.com/xuiow/?o=3&g=B43EA207-C6AC-E01B-7865-62634815F491&s=B021CBBD-E38E-4F8C-8E93-6624B0597A23&z=1407935653"></script>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=3&g=87B35A3E-C25D-041E-0A0F-C3E8E473A019&s=BBA5481A-926B-4561-BD79-249F618495E6&z=1393532281"></SCRIPT>
<SCRIPT id=2f2a695a6afce2c2d833c706cd677a8e type=text/javascript src="http://d.lqw.me/xuiow/?o=2&g=0AD3E5F2-B632-382A-0473-4C994188DBBA&s=9D0EB5E9-CCC9-4360-B7CA-3E645650CC53&z=1387549919"></SCRIPT>
The "id" is consistent: it's always "2f2a695a6afce2c2d833c706cd677a8e" in the cases we've seen.
There's always a "g", "s", and "z" parameter, and sometimes a "o" parameter that has values of 2 or 3.
We've noticed that with our pages, a certain version of this script is 100% correlated with seeing corrupted characters in the DOM: if "o" is omitted or set to 2, we'll see a Unicode FFFD injected near the end of the page or sometimes a Ux000E character, a.k.a. SHIFT OUT, which blows up standard JSON/XML serialization libraries, which is why we've been researching these URLs. We've never seen a corruption for "o=3"
However, sometimes it looks like Adpeak gets confused, and inserts junk like this:
<script type="text/javascript" id="2f2a695a6afce2c2d833c706cd677a8e" src="��?o=3&g=&s=&z=����������~?"></script>
Now, we don't know that this is Adpeak, because the URLs are mangled, but the "o=3", "g", "s", and "z" parameters are four smoking guns. The host is missing here, so it will resolve against our server, so these UxFFFDs will get sent up as UTF-8 hex-encoded "%EF%BF%BD" sequences, which are identical to what people have been seeing above.
If you're curious about how common this is, for a particular customer with high traffic and a wide demographic, we see Adpeak URLs injected into about 1.09% of their web pages, both well-formed Adpeak URLs as well as URLs with UxFFFD's. If you just look for Adpeak URLs with UxFFFD sequences, those appear in 0.053% of all web pages. And if you just look for Adpeak URLs that cause DOM corruptions (e.g., the valid URLs that contain "o=2" or no "o" parameter), that covers 0.20% of all web pages.
Probably your site's character-set is not initialized to UTF-8, but when you request a page in the site it thinks that the character are encoded with utf-8. When it "understands" that the characters are not encoded in UTF-8 format, it replaces any character that it doesn't know with the bytes sequence EF BF BD ("character place keeper").
Make sure you use UTF-8 in everyplace in your site by using <meta charset="UTF-8"> in every page.
Another example for this in a different situation: Whats going on with this byte array?
You have to use Regular Expression Functions, Search for it in php official site or google it...
The url's which are in other languages rather than english are causing this problem,
Meta charset utf 8 will not affect the url,so it wont help..meta charset only helps you to display other languages text on your web page ,not your URL..
using php Regex you can shown even chinese text in url..
Hope it will work ..
just un-check the EnableBrowserLink option in visual studio. Every Thing will work out of box.
Previously i was able to download YouTube videos as mp3 via youtube-mp3.org Using this method:
http://www.youtube-mp3.org/api/pushItem/?item=http%3A//www.youtube.com/watch%3Fv%3D<VIDEOID>&xy=_
Then it returned the video id and they started converting the video on their servers. Then this request would return a JSON string with info about the video and the current conversion status:
http://www.youtube-mp3.org/api/itemInfo/?video_id=<VIDEOID>&adloc=
After repeating the request until the value for status is 'serving' I then started the last request by taking the value for key h from the JSON response from the previous request, and this would download a the mp3 file.
http://www.youtube-mp3.org/get?video_id=<VIDEOID>&h=<JSON string value for h>
Now the first request always returns nothing. The second and third requests only succeed if the requested video is cached on their servers (like popular music videos). If thats not the case then the second request would return nil and so the 3rd request can't be started because of the missing hvalue from the second request. Could anybody help me with getting the website to start a conversion something needs to be wrong with the first URL i just dont know what. Thanks
I just tested it. For the first request, you need to send with it a header of:
Accept-Location: *
Otherwise, it will return a 500 (Internal Server Error). But with that header, it will return a string of the youtube video id, and you can use the 2nd api for checking the progress.
Here's the C# code I used for testing:
HttpWebRequest wr = (HttpWebRequest)WebRequest.Create("FIRST_API_URL");
wr.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.75 Safari/535.7";
wr.Headers.Add("Accept-Location", "*");
string res = (new StreamReader(wr.GetResponse().GetResponseStream())).ReadToEnd();
Btw, you can keep track of the headers in the browser's Network (Chrome) debug tab.
Regards