Special character encoding issue with ServerXMLHTTP - character-encoding

I am using the following piece of code to retrieve data from a site:
Dim htttpObj As Object
Set htttpObj = CreateObject("MSXML2.ServerXMLHTTP.6.0")
htttpObj.SetTimeouts 10000, 10000, 10000, 300000
htttpObj.Open "POST", url, False
htttpObj.setRequestHeader headerName, headerValue
htttpObj.Send ("func1=" & func1 & "&func2=" & func2 & "&username=" & login & "&psd=" & password)
answer = htttpObj.responseText
The code works great when retrieving strings like "Cat", "dog", "Hello World!" But it doesn't work when retrieving strings like "Ações".
For instance "Ações" becomes "Ações" or other special characters.
Does any one know a solution for this issue?
P.s.:I don't get this result when using WinHttp method. But I still have to keep ServerXMLHTTP method as back up for the WinHttp.

Related

Wierd URL Encoding/Decoding for non English Characters

How and why a non-English word is converted to weird characters like پاکستان to پاکستان, is there any way back to get پاکستان from پاکستان. It happens in browser shown code and received requests at server
Background:
I get lot of requests at my Non-English content (urdu) website with urls like
پاکستان
I tried to know what that means but search engines don't help. I tried things like
Decode this 'mystring'
What ecoding is this 'mystring'
I thought it might be corrupted/spam url, from this link
Weird characters in URL
Problem explanation/example
But when I viewed one my js file in browser (while having look on working js file). It is showing me same wired characters in browser, even at localhost
'pakistan': {'eng': 'Pakistan', 'ur': 'پاکستان'},
//But actually source code for above line is following
'pakistan': {'eng': 'Pakistan', 'ur': 'پاکستان'},
But in browser its showing me following for same line,
My knowledge
I only know about Encoding/Decoding, which seems unrelated here with best of my knowledge as?
encodeURI and decodeURI in JS or quote and unquote in python and same for other languages. But what they do for me is only
`پاکستان` to `%D9%BE%D8%A7%DA%A9%D8%B3%D8%AA%D8%A7%D9%86` and vise versa
Why needed?
I don't want to miss the requests received with those malformed urls, there must be some things to undo as all browsers chrome/firefox/edge showing those characters same, If their translation/conversion method and result is same then there should be some technique available to reverse it as well
Thanks to Giacomo Catenazzi and then I be greatful to the following answer
How to decode cp1252 string?
A very custom and still imperfect solution to my problem.
This algo needs to be improved Only by experiment I came to know, this algo works as its not working for me when string is long or including - (hyphens)
So I made changes according to my requirement and its working fair enough, so that I could guess what the actual string was.
import re, itertools
from lxml.builder import unicode
def specific_my_required_processing(received_string):
starting_characters_in_encoded_string_in_my_case = ['Ø', 'Ã', 'Ù', 'Ù', 'Ú']
arr = received_string.split('-')
res = []
missed = []
for string_item in arr:
decoded_string = guess_decode_string_without_hyphens(string_item)
if decoded_string and decoded_string[:1] not in starting_characters_in_encoded_string_in_my_case:
res.append(decoded_string)
else:
missed.append({string_item: decoded_string})
resulting_urdu_string = '-'.join(res)
print('\n\nResult', resulting_urdu_string)
print('\nCould not be decoded', missed)
def guess_decode_string_without_hyphens(s):
encodings = ['cp1251', 'cp1252', 'utf8']
for steps in range(2, 10, 2):
for encs in itertools.product(encodings, repeat=steps):
r = s
try:
for enc in encs:
r = r.encode(enc) if isinstance(r, unicode) else r.decode(enc)
except (UnicodeEncodeError, UnicodeDecodeError) as e:
continue
if re.match(u'^[\w\sа-яА-Я]+$', r):
res = str(r)
print('Encoding => ', encs, ' Conversion = ' + s + ' => ' + res)
return res
sample_encoded_string = 'اسلام-آباد-Ûائیکورٹ-ای-ÙˆÛŒ-ایم-قانون-سازی-کالعدم-قرار-دینے-Ú©ÛŒ-درخواست-نامکمل-قرار'
specific_my_required_processing(sample_encoded_string)

Swift 4 OutputStream produces more output than input

I have been attempting to use streams in swift to interface with a java socket server (I don't believe the java server is my problem) but when I attempt to write with an OutputStream my string includes a bunch of extra garbage that was not in my original string
The code currently looks like this:
var maxWriteLength = 4096
func sendMessage(msg: String) {
let encodedDataArray = [UInt8](msg.utf8)
outputStream.write(encodedDataArray, maxLength: maxWriteLength)
}
However when I give it an input of "hi" it returns an ouput of:
Echo: hi���8B��,rؾ�؇��allowCloudBackup؇��allowAppInstallation؇��safariForceFraudWarning�&��q���ޙTh�C��=wthread��&��q����������������8$��N��8$���0'}��#�
Echo: �'��q����������p�g�iYh�C���iYh�C��
Echo: D�#D�8״
Echo: pV���؇��requireAlphanumeric؇��allowCellularHDUploadsInternational-Key_2��
and much much more
I have seen other posts suggesting that you should use encodedDataArray.count instead of a maxWriteLength however when I have used this the OutputStream will not write anything.
Thanks in advance.
For future people who struggle with this it was indeed the way the server was handling messages (I went back to check it). The problem was that I was using Scanner.nextLine() and had no \n inside of the swift portion of my code. If you are using an array make sure that you add + "\n" to your string before it is made into an array otherwise the bytes for \n will not be written. The final code looked like this:
func sendMessage(msg: String) {
var finalMsg = msg + "\n"
let encodedDataArray = [UInt8](finalMsg.utf8)
outputStream.write(encodedDataArray, maxLength: encodedDataArray.count)
}

How can I do about " and ' for saved object in server side and use it for JSON.parse() in javascript?

How can I do about " and ' for saving object and use it for JSON.parse() in javascript?
Because I have some strings and user can type using " and ' in a single string, for example "some description like "abc" for '3 feet" or "about 3'5 feet" or "wel"come"". When I want to transform into a JSON, especially when I want to parse in view side, for example:
var test = '${test as JSON}';
It breaks because contains escape like \"
However, if only contains " and use this JSON.parse("[{"abc": "asda"das"}]"), still invalid, because need escape.
So, I really don't know what I should do in thoses cases. I need some explanation how to fix, avoid or magic.
JSON can be treated directly as Object in Javascript. So you can simply write:
var test = ${(test as JSON).toString()}
Optionally, you can pass true to the toString() method to make the JSON pretty string or Object.
var test = ${(test as JSON).toString(true)}

URLConnection with arabic parameters

i'm trying to develop an android application which contains arabic data , so i've got a problem ;
URL twitter = new URL("http://10.0.2.2/WS/identi_el.php?id1="+nomm+"&id2="+pren+"&id3="+pa);
these parameters (nomm , pren and pa ) are in arabic language so it doesn't return any result , however , when i put them in french it returns results so can anyone helps me how to make URLConnection supports arabic letters please ?
Non alphanumeric characters except -, _ and . are know to cause issues in URLs, I bet you'll run into the same problem if you use a french word with an accent.
So stay on the safe side and encode all parameters before using them as part query string parameters.
I modified the URL from
URL twitter = new URL("http://10.0.2.2/WS/identi_el.php?id1="+nomm+"&id2="+pren+"&id3="+pa);
to
url = new URL("http://10.0.2.2/WS/identi_el.phpid1="+java.net.URLEncoder.encode(nomm,"utf8")+"&id2="+java.net.URLEncoder.encode(pren,"utf8")+"&id3="+java.net.URLEncoder.encode(pa,"utf-8"));
=> I just added the following java.net.URLEncoder.encode(...,"utf8") for each parameter and it's working :)

Int32.ParseInt throws FormatException after web post

Update
I've found the problem, the exception came from a 2nd field on the same form which indeed should have prompted it (because it was empty)... I was looking at an error which I thought came from trying to parse one string, when in fact it was from trying to parse another string... Sorry for wasting your time.
Original Question
I'm completely dumbfounded by this problem. I am basically running int.Parse("32") and it throws a FormatException. Here's the code in question:
private double BindGeo(string value)
{
Regex r = new Regex(#"\D*(?<deg>\d+)\D*(?<min>\d+)\D*(?<sec>\d+(\.\d*))");
Regex d = new Regex(#"(?<dir>[NSEW])");
var numbers = r.Match(value);
string degStr = numbers.Groups["deg"].ToString();
string minStr = numbers.Groups["min"].ToString();
string secStr = numbers.Groups["sec"].ToString();
Debug.Assert(degStr == "32");
var deg = int.Parse(degStr);
var min = int.Parse(minStr);
var sec = double.Parse(secStr);
var direction = d.Match(value).Groups["dir"].ToString();
var result = deg + (min / 60.0) + (sec / 3600.0);
if (direction == "S" || direction == "W") result = -result;
return result;
}
My input string is "32 19 17.25 N"
The above code runs on a .NET 4 web hosting service (aspspider) on an ASP.NET MVC 3 web application (with Razor as its view engine).
Note the assersion of degStr == "32" is valid! Also when I take the above code and run it in a console application it works just fine. I've scoured the web for an answer, nothing...
Any ideas?
UPDATE (stack trace)
[FormatException: Input string was not in a correct format.]
System.Number.StringToNumber(String str, NumberStyles options, NumberBuffer& number, NumberFormatInfo info, Boolean parseDecimal) +9586043
System.Number.ParseInt32(String s, NumberStyles style, NumberFormatInfo info) +119
System.Int32.Parse(String s) +23
ParkIt.GeoModelBinder.BindGeo(String value) in C:\MyProjects\ParkIt\ParkIt\GeoBinder.cs:42
Line 42 is var deg = int.Parse(degStr); and note that the exception is in System.Int32.Parse (not in System.Double as was suggested).
You are wrongly thinking that it is the following line that is throwing the exception:
int.Parse("32")
This line is unlikely to ever throw an exception.
In fact it is the following line:
var sec = double.Parse(secStr);
In this case secStr = "17.25";.
The reason for that is that your hosting provider uses a different culture in which the . is not a decimal separator.
You have the possibility to specify the culture in your web.config file:
<globalization culture="en-US" uiCulture="en-US" />
If you don't do that, then auto is used. This means that the culture could be set based on the client browser preferences (which are sent with each request using the Accept-Language HTTP header).
Another possibility is to specify the culture when parsing:
var sec = double.Parse(secStr, CultureInfo.InvariantCulture);
This way you know for sure that . is the decimal separator for the invariant culture.
Testing this (via PowerShell):
PS [64] E:\dev #43> '32 19 17.25 N' -match "\D*(?\d+)\D*(?\d+)\D*(?\d+(\.\d*))"
True
PS [64] E:\dev #44> $Matches
Name Value
---- -----
sec 17.25
deg 32
min 19
1 .25
0 32 19 17.25
So the regex is working with all three named captures getting a value, all of which will parse OK (ie. it isn't something like \d matching something like U+0660: ARABIC-INDIC DIGIT ZERO that Int32.Parse doesn't handle).
But you do not check that the regex actually makes a match.
Therefore I suspect that the value passed to the function is not the input you expect. Put a breakpoint (or logging) at the start of the function and get the actual value of value.
I think what is happening is:
Value isn't what you think it is.
The regex fails to match.
The captures are empty
Int32.Parse("") is throwing (just confirmed: it throws a FormatException "Input string was not in a correct format.")
Adendum: Just noted you comment on the assertion.
If things seem contradictory go back to basics: at least one of your assumptions is wrong eg. there could be an off by one in the exception's line number (an edit to the file before going to that line number: very easy to do).
Stepping through with a debugger in this case is by far the easiest approach. On every expression check everything.
If you cannot use a debugger then try and remove that restriction, if not how about IntelliTrace? Othewrwise use some kind of logging (if you app doesn't have it, add it as you'll need it in the future for things like this).
try remove non unicode ( if any - non-visible) chars from string :
string s = "søme string";
s = Regex.Replace(s, #"[^\u0000-\u007F]", string.Empty);
edit
also - try to see its hex values to see where it is doing exceptio n :
BitConverter.ToString(buffer);
this will show you the hex values so you can verify...
also paste its value so we can see it.
It turns out that this is a non-question. The problem was that the exception came from a 2nd field on the same form which indeed should have prompted it (because it was empty)... I was looking at an error which I thought came from trying to parse one string, when in fact it was from trying to parse another string...
Sorry for wasting your time.

Resources