Character set encoding for serial connection in putty - character-encoding

I have connected a gsm/gprs modem to the putty terminal on windows 7 for serial communication. The AT commands work but the response from the modem is displayed as special characters which is not readable. Both the port and the modem is configured at 9600 baud. I changed the character encoding to utf-8 (Window->translation->character set) but the results were the same.
Please help.

The character encoding of strings sent to/from the modem is controlled with the AT+CSCS command, Select TE character set. If you want UTF-8 run AT+CSCS="UTF-8". You can check current selected character set with AT+CSCS? and supported sets with AT+CSCS=?. See the 27.007 specification for more details.

Related

How to correctly read Latin 1 character from postgre database using C++

I have imported a shapefile in postgre database where Latin1 character encoding is used. (Database can not import using UTF-8 format). When I retrieve value using PQgetvalue() method some special characters are received incorrectly. For example I have a field value "STURDEEÿAVENUE" that is incorrectly converted
to "STURDEEÿAVENUE"
Since you are getting the data back as UTF-8, your client_encoding is probably wrong. It can be set per connection and manages the encoding with which the strings are sent back to client. By setting the variable to Latin1 immediately after connecting you can retrieve the strings in the desired encoding.

Tableau with Vertica accented characters not displaying from VARCHAR field

I've created a data connection to a Vertica table from Tableau and have a 'surname1' field in the rows. This field exists as VARCHAR in Vertica and if doing a SELECT I can see accented characters in the command line no problem.
The problem is that in Tableau these are not represented correctly, and I can't find any way to change the field encoding in Tabelau to recognise them.
Does anybody know how to solve this?
Below is an example of a select from Vertica in the command line, and what appears in Tableau:
surname1
---------------
Mérida
Fernández
Villadóniga
Muñoz
López
Thanks in advance,
James
Just leaving this in case it helps anybody in the future:
The cause of the problem was that the Vertica database was being fed by a MySQL database through a mysqli connection. This connections character encoding was configured as latin1 / 8859-1, whereas Vertica was configured under utf-8.
The problem was then further confounded because the Putty window I was using to access Vertica from Windows was also configured under latin1 / 8859-1 which effectively rendered invisible the fact the data wasn't stored correctly in Vertica under utf-8.
To solve this, I reconfigured the mysqli that fed the vertica connection to use utf-8 encoding, with the following line of code:
$mysqli->set_charset("utf8");
Note, to find out the characterset was Latin1 in the first place, I used the following:
echo $CMySQLI->character_set_name();
In summary, if you find an accented character problem with Tableau and your accessing your DB through putty, ensure the character encoding is aligned between putty and the DB so that errors aren't masked in this way.
Regards,
James

character encoding output to a file in linux

The working environment is jboss+mssql
I am doing a query and output the formatted result to a text file. The query result has some French accent characters.
On my local machine, everything works fine, but on the UAT server (linux box, UTF-8), the french accent characters become question marks.
Does anyone know how to solve it?
It depends on how you create your file - a code example would be helpful.
If you do specify an encoding explicitly, e.g. when creating a Writer, then if it doesn't match the locale of the machine on which you view the file, you may see question marks, placeholder boxes etc. instead of accented letters. You can use the locale command to check your locale and this will make it possible to learn the associated character encoding. This is just a matter of viewing the file. You say that the box is UTF-8, but do ensure that the app is also running under a UTF-8 locale - your user console and the server app may be using different locales.
If you do not specify the character encoding when writing, most often you will end up using the system's locale. In that case it may happen that this locale doesn't support the characters you need, so they are replaced with placeholders. A solution would be to change the locale with which your app is running e.g. by exporting the corresponding LC_* environmental variables.
So, the short checklist goes like this:
How do you write your file? Is the encoding specified explicitly?
What is the locale with which the app is running (output of locale command)?
Check the actual bytes written to your file using od -t x1 command or using a hex viewer like the one included in mc. Are the question marks actual question marks (hex code 3F), or rather some other character? If they take one byte, they're probably in one of the Latin-N (ISO 8859-N) encodings. If they take more than one byte, it's probably UTF-8 (I understand the letters a-z look normal, so it's not UTF-16).

iOS Passbook Serial Number: What characters are valid? What is the max length?

Passbook for iOS uses a serial number that your servers can use to identify a specific pass.
Does anyone know what characters are valid in the passbook serial number? I know that digits and letters are valid, but are symbols/punctuation valid as well (e.g. "-" and ".")?
Also what is the maximum length of a serial number?
Thank you.
Pretty much any character can be used, including '-' and '.', as long as the serial remains unique. Special characters, like '\' will need to be properly escaped, although these may not be compatible with your database, or may cause other problems if not handled properly elsewhere in your code.
I have just tried a pass with the following serial and it added to Passbook with no problems.
"serialNumber":"[]{}-_)(*&^%$##!`~+=|\\\/?.><,:;"
UTF8 encoded characters are also fine:
"serialNumber":"\u9127\u6a02\u611a" // Chinese characters 鄧樂愚
As for maximum length, I'm not aware of any limit, although it would be quite simple to experiment.
This serial of 400 characters also ingests ok.
"serialNumber":"0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"
I would recommend against using any sort of user input for the serial, since this may lead to non-unique collisions and open you up to injection attacks. Also adhering to XML standards is not a bad practice to avoid any problems if you change your architecture (say to a web service solution like AWS DynamoDB) down the line. Base64 encoding your serial would ensure widespread compatibility.
The serial can also be used to store metadata in the pass E.g.
"serialNumber":"UniqueID|data1|data2|data3|etc."

Character Encoding and the ’ Issue

Even today, one frequently sees character encoding problems with significant frequency. Take for example this recent job post:
(Note: This is an example, not a spam job post... :-)
I have recently seen that exact error on websites, in popular IM programs, and in the background graphics on CNN.
My two-part question:
What causes this particular, common encoding issue?
As a developer, what should I do with user input to avoid common encoding issues like
this one? If this question requires simplification to provide a
meaningful answer, assume content is entered through a web browser.
What causes this particular, common encoding issue?
This will occur when the conversion between characters and bytes has taken place using the wrong charset. Computers handles data as bytes, but to represent the data in a sensible manner to humans, it has to be converted to characters (strings). This conversion takes place based on a charset of which there are many different ones.
In the particular ’ example, this is a typical CP1252 representation of the Unicode Character 'RIGHT SINQLE QUOTATION MARK' (U+2019) ’ which was been read using UTF-8. In UTF-8, that character exist of the bytes 0xE2, 0x80 and 0x99. If you check the CP1252 codepage layout, then you'll see that those bytes represent exactly the characters â, € and ™.
This can be caused by the website not having read in the original source properly (it should have used CP1252 for this), or is displaying an UTF-8 page with the wrong charset=CP1252 attribute in Content-Type response header (or the attribute is missing; on Windows machines the default charset of CP1252 would be used then).
As a developer, what should I do with user input to avoid common encoding issues like this one? If this question requires simplification to provide a meaningful answer, assume content is entered through a web browser.
Ensure that you read the characters from arbitrary byte stream sources (e.g. a file, an URL, a network socket, etc) using a known and predefinied charset. Then, ensure that you're consistently storing, writing and sending it using an Unicode charset, preferably UTF-8.
If you're familiar with Java (your question history confirms this), you may find this article useful.

Resources