How do operating systems handle different character encoding? - character-encoding

My operating system is Chinese win10 system, so its character encoding should be the use of GBK, but if I open UTF-8 encoding TXT file, it is still able to display the normal inside Chinese characters. But when I put the file into an English Win10 virtual machine to open the TXT file, the text display has changed. Why is this happening? How does the operating system handle character coding? Operating system encoding and TXT file character encoding relationship is what?
edit: I use notepad to open the file.
in english system
in chinese system

Related

Notepad++ can't display Chinese Characters after updating to Windows 10 18363.592

I just updated my windows 10 from 1909 18363.535 to 1909 18363.592. Notepad++ used to work properly and was able to display Chinese Characters, but after this update
NONE of the Chinese character can be displayed. Initially, I thought it was a Notepad++ problem, as it happened on my other laptop, so I copied the same file to my 2nd laptop which had the 18363.535 version, same version of Notepad++, the file was able to display just fine.
After the update, neither can display Chinese characters. I have even tried to change the file format, from UTF8 to ASCII and checked some character encoding. None worked.
All Chinese are now squares. You can even input the new ones.
My locale was set correctly to Chinese on both computers using windows 10 X64 English (UK) version.
Anyone had the same/similar problems????
Did a long seach, as it turns out, it's a font problem.
Windows 10's lastest update somehow messed up the font.

How to change codepage of files to make it usable on IBM/Toshiba 4690 OS?

I've files with various languages. Tried Notepad ++ and cpconverter but they didnt help as some codepages like codepage 942 or codepage 1381 are not available.
This is an image from IBM machine running on 4690 OS
If it is in binary format, Please make use any of the Hex editor tools to view the contents of the file but before that you should be well versed about the contents (ex: TLOG strings if its Transaction data) to understand the details.

Tips on unicode text editors

I am currently converting a legacy system to a new platform and need to extract strings from the old systems resource files.
The old system was written in Delphi and the strings are kept in files called .dfm. I have no trouble locating the strings and for English and other European languages there is no problem. The trouble comes when I try to extract strings in Japanese. I have used Notepad++ and it seems to me that the program don´t recognice the correct encoding. I get Japanese symbols but they don´t seem to match what is in the GUI. Notepad++ shows signs in something called GB2312(Simplified Chinese). But it looks weird.
My question is, does anyone have any tips on programs/text editors that are good at operations like this?
Also I'm grateful for any tips that might help me along the way.
Assuming that your issue is simply that Notepad++ is incorrectly guessing the encoding you can solve the problem by manually setting the encoding in Notepad++, like this:
Notepad++ itself already handles encoding issues. To make it to desired encoding, like Unicode;
first, copy all the contents of the file,
choose Unicode without BOM in the menu,
last, replace all contents with copied contents
save the file
Your contents will then be in your desired encoding.
Strings are kept not [just] in DFMs in Delphi. Only forms and associated text are. So you would to review all the code as well.
As for DFMs - before Delphi 2009 DFMs didn't use Unicode so you must know what charset was used. That was one of big problems with localization and internationalization of Delphi applications.

after saving delphi form, it loses accented characters

I got some forms in Delphi 7 which I open in my IDE. Certain accented characters are not displayed correctly in the form and when I change a form containing such a character , the accent is lost.
E.g. something encoded as #337 a in dfm becomes u in the saved dfm
Can you tell what may be wrong?
update:
Problem for fixed after I changed in Control Panel, Region and Language, Tab Formats.
I changed the format from English to the language that has accented character.
Delphi 7 does not support Unicode, only ASCII. That's why the "extra" characters are not displayed.
The controls are capable of showing unicode (because Windows does). But the dfm files are still ASCII, and you have no guarantees about characters above 127. (And the VCL does not support them either).
You can switch to 2010 or 2011 (XE) for Unicode support.
In a non-unicode delphi version (delphi 7 for instance), if your current codepage supports a character, then Delphi will store your accent character into the DFM. If you reload on a system that is set to a different codepage, you won't see that character.
In a unicode Delphi (2009 or later) you will be able to store any codepoint you want into the DFMs.
AFAIR, all dfm content is encoded as UTF-8, in the dfm files produced by Delphi 7.
So you could be able to use whatever character you need.
But you need to set the proper Font CharSet property value for the component.

Indy FTP TransferType

I'm using the IdFTP (Indy 10) component to download some files (zip and txt) from a remote location. Before getting each file I set the TransferType to binary.
IdFTP.TransferType := ftBinary;
IdFTP.Get(ASource, ADest, AOverwrite);
I expect that both text and binary files can be downloaded using the binary mode. However it looks like text files contents is messed up while zip files are downloaded correctly. If I set the TransferType to ASCII for text files it works as expected. How can I detect which TransferType to set for a given file? Is there a common denominator or auto setting?
I don't see how the Binary flag can mess up transferred files. Binary type means the server transfers the files without any processing, as is.
The only thing that an FTP server should use the ASCII flag for, is to correctly handle the end of line in text files, usually (1) either only Line Feed in Unix or (2)Carriage Return + Line Feed in Windows. But nowadays most text editors handle both in either system.
So the safest is use only ASCII flag for very well known text files, probably only files with a .txt extension, and use Binary flag for all the others.
When in doubt, rule it out (!) - try transferring the files from the server using the Windows commandline FTP program, and see if text files still come out wrong. The program will transfer binary (command BIN) or text (command ASCII). If you transfer files with this and they still arrive differently to your expectation, then something is being done at the server end*. If they arrive fine, then either you (or Indy) are doing something. :-)
*In what way are the text files messed up? If you're transferring unicode text files, you might be better off transferring them as BINary anyway. I must admit that, as #unknown (yahoo) said, in most cases you should probably stick to BIN mode.
I guess it would also depend on how you are viewing the text file, ANSI or WideChar as to whether the text is messed up or not.

Resources