Contents of executable files cannot be copied? - character-encoding

So, text files can be copied and pasted to another location by copying the contents of the original file into a blank text file. This can be done with a text editor. Highlight contents of text file, copy, create new blank text file, paste in to it.
But, why can't image, audio, video, executable files, etc., be copied and pasted like this? For example, I open an executable file with a text editor, copy all of it's contents, create a new blank text file, change the extension to .exe, and paste into it (through a text editor). But, the file cannot be run. Why?
Also, I would like to be able to edit these types of files like I do with text files. Is there a way?

Because executable and media files are "binary" files. Text files are binary as well, but different. All files are created binary, but some are created more binary than others.
You're opening a binary file in a text editor. This immediately changes the semantics of the bytes. The main problem is bytes containing a value that happens to correspond to those of newline characters if it were a text file (0x0A and 0x0D), which will be rendered as a platform-dependent newline (\r\n on Windows, for example). When you copy that, you've changed either 0x0A or 0x0D to 0x0D 0x0A.
Then there's control characters or non-printable characters. Not all bytes between 0x00 and 0xFF can be represented as a character. They'll either be omitted or replaced with a displayable character.
So when you copy a text containing those, they'll be omitted or otherwise mangled.
In conclusion: you cannot reliably use text to display all possible byte values, unless you choose to encode the bytes' values, as is done using for example Base64 encoding.
If you want to edit a binary file, use an editor that is aware of those bytes: a "hex editor". Do note that changing random byte values in a binary file does not guarantee the sanity of that file: there may be checksums built into the format, and your edit will invalidate that checksum.

Related

How to load TextAsset from txt file containing extended ASCII characters in Unity?

In my case specifically, I have bullet-points (• or #149) in my text file.
If I copy paste "•" into my Unity text field in the editor, it shows up, so I am pretty sure the bullet-point is lost in the reading process. (I checked in debug mode, and indeed the bullet-point is lost at reading).
This is how I read in my text file as a TextAsset:
TextAsset content = Resources.Load(SlideManager.slideLanguage+"\\"+fileName+" ("+SlideManager.slideNumber+")") as TextAsset;
It turns out, that the way I read is completely fine. It reads the file correctly, but the encoding of the file is ASCII, therefore the resource loader cannot interpret none ASCII characters, and drops them.
Thus, since the bullet-point is not standard ASCII, but extended ASCII character, you have to specify the encoding of your text files.
For example, set encoding to UTF-8, and then it will work. I used notepad++ to set encoding, but I am sure there are many other ways you can do it.
To set encoding in Notepad++
Click on the tab named Encoding (fifth tab from the left on the top by default), and select Convert to UTF-8.

file encoding on a mac, charset=binary

I typed in
file -I*
to look at all the encoding of all the CSV files in an entire directory. A lot of the file encodings are charset=binary. I'm not too familiar with this encoding format.
Does anyone know how to handle this encoding?
Thanks a lot for your time.
"Binary" encoding pretty much means that the encoding is unknown.
Everything is binary data under the hood. In text files each byte, or sequence of bytes, represents a specific character, and which character in particular depends on the encoding the file was encoded with/you're interpreting the file with. Some encodings are unambiguously recognisable, others aren't (e.g. any file is valid in any single-byte encoding, you can't easily distinguish one single-byte encoding from another). What file is telling you with charset=binary is that it doesn't have any more specific information than that the file contains bits and bytes (Capt'n Obvious to the rescue). It's up to you to interpret the file in the correct encoding/interpret it as the correct file format.

In Xcode, how do I extract .png image from an .icns file programmatically?

A beginner in iOS7 and Xcode. If I understand well, an .icns file is a container and not an image. It contains different sizes with different extensions (e.g. png, jpg) of the icons and perhaps other stuff. Right or wrong?
If so, is there a way of extracting a 512x512.png version of icon from this file? I was told I have to use NSdata to get hold of the contents of .icns file and then substring it to get the image data. But what I don't know is where does the block of code for 512x512.png starts where does it end?
I hope I am formulating my question correctly. I am new at this.
Thank you for your help
I don't have any code ready (though you might want to look at libicns).
Basically, you have to read the file contents sequentially. First you read the header (8 bytes). Then after the header, you read icon by icon. Each icon starts with 8 byte header containing the type and length. You read the type and length and if the type doesn't match, you skip the icon and proceed with the next one.
Once you have found the desired icon, you can extract it. It starts right after the length field and is length - 8 bytes long (the length includes the header which is only needed for the icns file format).
Note that the length is stored with the most significant bit first (big endian) but iOS is little endian. So you need to convert the numbers.
The icon type can either be treated as string of four characters (without a trailing zero) or as a unsigned 4 byte number. Some compilers can treat 'ic09' as an unsigned int constant (note the single quotes vs. the double quotes for a string).

What kind of char is this and how do I convert it to a text?

What kind of char is this and how do I convert it to a text in c#/vb.net?
I opened a .dat file in notepad, took a screenshot and attached it here.
Your screenshot looks like the digits "0003" in box. This is a common way to display characters for which a glyph isn't available.
U+0003 is the "END OF TEXT" control character. It's unlikely to occur within a text file, but a ".dat" file might be a mixture of text and binary data.
You'll need to use a hex editor to find the exact ASCII code (assuming the file is ASCII, which seems to be an entirely incorrect assumption) that the file contains. It's safe to say that whatever byte sequence is contained in the file is not a printable character in whatever encoding the editor used to open the file, and that is why it used that graphic in place of the actual character.

How to read unicode characters accurately

I have a text file containing what I am told are unicode characters, for example:
\320\222\320\21015-25'ish per main or \320\222\320\21020-40'ish per starter
Which should read:
£15-25'ish per main or £20-40'ish per main starter
However, when viewing this text in Firefox, the output is mangled with various unwanted characters.
So, are these really unicode characters? And if so, how can I convert them to a form which is displayable correctly?
You need to:
know the encoding of the text file
read the data without losing information (either by reading it as binary or by reading it as text with the right encoding)
write the data with the right encoding (either by writing it out in binary and specifying the original encoding, or writing it out as text in an encoding which you also specify in the headers)
Try to separate out the problem into "reading" and/or "writing". Do you know the encoding of the file? What do you have to do with the file? When you've written it with backslashes, is that actually what's in the file (i.e. an escaped form) or is it actually just a "normal" text encoding such as UTF-8?

Resources