Fixing language decoding issues in midi files - character-encoding

I have a midi file of a Chinese song with information given inside the name of tracks. I use a library called MIDI.js to parse it and the output looks like this
±¡²`»¡¸Ü¥¼´¿Á¿
Where should I look to find the correct encoding / decoding?

There is no standard for specifying text encoding in MIDI files (at least none that is used in practice).
If you know the song is Chinese, you should try common Chinese encodings (GB2312/GB18030 or Big5).

Related

ID3 Parser and Editor

I'm writing an ID3 parser and editor. It does already support ID3v1, v2.1-2.3. Are there any other widely used ID3 versions or extensions? For example, I've read about Enhanced ID3v1 tag (which goes before ID3v1) and starts with "TAG+", but I've never seen it inside MP3 files. Should I implement support for it anyway?
"ID3v2.1" never existed.
Yes, Enhanced TAG identifies by TAG+, which extends IDv1.
For a list of all metadata systems to be expected in MP3 files see https://stackoverflow.com/a/62366354 - top priority should have ID3v2.4 as you will encounter those most aside from ID3v2.3. Then go for informal and/or legacy ones because those can still be encountered (just because files become old doesn't mean they cease to exist).
Keep the following things in mind when parsing files:
A file can have both: IDv1 and IDv2 tags.
A file can have multiple IDv2 tags (i.e. IDv2.3 and IDv2.4). Although it shouldn't occur it should pose no problem to your parser to also accept multiple tags of the same version.
ID3v2 is not limited to MP3 files (but IDv1 and all its informal extensions are).
Consider the following parsing order in an MP3 file:
Check for ID3v1 at the end of the file.
Check for ID3v1.2 in front of ID3v1.
Check for Enhanced TAG in front of ID3v1.
Check for multiple ID3v2 at the start of file and, as for ID3v2.4, a footer at the end of the file in front of all ID3v1-like tags.

String UTF-8 encoding with cyrillic in H2O

I load csv file of utf-8 encoding with cyrillic strings. After parsing in Flow interface - i see not cyrillic, but not readable symbols like "пїўпѕЂпѕ™пїђпѕ" How can i use utf-8 cyrillic strings in H2O?
This appears to be a bug in the Flow interface, but only in the setupParse command. If you continue through and do the import, the data gets imported correctly.
I've reported the bug, with test data and screenshots (taken in Firefox) here:
https://0xdata.atlassian.net/browse/PUBDEV-4640
So if you have additional information, or the bug is behaving differently for you, it'd be good to add it to that bug report.
check your csv file in text and binary presentation to find how Cyrillic text is encoded, if it is UTF-8 it should look like this:
Привет
for the word
Привет

How to programmatically set application name in Japanese?

Currently I am trying to set application name using
net.rim.blackberry.api.homescreen.HomeScreen.setName("これはある");
but it throws exception: IllegalArgumentException.
Can anyone provide the solution?
I am using Blackberry JDE 5.0.
This is probably a string encoding problem. Try
new String(new String("これはある").getBytes("UTF-16BE"), "UTF-16BE");
It's not pretty but I think that will work.
Here's a link to the Blackberry string spec: http://www.blackberry.com/developers/docs/5.0.0api/java/lang/String.html
By default it's ISO-8859-1 which does not include Japanese characters.
The problem you are facing is how to get a string represented in your source code into your application with the same characters. For latin characters, this is pretty straightforward, as we can just put the characters in quotes, and get a string, like "Hello world"
When you go to non-latin, like Japanese, it gets harder. You can still directly write Japanese in your source code, but you need to make sure your editor and your compiler agree on an encoding so that the characters can be interpreted correctly. The Java-SE compiler takes an argument "-encoding" which allows you to specify the encoding of your java source files.
Unfortunately, rapc, the BlackBerry compiler, does not offer an option to specify encoding, even though it is invoking javac itself. So rapc uses the platform default, which is utf-8 on Linux and OSX and iso-8859-1 on Windows.
The way around this problem is to use a feature of the Java language for parsing strings - unicode escaping. By entering the six character sequence "\u3053" in a string, the java compiler will parse that number as hexidecimal and use the corresponding unicode code point, solving problems with source file encoding.
So "Hello world" and "\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064" will result in the same strings appearing in your class files.
Because of this, Svetlin's answer from the comments is the right approach here:
net.rim.blackberry.api.homescreen.HomeScreen.setName("\u3053\u308C\u306F\u3042\u‌​308B");

Reading and Parsing a .MID file with Lua?

I am trying to read a .MID file with Lua and then parsing it into a table with all of the notes (ie {"A", "B#", "Cb", etc.}) but I cannot manage to read the file correctly. I use io.open and file:lines() but writing those same lines into another midi file results in a non-working midi file.
Is there any easier way to read and parse a .MID file with Lua?
The Standard MIDI File format is binary, not text. So you cannot expect to read it as "lines" at all. Instead, you'll need to use the read function to get bytes and inspect them. You might be better off finding a C library for MIDI files and binding it to Lua.
.MID files (presumably Standard MIDI format) are binary, not text. Reading them with file:lines() will not work. You need to read the entire thing into a "string" (Lua strings can hold arbitrary bytes of data) with file:read("*a") instead; this will read the entire file into a single string. You also need to make sure that you open the file in binary mode (for platforms where this makes a difference).
There seems to be a framework called MIDI.lua for parsing MIDI data. Not sure how well it handles MIDI files, though.

How does acrobat encode annotations added as sticky notes to pdfs?

We have been reading and writing Sticky Notes/Annotations/Comments to pdfs via an activex control in our application for a number of years. We have recently upgraded to Delphi2009 with Unicode Support. The following is causing problems.
When we call
CAcroPDAnnot.GetContents
The results seem to be rather strange and we lose our Unicode Chars. It is not like saving as an ansi string which would usually result in returning ????? instead we get a string such as
‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
For a string of Japanese characters.
However if I save the comments in the pdf to a datafile via the menu in the pdf itself it is written to file as something like
0kˆL0Oeå0k˜¨ª0’0r0D0_0‰
The latter can be export and reimported into an acrobat pdf and will recreate the correct unicode characters. However once I call CAcroPDAnnot.GetContents in my code it is coming back as something else.
Is CAcroPDAnnot.GetContents broken?
Is there an encoding scheme I should be aware of?
Is there an alternative I might be able to do?
Thanks
‚És‚­“ú‚É•—Ž×‚ð‚Ђ¢‚½‚ç
That's the string:
に行く日に風邪をひいたら
in CP-932 aka Shift-JIS encoding, an awful but lamentably still-popular encoding in Japan.
You're currently interpreting it in as CP-1252 (Windows Western European). If your PDF-reading component won't convert it for you automatically, you'll need to find a way to detect what encoding the document is in and convert it manually.
I don't know what Delphi provides for reading encodings, but have you got the encodings for Shift-JIS installed in Windows, from the Control Panel -> Regional Options -> "Install files for East Asian languages" option? If not, that might explain why it'd be failing to convert automatically, perhaps.
You're not exactly giving us a lot of information to work with.
I take it you're talking about the "Acrobat.CAcroPDAnnot" class' method GetContents here. Which version of Acrobat are you using? Have you perhaps switched versions (or run an update) around the time you started programming with Delphi 2009?
Then: how did you instantiate the object? If using a *_TLB.pas file generated from the DLL, are you certain it still matches it? (Try re-generating it, if uncertain).
Third: how are you calling the method? What type of variable are you assigning the result to?
What might also help, is if you could provide a sample of an annotation (preferably including non-ASCII chars); and for that annotation:
what it should look like (and what it does look like inside Reader)
what it returns when using a pre-2009 version of Delphi*
what it returns when using Delphi 2009*
(* preferably the HEX byte codes of the (ansi/wide)strings; but output from the Ctrl-F7 inspector should do)
Then maybe someone could provide a more meaningful answer.
Ok, one of the main differences between Delphi 2009 and the earlier versions is that the default string type is an unicode string. That means that if you use the same ActiveX component as in previous versions, you are passing unicode strings to ascii strings and that is usually not a good idea.
There are a couple of solutions for this problem:
Try if you can upgrade your activeX component so that it supports full unicode strings.
Use AnsiString and not string to communicate with the activeX component. In this case, you can still use the old interface, but you are still bound to the same limitations.
Use an other control that creates pdf. There is a lot to find, but be prepared to change a big chunk of your software. (Some controls are XML based and use encoding. )

Resources