Matlab Parse Binary File - parsing

I am looking to speed up the reading of a data file which has been converted from binary (it is my understanding that "binary" can mean a lot of different things - I do not know what type of binary file I have, just that it's a binary file) to plaintext. I looked into reading files quickly awhile ago, and was informed that reading/parsing a binary file is faster than text. So, I would like to parse/read the binary file (that was converted to plaintext) in an effort to speed up the program.
I'm using Matlab for this project (I have a Matlab "program" that needs the data in the file). I guess I need some information on the different "types" of binary, but I really want information on how to read/parse said binary file (I know what I'm looking for in plaintext, so I imagine I'll need to convert that to binary, search the file, then pull the result out into plaintext). The file is a logfile, if that helps in any way.
Thanks.

There are several issues in what you are asking -- however, you need to know the format of the file you are reading. If you can say "At position xx, I can expect to find data yy", that's what you need to know. In you question/comments you talk about searching for strings. You can also do it (much like a text file) "when I find xxxx in the file, give me the following data up to nth character, or up to the next yyyy".
You want to look at the documentation for fread. In the documentation there are snippets of code that will get you started, but as I (and others) said you need to know the format of your binary files. You can use a hex editor to ascertain some information if you are desperate, but what should be quicker is the documentation for the program that outputs these files.
Regarding different "binary files", well, there is least significant byte first or LSB last. You really don't need to know about that for this work. There are also other platform-dependent issues which I am almost certain you don't need to know about (unless you are moving the binary files from Mac to PC to unix machines). If you read to almost the bottom of the fread documentation, there is a section entitled "Reading Files Created on Other Systems" which talks about the issues and how to deal with them.
Another comment that I have to make, you say that "reading/parsing a binary file is faster than text". This is not true (or even if it is, odds are you won't notice the performance gain). In terms of development time, however, reading/parsing a textfile will save you huge amounts of time.

The simple way to store data in a binary file is to use the 'save' command.
If you load from a saved variable it should be significantly faster than if you load from a text file.

Related

What exactly is an .EBIN file and how can I access its contents?

A friend of mine has a problem. He has hundreds of highly confidential .EBIN files for a medical study created by a person that is no longer available.
I figured that it's probably an Erlang directory - I downloaded Erlang and looked for several file type specifications, but I just can't find a way to "open" this binary file.
I feel really stupid right now as I should be able to easily access this as a long-term programmer, but I'm clueless. I don't even know what to enter into a search engine.
I'd guess that they are just containing serialized Erlang data ("terms"). Try starting Erlang and entering the following from the Erlang shell:
erlang:binary_to_term(element(2,file:read_file("YOURFILE.EBIN"))).
See http://erlang.org/doc/man/erlang.html#term_to_binary-2 for details about the term_to_binary() function and see http://erlang.org/doc/apps/erts/erl_ext_dist.html for details about the term format. If the bytes on disk don't look like this, it's likely that the binary data has also been encrypted before writing it on disk.

Loading textfile into stringlist with firemonkey on osx when the encoding is unknown

I am having a hard time to load a textfile into a stringlist in firemonkey on osx when the encoding of the textfile in not known.
When I just use list.loadfromfile(filename), I get most of the time an exception regarding encoding.
list.loadfromfile(filename,TEncoding.unicode) will also fail when the file is in ansi, and opposite.
There is no issue on Windows, list.loadfromfile(filename) just works, but not on osx.
I cant specify the encoding, because it will be unknown (user provide the text files).
Any clue how I can get around this encoding issue when running the app on a mac?
In general this is not possible. It is quite possible to create a single file that is valid when interpreted in all common encodings. This has been discussed many times, for instance: The Notepad file encoding problem, redux.
I'm assuming that you are working with files that do not contain byte order marks, BOMs. Obviously if your input files contained BOMs then you could simply check the BOM and be done.
With that assumption stated, the right solution to the problem, in a perfect world, is to know the encoding. Either pick a specific encoding which your program requires, or arrange for the user to tell you the encoding when they supply the file.
If, for whatever reason, you cannot do that then the next best thing to do is to use heuristics to attempt to guess the encoding used. I'm not aware of any Pascal code to do this. But you should be able to put something together that will work reasonably well. This answer gives an outline of a basic strategy: https://stackoverflow.com/a/20747074

Print contents of rpg file in human-readable format

Context
A friend of mine is having trouble printing source code to a human readable format.
The compiled (I assume) programs of their welding robot have the .rpg extension. They want to collect print-outs in human-readable format, possibly for backup or future reference.
Their supplier can provide the software that accomplishes this, be it at a considerable cost (and possibly: an annual license). Because of this, my friend decided to ask me if a easier/cheaper solution exists.
Examples & Pictures
The files can be read on the console of the robot, an example:
I've done some minor research and I'm fairly sure this is the Report Program Generator (RPG) language developed by IBM. The Assembly-like syntax seems to match; it might be one of the later versions of the language.
My friend has send me an example .rpg file, the contents seem binary with some string literals scattered throughout. Screenshot of the contents of an example file in hexadecimal:
The Question
There is not much, if any, clear information to be found online so I suppose I have multiple questions (for anyone that might know more about this):
Is this (first image) Report Program Generator (RPG) code?
Does the .rpg file contain compiled or processed code? Maybe an intermediate format?
Is it possible to convert files as shown in the example, back to source-code or human-readable format, kind of 'disassemble' it?
If anyone knows more, don't hesitate to give me any information or ask more details if necessary. Thanks in advance!
And maybe not an important question but still something that bugs me (and might indicate I'm on the wrong track):
If this is indeed an RPG program, why would the compiled/processed binary have the .rpg extension, shouldn't the source-file have that? This leads me to believe I'm either (a) assuming the wrong things (the language, etc...) or (b) this is an intermediate format, easier for machines to read, that has to be interpreted by some kind of runtime system.
I don't think that's any version of IBM's RPG language. RPG does have a MOVEL opcode, but it doesn't have any of the others.
Also, all the versions of the IBM language have been intended for business programming. I doubt that it would have been used for robotics.
My guess is that's a proprietary language of the company that makes the robot.
There are some similarities but it does not look like IBM RPG language.
RPG sources are in fact source physical file members. They are not stored in the "traditional" file system but in OS/400 libraries. Therefore RPG sources have no extension. They can be converted to Integrated File System stream file though.
I can't answer this question I'm afraid as it's unknown language to me.
I expect possibly that the OP misidentifies the file type/extension; that the extension is actually .prg, and the files serve as instructions for a Panasonic Industrial Welding Robot. The following forum [drilled down to Panasonic Robots] bills itself as the biggest Industrial Robots Supportforum worldwide!; perhaps a good place to ask about those images provided in the OP, and the inquiry about getting source from what appears to be a binary instruction stream.
FWiW, the first image seems to show that the Ezed utility [on the console] gives that human-readable format, so then the question might be how to get that saved and then how to transfer that elsewhere; e.g. what type of comm ports and file transfer utilities are available from whatever platform/OS.

Binary Serialized File - Delphi

I am trying to deserialize an old file format that was serialized in Delphi, it uses binary seralization. I know nothing about the structure of the file except some very high level records that are in it.
What steps would you take to solve this problem? Any tools etc?
A good hexeditor, and use the gray matter to identify structures.
If you get a hint what kind of file it is, you can search for more specialized tools.
Running the unix/Linux "file" command can be good too (*) See Barry's comment below for how it works. It can be a quick check for common filetypes like DBF,ZIP etc hidden by using a different extension.
(*) there are 3rd party builds for windows, but they might lag in versions. If you can do it on a recent *nix distro, it is advised to do so.
The serialization process simply loops over all published properties and streams their value to a text file. If you do not know the exact classes that were streamed to the file you will have a very hard time deserializing the file. (if not impossible)
A good hex editor is first. If the file is read without buffering (eg read directly from a TFileStream) you could gain some information when using ProcMon from SysInternals; You can see exactly what data is read in what chunks and thus determine more quickly where the boundaries are between the structures you already identified.

What are the differences or advantages of using a binary file vs XML with TClientDataSet?

Is there any difference or advantages using binary a file or XML file with
TClientDataSet.
Binary will be smaller and faster.
XML will be more portable and human readable.
The Binary file will be a little smaller.
The main advantage of the XML format is that you can pass it around via http(s) protocols.
Binary is smaller and faster, but only readable by TClientDataSets.
XML is larger and slower (both are not that bad, i.e. not by orders of magnitude bigger or slower).
XML is readable by people (not recommended in general, but it is doable), and software.
Therefore it is more portable (as Nick wrote).
TClientDataSets can load and save their own style of XML, or you can use the Delphi XML Mapper tool to read and write any kind of XML.
XSLT can for instance be used to transform those XML files into any kind of text, including other XML, HTML, CSV, fixed columns, etc.
In contrast to what Tim indicates, both binary and XML can be transferred through HTTP and HTTPS. However, it is often appreciated sending XML as it is easier to trace.
Without having tested it: I guess the binary format would be quite a lot faster when reading and writing. You'd better do your own benchmarks for that, though.
Another advantage of binary might be, that it cannot be easily edited which prevents people from mucking up the data outside the application.
When using Delphi 2009, we have noticed that if the file has an extension of .XML, it will not save in binary format over an existing dfXMLUTF8 format, even with a LoadFromFile, SaveToFile. Changing the file extension to something else (.DAT, for example) allows saving the file in dfBinary. Our experience is that the binary file, in addition to being somewhat more difficult for the end-user to manipulate (a plus!), is approximately 50% smaller than the dfXMLUTF8 format file.

Resources