how do I convert DAQ-derived mxd file format to csv? - parsing

Background:
I was given a pile of yokagawa "mxd" files without documentation or
description, and told "convert it".
I have looked for documentation and found none. The OEM doesn't seem to "do" reproducibility in the sense of a "code book". (link)
I have looked for online code for converters and found none.
National Instruments has a connector, but only if I use latest/greatest
LabVIEW (link). I don't have that version.
The only compatible suffix is from ArcGIS, but why would DAQ use a format like that.
Questions:
Is there a straightforward way to convert "mxd" to "csv"?
How do I find the relationship using the binary data? Eyeballing HEX seems slow/inefficient.
Is there any relationship between DAQ mxd and ArcGIS mxd?

Yokogawa supplies a progam called MX100 Standard Software: https://y-link.yokogawa.com/YL008/?Download_id=DL00002238&Language_id=EN, this program can read the *.mxd files and also export them to ascii or excel. See the well hidden manual: http://web-material3.yokogawa.com/IMMX180-01E_040.pdf, page 105 has chapter 3.7: converting data formats.

Related

Convert Mathjax ouptut in a readable format for jqmath

On a project, I use ck editor with mathjax plugin in order to insert some formulas.
In a another part of this project, I would like to use jqmath. Cause it's faster and more integrated in wkhtmltopdf (I use those formulas in some docs produced by wkhtmltopdf, and some issues exist with mathjax, especialy over bar).
My problem: syntax is different between mathjax and jqmath. Of course, jqmath doesn't care about my formulas syntaxed under mathjax...
So my question is: does it exist a way to convert maths strings from mathjax to jqmath syntax?
Cheers
Both MathJax and jqmath use MathML internally and both understand it as an input format (jqmath added MathML input support a while back, see the copy-me.html in the distribution). So you can generate MathML from MathJax and feed that into jqmath.

OASIS VLSI layout files parser

OASIS is a format for VLSI topology representation. I need a parser for the OASIS format, or maybe some documentation which will describe how this format is structured. I can't find any mentions of it in Google.
Is there an OASIS parser available out there, or at least some documentation on the file structure?
The OASIS file format is graph structure that defines the layout of the chip. Geometry in the file is divided into cells. Each cell can then be placed any number of times in different locations. The placements can be nested within other cells forming a DAG graph structure.
You can parse the oas file by writing a recursive descent parser and recreating the graph structure in memory.
The official specification for the oas format can be found here.
Also, look at KLayout source code for an example of how to write a parser for Oasis.
I think Cadence Virtuoso will help you. The December 2013 release is stable, with all features added for OASIS.

Mahout: Importing CSV file to Sequence Files using regexconverter or arff.vector

I just started learning how to use mahout. I'm not a java programmer however, so I'm trying to stay away from having to use the java library.
I noticed there is a shell tool regexconverter. However, the documentation is sparse and non instructive. Exactly what does specifying a regex option do, and what does the transformer class and formatter class do? The mahout wiki is marvelously opaque. I'm assuming the regex option specifies what counts as a "unit" or so.
The example they list is of using the regexconverter to convert http log requests to sequence files I believe. I have a csv file with slightly altered http log requests that I'm hoping to convert to sequence files. Do I simply change the regex expression to take each entire row? I'm trying to run a Bayes classifier, similar to the 20 newsgroups example which seems to be done completely in the shell without need for java coding.
Incidentally, the arff.vector command seems to allow me to convert an arff file directly to vectors. I'm unfamiliar with arff, thought it seems to be something I can easily convert csv log files into. Should I use this method instead, and skip the sequence file step completely?
Thanks for the help.

Determine if received data is PostScript or PCL

I have a service that receives printer data via tcp/ip. When the data is received, is there reliable, efficient way to examine the data stream and determine if the data is PostScript vs PCL data? For example, are there characters I could look for at the beginning of the data stream to indicate the format?
I would probably just count the number of escape characters in the file. PCL will have gobs of them. Postscript will have gobs of % signs. That isn't a perfect solution, but it's dead simple and I'll bet it would actually be quite reliable.
The only "real" way I can see doing this is to actually parse the PCL and parse the postscript and see which one works.
I'll add my 2ยข.
Like others have mentioned here, your first stab at programmatically identifying the document would be to look at the first two characters. If it starts with %!, it is PostScript, if it starts with an escape character (hex 1B, oct 033, ascii 27), as very likely PCL will start with PCL commands, then it is PCL. This will likely resolve 99% of the documents you need to process. If it still isn't known, then you can search the document for a showpage string. If it's PostScript, it has to have a showpage to render the page. If you can't find one, and if there are any escape characters in the file, you know it is PCL, and you can err on the side of PCL if there is no showpage, and there is no escape characters, because raw text files are valid PCL and printers can blort them out as they come.
Postscript data must begin with "%!ps" or "%!PS" - it may be a longer readable string like "%!PS-Adobe-3.0" - but that is basically this.
Most likely PCL have a similar signature - I remember seeing it in the past.
According to the PCL 5 General Printing FAQs PCL files should start with ESC "E". I assume another ESC sequence must follow. So my guess is that files starting with bytes 1B 45 1B are most likely PCL files.
This leaves PCL files unrecognized which don't adhere to this rule.
In my use case it's macOS that always produces PCL with the ESC E at the beginning.

Where can I find get a dump of raw text on the web?

I am looking to do some text analysis in a program I am writing. I am looking for alternate sources of text in its raw form similar to what is provided in the Wikipedia dumps (download.wikimedia.com).
I'd rather not have to go through the trouble of crawling websites, trying to parse the html , extracting text etc..
What sort of text are you looking for?
There are many free e-books (fiction and non-fiction) in .txt format available at Project Gutenberg.
They also have large DVD images full of books available for download.
NLTK provides a simple Python API to access many text corpora, including Gutenberg, Reuters, Shakespeare, and others.
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
the gutenberg project has huge amounts of ebooks in various formats (including plain text)

Resources