OASIS VLSI layout files parser - parsing

OASIS is a format for VLSI topology representation. I need a parser for the OASIS format, or maybe some documentation which will describe how this format is structured. I can't find any mentions of it in Google.
Is there an OASIS parser available out there, or at least some documentation on the file structure?

The OASIS file format is graph structure that defines the layout of the chip. Geometry in the file is divided into cells. Each cell can then be placed any number of times in different locations. The placements can be nested within other cells forming a DAG graph structure.
You can parse the oas file by writing a recursive descent parser and recreating the graph structure in memory.
The official specification for the oas format can be found here.
Also, look at KLayout source code for an example of how to write a parser for Oasis.

I think Cadence Virtuoso will help you. The December 2013 release is stable, with all features added for OASIS.

Related

how can I parse json-ld to markdown

Is there an existing parser to parse json-ld to markdown? I want to generate documentation from my jsonld file. If such a thing doesn't exist, how should I go ahead writing one? or perhaps I could use a json to markdown converter? Any suggestions on how could do this?
I was just googling for such a program, and found your question.
The closest things I could find are: ocxmd, which is an extension to Markdown; and md-ld, which does not even use proper Markdown - instead, it apparently creates an incompatible version of the format which can be parsed to JSON-LD.
If I were writing such a converter in Python, I would use:
pyld to parse JSON-LD files and expand them using the #context;
And a template engine, likely Jinja2, to generate Markdown representation of every node of the JSON-LD document.
The program would be based on recursion. You might have separate functions to display:
URIs,
Numbers,
Images,
...
The program will recurse over the JSON-LD document and convert each of its sections into Markdown format.

how do I convert DAQ-derived mxd file format to csv?

Background:
I was given a pile of yokagawa "mxd" files without documentation or
description, and told "convert it".
I have looked for documentation and found none. The OEM doesn't seem to "do" reproducibility in the sense of a "code book". (link)
I have looked for online code for converters and found none.
National Instruments has a connector, but only if I use latest/greatest
LabVIEW (link). I don't have that version.
The only compatible suffix is from ArcGIS, but why would DAQ use a format like that.
Questions:
Is there a straightforward way to convert "mxd" to "csv"?
How do I find the relationship using the binary data? Eyeballing HEX seems slow/inefficient.
Is there any relationship between DAQ mxd and ArcGIS mxd?
Yokogawa supplies a progam called MX100 Standard Software: https://y-link.yokogawa.com/YL008/?Download_id=DL00002238&Language_id=EN, this program can read the *.mxd files and also export them to ascii or excel. See the well hidden manual: http://web-material3.yokogawa.com/IMMX180-01E_040.pdf, page 105 has chapter 3.7: converting data formats.

Stanford NLP - Using Parsed or Tagged text to generate Full XML

I'm trying to extract data from the PennTreeBank, Wall Street Journal corpus. Most of it already has the parse trees, but some of the data is only tagged.
i.e. wsj_DDXX.mrg and wsj_DDXX.pos files.
I would like to use the already parsed trees and tagged data in these files so as not to use the parser and taggers within CoreNLP, but I still want the output file format that CoreNLP gives; namely, the XML file that contains the dependencies, entity coreference, and the parse tree and tagged data.
I've read many of the java docs but I cannot figure out how to get it the way I described.
For POS, I tried using the LexicalizedParser and it allows me to use the tags, but I can only generate an XML file with the some of the information I want; there is no option for coreference or generating the parse trees. To get it to correctly generate the sub-optimal XML files here, I had to write a script to get rid of all of the brackets within the files. This is the command I use:
java -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat typedDependenciesCollapsed,wordsAndTags -outputFilesExtension xml -outputFormatOptions xml -writeOutputFiles -outputFilesDirectory my\dir -tokenized -tagSeparator / -tokenizerFactory edu.stanford.nlp.process.WhitespaceTokenizer -tokenizerMethod newCoreLabelTokenizerFactory edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz my\wsj\files\dir
I also can't generate the data I would like to have for the WSJ data that already has the trees. I tried using what is said here and I looked at the corresponding Javadocs. I used the command similar to what is described. But I had to write a python program to retrieve the stdout data resulting from analyzing each file and wrote it into a new file. This resulting data is only a text file with the dependencies and is not in the desired XML notation.
To summarize, I would like to use the POS and tree data from these PTB files in order to generate a CoreNLP parse corresponding to what would occur if I used CoreNLP on a regular text file. The pseudo command would be like this:
java -cp "*" edu.stanford.nlp.pipeline.CoreNLP -useTreeFile wsj_DDXX.mrg
and
java -cp "*" edu.stanford.nlp.pipeline.CoreNLP -usePOSFile wsj_DDXX.pos
Edit: fixed a link.
Yes, this is possible, but a bit tricky and there is no out of the box feature that can do this, so you will have to write some code. The basic idea is to replace the tokenize, ssplit and pos annotators (and in case you also have trees the parse annotator) with your code that loads these annotations from your annotated files.
On a very high level you have to do the following:
Load your trees with MemoryTreebank
Loop through all the trees and for each tree create a sentence CoreMap to which you add
a TokensAnnotation
a TreeAnnotation and the SemanticGraphCoreAnnotations
Create an Annotation object with a list containing the CoreMap objects for all sentences
Run the StanfordCoreNLP pipeline with the annotators option set to lemma,ner,dcoref and the option enforceRequirements set to false.
Take a look at the individual annotators to see how to add the required annotations. E.g. there is a method in ParserAnnotatorUtils that adds the SemanticGraphCoreAnnotations.

Invoice format recognizer

I'm working with an application that receives all text from an invoice (text was get by processing the scanned image of that invoice). Now, because there're several invoice formats that are available so I need to categorize what format the application is receiving. For example some format may contains number of unit, some don't (but they both have total cost).
I did some research on parsing techniques but found no workable solution for this. Do you have any suggestion for this type of problems?
In Perl, you can use Marpa, a general BNF parser — describe your invoice format in BNF and Marpa will parse your invoices accordng to the BNF, see e.g. how it tackled this complex example with this simple code.

Parse arbitrary text to produce dependency graph

How to create dependency graph (parse tree) for random sentences. Is there any predined grammer to parse english sentences using nltk.
Example:
I want to make a parse tree for the sentence
“A large company needs a sustainable business model.”
which should look like this.
Please suggest me how this can be done.
This question is a near-duplicate of 3125926. But I'll elaborate just a little on the answer given there.
I don't have personal experience with dependency parsing under NLTK, but according to the accepted answer, the integration with MaltParser is documented at http://nltk.googlecode.com/svn/trunk/doc/api/nltk.parse.malt.MaltParser-class.html
If for some reason MaltParser doesn't suit your needs, you might also take a look at MSTParser and the Stanford Parser. I think those three options are the best-known, and I expect one (or all) of them will work for you.
Note that the Stanford Parser includes routines to convert from constituency trees and between several of the standard dependency representations, so if you need a specific format, you might look at the format-conversion arguments to the edu.stanford.nlp.trees.EnglishGrammaticalStructure class.
e.g., to convert from constituency trees to basic dependencies:
java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile <input trees> -basic

Resources