Converting an AVDL file into something Apache's avro python package can parse - avro

What I would like to be able to do is take an .avdl file and parse it into python. I would like to make use of the information from within python.
According to the documentation, Apache's python package does not handle .avdl files. I need to use their avro-tools to convert the .avdl file into something it does know how to parse.
According to the documentation at https://avro.apache.org/docs/current/idl.html, I can convert a .avdl file into a .avpr file with the following command:
java -jar avro-tools.jar idl src/test/idl/input/namespaces.avdl /tmp/namespaces.avpr
I ran through my .avdl file through Avro-tools, and it produced an .avpr file.
What is unclear is how I can use the python package to interpret this data. I tried something simple...
schema = avro.schema.parse(open("my.avpr", "rb").read())
but that generates the error:
SchemaParseException: No "type" property:
I believe that avro.schema.parse is designed to parse .avsc files (?). However, it is unclear how I can use avro-tools to convert my .avdl into .avsc. Is that possible?
I am guessing there are many pieces I am missing and do not quite understand (yet) what the purpose of all of these files are.
It does appear that an .avpr is a JSON file (?) so I can just read and interpret it myself, but I was hoping that there would be a python package that would assist me in navigating the data.
Can anyone provide some insights into this? Thank you.

The answer is to use the idl2schemata command with avro-tools.jar, providing it with an output directory to which it can write the .avsc files. The .avsc files can then be read AVRO python package.
For example:
java -jar avro-tools.jar idl2schemata src/test/idl/input/namespaces.avdl /tmp/

Related

Tex4ebook tidy command missing

I'm converting a Latex document with Tex4ebook (built on tex4ht) and I get the following warning: exec_epub: tidy command seems missing, you should install it in order to make a valid epub file.
But where do I install it and what does it do?
Thanks in advance.
tex4ebook uses HTML Tidy for clean-up of the generated XML files that can contain some issues resulting from the conversion process. If tex4ebook cannot find Tidy, it will use regular expressions for the clean-up. It is usually enough for production of the valid Epub file, so you don't really need to worry about the warning, as long as the generated file is valid.

Identify old dos system file type and decode it into text file

I have pick up an old dos system from my friend, and I need to import the data into SQL, but before importing the data, i need to decode it into a readable text file, but I failed to do so. I have try several stuff:
file command in ubuntu terminal, it said "data"
Use online trid and it said macbin(MacBinary 1)
Tried bin2hex, but couldn't unhex it
Tried some online macbin to hex, no luck as well
Tried to open in macOS, but it keep extracting files
bin2hex said, nothing here
stuffitexpander.... Doesn't recognize...
This is the file that i need to decode
https://gofile.io/?c=wdbs6A
Please let me know if you need the original program.
I think they are just some database files.
Use this site for explanations. they even have a file analyzer - showing you the data inside.
You will need to rename the files to .db extension instead of .ocm.

Is it possible to directly parse API protobuf files in iOS?

I am making an app that shows the real-time location of local buses. I have an API that returns the .pb (protobuf) file with Vehicle Positions. I am handling proto buffers for the first time and I have no idea why we can't parse them like JSON file.
I saw a library named "Swift-Protobuf", but in its documentation. They are asking to run a command to convert protobuf file into a swift object. But as I am making API calls every minute that returns the protobuf file. How can I run that command every time?
$ protoc --swift_out=. my.proto
I just want to parse those .pb files into a swift object. So that I can use the data on my project.
They are asking to run a command to convert protobuf file into a swift object. But as I am making API calls every minute that returns the protobuf file. How can I run that command every time?
I think you've misunderstood the documentation: you don't need to run protoc --swift_out=. my.proto for every .pb file you receive; you use that command to generate code that knows how to read and write data according to the schema that you define in a .proto file. You can then take that generated code and add it to your iOS project, and after that you can use the code to read and write protobuf data that matches your schema.
I am making an app that shows the real-time location of local buses.
So before you can get started, you're going to need a .proto file that describes the data format used by whoever provides the bus location data, or you'll need whoever provides that data to use SwiftProtobuf or similar to generate a Swift parser for their .proto file.
...I have no idea why we can't parse them like JSON file.
Well, the point of the protobuf format is to be language-agnostic and faster/easier to use than JSON or XML, and one of the design decisions that Google apparently made is to sacrifice human readability for size/speed. So you could write a parser to parse these files just as you would JSON data, but you'd have to learn how the format works. But it's a lot easier to describe the data you're sending and have a program generate the code. One nice aspect of this arrangement is that you can describe the schema once and then generate code that works with that schema for several languages, so you don't have to write code separately for your iOS app, your Android app, and your server.

Best way to parse pdf and word doc

I want to build an application that gets info out of a pdf or word doc and populate this into my database.
How do I go about this in the best way? Bare in mind that only certain information needs to be extracted from the pdf or word docu.
To parse PDF, I know 2 choices :
pdftotext
Check pdf2text
OCR
try tesseract
There are planty of free open source libs that will help you parsing the input file.
in the basic concepts- dont build the parser from scratch ,
use some open source lib to help you out.
if you will say in what lang you'r trying to write your code it may help:
for example for PDF your can find:
https://www.pdfparser.org/ (for php)
https://www.codeproject.com/Articles/12445/Converting-PDF-to-Text-in-C (for C#)
and more.
for DOC\DOCX , pretty much the same.

Parse or convert .pb files under .sonar folder

I'm using sonarqube 5.6.5,everything works well . Now i need to parse the issues.PB file generated under .sonar/batch-report/ folder. i tried using jsonformat but it is not working.
They are "Protobuf" format, which is a format by google for serializing data. You can get started here or find for example a tutorial here on how to use it in Java.
What I don't understand is that your question has a tag "protobuf-net", which github page explains very well how to use it (in .NET).

Resources