Ruby : Extract an embedded XML inside a PDF - ruby-on-rails

I have a Rails application connected to a Flask microservice.
I have a pdf stored inside AWS S3.
I can access that pdf with the following code :
s3 = Aws::S3::Client.new(region: 'eu-west-3')
image_data = s3.get_object(bucket: "bucket-name", key: "file-name")
image_data.body
Inside that pdf is an embedded XML.
All I'm trying to do at this point is to extract the XML inside the PDF within my Rails application or within my Flask application.
That's it : I just want to extract the XML, nothing else.
Unfortunately i'm unable to find any relevant information, in Ruby or Python3, on how to do this properly.
For full information, here is the actual pdf (this is fake data) from which I'm trying to extract the XML
If you open it with adobe
https://drive.google.com/file/d/10pN8aFDAYY3qDBziODH-L441ZcnfdxQU/view?usp=sharing

Related

How to Upload Video to Rails Server from App Inventor

My goal is essentially to upload a video file, like this, but with Ruby on Rails instead of PHP. I can successfully send JSON data to my server, but haven't been able to get file uploads working. The end objective is to have the file be in the server's /tmp directory, just like files uploaded via a webpage file_field_tag.
This image:
shows what I've tried so far. The result is that on the server, the parameter list is empty, unlike if you had used a file_field_tag. In the PHP example, they are able to get the contents of the file from the input stream... maybe there is something similar in Rails?
I know my API works, as I was able to successfully make a request using a JavaScript XMLHttpRequest, so I'm led to believe the solution involves working around what App Inventor offers for HTTP requests.
Edit: Removed unsupported header since the PHP example doesn't use headers anyways

PDF::Toolkit.pdftotext not working in Rails

I am developing Rails 4 application where used PDF:Toolkit. Everything work file like open pdf, pdf-to-text working well in local.
But when i uploaded code on server then always read any pdf blank. No any contents read inside PDF file.
PDF::Toolkit.pdftotext(#document.attachment.path,"-layout")
Output nil of every PDF using above syntax.
I tried lots but not found any issue.
Anyone have a idea.

Reading a hosted PDF file from a URL and converting to text in iOS using Objective-C

I am reading a PDF file stored locally (using nsbundle) and converting it to text.
But when I am trying to read the PDF from http i.e. URL scheme and give the path to my PDF to text converter it returns nil.
Any solutions would be appreciated.
My basic question is how to read a PDF file from a URL path?
on that way there are many restriction to convert PDF file to plain text.If you want to display PDF on app then use PDF Reader Core

extract data from Pdf using Web harvesting

How can i extract data from PDF using Web Harvesting? I am getting all the relevant PDFs url in a page but i am not been able to extract data out of those Pdf.I am using Web Harvest version 2.0 for extracting the Pdfs url. Please help.
how will i incorporate pdfcommand in web harvesting to get the text? Is there any other way to do without running any batch file?
I think web harvest is not sufficient for this. You should use WGET and pdfbox to get your result. First download all the PDF through your URL into a folder with the help of WGET or Web harvest itself. Then run pdfbox command to get text from PDFs. You may get some knowledge on pdfbox from URL http://pdfbox.apache.org/commandline/. You can also create a batch file to run these things in order.

convert html file(not html layout) into pdf using Rails

I have an application which takes in a zip file as its input. The zip file contains a html file, its css and the images. I need to convert this html to pdf and send back. I have looked at Prince XML and wicked PDF but dont know how to perform this exact task.
It is like my application should act as a HTML to PDF converter. They will send us zip files and my app should generate the corresponding PDF. How to go about this task.
You can use PDFKit gem.
https://github.com/jdpace/PDFKit
http://www.jonathanspies.com/posts/11-Simple-PDFkit-example-in-Rails-3
The process I would implement is:
Upload the zip via a form
Unzip the contents
Process the html file within using Hpricot or similar (if you need to tidy it up first)
Convert the raw html to PDF with https://github.com/jdpace/PDFKit

Resources