Context
In our platform we allow users to upload word documents, those documents are stored in google drive and then dowloaded again to our platform in HTML format to create a section where the users can interact with that content.
Rails 5.0.7
Ruby 2.5.7p206
selenium-webdriver 3.142.7 (latest stable version compatible with our ruby and rails versions)
Problem
Some of the documents have charts or graphics inside that are not processed correctly giving wrong results after all the process.
We have been trying to fix this problem at the moment we get the word document and before to send it to google drive.
I'm looking for a simple way to export the entire chart and/or table as an image, if anyone knows of a way to do this the advice would be much appreciated.
Edit 1: Adding some screenshots:
This screenshot is from the original word doc:
And this is how it looks in our systems:
Here are the approaches I have tried that haven't worked for me so far.
Approach 1
Using nokogiri to read the document and found the nodes that contain the charts (we've found that they are called drawing) and then use Selenium to navigate through the file and take and screenshot of that particular section.
The problem we found with this approach is that the versions our gems are not compatible with the latest versions of selenium and its web drivers (chrome or firefox) and it is not posible to perform this action.
Other problem, and it seems is due to security, is that selenium is not able to browse inside local files and open it.
options = Selenium::WebDriver::Firefox::Options.new(binary: '/usr/bin/firefox', headless: true)
driver = Selenium::WebDriver.for :firefox, options: options
path = "#{Rails.root}/doc_file.docx"
driver.navigate.to("file://#{path}")
# Here occurs the first issue, it is not able to navigate to the file
puts "Title: #{driver.title}"
puts "URL: #{driver.current_url}"
# Below is the code that I am trying to use to replace the images with the modified images
drawing_elements = driver.find_elements(:css, 'w|drawing')
modified_paragraphs = []
drawing_elements.each do |drawing_element|
paragraph_element = drawing_element.find_element(:xpath, '..')
paragraph_element.screenshot.save('paragraph.png')
modified_paragraph = File.read('paragraph.png')
modified_paragraphs << modified_paragraph
end
driver.quit
file = File.open(File.join(Rails.root, 'doc_file.docx'))
doc = Nokogiri::XML(file)
drawing_elements = doc.css('w|drawing')
drawing_elements.each_with_index do |drawing_element, i|
paragraph_element = drawing_element.parent
paragraph_element.replace(modified_paragraphs[i])
end
new_doc_file = File.write('modified_doc.docx', doc.to_xml)
s3_client.put_object(bucket: bucket, key: #document_path, body: new_doc_file)
File.delete('doc_file.docx')
Approach 2
Using nokogiri to get the drawing elements and the try to convert it directly to an image using rmagick or mini_magick.
It is only possible if the drawing element actually contains an image, it can convert that correctly to an image, but the problem is when inside of the drawing element are not images but other elements like graphicData, pic, blipFill, blip. It needs to start looping into the element and rebuilding it, but at that point of time it seems that the element is malformed and it can't rebuild it.
Other issue with this approach is when it founds elements that seem to conform an svg file, it also needs to loop into all the elements and try to rebuild it, but the same as the above issue, it seems that the element is malformed.
response = s3_client.get_object(bucket: bucket, key: #document_path)
docx = response.body.read
Zip::File.open_buffer(docx) do |zip|
doc = zip.find_entry("word/document.xml")
doc_xml = doc.get_input_stream.read
doc = Nokogiri::XML(doc_xml)
drawing_elements = doc.xpath("//w:drawing")
drawing_elements.each do |drawing_element|
node = get_chil_by_name(drawing_element, "graphic")
if node.xpath("//a:graphicData/a:pic/a:blipFill/a:blip").any?
img_data = node.xpath("//a:graphicData/a:pic/a:blipFill/a:blip").first.attributes["r:embed"].value
img = Magick::Image.from_blob(img_data).first
img.write("node.jpeg")
node.replace("<img src='#{img.to_blob}'/>")
elsif node.xpath("//a:graphicData/a:svg").any?
svg_data = node.xpath("//a:graphicData/a:svg").to_s
Prawn::Document.generate("node.pdf") do |pdf|
pdf.svg svg_data, at: [0, pdf.cursor], width: pdf.bounds.width
end
else
puts "unsupported format"
end
end
# update the file in S3
s3.put_object(bucket: bucket, key: #document_path, body: doc)
end
Approach 3
Convert the elements since its parents to a pdf file and then to an image.
Basically the same issue as in the approach 2, it needs to loop inside all the elements and try to rebuild it, we haven't found a way to do that.
I'm using the GNU texinfo package to generate both PDF and .info files from a .texi file.
I'm trying to update an old .texi file (not changed since 2001) and generate the same PDF output. I've resolved a number of issues, but there are a couple outstanding. In the old PDF, the title was in Helvetica and body text is Liberation Serif. In the new PDF, both are Computer Modern.
I've read everything that I can find about fonts, but I'm not able to change the fonts. Nothing that I do seems to work. Everything I have tried generates errors.
In my .texi file, before any \setfont directives, I have \def\fontprefix{uh} which, if I read the pdftex.map file correctly, should select the NimbusSanL font set (e.g. uhvr8a.pfb). I get the following errors:
mktexnam: Could not map source abbreviation for uhss10.
kpathsea: Running mktexmf uhss10
! I can't find file uhss10'.
<*> ...ljfour; mag:=1; ; nonstopmode; input uhss10`
Does anyone have a example .texi file which sets the font family to use? Or an explanation of what I'm doing wrong?
I am just starting to use jsPDF and I think it may actually work (after attempting a zillion different ways to produce PDFs in my Quasar/Electron desktop application that have not worked).
Is there a way to display the PDF in the application window?
this.doc = new jsPDF({
orientation: "landscape",
unit: "in",
format: [4, 2]
})
this.doc.text(this.dogArray[0].dogCallName, 1, 1)
this.doc.save("test.pdf")
That works and I can save the PDF, but I'd also like to be able to display the generated PDF in the Electron browser window. I can console.log out this.doc, and I can display it on the window, but it's just a bunch of string info.
Is there something like doc.view("file.pdf") that can be used? I'm looking through the jsPDF documentation but I'm not seeing what I'm looking for.
I want to be able to see the PDF like the author shows on his Demo Website
I've tried to copy some code from an online book - Link here - into my Visual Studio Code, using Ctrl + C. In the book, the code appears in the format I desire to have within my editor:
salaries_and_tenures = [(83000, 8.7), (88000, 8.1),
(48000, 0.7), (76000, 6),
(69000, 6.5), (76000, 7.5),
(60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]
However, if I copy it from the previously mentioned URL, the code looks like that in the editor:
salaries_and_tenures=[(83000,8.7),(88000,8.1),(48000,0.7),(76000,6),(69000,6.5),(76000,7.5),(60000,2.5),(83000,10),(48000,1.9),(63000,4.2)]
Also, if I download the file locally and I proceed with copying the code into my editor (from the PDF), the code looks alike the one in the book:
salaries_and_tenures = [(83000, 8.7), (88000, 8.1),
(48000, 0.7), (76000, 6),
(69000, 6.5), (76000, 7.5),
(60000, 2.5), (83000, 10),
(48000, 1.9), (63000, 4.2)]
Does anyone know what is happening behind the scenes? Is there any reason the white-space characters get deleted? (Remark: it is crystal clear that it's not an editor-related issue, because I got the same issue here, when I wrote the post)
As you mentioned yes this is not editor related issue.
As you know pdf files don't store data as plain text. so copy and pasting from them is not exactly like notepad, word, etc.
But the question is why sometimes it removes white spaces and sometimes doesn't?
This is because of your pdf reader. probably you use different pdf readers in the browser and your OS. some pdf readers let you copy white spaces and some don't.
if you want to copy white spaces without downloading pdf just use your OS pdf reader for online reading too (There might be some option about online data source in your pdf reader).
I have a jasper report where the text fields are having hyper links to share point documents.The links work just fine in report and in other export formats such as excel and pdf but when exported to pptx , only the text fields are exported but not the links.
FYI -- the jasper reports version is 5.6.1
Plz help if anyone has a solution to my problem.
I have tested it (with hyperlinkType="Reference") and can not find any problems.
This is how I export to pptx
JRPptxExporter exporter = new JRPptxExporter();
File outputFile = new File("test.pptx");
exporter.setExporterInput(new SimpleExporterInput(print));
exporter.setExporterOutput(new SimpleOutputStreamExporterOutput(outputFile));
SimplePptxReportConfiguration configuration = new SimplePptxReportConfiguration();
configuration.setIgnoreHyperlink(false);
exporter.setConfiguration(configuration);
exporter.exportReport();
Naturally the hyper link does not work if your are in design mode (since its design mode) you need to switch to presentation mode.
If you still have problems please post jrxml related to your textField definition and your code for exporting to pptx.