Can I combine documents while both keep their own format? - latex

For my MSc. I'm writing up a manuscript using the publisher's templates. This document also has to be included in the thesis report. I'd like to insert the manuscript in the report without the format of the manuscript changing. How can I do this?
I've seen some examples of the standalone package but they all talk about including document B into document A while both keep the formatting of A. What I'm looking for is essentially a workaround to using two document classes.

Related

Microsoft word clipboard HTML documentation

I could not find any documentation describing conventions in text/html data in the clipboard resulting from copying part of a word document!
Specifically I want to know what classes like MsoNormal, TableGrid313, MsoTableGrid, MsoHeading9, MsoListParagraph are there! Or does styling information of texts always lay in style attribute of a span element containing the text?
The Word round-tip HTML is undocumented as it's not an official Word file format.
It was created to enable round-tripping Word documents for viewing (and some editing) in a browser, many years ago. Even then, it was not documented as its use was for internal Microsoft software. Being HTML, anyone could read and produce it, but MS made an conscious decision to not document it (and not need to put the resources into maintaining that documentation).

Mathematical equation editor in Rails

I'm creating a simple blog using ruby on rails 4. I want to be able to write mathematical equations with it. I'd appreciate if you could help me with any of these questions:
Is there a gem/plugin/etc which I can use to add an editor for mathematical equations to my app? Just like Mathmatical editor in MS word I could use to create an equation and it could give me the MathML code as output?
Let's assume that I have a form where I write the equations in Latex. Is there a gem/plugin/etc (not javascript because it will probably be a huge code and it might cause conflicts and some problems) to convert the equations from Latex/Tex to MathML before saving them to the database?
Thanks.
You can use Equation Editor www.codecogs.com
This is designed to be physically integrated within a websites, allowing your users to create equations without leaving your website. There are two approaches to integrating the editor: either as a popup that can be activated from a button on your site, or as an object embedded directly a page. A range of plugins for popular editors are also available, e.g. CK Editor, Tiny MCE.

Parsing XPS or PDF and inserting data into a Word Template?

So, I have an option of sending a document from a database to print either in PDF or XPS. I need to be able to extract specific data, such as name, date, etc. from one of those formats and inserting that data into a word template. The word template is not editable. You can only type within fields... each field has a heading before it, such as name, dob, etc.
Basically I need to be able to automate transferring that information from the PDF or XPS file into the word template.
I'm familiar enough with C++, Python and Java.. so I have no language preference -- whatever gets the job done.
Could you suggest a way I can manage to accomplish this? I've having a bit of a difficulty figuring out the way I can parse/extract data from one of those file types and which file type would be a better candidate. And I definitely have no idea how I can automate the population of fields in the Word Template.
Oh and forgot to mention, this is on Windows 7 (and maybe 8, but mostly 7) machines.
Thank a lot for your help in advance!
This is for anyone who has the same sort of question, so this is how I did it:
I used PDFBox (http://pdfbox.apache.org/) to parse the document and extract the needed data and then I used docx4j (http://www.docx4java.org/trac/docx4j) to insert data into word template. Both are incredible tools and have excellent communities that help out almost immediately.

Extracting ePub Excerpt

I've read about the ePub format, standard, structure, readers, tools and available developer techniques to manipulate/convert/create ePubs but there is no such thing as a magical function (so far) to extract a particular length of characters to create an excerpt of the book. And that's precisely what I'm looking for: A way to extract the first X words of an ePub.
The first approach I'm considering (not my favorite btw) is creating a parser to read all the ePub metadata and start parsing the xml files in the right order until I have enough words to create the excerpt of a determined ePub (I will appreciate some feedback in this direction)
The second way (which I can't find so far) is an existent tool/function or parser (in any language) which returns (hopefully) the plain text of the ePub so I can collect the first X words in order to create my excerpt.
Do you know about any tool which can help me achieve the second option?
You should have a look at Apache Tika: http://tika.apache.org/
You can use it from command line, or as a java library or even in server mode to extract text from ePub.
Hope this will help,
F.
Jose,
I'm not aware of any tool to do what you want. Let me comment on your first approach, though. If you do find a tool I hope these comments allow you to evaluate it.
I think your approach is fine and, if you want to do a good job of creating an extract, you may want to own this step anyway. I would suggest you,
grab the OPF file and look for a GUIDE section. If a GUIDE section exists, check the types that are given. Some are probably not relevant for an excerpt (cover,title-page,copyright-page). Many books will not have the types explicitly stated but this should help where they do.
now go through the files in sequence in the SPINE section, excluding anything that is irrelevant, and read through enough XHTML files to get your excerpt.
while in the OPF file grab a bunch of metadata if this is relevant for the excerpt (title, creator, date are mandatory, I think, and some authors will also put in a whole bunch of other metadata such as keywords).
If you are creating a mini-EPUB with this excerpt you will need to pick up any CSS, Audio, Video, Image and Custom Font files that get referenced in the XHTML files used to make your excerpt. You may even choose to use the original cover file for the cover file of your excerpt epub.
If you working with fixed layout books with fun stuff like Read Aloud AND you want to create a mini-EPUB as an excerpt, you may be better off going with a page count rather than a word count. Don't forget to include any SMIL files into your excerpt and to make it look nice: (i) don't split a two page spread and (ii) make sure that the first page is an odd numbered page if odd in the original or even if even numbered in the original - to do this you may need to add a blank filler page (get the odd/even wrong and subsequent two page spreads won't be facing each other)
I hope that helps.

What is the best way to edit google docs using the API?

I'd like to be able to edit any kind of Google docs using the API from Google App Engine.
My goal is to lose as little information as possible when editing the document. The edits are fairly simple like replacing some words.
Document
To edit them, I'm doing an export in HTML and importing it again. But we are loosing some information like notes. There is also an issue with the title, the size before each title increase at every new update, so I have to reset some css. Is there is a better way of editing docs ?
Spreadsheet
There is the spreadsheet API, so I think I'm covered.
Presentation
I did not find a format that I could export and import again. The only one seems to be powerpoint. But powerpoint files cannot be easily edited.
Drawing
I did not find a format that can be both exported and imported. I tried with SVG, but svg cannot be imported back.
Document
PDF offers you the best fidelity in and out of Google Docs, without the hassle of proprietary or complicated formats like MS Word files.
Spreadsheet
Only proprietary or complicated formats guarantee fidelity here of things like cell color. The Spreadsheets API only allows data to be updated, but not formatting.
Presentation
You are correct, PPTX is the only format that can go both in and out of Google Presentations.
Drawing
You are correct, there is no import format that can go both in and out of Google Drawings.

Resources