Text bleeding out of page in knitr using latex - latex

I am using the following code snippet to write some text data frame as a table.
temp<-c("A white paper is a document that describes a given problem and proposes a specific solution to the problem.", "Originally used to describe government policy, white papers are most common today in corporate settings.", "A typical white paper might list ways to meet a client's marketing needs, suggest the use of a certain product for a technical process, or identify ways to streamline internal communication.")
myTable<-as.data.frame(temp)
myTable<-print.xtable(xtable(textData,caption="Some Text Data"),caption.placement="top",print.results="F", tabular.environment="tabularx", width="\\textwidth")
The table border is limited to the page, but the text still bleeds out. How do I get the text to come within the table limits?

Related

Is there a way to have a formula or script pick an amount of pre-set lengths to cover an area

Apologies if the title isn't very clear.
What I am trying to do is get a google sheet to automatically calculate how many lengths of a material I will need to cover an area, hopefully to include a mix if needed. There are three different lengths of material that never change, but the total area I need to cover changes on a case by case basis. It is only a straight line so there is no need to worry about width or height.
The data breaks down as follows:
Pre-set lengths to choose from
10'6"
12'6"
14'6"
Length of area I need to cover only comes in inches (ie. 68 1/2"; 70"; 59")
The only thing I have been successful in doing is getting the length I need to cover and then manually picking out how many pieces of each length I need, but I cannot think of any way for me to have a formula or script optimize how many of each piece I need. I can understand formulas well enough, but once trying to script anything comes into play I start getting lost. I believe this issue may be beyond the capabilities of formulas.
This is an interesting problem - I don't have the 'reputation' required to comment, but to be clear: you're actually trying to find the 'best fit' of the available lengths to cover the required length?
If that's the case then yes, you're not going to get there without scripting. Fortunately, there are other folks who have this problem and have solved it... you could look at this online cut-list calculator for an example. I think that one even includes an embeddable script for your sheets.
If you're looking to solve the problem yourself because it's interesting, googling 'optimal cut list' or the like will turn up references. Usually you're optimizing on two variables (e.g. 'fewest joins' and 'least waste'), which tips you over into the world of linear programming (only just...) if you want to go there. If it were me, I'd just dig up a few example scripts and map how they operate back to a theoretical description (e.g. this wiki article.)

Autoformat Text with Machine Learning

I am currently working at an issue regarding optimizing the workflow of an agency.
The agency receives like 30-40 PDF/Word documents, which should be converted into Indesign-Files, which will be print in newspapers. Its always the same pattern: job adverts with a logo, the job position and some text.
Weekly the same customers send us their adverts. Our employees usually take the patterns of the existing files and copy-paste the new text.
We apply some fix formating rules like: not words overlapping across lines, distance between the job title and the first paragraph. One important thing is to keep the height as small as possible in order to reduce costs for our clients. Because we have many employees who are new, work in part-time etc. we face a huge fluctuation. therefore we want to standardize the process, in order to only do some little changes for new adverts.... I guess you know what I mean.
Do you see a possibilty to improve the process for example with NLTK? I think of training an algorithm which recognizes the "job title", "bullet-points", logo etc. and automatically propose a formation for the text.
A colleague told me just to write a script which formates the indesign document.
What do you think? Thanks so far.
Here is a brief example:
Example Picture

Advanced Excel / Visual Basics for Website Parsing

I have links to 500 Wikipedia / Wikimedia Wikis, Talk Pages and history pages in an excel document that I'd like to parse to determine things like how many of the Wikis mention "advert" or "promotional" in the Talk page, how long the average Wiki is, how frequent edits are, etc.
I've figured out how to write a Visual Basics User Defined Function that will get the full HTML. Is there a plugin or some other way to get the text - as it appears on-screen - between two tags or identifiers, so I can pull out the information I need?
I am a business professional with very limited coding experience in comparison to a professional developer. But if you can point me in the right direction and to some good tutorials, I can learn. I'd also be interested in just paying someone a bit of money on the side if someone can help.
You can use XML Parser and Regex to search for text in an HTML document.
To get text as seen on in the browser, write a function to delete all tags. Although, it may not always be accurate as CSS and Javascript can alter what is visible on the screen.

TeX: Add blank page after every content page

I'm currently writing my bachelor thesis and my university wants a one sided print. The printing and binding will be done by a professional print company. They only accept two sided manuscripts.
Because of that I need to add a blank page after every page of content. I don't want to do this manually using \newpage or \clearpage because there are too many pages. Is there any, maybe low level, TeX command or package to do this? Or can you suggest another tool that does this without breaking the PDF?
Thanks for your help!
One option you might look into is to use a double sided layout that allows separate formatting for the even vs. odd pages: e.g. the book class allows this. Then you will need to define the even pages to be blank (presumably you don't want headers printed, or the page count to increment).
An alternative (if you can't get this to look correct for what you need) would be to do the layout in single sided (so that page numbering, etc. is all taken care of), then have a separate latex document which includes the pages, one at a time (pdfpages may be a good package to do this properly), and then insert blank pages (with no headers/etc.) in-between. This may end up being more work, but if you have trouble with formatting, it may be the easier way to go.
I suspect that you'd be better off doing this by manipulating the output PDF, rather than changing the LaTeX.
For example, if you're able to print to a file on your platform, there might be options in the print dialogue to tweak this. Your PDF viewer may be able to arrange this, if only by inserting blanks every second page. Or there may be a GUI or command-line tool to do the reshuffling for you.
Having said that, I've no specific recommendations for what tool you could use. A quick look around suggests strongly that the pstops tool might be able to do something along these lines, but that only helps if you're generating your PDF from postscript.
So no recipe, I'm afraid, but this'll probably be a better direction to look.
(or, meta answer: find a different print shop, or phone again and hope you get someone who gives you a different answer!)

How to generate a document like this in Latex

http://www.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf
I want to write a document that has the style like this one.
Like having a light colored background on a page, having a big header (like the EXTRACTION) shown in this link. Do you think it is possible to something like this in Latex?
I am comfortable with doing normal things in latex.
If you download and look at the document properties, it was made with InDesign CS3. Could you do this in LaTeX? Yes. The cover page is... just a cover page. If you use fancyhdr and make a page header, you can increase the header height, then lay the page header in there as an image. Try eso-pic for page backgrounds. But in all honesty, that document is kind of ugly. :D
Your best bet for a document like this is to use a desktop publishing system. A Free/Open Source Software solution would be Scribus Desktop Publishing.
Off the top of my head:
-- check out ConTeXt, strictly speaking an alternative to LaTeX but one designed for something closer to DTP than LaTeX itself;
-- LaTeX has lots of facilities for DTP-like work, a good place to start would be the newsletter on link text
-- investigate packages such as PGF/TKZ, eso-pic, newspaper.
That document smell like made with InDesign or QuarkXPress ... I guess there is a way to do it in latex but will not be straightforward at all ...
Actually it's quite feasible using LaTeX, it's just a pity that the learning curve and the technical involvement are higher than when using DTP tools like Adobe InDesign.
This explains why few people are willing to involve the required amount of time and energy into mastering LaTeX for such kind of projects, and consequently why few introductory material is available on the subject.
One notable exception is the recent workshop given by Dominik Wagenführ at Ubucon 2009 in Göttingen. Its proceedings are freely available a the bottom of the page, as well as the related source code. It's all in German but fairly easy to understand and very educational, so I'd recommand you to study it.

Resources