Is it possible to use iTextSharp to parse text between bookmarks?
I have saved a Word doc as PDF, using option to create bookmarks from the Word bookmarks. So the PDF has the bookmarks, and using iTextSharp I can get the list of bookmarks by calling SimpleBookmark.GetBookmark(pdfReader). What I want to be able to do is parse out the text for a particular bookmark or bookmarks. Each bookmark has a "Page" tag that has the page number and an XYZ location... so I was assuming I would be able to parse from the start location of one bookmark, to the start location of the next bookmark. But any examples I find only parse page by page. I haven't found a way to do more specific parsing with iTextSharp.
Thanks in advance for any ideas.
Related
Is it possible, using Google Analytics or a Word feature, to record what hyperlinks a reader follows when s/he is reading a Word document on a web page?
You could use something like goo.gl or bit.ly to shorten the URLs. It offers some analytics capabilities.
I'm a little unclear what you mean by reading a word document on a web page. Do you mean embedded word doc or simply a web page?
If you have access to the Word doc on the page, you could implement a unique UTM code to the linked URL(s), which should show up in your GA reporting.
Otherwise, you could see within GA source/medium reporting which website the visitor was referred from. If the source matches the website with the word doc, you can assume the visitor clicked the link. If there are multiple links on that web page, however, you won't be able to determine which exact link the visitor clicked.
I can upload large document as pdf file into web page no problem. but i want to use arrows to navigate the book pages not to upload the whole book at once as this may take long.
can any one help how to do this in mvc app with or without database? if database is necessary would Mongodb be a better choice? i do not want people to download the book; they can just read it online?
First you cannot prevent people to download your content if you visually display it BUT you can discourage them by making it difficult to do so.
That being said you wouldn't have a need a database to do what you want to do. You can but it's not necessary. You can simply find some library online that handle PDF such as iTextSharp cut the book in 1 PDF per page with it when it get uploaded so you have bunch of small files.
Then the trick is simple you query the PDF library to load the file Page1.PDF (arbitrary name) extract text format and output as text nicely has HTML. when the person click the link Page 2 then reload the page with the new PDF to use for display.
Doing so prevent the user from seeing or having access to the PDF file itself and if he want to download it all he will have to copy paste every single page manually or by code if he's a dev. Most common user wont go around copy pasting manually 300 pages because of laziness.
What i would personally do is each file uploaded i would create a folder with the name of the book and call the files 1.pdf, 2.pdf .... per page. Like that if i query the listing of directories i get the list of all books, and if i check the count of files in it i know the total page number. That would allow me to run all that without database.
Is it possible to display a pdf from a partial download?
I need only the first page of a pdf for my app. The problem is all the PDF online are 25mb or more in size. Optimizing for the app is not an option :(
The entire PDF will need to be downloaded to display and save it, but I want to show a preview first.
A similar question, but for android:
How to Display first page of PDF before downloading is completed
I do understand downloading of data in iOS, but how can I tell where in the PDF's data the page ends, so I can just display that.
Yes you can do this but the PDF needs to be pre-constructed in a linearized format. This is something that is part of the PDF specfication and is sometimes known as fast-web-view.
Linearized PDF is the same as normal PDF but the objects in the document are ordered in a particular way and with certain extra information which makes it possible to work with partial data.
In particular the objects for the first page are included at the start of the file specifically so that the first page can be displayed quickly.
So I see no reason you shouldn't download the objects at the start of the PDF and use those to display the first page. You could use the hint tables for fast access to selected other pages but that would be quite complicated.
However the essence is that you need to pick up the group-one objects for the first page. These should run from the "%PDF" header through to the first "%%EOF". I'm not sure whether your environment will complain about the missing (but not required) objects but if it does you will need to blank them out on a binary level so that you have an internally consistent page one PDF.
For full details on PDF linearization see the Adobe PDF Specification.
My answers may feature concepts based around ABCpdf .NET. It's what I work on. It's what I know. :-)
I have examined the object data of a Feedzirra::Feed.fetch_and_parse() object coming from a feed. The feed I'm using is http://feeds.feedburner.com/ChrisBurnor
My issue is that if a title on this page is linked to an external site Feedzirra does not pick up on it, in this case, the entry titled "Space Colony Art from the 1970's" links to publicdomainreview.com. Yet the link itself is not present anywhere in the Feedzirra returned object.
My question: Is there a known RSS element that contains the href material from an entry title?
Or: Is there a way I can examine the xml of this feed to see if I can perhaps find where the link is going...
For the future, I might want to peer inside of these links and include their material in my feed display but for now I just want to have the link.
On the feedburner page there is a link titled "View Feed XML". When opening it and then doing a "View Page Source" in your browser, you'll see the raw XML feed. But just using wget or curl seems to be less cumbersome to me.
If you look at the raw feed, you'll see that in there is no information or link about publicdomainreview.com.
So without further processing of the feed items, you can't easily get the information you want.
I would like to create a template which I would use to order my applications to find a new job.
For each line, I will need a contact (from my contact list), a document (from my documents list) and a task list.
I want to input a hyperlink in cells to each of those documents.
I am searching for a way to display a pop-up, a frame, whatever with the list of documents or contacts from my google account.
The use would choose (click) for the good one.
Then, the hyperlink to the chosen element would be pasted in the cell.
So, here is part of my answer :
I have to use google scripts in spreadsheet to perform this kind of operations .
There is a script called Contact manager in the script gallery that allows you to retrieve contacts directly from google.
Here is a hint about checkboxes in google spreadsheets :
https://sites.google.com/a/simpleappssolutions.com/building-powerful-web-applications-in-google-apps-script/check-box
Now, What I have to do is concatenate all that :)