extract text from word or pdf based on format (font name and size) - delphi

I need to parse large text (about 1000 pages of word or pdf document)and place some of the text inside this document into database fields
I found that the only thing I can distinguish the text I want to extract is the format , it is always "Helvetica-Condensed" size 12
can I do that ? I know how to use the string functions but what I should use to test the format ?
as I said the text is stored inside word document or PDF
if there is third party component can do no problem please refer it to me
Thanks

There is QuickPDF. The price is $249,00.

The other option is to code it yourself. The file specification is available online, and if your only trying to rip the text out of the document this should guide you most of the way.
The only thing to be careful of are documents which are built entirely from images. In that scenario (no matter what you use to read the file) you will also need an OCR type of application. To see if this is the case or not, open a sample of the type of file you are wanting to "extract" text from, select the text to copy then try to paste into notepad.

Related

Wierd character from swagger generated document

I have generated swagger document for which (used springfox-swagger2 and UI) but the generated documents contains some weird char and some weird behavior which i am looking for reason and solution,
I am listing details of every parameter passed with #ApiParam where in every request its adding some char  in it
enter image description here
The details block of some api having different format then some (the text size is different) even though the default setting used. not sure but its adding class="markdown" in some paragraphs automatically.
The date contains char which its does not need
enter image description here
So, I am not able to get idea where to look and why this type of behavior present in this.
Would appreciate once contribution and ideas.
Thank you in advance,

Programmatically generated CSV file format issue for Excel and Numbers

In my iOS project, I have programmatically generated csv file from my data. Most of time, it looks all good for Microsoft Excel and Apple Numbers to open with.
But when the cell data is something like 5 - 60, it seems Excel would automatically convert it to date value like May-60, while Numbers open it correctly.
I have found this thread: https://stackoverflow.com/a/165052/833885, so the solution makes Excel happy is using "=""5 - 60""". But this will make Numbers shows ="5 - 60"......
You can quickly generate empty csv file to test what I described above.
Is is possible to generate csv file that makes all world happy???
Thanks in advance.
You can create a new file in excel and import from the data ribbon tab - this gives options to specify the data types for 'columns' in the csv. A bit of a pain but will avoid the issue.

Add alternative text to a phrase in a document file

I use LibreOffice Writer and I want to insert an alternative text to a specific phrase in the document, how can I do it?
Example if we have an image in the document we can make double left click and add the alternative text like this:
Is it possible to make the same if we select a whole phrase of text? If yes how? And if No is there any other proposal?
The alternative text in 'word'/odt documents is actually intended as the 'alt' attribute in HTML (web) pages:
The alt attribute provides alternative information for an image if a
user for some reason cannot view it (because of slow connection, an
error in the src attribute, or if the user uses a screen reader).
(http://www.w3schools.com/tags/att_img_alt.asp)
It's only purpuse is thus to provide the user with information in case he/she can not view the image. Since having alternative text in case some text cannot be displayed is, well, silly, this 'alt' attribute is not defined for pieces of text. Alternatively, you could have a hyperlink pointing to nothing ("#"), which does provide a tooltip attribute.
What is it that you're intending to achieve anyway? It's not going to show up on any prints, which is the intended purpose of Writer... Footnotes (for prints) or Comments (for communication with co-editors) might suit you better.

open source controls to convert rich text formatted code to html markup

I am working on asp.net mvc. I am trying to display the rich text formatted content like,
{\rtf1\ansi\ansicpg1252\uc1\htmautsp\deff2{\fonttbl{\f0\
fcharset0 Times New Roman;}{\f2\fcharset0 Tahoma;}}{\colortbl\red0\green0\blue0;\red255\green255\blue255;}\loch\hi
ch\dbch\pard\plain\ltrpar\itap0{\lang1033\fs24\f2\cf0 \cf
0\ql{\f2 {\ltrch AMANDA WITH RC CALLED AND WANTED TO
VERIFY THAT WE WERE AFFILIATED WITH SHAUN # JAGGYS. LET HER KNOW WE
WERE, SHAUN CALLED RC AS WELL TO VERIFY STATUS OF BD}\li0\ri0\sa0\sb0\fi0\ql\par}
}
}
in the view. Actually this data could come from database table and i need to display it in the editor type control. so is there any open source controls that are able to display rich text format.
Well, I just got done writing a RTF to HTML converter that maintains all embedded media, and creates a MIME multipart message out of it. This is close to what you want to do. Essentially if you aren't interested in writing your own converter, you can look at this CodeProject and use his: http://www.codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter
There is also descriptions as to how to reach his solution.
On my project we just started ripping apart the RTF document and parsing its contents. Open source and 3rd-Party Libraries weren't an option for me.

ASP.net MVC Export To Excel

I am currently exporting to Excel using the old HTML trick, where I set the MIME type to application/ms-excel. This gives the added benefit of nicely formatted tables, however the negative of the excel document not being native Excel format.
I could export it as CSV, but then this would not be formatted.
I have read brief snippets that you can export it as XML to create the Excel document, but cannot find too much information on this. Does anybody know of any tutorials and/or benefits of this? Can it be formatted tables using this method?
Thanks.
Easiest way, you could parse your table and export it in Excel XML format, see this for example: http://blogs.msdn.com/b/brian_jones/archive/2005/06/27/433152.aspx
It allows you to format the table as you whish (borders, fonts,colors, I think even formulas), and Excel will recognize it as native excel format. As a plus, you can use other programs that can import Excel XML (ie.Open office, Excel viewer,etc) and you do not need to have Office components installed on the server.
Check out ExcelXmlWriter.
We've been using it for some time and it works well. There are some downsides to the xml format however. Since it's unlikely your end users will have the .xml extension associated with Excel, you end up having to download files as .xls with an Excel mime type. When a user opens a file downloaded in this way they get a warning that the file is not in xls format. If they click through it, the file opens normally.
The only alternative is a paid library to generate native Excel files. That's certainly the best solution but last time we looked there were no good, free libraries (may have changed)
Bill Sternberger has blogged a very simple solution here:
export to excel or csv from asp.net mvc
Just today I had to write a routine that exported data to excel in an MVC application. Here's the details so someone may benefit in the future, first the user had to select some date ranges and areas for the report. On the post back, this method was in place, with TheModelTypeList containing the data from LINQ/Entity Framework/SQL Query returning strong types:
if (ExportToExcel) {
var stream = new MemoryStream();
var serializer = new XmlSerializer(typeof(List<SomeModelType>));
serializer.Serialize(stream, TheModelTypeList);
stream.Position = 0;
FSR = new FileStreamResult(stream, "application/vnd.ms-excel");
}
The only catch on this one was the file type was not known when opening so the system prompted for the application to open it... this is a result of the content being XML.... I'm still working on that.
I am using Spreadsheet Light, an Open-Source library that provides ridiculously easy creation, manipulation and saving of an Excel sheet from C#. You can have an MVC / WebAPI Controller do the work of creating the file and either
Return a URL link to the saved Excel file to the page and invoke Excel to open it with an ActiveX object
Return a Data Content Stream to the page
Return a URL link to the calling page to force an Open / Save As dialog
http://spreadsheetlight.com/

Resources