I want parse pdf for form field names and types. Is it possible? Because when I tried one PDF, it gave me some strange characters e.g.:
...
?õ»â¢_¸ðO´×¢É]Ì|BQÔQClã(¢dVò¶~?ýg?þª í
pÅ2ÞÎÉÍ??Ú?wȳ.?d;k)*lÙ´¸(ò!ú©=ià??d?éPض2Èåäý?»p?nÜÈûÏ??M
õl:`Þ°Ã3£BíTCy5 ?ð?tN¿7fDõK
±¦?i¹vü~»X?s÷A~Ôê±4?ÕµX±¤?
...
Where could be the problem? I used tool http://support.persits.com/pdf/demo_formfields.asp and pdf https://www.drsr.sk//priznania/dpfoa2010.pdf
I want make some parser for iOS. Thanks for answer.
For PDF parsing on iOS, use the Quartz API.
For an example of an app which makes use of this API, see this reader.
To extract the specific information you're interested in, you will need to read the PDF document structure specification and figure out which dictionaries it's in (or, if you're lucky find some sample code).
Ok, so I looked into reference and found something. I was able to open PDF and make some CGPDFDictionaryRef but I am stuck at that point. This is my code:
CFURLRef pdfURL = CFBundleCopyResourceURL(CFBundleGetMainBundle(), CFSTR("simple_form.pdf"), NULL, NULL);
CGPDFDocumentRef myDocument = CGPDFDocumentCreateWithURL((CFURLRef)pdfURL);
//CFRelease(pdfURL);
int k;
CGPDFPageRef myPage;
NSInteger numOfPages = CGPDFDocumentGetNumberOfPages (myDocument);
for (k = 0; k < numOfPages; k++) {
myPage = CGPDFDocumentGetPage (myDocument, k + 1 );
CGPDFDictionaryRef ref = CGPDFPageGetDictionary(myPage); //what at this point?
CGPDFPageRelease (myPage);
}
I`d like to have something similar to Figure 14-1 here
Related
I use the Magento 2 REST API and want to handle errors that get thrown in C++.
Example error response:
{
"message": "Could not save category: %1",
"parameters": [
"URL key for specified store already exists."
]
}
I can retrieve both of these into a String and std::vector which leaves me with the question:
How am I able to return a String that is formatted by filling the placeholders?
With a fixed size, I could do something along the lines of this
char* buffer = new char[200];
String message = "Could not save category: %1";
std::vector<String> parameters = {"URL key for specified store already exists."};
String result = sprintf(buffer,message.c_str(),parameters[0]);
But, alas, I do not know the size beforehand.
How should I go about doing this? Are there stl functions that could help, should I be using self-written templates(no experience with this), can I convert the std::vector to a va_list, or are there other solutions?
Edit: I didn't notice this asks for a C++ dialect and not Standard C++. Leaving it for now, might be of use for other people.
Nothing standard exists that will do that automatically. That being said, if your interpolation format is just %<number> it might be easy enough to write:
string message = "Could not save category: %1";
std::vector<string> parameters = {"URL key for specified store already exists."};
for (size_t i = 0; i < parameters.size(); ++i) {
const auto interp = "%" + std::to_string(i+1);
const size_t pos = message.find(interp);
if (pos != std::string::npos)
message.replace(pos, interp.size(), parameters[i]);
}
This will put the result in the message string. Of course this implementation is rather limited and not particularly efficient, but again, doing this properly requires a library-sized solution, not SO-answer-sized one.
live version.
It is possible to parse master playlist to get and store all the url’s associated with the variants, using libav, to make downloads according to the variant of my choice. Thanks, all help is welcome
In case that someone else need this information, i found it, you just need to do this in your program:
AVFormatContext *fmtctx = NULL;
HLSContext *c = fmtctx -> priv_data;
previously you need to add every structure used in "hls.c" (HLSContext, variant, playlist, rendition, etc)
then you can get access to the variant and his associate data (url, bitrate,etc);
int a;
for(a=0; a < c->n_variants; a++){
av_log(NULL, AV_LOG_INFO, "url = %s \n", c->playlists[a]->url);
} /*for printing url's of the master playlist*//
I need help with opening the result of my mail merge operations directly in an new writer document.
Object mailMergeService = mcf.createInstanceWithContext(mailMergePackage, context);
XPropertySet mmProperties = UnoRuntime.queryInterface(XPropertySet.class, mailMergeService);
mmProperties.setPropertyValue("DocumentURL", templatePath);
mmProperties.setPropertyValue("DataSourceName", dbName);
mmProperties.setPropertyValue("CommandType", mmCommandType);
mmProperties.setPropertyValue("Command", mmCommand);
mmProperties.setPropertyValue("OutputType", mmOutputType);
// mmProperties.setPropertyValue("OutputURL", templateDirectory);
// mmProperties.setPropertyValue("FileNamePrefix", mmFileNamePrefix);
// mmProperties.setPropertyValue("SaveAsSingleFile", mmSaveAsSingleFile);
The mmOutputType is set as MailMergeType.SHELL
The LibreOffice API documentation says
"The output is a document shell.
The successful mail marge returns a XTextDocument based component."
So I've tried something like this
XJob job = UnoRuntime.queryInterface(XJob.class, mailMergeService);
Object mergedTextObject = job.execute(new NamedValue[0]);
String url = "private:factory/swriter";
loader.loadComponentFromURL(url, "_blank", 0, new PropertyValue[0]);
XTextDocument mergedText = UnoRuntime.queryInterface(XTextDocument.class, mergedTextObject);
XTextCursor cursor = mergedText.getText().createTextCursor();
cursor.setString(mergedText.getText().getString());
I guess I have to pass the XTextDocument component to the url-argument of the loadComponentFromURL method but I didnt find the right way to do that.
When I change the OutputType to MailMergeType.FILE the result is generated in a given directory and I can open the file and see that the mail merge succeeded. But this is not what my application should do.
Does someone know how I can open the result of the mail merge directly in an new writer document without saving the result to the hard drive?
Sincerly arthur
Hey guys I've found a simple way to open the result of my mail merge process directly.
The relevant snippets are these
XJob job = UnoRuntime.queryInterface(XJob.class, mailMergeService);
Object mergedTextObject = job.execute(new NamedValue[0]);
XTextDocument mergedText = UnoRuntime.queryInterface(XTextDocument.class, mergedTextObject);
mergedText.getCurrentController().getFrame().getContainerWindow().setVisible(true);
The last line of code made the window appear with the filled mail merge result.
I also don't need this line anymore
loader.loadComponentFromURL("private:factory/swriter", "_blank", 0, new PropertyValue[0]);
The document opens as a new instance of a swriter document. If you want to save the result as a file you can do this
mergedText.getCurrentController().getFrame().getContainerWindow().setVisible(true);
XStorable storeMM = UnoRuntime.queryInterface(XStorable.class, mergedText);
XModel modelMM = UnoRuntime.queryInterface(XModel.class, mergedText);
storeMM.storeAsURL(outputDirectory + outputFilename, modelMM.getArgs());
Sincerly
Arthur
What version of LO are you using? The SHELL constant has only been around since LO 4.4, and it is not supported by Apache OpenOffice yet, so it could be that it isn't fully implemented. However this code seems to show a working test.
If it is returning an XTextDocument, then normally I would assume the component is already open. However it sounds like you are not seeing a Writer window appear. Did you start LO in headless mode? If not, then maybe the process needs a few seconds before it can display.
Object mergedTextObject = job.execute(new NamedValue[0]);
Thread.sleep(10000);
Anyway to me it looks like your code has a mistake in it. These two lines would simply insert the text onto itself:
XTextCursor cursor = mergedText.getText().createTextCursor();
cursor.setString(mergedText.getText().getString());
Probably you intended to write something like this instead:
XTextDocument mergedText = UnoRuntime.queryInterface(XTextDocument.class, mergedTextObject);
String url = "private:factory/swriter";
XComponent xComponent = loader.loadComponentFromURL(url, "_blank", 0, new PropertyValue[0]);
XTextDocument xTextDocument = (XTextDocument)UnoRuntime.queryInterface(XTextDocument.class, xComponent);
XText xText = (XText)xTextDocument.getText();
XTextRange xTextRange = xText.getEnd();
xTextRange.setString(mergedText.getText().getString());
One more thought: getString() might just return an empty string if the entire document is in a table. If that is the case then you could use the view cursor or enumerate text content.
EDIT:
To preserve formatting including tables, you can do something like this (adapted from https://blog.oio.de/2010/10/27/copy-and-paste-without-clipboard-using-openoffice-org-api/):
// Select all.
XController xMergedTextController = mergedText.getCurrentController();
XTextViewCursorSupplier supTextViewCursor =
(XTextViewCursorSupplier) UnoRuntime.queryInterface(
XTextViewCursorSupplier.class, xMergedTextController);
XTextViewCursor oVC = supTextViewCursor.getViewCursor();
oVC.gotoStart(False) // This would not work if your document began with a table.
oVC.gotoEnd(True)
// Copy and paste.
XTransferableSupplier xTransferableSupplier = UnoRuntime.queryInterface(XTransferableSupplier.class, xMergedTextController);
XTransferable transferable = xTransferableSupplier.getTransferable();
XController xController = xComponent.getCurrentController();
XTransferableSupplier xTransferableSupplier_newDoc = UnoRuntime.queryInterface(XTransferableSupplier.class, xController);
xTransferableSupplier_newDoc.insertTransferable(transferable);
I really hope you will be able to help me out on this one.
I am new to pdf.js so for the moment, I am playing around with the pre-built version to see if I can integrate this into my web app.
My problem:
I am using tcpdf to generate a pdf file which I would like to visualize using pdf.js without having to save it to a file on the server.
I have a php file (generate_document.php) that I use to generate the pdf. The file ends with the following:
$pdf->Output('test.pdf', 'I');
according to the tcpdf documentation, the second parameter can be used to generate the following formats:
I: send the file inline to the browser (default). The plug-in is used if available. The name given by name is used when one selects the "Save as" option on the link generating the PDF.
D: send to the browser and force a file download with the name given by name.
F: save to a local server file with the name given by name.
S: return the document as a string (name is ignored).
FI: equivalent to F + I option
FD: equivalent to F + D option
E: return the document as base64 mime multi-part email attachment (RFC 2045)
Then, I would like to view the pdf using pdf.js without creating a file on the server (= not using 'F' as a second parameter and passing the file name to pdf.js).
So, I thought I could simply create an iframe and call the pdf.js viewer pointing to the php file:
<iframe width="100%" height="100%" src="/pdf.js_folder/web/viewer.html?file=get_document.php"></iframe>
However, this is not working at all....do you have any idea what I am overlooking? Or is this option not available in pdf.js?
I have done some research and I have seen some posts here on converting a base64 stream to a typed array but I do not see how this would be a solution to this problem.
Many thanks for your help!!!
EDIT
#async, thanks for your anwer.
I got it figured out in the meantime, so I thought I'd share my solution with you guys.
1) In my get_document.php, I changed the output statement to convert it directly to base64 using
$pdf_output = base64_encode($pdf->Output('test_file.pdf', 'S'));
2) In viewer.js, I use an XHR to call the get_document.php and put the return in a variable (pdf_from_XHR)
3) Next, I convert what came in from the XHR request using the solution that was already mentioned in a few other posts (e.g. Pdf.js and viewer.js. Pass a stream or blob to the viewer)
pdf_converted = convertDataURIToBinary(pdf_from_XHR)
function convertDataURIToBinary(dataURI) {
var base64Index = dataURI.indexOf(BASE64_MARKER) + BASE64_MARKER.length;
var base64 = dataURI.substring(base64Index);
var raw = window.atob(base64);
var rawLength = raw.length;
var array = new Uint8Array(new ArrayBuffer(rawLength));
for (i = 0; i < rawLength; i++) {
array[i] = raw.charCodeAt(i);
}
return array;
}
et voilà ;-)
Now i can inject what is coming from that function into the getDocument statement:
PDFJS.getDocument(pdf_converted).then(function (pdf) {
pdfDocument = pdf;
var url = URL.createObjectURL(blob);
PDFView.load(pdfDocument, 1.5)
})
I'm have domain class with property that represents files uploaded on my GSP. I've defined that file as byte array (byte [] file). When some specific action happens I'm sending mail with attachments from. This is part of my SendMail service:
int i = 1;
[requestInstance.picture1, requestInstance.picture2, requestInstance.picture3].each(){
if(it.length != 0){
DataSource image = new ByteArrayDataSource(it, "image/jpeg");
helper.addAttachment("image" + i + ".jpg", image);
i++;
}
}
This works fine with image files. But now I want to be able to work with all file types and I'm wondering how to implement this. Also, I want to save real file name in database. All help is welcomed.
You can see where the file name and MIME type are specified in your code. It should be straightforward to save and restore that information from your database along with the attachment data.
If you're trying to figure out from the byte array of data what the MIME type is and what a good filename would be, that's a harder problem. Try this.