After tika-core is upgraded from 1.26 to 2.1.0, TIKA no longer throws an exception when parsing encrypted documents in .doc format - apache-tika

After tika-core is upgraded from 1.26 to 2.1.0, no exception will be thrown for encrypted doc documents
protected boolean checkMsmime(InputStream stream) throws IOException, SAXException {
Metadata metadata = new Metadata();
ContentHandler handler = new DefaultHandler();
ParseContext context = new ParseContext();
BodyContentHandler bch = new BodyContentHandler();
try {
new AutoDetectParser().parse(stream, handler, metadata, context);
} catch (TikaException e) {
// doc Encryption protection
if (e instanceof EncryptedDocumentException) {
return true;
}
// office docx Encryption protection
if (e.getCause() instanceof org.apache.poi.EncryptedDocumentException) {
return true;
}
log.error(e);
return false;
}catch ( IOException exception){
System.out.println("exception exception1 "+exception);
}catch (SAXException exception){
System.out.println("exception exception2 "+exception);
}
return false;
}
In version 1.26 of tika, if the .doc document is encrypted, AutoDetectParser().parse() TIKA parsing will throw an exception, but after upgrading to 2.1.0, no exception will be thrown, and it is considered not an encrypted document.
Encrypted files in other formats can throw exceptions, only encrypted documents in .doc format no longer throw exceptions

Related

Apache Tika BodyContentHandler() is Empty

I'm using Apache Tika 1.18 and when I use one web service framework (sparkjava), the code below works. Yet in SpringBoot, the BodyContentHandler() line of code is empty. Thus, my returned text is empty.
Not sure what's up with this but would appreciate any suggestions.
I'm passing a Base64 encoded string to this code and it's also URLEncoded. Thus, the two decodes as the first two lines.
Running this code in the debugger in SpringBoot, the variable contents have the same values as in sparkjava, but once I get to the BodyContentHandler(), instead of having the input text as the sparkjava version has for the handler variable, the SpringBoot version has "" for handler.
I also tested this behavior with Tika 1.17. Same. Also tried removing the -1 parameter from the new BodyContentHandler() constructor. Same.
Thanks in advance.
String "data=" passed into SpringBoot POST method.
String bodyData = URLDecoder.decode(data.substring(data.indexOf("data=") + 5));
byte[] decodedBodyData = java.util.Base64.getMimeDecoder().decode(bodyData);
Tika tika = new Tika();
try
{
Parser parser = new AutoDetectParser();
// line of code below returns "". Problem!
BodyContentHandler handler = new BodyContentHandler(-1); // handle larger files.
Metadata metadata = new Metadata();
InputStream inputStream = new ByteArrayInputStream(decodedBodyData);
ParseContext context = new ParseContext();
//parsing the file
parser.parse(inputStream, handler, metadata, context);
textToReturn = handler.toString();
}
catch (IOException e)
{
e.printStackTrace();
}
catch (SAXException e)
{
e.printStackTrace();
}
catch (TikaException e)
{
e.printStackTrace();
}
catch (Exception e)
{
e.printStackTrace();
}

Downloading Docusign PDF in Grails, file corrupted

Using Groovy 1.8.6 and Grails 2.1.0
Using embedded API, after user signs document, browser is redirected back to my app. Using "Get Envelope Documents and Certificate" API to download document to server. URL format:
"${baseUrl}/envelopes/${envelopeId}/documents/combined"
Code snippet (with minor details removed):
private void getDocument(requestUrl) {
def connection = urlConnect(requestUrl, null, "GET")
if (connection.responseCode == 200) {
savePDF(envelopeId, connection.inputStream)
}
}
private void savePDF(envelopeId, inputStream) {
String filePath = getSavedPDFPath(envelopeId)
def pdfWriter = new File(filePath).newWriter()
pdfWriter << inputStream
pdfWriter.close()
}
What happens is that the resulting file is not 100% correct, Adobe Reader complains that "at least one signature is invalid". Reader at least knows that the file was signed by DocuSign, Inc., and can show details about the certificate.
Per the Question's comment thread, the issue was being caused by the way the file was being saved. Using this code instead, the file saves / opens correctly:
private void savePDF(envelopeId, connection)
{
FileOutputStream fop = null;
File file;
String filePath = getSavedPDFPath(envelopeId);
try {
file = new File(filePath);
fop = new FileOutputStream(file);
byte[] buffer = new byte[1024];
int numRead;
while((numRead = connection.getInputStream().read(buffer)) > 0)
{
fop.write(buffer, 0, numRead);
}
fop.flush();
fop.close();
}
catch (Exception e)
{
throw new RuntimeException(e);
}
}

How to send HttpPostedFileBase to S3 via AWS SDK

I'm having some trouble getting uploaded files to save to S3. My first attempt was:
Result SaveFile(System.Web.HttpPostedFileBase file, string path)
{
//Keys are in web.config
var t = new Amazon.S3.Transfer.TransferUtility(Amazon.RegionEndpoint.USWest2);
try
{
t.Upload(new Amazon.S3.Transfer.TransferUtilityUploadRequest
{
BucketName = Bucket,
InputStream = file.InputStream,
Key = path
});
}
catch (Exception ex)
{
return Result.FailResult(ex.Message);
}
return Result.SuccessResult();
}
This throws an exception with the message: "The request signature we calculated does not match the signature you provided. Check your key and signing method." I also tried copying file.InputStream to a MemoryStream, then uploading that, with the same error.
If I set the InputStream to:
new FileStream(#"c:\folder\file.txt", FileMode.Open)
then the file uploads fine. Do I really need to save the file to disk before uploading it?
This is my working version first the upload method:
public bool Upload(string filePath, Stream inputStream, double contentLength, string contentType)
{
try
{
var request = new PutObjectRequest();
request.WithBucketName(_bucketName)
.WithCannedACL(S3CannedACL.PublicRead)
.WithKey(filePath).InputStream = inputStream;
request.AddHeaders(AmazonS3Util.CreateHeaderEntry("ContentType", contentType));
_amazonS3Client.PutObject(request);
}
catch (Exception exception)
{
// log or throw;
return false;
}
return true;
}
I just get the stream from HttpPostedFileBase.InputStream
(Note, this is on an older version of the Api, the WithBucketName syntax is no longer supported, but just set the properties directly)
Following the comment of shenku, for newer versions of SDK.
public bool Upload(string filePath, Stream inputStream, double contentLength, string contentType)
{
try
{
var request = new PutObjectRequest();
string _bucketName = "";
request.BucketName = _bucketName;
request.CannedACL = S3CannedACL.PublicRead;
request.InputStream = inputStream;
request.Key = filePath;
request.Headers.ContentType = contentType;
PutObjectResponse response = _amazonS3Client.PutObject(request);
return true;
}catch(Exception ex)
{
return false;
}
}

How to compress the files in Blackberry?

In my application I used html template and images for browser field and saved in the sdcard . Now I want to compress that html,image files and send to the PHP server. How can I compress that files and send to server? Provide me some samples that may help lot.
i tried this way... my code is
EDIT:
private void zipthefile() {
String out_path = "file:///SDCard/" + "newtemplate.zip";
String in_path = "file:///SDCard/" + "newtemplate.html";
InputStream inputStream = null;
GZIPOutputStream os = null;
try {
FileConnection fileConnection = (FileConnection) Connector
.open(in_path);//read the file from path
if (fileConnection.exists()) {
inputStream = fileConnection.openInputStream();
}
byte[] buffer = new byte[1024];
FileConnection path = (FileConnection) Connector
.open(out_path,
Connector.READ_WRITE);//create the out put file path
if (!path.exists()) {
path.create();
}
os = new GZIPOutputStream(path.openOutputStream());// for create the gzip file
int c;
while ((c = inputStream.read()) != -1) {
os.write(c);
}
} catch (Exception e) {
Dialog.alert("" + e.toString());
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
Dialog.alert("" + e.toString());
}
}
if (os != null) {
try {
os.close();
} catch (IOException e) {
e.printStackTrace();
Dialog.alert("" + e.toString());
}
}
}
}
this code working fine for single file but i want to compress all the file(more the one file)in the folder .
In case you are not familiar with them, I can tell you that in Java the stream classes follow the Decorator Pattern. These are meant to be piped to other streams to perform additional tasks. For instance, a FileOutputStream allows you to write bytes to a file, if you decorate it with a BufferedOutputStream then you get also buffering (big chunks of data are stored in RAM before being finally written to disc). Or if you decorate it with a GZIPOutputStream then you get also compression.
Example:
//To read compressed file:
InputStream is = new GZIPInputStream(new FileInputStream("full_compressed_file_path_here"));
//To write to a compressed file:
OutputStream os = new GZIPOutputStream(new FileOutputStream("full_compressed_file_path_here"));
This is a good tutorial covering basic I/O . Despite being written for JavaSE, you'll find it useful since most things work the same in BlackBerry.
In the API you have these classes available:
GZIPInputStream
GZIPOutputStream
ZLibInputStream
ZLibOutputStream
If you need to convert between streams and byte array use IOUtilities class or ByteArrayOutputStream and ByteArrayInputStream.

Blackberry InputStream Closes Prematurely

The following code is used to get an XML file from a web server, and today, for the last few runs, this throws an exception with an error message "stream close." I have not modified this code since yesterday, nor have I modified any methods that handle the parsing.
The idea is this builds a list of item from the XML file pulled from the fullurl. There should 20 items in the list (based on the XML file I am using right now). In the last few runs, the parsing operation has thrown the exception mentioned above, and only stores 5 items. The method public void endDocument() never gets called.
Any thoughts would be helpful, since this will have to be moved to a background task, and I would like to have solved before I do that.
public void getAndParseXML() {
HttpConnection xmlcon = null;
InputStream xmlinput = null;
SAXParserFactory spf = null;
String fullurl = this.getNewsUrl() + NewsListBuilderTask.CONNECTION_STRING; // URL of XML file along specification for connection type
if ( (TransportInfo.isTransportTypeAvailable(TransportInfo.TRANSPORT_TCP_WIFI)) && (TransportInfo.hasSufficientCoverage(TransportInfo.TRANSPORT_TCP_WIFI)) )
fullurl += NewsListBuilderTask.WIFI_STRING;
try {
xmlcon = (HttpConnection)Connector.open( fullurl, Connector.READ, false ); // open connection to XML source
spf = SAXParserFactory.newInstance(); // set up xml parsers
xmlinput = xmlcon.openInputStream(); // set up input stream
SAXParser saxparser = spf.newSAXParser(); // create a new parser object
saxparser.parse( xmlinput, this ); // parse operations start here
}
catch( IOException ex ) {
System.out.println( "IOException Caught:\t" + ex.getMessage() ); // set a default item if any exception occurs with retreiving or parsing XML file
this.createDefaultItem();
}
catch (SAXException ex) {
System.out.println( "SAXException Caught:\t" + ex.getMessage() );
ex.printStackTrace();
this.createDefaultItem();
}
catch ( IllegalArgumentException ex ) {
System.out.println( "IllegalArgumentException Caught:\t" + ex.getMessage() );
ex.printStackTrace();
this.createDefaultItem();
}
catch (ParserConfigurationException ex) {
System.out.println( "ParserConfigurationException Caught:\t" + ex.getMessage() );
ex.printStackTrace();
this.createDefaultItem();
}
finally {
if ( xmlinput != null) {
try {
xmlinput.close(); // attempt to close all connections
}
catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if ( xmlcon != null ) {
try {
xmlcon.close();
}
catch ( IOException ex ) {
ex.printStackTrace();
}
}
}
}
NOTE: The fullurl used ends up bieng "http://somexmlfile.com?type=photo;deviceside=true" with ";interface=wifi" appended if available.

Resources