How to extract content from. Pst file using apache tika? - pst

How to parse.Pst file using apache tika
1.2?
How can I get entire body, attachment, and all Metadata of email while searching with ljcene?
for (File file : docs.listFiles()) {
Metadata metadata = new Metadata();
ContentHandler handler = new BodyContentHandler();
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);
try {
parser.parse(stream, handler, metadata, context);
}
catch (TikaException e) {
e.printStackTrace();
}
catch (SAXException e) {
e.printStackTrace();
}

If you're stuck with 1.2, you might try the recommendation here
If you're able to upgrade, we added that as the RecursiveParserWrapper in 1.7 ...just upgrade to 1.12 if you can, or wait a week or two and 1.13 should be out.
Via commandline:
java -jar tika-app.jar -J -t -i input_directory -o output_directory
Or in code:
Parser p = new AutoDetectParser();
RecursiveParserWrapper wrapper = new RecursiveParserWrapper(p,
new BasicContentHandlerFactory(
BasicContentHandlerFactory.HANDLER_TYPE.XML, -1));
try (InputStream is = Files.newInputStream(file)) {
wrapper.parse(is, new DefaultHandler(), new Metadata(), context);
}
int i = 0;
for (Metadata metadata : wrapper.getMetadata()) {
for (String name : metadata.names()) {
for (String value : metadata.getValues(name)) {
System.out.println(i + " " + name +": " + value);
}
}
i++;
}

Related

Why does this script cause problems only on Jenkins?

I had to add a job in a multibranch pipeline.
The goal was to automatically install a plugin in jira.
The code runs perfectly fine because I tested it in a Unit Test.
But when I put it as a scripted pipeline on Jenkins, it does not do what it should do.
The goal is to install the plugin on jira, but when I run it on Jenkins it will not be installed.
Also I do not get any json Response, which I get in my JUnit Test.
I really do not know what is different on Jenkins. I already found it weird that I have to approve every single method in my script. Did someone have a similar issue on jenkins before.
I really appreciate your help.
#Test
public void test_install_plugin() throws IOException {
String jsonBody = "{\"pluginUri\": \"http://myServer/job/element/job/mybranch\", \"pluginName\": \"myPlugin\"}";
String token = "-6485649001990379871";
String jiraPath="myServer/rest/plugins/1.0/";
String query = "token="+token;
String url = jiraPath+"?"+query;
URL jiraURL = new URL(url);
HttpURLConnection jiraURLConnection = (HttpURLConnection)jiraURL.openConnection();
jiraURLConnection.setDoOutput(true);
jiraURLConnection.setDoInput(true);
jiraURLConnection.setRequestProperty ("Content-Length",String.valueOf(jsonBody.length()));
jiraURLConnection.setRequestProperty ("Accept", "application/json");
jiraURLConnection.setRequestProperty ("Content-Type", "application/vnd.atl.plugins.install.uri+json");
jiraURLConnection.setRequestProperty ("Authorization", "Basic YWRXXXXXXtaW4=");
jiraURLConnection.setRequestMethod("POST");
OutputStream os = null;
try {
os = jiraURLConnection.getOutputStream();
byte[] input = jsonBody.getBytes("utf-8");
os.write(input, 0, input.length);
os.flush();
}
catch(IOException ex)
{
System.out.println("Exception: " + ex.getMessage());
}
finally
{
os.close();
System.out.println("responseCode of install request: " + jiraURLConnection.getResponseCode());
}
BufferedReader br = null;
try {
br = new BufferedReader(new InputStreamReader(jiraURLConnection.getInputStream(), "utf-8"));
StringBuilder response = new StringBuilder();
String responseLine = null;
while ((responseLine = br.readLine()) != null) {
response.append(responseLine.trim());
}
System.out.println("Response: " + response.toString());
}
catch(IOException ex)
{
System.out.println("Exception: " + ex.getMessage());
}
finally
{
br.close();
jiraURLConnection.disconnect();
}
}

JavaMail MIME attachment link by cid

Background
I have banged my head against this for a while and not made much progress. I am generating MPEG_4 / AAC files in Android and sending them by email as .mp3 files. I know they aren't actually .mp3 files, but that allows Hotmail and Gmail to play them in Preview. They don't work on iPhone though, unless they are sent as .m4a files instead which breaks the Outlook / Gmail Preview.
So I have thought of a different approach which is to attach as a .mp3 file but have an HTML link in the email body which allows the attached file to be downloaded and specifies a .m4a file name. Gmail / Outlook users can click the attachment directly whereas iPhone users can use the HTML link.
Issue
I can send an email using JavaMail with HTML in it including a link which should be pointing at the attached file to allow download of that file by the link. Clicking on the link in Gmail (Chrome on PC) gives a 404 page and iPhone just ignores my clicking on the link.
Below is the code in which I generate a multipart message and assign a CID to the attachment which I then try to access using the link in the html part. It feels like I am close, but maybe that is an illusion. I'd be massively grateful if someone could help me fix it or save me the pain if it isn't possible.
private int send_email_temp(){
Properties props = new Properties();
props.put("mail.smtp.auth", "true");
props.put("mail.smtp.host", smtp_host_setting);
//props.put("mail.debug", "true");
props.put("mail.smtp.ssl.enable", "true");
props.put("mail.smtp.starttls.enable", "true");
props.put("mail.smtp.port", smtp_port_setting);
session = Session.getInstance(props);
ActuallySendAsync_temp asy = new ActuallySendAsync_temp(true);
asy.execute();
return 0;
}
class ActuallySendAsync_temp extends AsyncTask<String, String, Void> {
public ActuallySendAsync_temp(boolean boo) {
// something to do before sending email
}
#Override
protected Void doInBackground(String... params) {
try {
Message message = new MimeMessage(session);
message.setFrom(new InternetAddress(username));
message.setRecipients(Message.RecipientType.TO,
InternetAddress.parse(recipient_email_address));
message.setSubject(email_subject);
Multipart multipart = new MimeMultipart();
MimeBodyPart messageBodyPart = new MimeBodyPart();
String file = mFileName;
/**/
DataSource source = new FileDataSource(file);
messageBodyPart.setDataHandler(new DataHandler(source));
/* /
File ff = new File(file);
try {
messageBodyPart.attachFile(ff);
} catch(IOException eio) {
Log.e("Message Error", "Old Macdonald");
}
/* /
messageBodyPart = new PreencodedMimeBodyPart("base64");
byte[] file_bytes = null;
File ff = new File(file);
try {
int length = (int) ff.length();
BufferedInputStream reader = new BufferedInputStream(new FileInputStream(ff));
file_bytes = new byte[length];
reader.read(file_bytes, 0, length);
reader.close();
} catch (IOException eio) {
Log.e("Message Error", "Old Macdonald");
}
messageBodyPart.setText(Base64.encodeToString(file_bytes, Base64.DEFAULT));
messageBodyPart.setHeader("Content-Transfer-Encoding", "base64");
/**/
messageBodyPart.setFileName( DEFAULT_AUDIO_FILENAME );//"AudioClip.mp3");
//messageBodyPart.setContentID("<audio_clip>");
String content_id = UUID.randomUUID().toString();
messageBodyPart.setContentID("<" + content_id + ">");
messageBodyPart.setDisposition(Part.ATTACHMENT);//INLINE);
messageBodyPart.setHeader("Content-Type", "audio/mp4");
multipart.addBodyPart(messageBodyPart);
MimeBodyPart messageBodyText = new MimeBodyPart();
//final String MY_HTML_MESSAGE = "<h1>My HTML</h1><a download=\"AudioClip.m4a\" href=\"cid:audio_clip\">iPhone Download</a>";
final String MY_HTML_MESSAGE = "<h1>My HTML</h1><a download=\"AudioClip.m4a\" href=\"cid:" + content_id + "\">iPhone Download</a>";
messageBodyText.setContent( MY_HTML_MESSAGE, "text/html");
multipart.addBodyPart(messageBodyText);
message.setContent(multipart);
Print_Message_To_Console(message);
Transport transport = session.getTransport("smtp");
transport.connect(smtp_host_setting, username, password);
transport.sendMessage(message, message.getAllRecipients());
transport.close();
} catch (MessagingException e) {
e.printStackTrace();
} finally {
}
return null;
}
#Override
protected void onPostExecute(Void aVoid) {
super.onPostExecute(aVoid);
// something to do after sending email
}
}
int Print_Message_To_Console(Message msg) {
int ret_val = 0;
int line_num = 0;
InputStream in = null;
InputStreamReader inputStreamReader = null;
BufferedReader buff_reader = null;
try {
in = msg.getInputStream();
inputStreamReader = new InputStreamReader(in);
buff_reader = new BufferedReader(inputStreamReader);
String temp = "";
while ((temp = buff_reader.readLine()) != null) {
Log.d("Message Line " + Integer.toString(line_num++), temp);
}
} catch(Exception e) {
Log.d("Message Lines", "------------ OOPS! ------------");
ret_val = 1;
} finally {
try {
if (buff_reader != null) buff_reader.close();
if (inputStreamReader != null) inputStreamReader.close();
if (in != null) in.close();
} catch(Exception e2) {
Log.d("Message Lines", "----------- OOPS! 2 -----------");
ret_val = 2;
}
}
return ret_val;
}
You need to create a multipart/related and set the main text part as the first body part.

Improvements in uploading of files, vaadin

I want to upload file to git without saving on local disk. I use vaadin + java in my webapp, and upload component from vaadin.
public OutputStream receiveUpload(String filename, String MIMEType)
{
this.filename = filename;
FileOutputStream fos = null;
try {
// exist any possibility to no saving file in filepath (only push
// to git)
fos = new FileOutputStream(new File(
filepath + File.separator + filename));
} catch (Exception e) {
// How to omit it, I don't want to save file in filepath...
return null;
}
return fos;
}
public void uploadSucceeded(Upload.SucceededEvent event)
{
try {
// this method read file from filepath. Exist any possibilty to
// transfer file from upload panel to here without saving this
// file in filepath ?
commitToGit(filepath + File.separator + filename);
} catch(Exception e) {
e.printStackTrace();
} finally {
// removing file from filepath, it is no comfortable for me
File file = new File(filepath + File.separator + filename);
if (file != null) {
file.delete();
}
}
}
Look here for the pipe functionality
Best way to Pipe InputStream to OutputStream
in the receiveUpload method you setup the pipe between the uploading file and your git connector.
The uploadSucceeded method is then not needed or can be used to cleanup resources.

Tika--Extracting Distinct Items from a Compound Document

Question:
Assume an email message with an attachment (assume a JPEG attachment). How do I parse (not using the Tika facade classes) the email message and return the distinct pieces--a) the email text contents and b) the email attachment?
Configuration:
Tika 1.2
Java 1.7
Details:
I have been able to properly parse email messages in basic email message formats. However, after the parsing, I need to know a) the email's text contents and b) the the contents of any attachment to the email. I will store these items in my database as essentially parent email with child attachments.
What I cannot figure out is how I can "get back" the distinct parts and know that the parent email has attachments and be able to separately store those attachments referenced to the mail. This is, I believe, essentially similar to extracting ZipFile contents.
Code Example:
private Message processDocument(String fullfilepath) {
try {
File filename = new File(fullfilepath) ;
return this.processDocument(filename) ;
} catch (NullPointerException npe) {
Message error = new Message(false) ;
error.appendErrorMessage("The file name was null.") ;
return error ;
}
}
private Message processDocument(File filename) {
InputStream stream = null;
try {
stream = new FileInputStream(filename) ;
} catch (FileNotFoundException fnfe) {
// TODO Auto-generated catch block
fnfe.printStackTrace();
System.out.println("FileNotFoundException") ;
return diag ;
}
int writelimit = -1 ;
ContentHandler texthandler = new BodyContentHandler(writelimit);
this.safehandlerbodytext = new SafeContentHandler(texthandler);
this.meta = new Metadata() ;
ParseContext context = new ParseContext() ;
AutoDetectParser autodetectparser = new AutoDetectParser() ;
try {
autodetectparser.parse(
stream,
texthandler,
meta,
context) ;
this.documenttype = meta.get("Content-Type") ;
diag.setSuccessful(true);
} catch (IOException ioe) {
// if the document stream could not be read
System.out.println("TikaTextExtractorHelper IOException " + ioe.getMessage()) ;
//FIXME -- add real handling
} catch (SAXException se) {
// if the SAX events could not be processed
System.out.println("TikaTextExtractorHelper SAXException " + se.getMessage()) ;
//FIXME -- add real handling
} catch (TikaException te) {
// if the document could not be parsed
System.out.println("TikaTextExtractorHelper TikaException " + te.getMessage()) ;
System.out.println("Exception Filename = " + filename.getName()) ;
//FIXME -- add real handling
}
}
When Tika hits an embedded document, it goes to the ParseContext to see if you have supplied a recursing parser. If you have, it'll use that to process any embedded resources. If you haven't, it'll skip.
So, what you probably want to do is something like:
public static class HandleEmbeddedParser extends AbstractParser {
public List<File> found = new ArrayList<File>();
Set<MediaType> getSupportedTypes(ParseContext context) {
// Return what you want to handle
HashSet<MediaType> types = new HashSet<MediaType>();
types.put(MediaType.application("pdf"));
types.put(MediaType.application("zip"));
return types;
}
void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context
) throws IOException {
// Do something with the child documents
// eg save to disk
File f = File.createTempFile("tika","tmp");
found.add(f);
FileOutputStream fout = new FileOutputStream(f);
IOUtils.copy(stream,fout);
fout.close();
}
}
ParseContext context = new ParseContext();
context.set(Parser.class, new HandleEmbeddedParser();
parser.parse(....);

how to upload a file

i try to upload a file from client to server
on the client side, i have a file input
on server side i have
private void uploadFile(final FileTransfer fileTransfer) {
String destinationFile = "/home/nat/test.xls";
InputStream fis = null;
FileOutputStream out = null;
byte buf[] = new byte[1024];
int len;
try {
fis = fileTransfer.getInputStream();
out = new FileOutputStream(new File(destinationFile));
while ((len = fis.read(buf)) > 0) {
out.write(buf, 0, len);
}
}
}
a file is created on the server, but it's empty
when i debug, i can see then fis is not null
any idea?
Here is a code extract of mine:
try {
File fileData = new File(fileTransfer.getFilename());
// Write the content (data) in the file
// Apache Commons IO: (FileUtils)
FileOutputStream fos = FileUtils.openOutputStream(fileData);
// Spring Utils: FileCopyUtils
FileCopyUtils.copy(fileTransfer.getInputStream(), fos);
// Alternative with Apache Commons IO
// FileUtils.copyInputStreamToFile(fileTransfer.getInputStream(), fileData);
// Send the file to a back-end service
myService.persistFile( fileData );
} catch (IOException ioex) {
log.error("Error with io")
}
return fileTransfer.getFilename(); // this is for my javascript callback fn
Apache Commons IO is a good library to use for such manipulations (I use Spring Utils as well). If you do not have a Spring context, use the commented alternative with Apache (check the syntax, it is not verified).

Resources