In parsing an XML result in a blackberry application, which returns nodes in the form;
<searches>
<search id ='1234'>
<name> somename </name>
<address> some address </address>
<sector> some sector </sector>
<contacts> 12345, me#me.com </contacts>
<searches>
when a search is made which doesn't have any matches the result is returned empty, that is;
<name></name>
<address></address>
<sector></sector>
<contacts></contacts>
don't return with the results. It appears just as <searches></searches> . How do I specify a dialog alert in such cases when the search results return just <searches></searches>
here is my Http connection attached with the parser.
try{
HttpConnection connection = (HttpConnection)Connector.open("http://someurl.xml",Connector.READ_WRITE);
URLEncodedPostData postData = new URLEncodedPostData(URLEncodedPostData.DEFAULT_CHARSET, false);
postData.append("username", "someusername");
postData.append("password", "somepassword");
postData.append("term", word);
connection.setRequestMethod(HttpConnection.POST);
connection.setRequestProperty("Content-Type","application/x-www-form-urlencoded");
connection.setRequestProperty("User-Agent","Profile/MIDP-2.0 Configuration/CLDC-1.0");
OutputStream requestOut = connection.openOutputStream();
requestOut.write(postData.getBytes());
connection.getHeaderField("Content-type");
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
docBuilder.isValidating();
InputStream detailIn = connection.openInputStream();
doc = docBuilder.parse(detailIn);
requestOut.close();
connection.close();
NodeList list = doc.getElementsByTagName("name");
NodeList list1 = doc.getElementsByTagName("address");
NodeList list2 = doc.getElementsByTagName("sector");
NodeList list3 = doc.getElementsByTagName("contacts");
callback(list,list1,list2,list3);
requestOut.close();
connection.close();
}
catch(Exception ex){
System.out.println(ex.toString());
}
do I use if's or for ?
I don't have the Eclipse plug-in in front of me (so I can't test this code), but something like this should work:
doc = docBuilder.parse(detailIn);
requestOut.close();
connection.close();
NodeList list = doc.getElementsByTagName("name");
NodeList list1 = doc.getElementsByTagName("address");
NodeList list2 = doc.getElementsByTagName("sector");
NodeList list3 = doc.getElementsByTagName("contacts");
if (list == null || list.getLength() == 0) {
// no results, so post an alert on the UI thread
UiApplication.getUiApplication().invokeLater(new Runnable() {
public void run() {
Dialog.alert("No results found!");
}
});
}
This only tests the existence of the name element, assuming that if name is missing, so will the others (address, sector, and contacts). If that's not true for your application, you could choose to make the if statement check list1, list2, and list3 also.
Related
I'm trying to build a facturx using Mustang library in an API.
I have a (valid) XML string as entry
#PostMapping
public FxResponse createFacturX(#RequestBody XmlRequest request) throws IOException {
if(request.getVersion() == null) {
request.setVersion(2);
}
ByteArrayOutputStream output = new ByteArrayOutputStream();
log.debug("Converting to PDF/A-3u");
if ((request.getVersion() < 1) || (request.getVersion() > 2)) {
throw new IllegalArgumentException("invalid version");
}
PDFAConformanceLevel pdfaConformanceLevel;
switch (request.getConformanceLevel()) {
case "BASIC":
pdfaConformanceLevel = PDFAConformanceLevel.BASIC;
break;
case "ACCESSIBLE":
pdfaConformanceLevel = PDFAConformanceLevel.ACCESSIBLE;
break;
case "UNICODE":
pdfaConformanceLevel = PDFAConformanceLevel.UNICODE;
break;
default:
throw new IllegalArgumentException("invalid level");
}
System.out.println(Arrays.toString(request.getPdf().getBytes(StandardCharsets.UTF_8)));
byte[] xmlData = request.getXml().getBytes(StandardCharsets.UTF_8);
byte[] pdfData = Base64.getDecoder().decode(request.getPdf().getBytes(StandardCharsets.UTF_8));
ZUGFeRDExporterFromA1 ze = new ZUGFeRDExporterFromA1()
.setProducer("Mustang API")
.setCreator("Creator ME")
.setZUGFeRDVersion(request.getVersion())
.setConformanceLevel(pdfaConformanceLevel)
.load(pdfData);
ze.attachFile("factur-x.xml", xmlData, "text/xml", "Data");
ze.setXML(xmlData);
log.debug("Attaching ZUGFeRD-Data");
ze.disableAutoClose(true);
ze.export(output);
byte[] bytes = output.toByteArray();
InputStream inputStream = new ByteArrayInputStream(bytes);
byte[] pdfBytes = IOUtils.toByteArray(inputStream);
try {
Utils.facturxValidator(request.getXml());
} catch (Exception e) {
e.printStackTrace();
}
String encoded = Base64.getEncoder().encodeToString(pdfBytes);
return new FxResponse("OK", encoded);
}
I do retrieve a PDF as output but i have errors while validating the PDF.
On the other hand, my XML is fully valid
I tried to check if pdf was compliant or not to and I also tried mustang cli to check the ouput.
I do have the following failures:
<xml>
<info>
<version>2</version>
<profile>urn:cen.eu:en16931:2017#compliant#urn:factur-x.eu:1p0:basic</profile>
<validator version="2.4.0"/>
<rules>
<fired>466</fired>
<failed>3</failed>
</rules>
<duration unit="ms">5242</duration>
</info>
<messages>
<notice type="27" location="/*:CrossIndustryInvoice[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]/*:ExchangedDocumentContext[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]" criterion="ram:GuidelineSpecifiedDocumentContextParameter/ram:ID = $XR-CIUS-ID">[BR-DE-21] Das Element "Specification identifier" (BT-24) soll syntaktisch der Kennung des Standards XRechnung entsprechen. (From /xslt/XR_21/XRechnung-CII-validation.xslt)</notice>
<notice type="27" location="/*:CrossIndustryInvoice[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]/*:SupplyChainTradeTransaction[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]/*:ApplicableHeaderTradeAgreement[namespace-uri()='urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100'][1]/*:SellerTradeParty[namespace-uri()='urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100'][1]" criterion="ram:DefinedTradeContact">[BR-DE-2] Die Gruppe "SELLER CONTACT" (BG-6) muss übermittelt werden. (From /xslt/XR_21/XRechnung-CII-validation.xslt)</notice>
<notice type="27" location="/*:CrossIndustryInvoice[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]/*:SupplyChainTradeTransaction[namespace-uri()='urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100'][1]/*:ApplicableHeaderTradeSettlement[namespace-uri()='urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100'][1]/*:SpecifiedTradeSettlementPaymentMeans[namespace-uri()='urn:un:unece:uncefact:data:standard:ReusableAggregateBusinessInformationEntity:100'][1]" criterion="not(ram:ApplicableTradeSettlementFinancialCard) and not(/rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction/ram:ApplicableHeaderTradeSettlement/ram:SpecifiedTradePaymentTerms/ram:DirectDebitMandateID or /rsm:CrossIndustryInvoice/rsm:SupplyChainTradeTransaction/ram:ApplicableHeaderTradeSettlement/ram:CreditorReferenceID or ram:PayerPartyDebtorFinancialAccount/ram:IBANID)">[BR-DE-23-b] Wenn BT-81 "Payment means type code" einen Schlüssel für Überweisungen enthält (30, 58), dürfen BG-18 und BG-19 nicht übermittelt werden. (From /xslt/XR_21/XRechnung-CII-validation.xslt)</notice>
</messages>
<summary status="valid"/>
</xml>
My questions are the following :
How can I solve those failure
Are those errors have impact on my facturX ? I do validated them on a test site, It's not on production
I noticed that i forgot to set facturx profile, for example, EXTENDED:
ZUGFeRDExporterFromA1 ze = new ZUGFeRDExporterFromA1()
.setProducer("Mustang API")
.setCreator("Creator ME")
.setProfile("EXTENDED");
.setZUGFeRDVersion(request.getVersion())
.setConformanceLevel(pdfaConformanceLevel)
.load(pdfData);
ze.attachFile("factur-x.xml", xmlData, "text/xml", "Data");
ze.setXML(xmlData)
Available values are the following:
{"MINIMUM", new Profile("MINIMUM", "urn:factur-x.eu:1p0:minimum")},
{"BASICWL", new Profile("BASICWL", "urn:factur-x.eu:1p0:basicwl")},
{"BASIC", new Profile("BASIC", "urn:cen.eu:en16931:2017#compliant#urn:factur-x.eu:1p0:basic")},
{"EN16931", new Profile("EN16931", "urn:cen.eu:en16931:2017")},
{"EXTENDED", new Profile("EXTENDED", "urn:cen.eu:en16931:2017#conformant#urn:factur-x.eu:1p0:extended")},
{"XRECHNUNG", new Profile("XRECHNUNG", "urn:cen.eu:en16931:2017#compliant#urn:xoev-de:kosit:standard:xrechnung_2.1")}
It works fine now !
Build 3 Node cluster in testing environment and used Neo4j-JDBC connection to save JSON data into Neo4j.
When creating just 2000 nodes and 2000 relations through JSON statistics are: Total time to save topology data in Neo4j: 456688 ms and links size: 2000, nodes size: 2000.
Saved without checking duplicacy of nodes/relations(Removed checkVertex and checkRelation methods):
Total time to save topology data in Neo4j: 446979 ms and links size: 2000, nodes size: 4000 (As we are not checking duplicacy, double nodes has been created).
Code:
public Connection getConnection(String masterNodeIp, String password) throws Exception {
return(Connection)DriverManager.getConnection("jdbc:neo4j:http://"+masterNodeIp+"/?user=neo4j,password="+password+"");
}
//By iterating through edges, Added source and target nodes.
try {
for (Links link : topology.getL2links()) {
if(conn != null) {
long srcId = etGraphIdByUniquenessOfOrphan(clientId,link.getSrcMgmtIP());
GraphId srcGraphId = prepareGraphId(srcId, "DEVICE");
long tgtId = etGraphIdByUniquenessOfOrphan(clientId,link.getTgtMgmtIP());
GraphId tgtGraphId = prepareGraphId(tgtId, "DEVICE");
String srcQuery = createNode(conn, link, false,clientId,discProfileId,
srcGraphId);
if(srcQuery!=null && !srcQuery.isEmpty())
stmt.execute(srcQuery);
String tgtQuery = createNode(conn, link, true,clientId,discProfileId,
tgtGraphId);
if(tgtQuery != null && !tgtQuery.isEmpty())
stmt.execute(tgtQuery);
String relationQuery = processRelation(conn, link,srcGraphId,tgtGraphId);
if(relationQuery!=null && !relationQuery.isEmpty())
stmt.execute(relationQuery);
}
}
} catch(Exception e) {
System.out.println("Exception in processJsonData ::: "+e.getMessage());
throw e;
} finally {
stmt.close();
conn.close();
}
//Before creating node checked whether node is already existed or not in order to avoid duplicacy
private boolean checkVertex(Connection conn, String ip, String hostName, long clientId, long discPId, GraphId graphId) throws Exception{
Statement stmt = null;
ResultSet rs = null;
boolean result=false;
try {
stmt = conn.createStatement();
StringBuffer queryBuffer = new StringBuffer();
queryBuffer.append(" MATCH (node) WHERE node.id ='"+graphId.getId()+"' AND node.sourceType = '"+graphId.getSourceType()+"'");
queryBuffer.append(" RETURN node");
rs = (ResultSet) stmt.executeQuery(queryBuffer.toString());
while(rs.next()) {
result=true;
break;
}
} catch(Exception e) {
System.out.println("Exception in fetching node ::: "+e.getMessage());
throw e;
} finally {
rs.close();
stmt.close();
}
return result;
}
//Before creating Relation also checked duplicacy for relationships.
private boolean checkRelation(Connection conn, Links link, GraphId srcGraphId, GraphId tgtGraphId) throws SQLException {
Statement stmt = null;
ResultSet rs = null;
boolean result=false;
try {
stmt = conn.createStatement();
StringBuffer queryBuffer = new StringBuffer();
queryBuffer.append(" MATCH (src:resource)-[r:topology]->(tgt:resource) WHERE src.id='"+srcGraphId.getId()
+"' AND tgt.id='"+tgtGraphId.getId()+"' AND r.srcInt='"+link.getSrcInt()+"'AND r.tgtInt='"+link.getTgtInt()+"'");
queryBuffer.append(" RETURN r");
rs=(ResultSet) stmt.executeQuery(queryBuffer.toString());
while(rs.next()) {
result=true;
break;
}
}
catch(Exception e) {
System.out.println("Exception in fetching node ::: "+e.getMessage());
} finally {
rs.close();
stmt.close();
}
return result;
}
We created indexes for those duplicacy check queries but still performance is slow.
And also please let us know how to use "Node key" unique constraint in Java level so that we can skip once checkVertex query. We tried to catch "constraintViolationexception" and added log instead of throwing it but it's throwing exception not saving any nodes.
There are a lot of things that you can improve:
for mass data imports use the Java Driver directly, JDBC adds an indirection layer
Use parameters!
Use batching, either with UNWIND or by executing multiple prepared statemts as batch
Don't construct queries with literal values.
Make sure you have indexes/constraints for your keys. Your queries don't use any indexes because you didn't provide any labels!
Use MERGE if you don't want to have constraint exceptions.
Don't use StringBuffer, ever.
Use try-with-resources
Use executeUpdate
For Batching:
https://medium.com/#mesirii/5-tips-tricks-for-fast-batched-updates-of-graph-structures-with-neo4j-and-cypher-73c7f693c8cc
For parameters:
http://neo4j-contrib.github.io/neo4j-jdbc/#_minimum_viable_snippet
Background
I have banged my head against this for a while and not made much progress. I am generating MPEG_4 / AAC files in Android and sending them by email as .mp3 files. I know they aren't actually .mp3 files, but that allows Hotmail and Gmail to play them in Preview. They don't work on iPhone though, unless they are sent as .m4a files instead which breaks the Outlook / Gmail Preview.
So I have thought of a different approach which is to attach as a .mp3 file but have an HTML link in the email body which allows the attached file to be downloaded and specifies a .m4a file name. Gmail / Outlook users can click the attachment directly whereas iPhone users can use the HTML link.
Issue
I can send an email using JavaMail with HTML in it including a link which should be pointing at the attached file to allow download of that file by the link. Clicking on the link in Gmail (Chrome on PC) gives a 404 page and iPhone just ignores my clicking on the link.
Below is the code in which I generate a multipart message and assign a CID to the attachment which I then try to access using the link in the html part. It feels like I am close, but maybe that is an illusion. I'd be massively grateful if someone could help me fix it or save me the pain if it isn't possible.
private int send_email_temp(){
Properties props = new Properties();
props.put("mail.smtp.auth", "true");
props.put("mail.smtp.host", smtp_host_setting);
//props.put("mail.debug", "true");
props.put("mail.smtp.ssl.enable", "true");
props.put("mail.smtp.starttls.enable", "true");
props.put("mail.smtp.port", smtp_port_setting);
session = Session.getInstance(props);
ActuallySendAsync_temp asy = new ActuallySendAsync_temp(true);
asy.execute();
return 0;
}
class ActuallySendAsync_temp extends AsyncTask<String, String, Void> {
public ActuallySendAsync_temp(boolean boo) {
// something to do before sending email
}
#Override
protected Void doInBackground(String... params) {
try {
Message message = new MimeMessage(session);
message.setFrom(new InternetAddress(username));
message.setRecipients(Message.RecipientType.TO,
InternetAddress.parse(recipient_email_address));
message.setSubject(email_subject);
Multipart multipart = new MimeMultipart();
MimeBodyPart messageBodyPart = new MimeBodyPart();
String file = mFileName;
/**/
DataSource source = new FileDataSource(file);
messageBodyPart.setDataHandler(new DataHandler(source));
/* /
File ff = new File(file);
try {
messageBodyPart.attachFile(ff);
} catch(IOException eio) {
Log.e("Message Error", "Old Macdonald");
}
/* /
messageBodyPart = new PreencodedMimeBodyPart("base64");
byte[] file_bytes = null;
File ff = new File(file);
try {
int length = (int) ff.length();
BufferedInputStream reader = new BufferedInputStream(new FileInputStream(ff));
file_bytes = new byte[length];
reader.read(file_bytes, 0, length);
reader.close();
} catch (IOException eio) {
Log.e("Message Error", "Old Macdonald");
}
messageBodyPart.setText(Base64.encodeToString(file_bytes, Base64.DEFAULT));
messageBodyPart.setHeader("Content-Transfer-Encoding", "base64");
/**/
messageBodyPart.setFileName( DEFAULT_AUDIO_FILENAME );//"AudioClip.mp3");
//messageBodyPart.setContentID("<audio_clip>");
String content_id = UUID.randomUUID().toString();
messageBodyPart.setContentID("<" + content_id + ">");
messageBodyPart.setDisposition(Part.ATTACHMENT);//INLINE);
messageBodyPart.setHeader("Content-Type", "audio/mp4");
multipart.addBodyPart(messageBodyPart);
MimeBodyPart messageBodyText = new MimeBodyPart();
//final String MY_HTML_MESSAGE = "<h1>My HTML</h1><a download=\"AudioClip.m4a\" href=\"cid:audio_clip\">iPhone Download</a>";
final String MY_HTML_MESSAGE = "<h1>My HTML</h1><a download=\"AudioClip.m4a\" href=\"cid:" + content_id + "\">iPhone Download</a>";
messageBodyText.setContent( MY_HTML_MESSAGE, "text/html");
multipart.addBodyPart(messageBodyText);
message.setContent(multipart);
Print_Message_To_Console(message);
Transport transport = session.getTransport("smtp");
transport.connect(smtp_host_setting, username, password);
transport.sendMessage(message, message.getAllRecipients());
transport.close();
} catch (MessagingException e) {
e.printStackTrace();
} finally {
}
return null;
}
#Override
protected void onPostExecute(Void aVoid) {
super.onPostExecute(aVoid);
// something to do after sending email
}
}
int Print_Message_To_Console(Message msg) {
int ret_val = 0;
int line_num = 0;
InputStream in = null;
InputStreamReader inputStreamReader = null;
BufferedReader buff_reader = null;
try {
in = msg.getInputStream();
inputStreamReader = new InputStreamReader(in);
buff_reader = new BufferedReader(inputStreamReader);
String temp = "";
while ((temp = buff_reader.readLine()) != null) {
Log.d("Message Line " + Integer.toString(line_num++), temp);
}
} catch(Exception e) {
Log.d("Message Lines", "------------ OOPS! ------------");
ret_val = 1;
} finally {
try {
if (buff_reader != null) buff_reader.close();
if (inputStreamReader != null) inputStreamReader.close();
if (in != null) in.close();
} catch(Exception e2) {
Log.d("Message Lines", "----------- OOPS! 2 -----------");
ret_val = 2;
}
}
return ret_val;
}
You need to create a multipart/related and set the main text part as the first body part.
I'm trying to code sql access to a database using sqljocky in Dart. As I want to make some computation with the result returned by my database Handler, the method return a Future.
But when I try to run it, I'm getting the following error:
Uncaught Error: The null object does not have a method 'then'`
I've ran the debugger and found that this error raise on:
db.query('select * from user where email="$email"').then(...)
but the catchError clause doesn't fire.
My handler method is:
// db is a ConnectionPool
Future<Map<String,String>> queryUser(String email){
print(email);
db.query('select * from user where email="${email}"').then((result) { // here raise the error
Map<String,String> results = new Map<String,String>();
result.forEach((row){
results['status'] = '200';
results['ID'] = row[0];
results['Image'] = row[1];
results['Name'] = row[2];
results['Email'] = row[3];
results['Password'] = row[4];
});
return results;
}).catchError((error){
Map<String,String> results = new Map<String,String>();
results['status'] = '500';
return results;
});
}
And the method that call this handler is:
List getUser(String email) {
Future<Map<String,String>> result = dbhandler.queryUser(email);
result.then((Map<String,String> result) {
String statuscode = result['status'];
result.remove('status');
String json = JSON.encode(result);
List pair = new List();
pair.add(statuscode);
pair.add(json);
return pair;
});
If I run the query directly in phpmyadmin, it return correct data, so it is correct.
Can someone give me a hint about how to solve it?
The queryUser() method will always return null, as there is no return statement. In the next release of Dart there will be a static hint warning for this, but at the moment there is none.
Perhaps the code below is what you meant to do. Note the initial return statement before db.query(), and the extra result.toList() call. I haven't tested this, so there's probably a typo or two.
Future<Map<String,String>> queryUser(String email){
print(email);
return db.query('select * from user where email="${email}"')
.then((result) => result.toList())
.then((rows) {
var row = rows.single;
Map<String,String> results = new Map<String,String>();
results['status'] = '200';
results['ID'] = row[0];
results['Image'] = row[1];
results['Name'] = row[2];
results['Email'] = row[3];
results['Password'] = row[4];
return results;
}).catchError((error){
Map<String,String> results = new Map<String,String>();
results['status'] = '500';
return results;
});
}
You can also make this a bit cuter using map literals:
Future<Map<String,String>> queryUser(String email){
return db.query('select * from user where email="${email}"')
.then((result) => result.toList())
.then((rows) => <String, String> {
'status': '200',
'ID': rows.single[0],
'Image': rows.single[1],
'Name': rows.single[2],
'Email': rows.single[3],
'Password': rows.single[4] })
.catchError((error) => <String, String> {'status': '500'});
}
Finally I found the answer using Completer to control the Future object, but the real problem was, as Greg Lowe said, that my methods doesn't return anything as they come to end before the then clause.
Using completer, I've made my query method as:
Future<Map<String,String>> queryUser(String email){
Completer c = new Completer();
db.query('select * from user where email="$email"').then((result) {
Map<String,String> results = new Map<String,String>();
result.forEach((row){
results['status'] = '200';
results['ID'] = row[0].toString();
results['Image'] = row[1];
results['Name'] = row[2];
results['Email'] = row[3];
results['Password'] = row[4];
}).then((onValue){
c.complete(results);
});
}).catchError((error){
Map<String,String> results = new Map<String,String>();
results['status'] = '500';
c.completeError((e) => print("error en queryUser"));
});
return c.future;
}
I also solved an error when using the foreach method, at first I supposed it return nothing, but after that, I noticed that it return a Future, so I added a then clause.
And my getUser method:
Future<List> getUser(String email) {
Completer c = new Completer();
Future<Map<String,String>> result = dbhandler.queryUser(email);
result.then((Map<String,String> result) {
String statuscode = result['status'];
result.remove('status');
String json = JSON.encode(result);
List pair = new List();
pair.add(statuscode);
pair.add(json);
c.complete(pair);
});
return c.future;
}
After those changes, everything works right
Question:
Assume an email message with an attachment (assume a JPEG attachment). How do I parse (not using the Tika facade classes) the email message and return the distinct pieces--a) the email text contents and b) the email attachment?
Configuration:
Tika 1.2
Java 1.7
Details:
I have been able to properly parse email messages in basic email message formats. However, after the parsing, I need to know a) the email's text contents and b) the the contents of any attachment to the email. I will store these items in my database as essentially parent email with child attachments.
What I cannot figure out is how I can "get back" the distinct parts and know that the parent email has attachments and be able to separately store those attachments referenced to the mail. This is, I believe, essentially similar to extracting ZipFile contents.
Code Example:
private Message processDocument(String fullfilepath) {
try {
File filename = new File(fullfilepath) ;
return this.processDocument(filename) ;
} catch (NullPointerException npe) {
Message error = new Message(false) ;
error.appendErrorMessage("The file name was null.") ;
return error ;
}
}
private Message processDocument(File filename) {
InputStream stream = null;
try {
stream = new FileInputStream(filename) ;
} catch (FileNotFoundException fnfe) {
// TODO Auto-generated catch block
fnfe.printStackTrace();
System.out.println("FileNotFoundException") ;
return diag ;
}
int writelimit = -1 ;
ContentHandler texthandler = new BodyContentHandler(writelimit);
this.safehandlerbodytext = new SafeContentHandler(texthandler);
this.meta = new Metadata() ;
ParseContext context = new ParseContext() ;
AutoDetectParser autodetectparser = new AutoDetectParser() ;
try {
autodetectparser.parse(
stream,
texthandler,
meta,
context) ;
this.documenttype = meta.get("Content-Type") ;
diag.setSuccessful(true);
} catch (IOException ioe) {
// if the document stream could not be read
System.out.println("TikaTextExtractorHelper IOException " + ioe.getMessage()) ;
//FIXME -- add real handling
} catch (SAXException se) {
// if the SAX events could not be processed
System.out.println("TikaTextExtractorHelper SAXException " + se.getMessage()) ;
//FIXME -- add real handling
} catch (TikaException te) {
// if the document could not be parsed
System.out.println("TikaTextExtractorHelper TikaException " + te.getMessage()) ;
System.out.println("Exception Filename = " + filename.getName()) ;
//FIXME -- add real handling
}
}
When Tika hits an embedded document, it goes to the ParseContext to see if you have supplied a recursing parser. If you have, it'll use that to process any embedded resources. If you haven't, it'll skip.
So, what you probably want to do is something like:
public static class HandleEmbeddedParser extends AbstractParser {
public List<File> found = new ArrayList<File>();
Set<MediaType> getSupportedTypes(ParseContext context) {
// Return what you want to handle
HashSet<MediaType> types = new HashSet<MediaType>();
types.put(MediaType.application("pdf"));
types.put(MediaType.application("zip"));
return types;
}
void parse(
InputStream stream, ContentHandler handler,
Metadata metadata, ParseContext context
) throws IOException {
// Do something with the child documents
// eg save to disk
File f = File.createTempFile("tika","tmp");
found.add(f);
FileOutputStream fout = new FileOutputStream(f);
IOUtils.copy(stream,fout);
fout.close();
}
}
ParseContext context = new ParseContext();
context.set(Parser.class, new HandleEmbeddedParser();
parser.parse(....);