Get attachment names from Tika - apache-tika

I'm parsing an EML file (RFC822) using Tika, but for some reason I cannot get the attachment names. I get the body, to, cc, bcc, attachments text, etc but not the attachments names. Any ideas? Below is the code I'm using.
var handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File(#"C:\Users\test\Desktop\testemail.eml"));
ParseContext pcontext = new ParseContext();
var parser = new RFC822Parser();
parser.parse(inputstream, handler, metadata, pcontext);
Debug.WriteLine("Contents of the document:" + handler.toString());
Debug.WriteLine("Metadata of the document:");
String[] metadataNames = metadata.names();
foreach (String name in metadataNames)
{
Debug.WriteLine(name + ": " + metadata.get(name));
}

Related

path to resources directory in xamarin.android

I need the path to my resources directory to access my fonts folder inside it like the one in this code:
PdfFont russian = PdfFontFactory.createFont(
"src/main/resources/fonts/FreeSans.ttf", "CP1251", true);
but in Xamarin.android. I tried the following:
string uri = "android.resource://" + this.PackageName + "/font/ARIAL.TTF";
PdfFont russian = PdfFontFactory.CreateFont(
uri, "CP1251", true);
but it doesn't work. I tried this code too:
var path2 = global::Android.OS.Environment.ExternalStorageDirectory.AbsolutePath;
filePath = System.IO.Path.Combine(path2.ToString(), "myfile4.pdf");
stream = new FileStream(filePath, FileMode.Create);
PdfWriter writer = new PdfWriter(stream);
PdfDocument pdf2 = new iText.Kernel.Pdf.PdfDocument(writer);
Document document2 = new Document(pdf2, PageSize.A4);
AssetManager assets = this.Assets;
string content;
Stream stream2 = assets.Open("ARIAL.TTF");
var memorystrm = new MemoryStream();
stream2.CopyTo(memorystrm);
byte[] t = memorystrm.ToArray();
Toast.MakeText(this, t.Length.ToString(), ToastLength.Long);
if (t != null)
{
PdfFont russian = PdfFontFactory.CreateFont(t, "UTF-8", true);
document2.SetFont(russian);
Paragraph p = new Paragraph("Hello World! ")
.Add(new Text("صباح! ").SetFontSize(14)).Add(new Text("Bonjour le monde! ").SetFontSize(10));
document2.Add(p);
document2.Close();
Toast.MakeText(this, "done", ToastLength.Long);
}
else
{
Toast.MakeText(this, "error", ToastLength.Long);
}
no code was exceuted
The path of the folder of the Xamarin.Android project is different with the native Android project.
If you want to save the font file in the project to access the file, try to save the files in the Asset folder.Set the Build Action for this files to AndroidAsset.
string content;
AssetManager assets = this.Assets;
using (StreamReader sr = new StreamReader(assets.Open("read_asset.txt")))
{
content = sr.ReadToEnd();
}
Check the tutorial:
https://learn.microsoft.com/en-us/xamarin/android/app-fundamentals/resources-in-android/android-assets?tabs=windows
Update
i'll add my code, it didn't work. no code was executed
It seems that you forgot to add the .Show() code such as Toast.MakeText(this, "done", ToastLength.Long).Show().

Unable to parse .docx or .xlsx file using apache tika -1.6. Jarfiles are getting loaded, but it is not parsing

The last line is returning blank value.
Parser _autoParser = new AutoDetectParser();
ContentHandler textHandler = new BodyContentHandler(-1);
PDFParserConfig pdfConfig = new PDFParserConfig();
pdfConfig.setExtractInlineImages(true);
System.out.println("inside Tika");
Metadata metadata = new Metadata();
ParseContext contextParse = new ParseContext();
contextParse.set(PDFParserConfig.class, pdfConfig);
contextParse.set(Parser.class, _autoParser);
InputStream input = new FileInputStream(fLoc);
System.out.println("trying to read the file content");
_autoParser.parse(input, textHandler, metadata, contextParse);

How to put two jasperReports in one zip file to download?

public String generateReport() {
try
{
final FacesContext facesContext = FacesContext.getCurrentInstance();
final HttpServletResponse response = (HttpServletResponse) facesContext.getExternalContext().getResponse();
response.reset();
response.setHeader("Content-Disposition", "attachment; filename=\"" + "myReport.zip\";");
final BufferedOutputStream bos = new BufferedOutputStream(response.getOutputStream());
final ZipOutputStream zos = new ZipOutputStream(bos);
for (final PeriodScale periodScale : Scale.getPeriodScales(this.startDate, this.endDate))
{
final JasperPrint jasperPrint = JasperFillManager.fillReport(
this.reportsPath() + File.separator + "periodicScale.jasper",
this.parameters(this.reportsPath(), periodScale.getScale(),
periodScale.getStartDate(), periodScale.getEndDate()),
new JREmptyDataSource());
final byte[] bytes = JasperExportManager.exportReportToPdf(jasperPrint);
response.setContentLength(bytes.length);
final ZipEntry ze = new ZipEntry("periodicScale"+ periodScale.getStartDate() + ".pdf"); // periodicScale13032015.pdf for example
zos.putNextEntry(ze);
zos.write(bytes, 0, bytes.length);
zos.closeEntry();
}
zos.close();
facesContext.responseComplete();
}
catch (final Exception e)
{
e.printStackTrace();
}
return "";
}
This is my action method in the managedBean which is called by the user to print a JasperReport, but when I try to put more than one report inside the zip file it's not working.
getPeriodScales are returning two objects and JasperFillManager.fillReport is running correctly as the reports print when I just generate data for one report, when I try to stream two reports though and open in WinRar only one appears and I get an "unexpedted end of archive", in 7zip both appear but the second is corrupted.
What am I doing wrong or is there a way to stream multiple reports without zipping it?
I figured out what was, I was setting the contentLenght of the response with bytes.length size, but it should be bytes.length * Scale.getPeriodScales(this.startDate, this.endDate).size()
public JasperPrint generatePdf(long consumerNo) {
Consumer consumerByCustomerNo = consumerService.getConsumerByCustomerNo(consumerNo);
consumerList.add(consumerByCustomerNo);
BillHeaderIPOP billHeaderByConsumerNo = billHeaderService.getBillHeaderByConsumerNo(consumerNo);
Long billNo = billHeaderByConsumerNo.getBillNo();
List<BillLineItem> billLineItemByBilNo = billLineItemService.getBillLineItemByBilNo(billNo);
System.out.println(billLineItemByBilNo);
List<BillReadingLine> billReadingLineByBillNo = billReadingLineService.getBillReadingLineByBillNo(billNo);
File jrxmlFile = ResourceUtils.getFile("classpath:demo.jrxml");
JasperReport jasperReport = JasperCompileManager.compileReport(jrxmlFile.getAbsolutePath());
pdfContainer.setName(consumerByCustomerNo.getName());
pdfContainer.setTelephone(consumerByCustomerNo.getTelephone());
pdfContainer.setFromDate(billLineItemByBilNo.get(0).getStartDate());
pdfContainer.setToDate(billLineItemByBilNo.get(0).getEndDate());
pdfContainer.setSupplyAddress(consumerByCustomerNo.getSupplyAddress());
pdfContainer.setMeterNo(billReadingLineByBillNo.get(0).getMeterNo());
pdfContainer.setBillType(billHeaderByConsumerNo.getBillType());
pdfContainer.setReadingType(billReadingLineByBillNo.get(0).getReadingType());
pdfContainer.setLastBilledReadingInKWH(billReadingLineByBillNo.stream().filter(billReadingLine -> billReadingLine.getRegister().contains("KWH")).collect(Collectors.toList()).get(0).getLastBilledReading());
pdfContainer.setLastBilledReadingInKW(billReadingLineByBillNo.stream().filter(billReadingLine -> billReadingLine.getRegister().contains("KW")).collect(Collectors.toList()).get(0).getLastBilledReading());
pdfContainer.setReadingType(billReadingLineByBillNo.get(0).getReadingType());
pdfContainer.setRateCategory(billLineItemByBilNo.get(0).getRateCategory());
List<PdfContainer> pdfContainerList = new ArrayList<>();
pdfContainerList.add(pdfContainer);
Map<String, Object> parameters = new HashMap<>();
parameters.put("billLineItemByBilNo", billLineItemByBilNo);
parameters.put("billReadingLineByBillNo", billReadingLineByBillNo);
parameters.put("consumerList", consumerList);
parameters.put("pdfContainerList", pdfContainerList);
JasperPrint jasperPrint = JasperFillManager.fillReport(jasperReport, parameters, new JREmptyDataSource());
return jasperPrint;
}
//above code is accroding to my requirement , you just focus on the jasperPrint object which am returning , then jasperPrint object is being used for pdf generation , storing those pdf into a zip file .
#GetMapping("/batchpdf/{rangeFrom}/{rangeTo}")
public String batchPdfBill(#PathVariable("rangeFrom") long rangeFrom, #PathVariable("rangeTo") long rangeTo) throws JRException, IOException {
consumerNosInRange = consumerService.consumerNoByRange(rangeFrom, rangeTo);
String zipFilePath = "C:\\Users\\Barada\\Downloads";
FileOutputStream fos = new FileOutputStream(zipFilePath +"\\"+ rangeFrom +"-To-"+ rangeTo +"--"+ Math.random() + ".zip");
BufferedOutputStream bos = new BufferedOutputStream(fos);
ZipOutputStream outputStream = new ZipOutputStream(bos);
try {
for (long consumerNo : consumerNosInRange) {
JasperPrint jasperPrint = generatePdf(consumerNo);
byte[] bytes = JasperExportManager.exportReportToPdf(jasperPrint);
outputStream.putNextEntry(new ZipEntry(consumerNo + ".pdf"));
outputStream.write(bytes, 0, bytes.length);
outputStream.closeEntry();
}
} finally {
outputStream.close();
}
return "All Bills PDF Generated.. Extract ZIP file get all Bills";
}
}

Losing the input stream in Apache tika

I am getting the Input stream from the HttpRequest and using same input stream to extract the metadata. like as shown below.
ServletFileUpload upload = new ServletFileUpload();
FileItemIterator iter = upload.getItemIterator(request);
--- more lines for the iteration and getting the stream ------
InputStream input = item.openStream();
This input is getting passed to the parser as below
public Map<String, String> extractMetadata(InputStream is) {
Map<String,String> map = new HashMap<>();
ContentHandler contentHandler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
ParseContext parseContext = new ParseContext();
parseContext.set(Parser.class ,
new ParserDecorator(parser));
try {
TikaInputStream tikaInputStream = TikaInputStream.get(is);
parser.parse(tikaInputStream, contentHandler, metadata,parseContext);
for (String name : metadata.names()) {
map.put(name ,metadata.get(name));
}
} catch (IOException|SAXException|TikaException e) {
map.put("ERROR","Error while retriving Metadata");
}
return map;
}
But when I try to get the input stream then it is not same as if i dont use tika for extract.
Does Tika Dirty the stream ?

Error in converting HTML with images to PDF using itextsharp

In my application first am allowing the user to create html document using CKEDITOR where user can can create html document and can insert image, form fields etc. the generated HTML document is than converted into PDF.
If HTML document contains plain text than PDF file gets created successfully but if user inserts image in it than gives error.
code for creating PDF document.
public ActionResult CreateFile(FormCollection data)
{
var filename = data["filename"];
var htmlContent = data["content"];
string sFilePath = Server.MapPath(_createdPDF + filename + ".html");
htmlContent = htmlContent.Trim();
if (!System.IO.File.Exists(sFilePath))
{
using (FileStream fs = new FileStream(sFilePath, FileMode.Create))
{
using (StreamWriter w = new StreamWriter(fs, Encoding.UTF8))
{
w.Write(htmlContent);
}
}
createPDF(sFilePath);
}
return View();
}
private MemoryStream createPDF(string sFilePath)
{
string filename = Path.GetFileNameWithoutExtension(sFilePath);
string name = Server.MapPath(_createdPDF + filename + ".pdf");
MemoryStream ms = new MemoryStream();
TextReader tr = new StringReader(sFilePath);
Document document = new Document(PageSize.A4, 30, 30, 30, 30);
string urldir = Request.Url.GetLeftPart(UriPartial.Path);
urldir = urldir.Substring(0, urldir.LastIndexOf("/") + 1);
Response.Write(urldir);
PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(name, FileMode.Create));
document.Open();
string htmlText = "";
StreamReader sr;
sr = System.IO.File.OpenText(sFilePath);
htmlText = sr.ReadToEnd();
sr.Close();
WebClient wc = new WebClient();
Response.Write(htmlText);
var props = new Dictionary<string, Object>();
props["img_baseurl"] = #"C:\Documents and Settings\shubham\My Documents\visdatemplatemanger\visdatemplatemanger\";
List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null,props);
for (int k = 0; k < htmlarraylist.Count; k++)
{
document.Add((IElement)htmlarraylist[k]);
}
document.Close();
System.IO.File.Delete(sFilePath);
UploadURL(name);
return ms;
}
The error that i get if image is included in HTML document is:
Could not find a part of the path 'C:\Program Files\Common Files\Microsoft Shared\PDFimages\rectangle-shape.png'.
iTextSharp will try to resolve relative images for HTTP-based documents but ones served from the filesystem you need to either provide absolute paths or provide a base for it to search from.
//Image search base, path will be concatenated directly so make sure it contains a trailing slash
var props = new Dictionary<string, Object>();
props["img_baseurl"] = #"c:\images\";
//Include the props from above
htmlarraylist = HTMLWorker.ParseToList(sr, null, props);

Resources