Apache Tika do not extract first line of the RTF file, It only extract last three char of first line. - apache-tika

I have added the RTF file in comment.Copy the following text in text editor and save as RTF format.
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("level1Missing.rtf"));
ParseContext pcontext = new ParseContext();
RTFParser rt = new RTFParser();
rt.parse(inputstream, handler, metadata, pcontext);
//getting the content of the document
System.out.println("Contents of the PDF :\n\n" + handler.toString());

In my view, Apache Tika has no problem. The criticality is in the rtf file; there is a \par less before {\line {\b Level1} : \par}.
You can try with this another simple file:
{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\f0\par
This is some {\b bold} text.\par
}
If you remove \par before This is some {\b bold} text.\par, tika will extract the last chars of the first line.

Related

Remove the WebKitFormBoundary in C#

I am working on the server that receives a file stream uploaded by multipart uploader.
But I got an additional WebKitFormBoundary.
If I remove it manually, it will work. So I tried the following code:
var fileStream = File.Create(#"C:\Users\myname\Desktop\myimage.png");
stream sr = new streamReader(myStream);
string myText = sr.ReadToEnd();
string newText = myText.Substring(myText.IndexOf("‰")); // remove header
byte[] byteArray = Encoding.ASCII.GetBytes(newText);
MemoryStream data = new MemoryStream(byteArray);
data.CopyTo(filestream);
If I use the above way to convert it to string, remove boundary and convert back to stream
the first character "‰" will become "?"
(ie. So ‰PNG will become ?PNG and the file becomes not readable.)
Any suggestions?
Where could I possible got wrong?
Thanks
This drove me nuts. Finally understood that if you have access to the request, you can access just the contents (with no header) like this:
var provider = new MultipartMemoryStreamProvider();
await Request.Content.ReadAsMultipartAsync(provider);
var file = await provider.Contents[0].ReadAsStreamAsync();
Hope this helps you, or someone with the same issue.
I have got the same issue but after investigating several blogs with applied several solutions, I got final working one. Please follow below code approach to fix it.
MemoryStream memoryStream = new MemoryStream(File.ReadAllBytes(filePath));
StreamReader streamReader = new StreamReader(memoryStream, Encoding.Default, true);
memoryStream.Seek(0, SeekOrigin.Begin);
string fileString = streamReader.ReadToEnd();
string fileData = fileString.Substring(0, fileString.IndexOf("\r\n\r\n") + 4);
string finalData = Regex.Replace(fileString, fileData, "");
var fileDataArr = Regex.Split(fileData, "\r\n|\r|\n").ToList();
var resultData = Regex.Replace(finalData, fileDataArr[0] + "--", "");
byte[] buffer = Encoding.Default.GetBytes(resultData);
Steps:
Convert your filedata into memory stream which can be used to read file content.
Use StreamReader to read file content and remove webkitformBoundary Header with default Encoding format.
Code To remove first 4 lines including webkitformBoundary from Top.
Code to remove webkitformBoundary from Footer.
Convert the string into Byte Array with default encoding format to maintain the file Encoding format.
Example:
WebKitFormBoundary Header
------WebKitFormBoundaryL1NUALe5NDrNt9S0 <br/>
Content-Disposition: form-data; name="userfile"; filename="BRtestfile1.pdf" <br/>
Content-Type: application/pdf <br/>
WebKitFormBoundary Footer
------WebKitFormBoundaryL1NUALe5NDrNt9S0-- <br/>

RestBuilder Plugin. How can I upload a file without creating a file?

Currently, I can upload files(exist) with Grails's RestBuilder.
However, I want to upload a file without creating a file .
I want to create binary data (= Text File) in a program and send it directly
Is it possible?
RestBuilder rest = new RestBuilder()
RestResponse resp = rest.post(url){
contentType("multipart/form-data")
setProperty("dataFile",[filePath])// <- it can
setProperty("dataFile",[ byte[] or inputStream() or String ? ])// <- Is it possible?
}
'''
I'm sure you figured this out already, but you can just use a String reference or a byte[] just as you can use File instances for the multipart request using RestBuilder. It should 'just work' e.g.
RestBuilder rest = new RestBuilder()
RestResponse response = rest.post(url) {
contentType 'multipart/form-data'
stringPart = 'hello' // String
bytePart = '68656c6c6f'.decode64() // byte[]
filePart = new File('/path/to/file.jpg') // File
}

How to attach a created file to mail mvc

As each user runs through my application I hold their data and dump it into a report as follows, which at the end is created into a pdf document and is later automatically downloaded on the users side(client-side). I now want to attach this document to an email and have it forwarded to them. This is where I have troubles with the attachment.
Code as follows:
ReportDocument rd = new ReportDocument();
rd.Load(Path.Combine(Server.MapPath("~/Reports/PP_RentalAgreement.rpt")));
rd.SetParameterValue("rent_agree_no", _1);
rd.SetParameterValue("r_initial", _2);
rd.SetParameterValue("r_f_name", _3);
rd.SetParameterValue("r_l_name", _4);
rd.SetParameterValue("r_id_no", _5);
rd.SetParameterValue("r_lic_no", _6);
rd.SetParameterValue("r_tel", _7);
rd.SetParameterValue("r_cell", _8);
rd.SetParameterValue("r_fax", _9);
Response.Buffer = false;
Response.ClearContent();
Response.ClearHeaders();
Stream st = rd.ExportToStream(CrystalDecisions.Shared.ExportFormatType.PortableDocFormat);
st.Seek(0, SeekOrigin.Begin);
if (ModelState.IsValid)
{
var m_message = new MailMessage();
m_message.To.Add(new MailAddress("JoeSoap#TextMail.com"));
m_message.Subject = "Pink Panther - Invoice";
m_message.Attachments.Add(new Attachment(st, "application/pdf", "Invoice.pdf"));
using (var smtp = new SmtpClient())
{
await smtp.SendMailAsync(m_message);
return RedirectToAction("Index");
}
}
I am getting an error on this line : m_message.Attachments.Add(new Attachment(st, "application/pdf", "Invoice.pdf")); saying The specified content type is invalid.
Someone suggested to me that I should specify a path however I am not actually saving this file anywhere
How am I able to allow the file to be attached and send it to the recipient?
The System.Net.Mail.Attachment class constructor with 3 overloads consist of these parameters:
public Attachment(System.IO.Stream contentStream, string name, string mediaType)
Hence, you're assigning name and content type in reversed order, which causing invalid content type problem at this code:
m_message.Attachments.Add(new Attachment(st, "application/pdf", "Invoice.pdf"));
The correct way is putting the file name as second argument like example below:
m_message.Attachments.Add(new Attachment(st, "Invoice.pdf", "application/pdf"));
Or using MediaTypeNames for content type setting:
m_message.Attachments.Add(new Attachment(st, "Invoice.pdf", MediaTypeNames.Application.Pdf));

Uri is not supported when saving pdf in server folder with nreco pdf generator

I have the following code:
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
htmlToPdf.PdfToolPath = "~/files/";
htmlToPdf.GeneratePdf(template);
Which throws the following error:
Uri is not supported when saving pdf in server folder with nreco pdf generator.
You will need to set a regular path to your file system like e.g. "C:\temp\myfolder\". Or use a . instead of ~ and backslashes:
htmlToPdf.PdfToolPath = ".\\files\\";
If NReco is able to deliver you an byte-array or a stream you should prefer this instead of a file and return it directly.
UPDATE:
After takeing a look into the documentation of NReco all you need to do is following:
var htmlToPdf = new NReco.PdfGenerator.HtmlToPdfConverter();
htmlToPdf.PdfToolPath = "<CORRECT_PATH_FOR_TOOL>";
var output = htmlToPdf.GeneratePdf(template);
System.IO.File.WriteAllBytes("<OUTPUT_PATH>", output);
This should create your pdf in the OUTPUT_PATH.
#OlaFW thanx for your effort.
I got my answer.
var pdfBytes = htmlToPdf.GeneratePdf(template);
string filePath = "/files/Myfile.pdf";
string Url = System.Web.Hosting.HostingEnvironment.MapPath(filePath);
System.IO.File.WriteAllBytes(Url, pdfBytes);

Reading XLS locally in asp (mvc)

thanks for reading, my doubt is the following, im trying to get data from an xls file but it has to be done locally, without uploading the file, i have done something similiar with txts files and works perfectly :
Function Send(ByVal file As HttpPostedFileBase) As ActionResult
Dim line As String
Dim textreader As System.IO.StreamReader = New StreamReader(file.InputStream)
While Not textreader.EndOfStream
line = textreader.ReadLine()
ViewBag.line = line
End While
Return View("Index")
End Function
but i cant do the same to the excel file, first of all, because i cant use the streamreader, so when using this code i dont know how to specify the dir of my xls file
Dim oApp As Excel.Application = New Excel.Application
Dim oWB As Excel.Workbook
Dim oSheet As Excel.Worksheet
oWB = oApp.Workbooks.Open(file.inputstream) <-- HERE IS WHERE I GET (AN OBVIOUS) ERROR
does anybody knows how to open the file locally? thanks for reading :)
There are several ways of doing it, one of them is accessing the file "database style"
...
...
string filePath = string.Format("C:\\TEST\\{0}.xlsx", Guid.NewGuid().ToString());
var fileStream = File.Create(filePath);
input.CopyTo(file.InputStream);
fileStream.Close();
string cn = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + filePath + ";Extended Properties=\"Excel 12.0;HDR=YES;\"";
string query = "SELECT * FROM SHEET_NAME";
conn= new OleDbConnection(cn);
conn.Open();
OleDbCommand woOleCommand = new OleDbCommand(query, conn);
DbDataReader result = woOleCommand.ExecuteReader();
// Read the DataReader...
...
So basically you query Sheets like tables, if they are indeed tables, this code might be what you're looking for.
On the other hand, if you still need to use automation, try something like this instead:
...
object missing = System.Reflection.Missing.Value;
wBook = (Excel._Workbook)xl.Workbooks.Open(filePath, false, false, missing, missing, missing, missing, missing, missing, missing, missing, missing, missing, missing, missing);
...
Reading InputStream
using (var fileStream = File.Create(filePath)) {
file.InputStream.CopyTo(fileStream);
}
// Now you got your stream on a file (filePath) so you can work with it.

Resources