Tika Error: 500 on processing pdf

Tika Error: 500 on processing pdf - apache-tika

What can possible go wrong in order to get an Error: 500 when I
curl -T file.pdf http://localhost:9998/tika > file.txt
a Tika server, where file.pdf is on the current directory and Tika is up on localhost:9998 ?
I have tried many different .pdf files to get their extract into .txt form but for most of them I get a 500 Error.
Thanks !

Related

Wireshark - Finding HTTP and application layer payload from a given pcap file

I am trying to trying to get the packets which contain application layer payloads like HTTP from a given pcap file.
I have tried using http in the Wireshark display filter. My doubt is that, is it the right way to get the http payload from the pcap file. Please help me on this.

Saving HTTP packets
To filter for http traffic in tshark, you would use a display filter (-Y). This is sample output showing what that would look like:
$ tshark -r input.pcap -Y http
25 1.051399 10.8.143.109 → server-13-35-127-122.sfo5.r.cloudfront.net HTTP
630 GET /online HTTP/1.1 0c:8d:db:90:cf:38 ← 6c:96:cf:d8:7f:e7
34 1.078368 server-13-35-127-122.sfo5.r.cloudfront.net → 10.8.143.109 HTTP
404 HTTP/1.1 304 Not Modified 6c:96:cf:d8:7f:e7 ← 0c:8d:db:90:cf:38
This shows them output as text (the default). To output them to a new file, use the -w flag:
$ tshark -r input.pcap -Y http -w modified.pcap
Export files
You can also export certain types of plaintext objects from tshark
$ output_folder="files"
$ tshark -r input.pcap --export-object http,$output_folder
$ ls $output_folder
example.png example.html ...
This article will walk you through generating a packet capture from which you can then export HTTP files.

How to create tar file with 7zip

I'm trying to create a tar file on windows using 7zip.
Most of the documents I found said to do something like this:
7z a -ttar -so dwt.tar dwt/
But when I tried to run it I got this error:
Command Line Error:
I won't write compressed data to a terminal
I'm currently using 7-Zip [64] 16.04
Any idea?

On Linux:
tar cf - <source folder> | 7z a -si <Destination archive>.tar.7z
from here
On Windows:
7za.exe a -ttar -so archive.tar source_files | 7za.exe a -si archive.tgz
from here.

I managed to do that making simply, with 7zip installed:
Right click on the folder you want to compress
Choose -7zip/add to file
Once there, on the new screen, on file type, you can choose 7z/tar/wim/zip
Choose tar, and there you go :)

From the manpage:
-so Write data to stdout (e.g. 7z x -so directory.tar.7z | tar xf -)
It does what you told it to. 7z can guess archive format from the file extension so it's enough to use
7z a archive.tar input/
To further compress as gzip you can use a pipe and a combination of stdin and stdout flags like in Tu.Ma.'s answer.

Batch file to check website status from a text file and restart service based on string

I need some batch guru to assist me in getting this resolved. I have a couple of files via which we are monitoring the response from the websites using wget. When the site is down we get the following response code in test1.txt:
Connecting to 10.x.x.x:443... failed: Bad file descriptor.
whilst when the site is running the response code in test2.txt is
Connecting to 10.x.x.x:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
I do not see any common pattern in both the above outputs based on which I can form a logic. Need some assistance in determining if from the outputs above
if the website is running, do nothing
if the website is down, start service.
Note, we need to do this only on the basis of the output from these files.
Tried the provided solution but it didn't work:
TestScript>wget-1.14.exe --spider --no-check-certificate https://somesite | find "Bad file descriptor" 1>nul
Spider mode enabled. Check if remote file exists.
--2015-10-08 18:15:21-- https://somesite
Connecting to 10.x.x.x:443... failed: Bad file descriptor.
TestScript>if errorlevel 1 (echo site is up ) else (echo site is down )
site is up

Pipe the output of wget to find to look for Bad file descriptor and then use errorlevel:
wget --spider http://someurl 2>&1 | find "Bad file descriptor" >nul
if errorlevel 1 (
echo site is up
) else (
echo site is down
)
2>&1 redirects the messages into the standard output so that it can be piped
--spider makes wget only check the url without saving the result
Alternatively use the file you already have:
if exist test1.txt find "Bad file descriptor" test1.txt >nul
if not errorlevel 1 (echo start the service)

This does not look like a tar archive

[root#c0002242 lfeng]# tar -zxvf /opt/test/ALLscripts.tar.gz -C /opt/test1
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
Could you please help me on this ?

Run the command
$ file ALLscripts.tar.gz
Compare the output, if it's gzip (as shown below) then use unzip tool to extract it
$ ALLscripts.tar.gz: gzip compressed data,from Unix

I was facing this error because my file was not downloaded yet and I was trying to extract it :).

wkhtmltopdf attempting to load from http rather than file

Here's an odd little problem that's led me to post my first question on SO. I am using wkhtmltopdf to convert an HTML document to a PDF as part of a Rails app. To do so, I am rendering the Rails web page to a static HTML file in a temp directory, copying a static header, footer and images to the same temp directory, then executing wkhtmltopdf using "system".
This works perfectly in Development and Test environments. In my Staging env, it does not. I suspected permissions at first, but the first couple of parts of that process (creating the HTML static files and copying them to the directory) are working. I can run wkhtmltopdf from the command line in that temp directory and get the expected outcome. Finally, I ran wkhtmltopdf via both "system" and backticks through the Rails console in staging environment, and here's what I get as output:
> `wkhtmltopdf --footer-html tmp/invoices/footer.html --header-html tmp/invoices/header.html -s Letter -L 0in -R 0in -T 0.5in -B 1in tmp/invoices/test.html tmp/invoices/this.pdf`
Loading pages (1/6)
QPainter::begin(): Returned false ] 10%
Error: Unable to write to destination
Error: Failed loading page http://tmp/invoices/test.html (sometimes it will work just to ignore this error with --load-error-handling ignore) => ""
Notice that last bit. I'm pointing to local files, but it's looking for them via http. OK, I think, maybe I need to be explicit and feed it the file:// protocol so it doesn't look for http. So I try this:
> system("wkhtmltopdf --footer-html file://Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/footer.html --header-html file://Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/header.html -s Letter -L 0in -R 0in -T 0.5in -B 1in file://Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/test.html file://Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/this.pdf")
Loading pages (1/6)
Error: Failed loading page file://library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/test.html (sometimes it will work just to ignore this error with --load-error-handling ignore)
=> false
Notice that this one fails with a lowercase "l" on Library. What the heck? (And no, it doesn't get any better with the recommendation to ignore the error with that switch.)
Any ideas? Is there a Rails or Ruby setting that would cause system commands to get rewritten? Is there an option I can add to wkhtmltopdf to make sure it loads from local file? I'm quite baffled. Thanks!

I have had success when using the absolute file path (notice the extra slash after the file://)
wkhtmltopdf --footer-html file:///Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/footer.html --header-html file:///Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/header.html -s Letter -L 0in -R 0in -T 0.5in -B 1in file:///Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/test.html file:///Library/Server/Web/Data/Sites/intranet-staging/current/tmp/invoices/this.pdf
This is the same on windows
Unix path
file:///absolute/path/to/file
Windows path
file:///C:/absolute/path/to/file

In last 0.11 whicked-pdf i found one bug
Example
C:\Ruby193\lib\ruby\gems\1.9.1\gems\wicked_pdf-0.11.0\lib>wicked_pdf.rb
Line 198 I change from:
options[hf][:html][:url] = "file://#{tf.path}" to options[hf][:html][:url] = "file:///#{tf.path}" - (change // to ///)
After change whicked-pdf again worked.

Take a look at the wicked_pdf gem.
You can add a PDF mime type and then whatever page you want pdf'd, just tack on a .pdf to the URL.
I am using this in prod and it works quite well.
No need to call wkhtmltopdf directly.

Categories

HOME

ruby-on-rails

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Tika Error: 500 on processing pdf - apache-tika

Related

Wireshark - Finding HTTP and application layer payload from a given pcap file

How to create tar file with 7zip

Batch file to check website status from a text file and restart service based on string

This does not look like a tar archive

wkhtmltopdf attempting to load from http rather than file

Categories

Resources