Turning on language detection using Tika Server? - apache-tika

I'm trying to do language detection using Tika Server. Is there a way of requesting this?

Didn't do much digging around this topic, but I needed this feature in a Docker setup. The official documentation isn't really helpful, but it seems this is now available via the /meta endpoint - at least in 1.14. Example usage:
curl -T file.txt http://127.0.0.1:9998/meta --header "Accept: application/json"
Response will be similar to:
{"language":"en", "Content-Encoding":"ISO-8859-1","Content-Type":"text/plain; charset\u003dISO-8859-1","X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.txt.TXTParser"]}
For my particular need, I've used this Docker image.
Hope that helps!

Just start the Tika CLI Server as normal, with the --language flag as your option
eg in one window do:
$ java -jar tika-app-1.4.jar --language --server 1234
Then in another do:
$ nc localhost 1234 < test.txt
en
$ nc localhost 1234 < spanish.txt
es
$ nc localhost 1234 < french.txt
fr
Pass in the text, and you'll get back the detected language
For the full list of the different modes that the Tika CLI supports, just run it with --help

import org.apache.tika.language.LanguageIdentifier;
public class Test
{
/**
* Tika language detection. Take a glance to the org.apache.tika.language.LanguageIdentifier class API.
* #param args Command line arguments.
*/
public static void main(String[] args) {
String sTextFr = "Texte en français. Il doit ĂȘtre assez long pour permettre l'analyse.";
String sTextEn = "This is an english text.";
LanguageIdentifier lin = new LanguageIdentifier(sTextFr);
System.out.println(String.format("Language (french sentence): %s", lin.getLanguage()));
lin = new LanguageIdentifier(sTextEn);
System.out.println(String.format("Language (english sentence): %s", lin.getLanguage()));
}
}

Related

How to use simple hello world example on opa server

I have defined a file with name - play.rego
package play
default hello = false
hello {
m := input.message
m == "world"
}
I also have file called -input.json
{ "message": "world"}
I now want to use the policy to evaluate on input data using opa server -
opa run --server
I also then registered the policy using below command -
curl -X PUT http://localhost:8181/v1/policies/play --data-binary #play.rego
and then I run below command for evaluating policy on the query -
curl -X POST http://localhost:8181/v1/policies/v1/data/play --data-binary '{"message": "world"}'
But the server always responds with nothing.
I need help fixing the problem?
The URL of the second request is wrong (should not contain v1/policies), and the v1 API requires you to wrap the input document inside an input attribute. Try:
curl -X POST http://localhost:8181/v1/data/play --data-binary '{"input":{"message": "world"}}'

Formatting nmap output

I have an nmap output looking like this
Nmap scan report for 10.90.108.82
Host is up (0.16s latency).
PORT STATE SERVICE
80/tcp open http
|_http-title: Did not follow redirect to https://10.90.108.82/view/login.html
I would like the output to be like
10.90.108.82 http-title: Did not follow redirect to https://10.90.108.82/view/login.html
How can it be done using grep or any other means?
You can use the following nmap.sh script like that:
<nmap_command> | ./nmap.sh
nmap.sh:
#!/usr/bin/env sh
var="$(cat /dev/stdin)"
file=$(mktemp)
echo "$var" > "$file"
ip_address=$(head -1 "$file" | rev | cut -d ' ' -f1 | rev)
last_line=$(tail -1 "$file" | sed -E "s,^\|_, ,")
printf "%s%s\n" "$ip_address" "$last_line"
rm "$file"
If you do not mind using a programming language, check out this code snippet with Python:
import nmapthon as nm
scanner = nm.NmapScanner('10.90.108.82', ports=[80], arguments='-sS -sV --script http-title')
scanner.run()
if '10.90.108.82' in scanner.scanned_hosts(): # Check if host responded
serv = scanner.service('10.90.108.82', 'tcp', 80)
if serv is not None: # Check if service was identified
print(serv['http-title'])
Do not forget to execute pip3 install nmapthon.
I am the author of the library, feel free to have a look here
Looks like you want an [nmap scan] output to be edited and displayed as you wish. Try bash scripting, code a bash script and run it.
Here's an link to a video where you might find an answer to your problem:
https://youtu.be/lZAoFs75_cs
Watch the video from the Time Stamp 1:27:17 where the creator briefly describes how to cut-short an output and display it as we wish.
If you require, I could code an bash script to execute an cut-shorted version of the output given by an nmap scan.

monitoring the number of RDP users using nagios core?

I'm using Nagios Core 4.3.4. Is there any way to monitor the number of users connected to the server RDP on a Windows server like nrpe check_users? Please tell me if you have.
you would have to write your own check for this.
In your check you could call a powershell script on the server (but it depends on your windows version):
ipmo RemoteDesktop # 1. import the remotedesktop module
$(Get-RDUserSession).count # 2. print the count of the session
But there is another approach mentioned on monitoring-portal.org site. It's in german, so I try to translate:
1.) read window performance counters with nsclient:
c:\program files\nsclient\nsclient++.exe -noboot CheckSystem listpdh >counters_list.txt
2.) define the command (where -s $USER7$ is the passphrase to establishe the connection
define command{
command_name check_nt_Counter_User
command_line $USER1$/check_nt -H $HOSTADDRESS$ -s $USER7$ -p 12489 -v COUNTER -l $ARG1$ -w $ARG2$ -c $ARG3$ -d SHOWALL
}
3.) define the service
define service{
service_description RDP-Sessions
host_name TerminalSrv
use sometemplate
check_command check_nt_Counter_User!"\\Terminalservices\\active sessions","RDP-User active","users"!18!20
notes get count of active sessions
process_perf_data 1
notifications_enabled 0
}

Curl a URL list from a file and make it faster with parallel

Right now i'm using the followin code:
while read num;
do M=$(curl "myurl/$num")
echo "$M"
done < s.txt
where s.txt contains a list (1 per line) of a part of the url.
Is it correct to assume that curl is running sequentially?
Or is it running in thread/jobs/multiple conn at a time?
I've found this online:
parallel -k curl -s "http://example.com/locations/city?limit=100\&offset={}" ::: $(seq 100 100 30000) > out.txt
The problem is that my sequence is coming from a file or from a variable (one element per line) and i can't adapt it to my needs
I've not fully understood how to pass the list to parallel
Should i save all the curl commands in the list and run it with parallel -a ?
Regards,
parallel -j100 -k curl myurl/{} < s.txt
Consider spending an hour walking through man parallel_tutorial. Your command line will love you for it.

Cron-like application of groovy script with console plugin environment?

We have an application that we would like to run a script on just like we do in the console window with access to the applications libraries and context, but we need to run it periodically like a cron job.
While the permanent answer is obviously a Quartz job, we need to the do this before we are able to patch the application.
Is there something available that gives us the same environment as the console-plugin but can be run via command-line or without a UI?
you can run a console script like the web interface does but just with a curl like this:
curl -F 'code=
class A {
def name
}
def foo = new A(name: "bar")
println foo.name
' localhost:8080/console/execute
You'll get the response as the console would print below.
With regard to #mwaisgold 's solution above, I made a couple of quick additions that helped. I added a little bit more to the script to handle authentication, plus the -F flag for curl caused an ambiguous method overloading error with the GroovyShell's evaluate method, so I addressed that by using the -d instead:
#/bin/bash
curl -i -H "Content-type: application/x-www-form-urlencoded" -c cookies.txt -X POST localhost:8080/myapp/j_spring_security_check -d "j_username=admin&j_password=admin"
curl -i -b cookies.txt -d 'code=
int iterations = 0
while (iterations < 10) {
log.error "********** Console Cron Test ${iterations++} ***********"
}
log.error "********** Console Cron Test Complete ***********"
' localhost:8080/myapp/console/execute

Resources