Does anyone know how to get the tdb2.dump command to actually do anything - jena

I'm trying to dump a jena database as triples.
There seems to be a command that sounds perfectly suited to the task: tdb2.dump
jena#debian-clean:~$ ./apache-jena-3.8.0/bin/tdb2.tdbdump --help
tdbdump : Write a dataset to stdout (defaults to N-Quads)
Output control
--output=FMT Output in the given format, streaming if possible.
--formatted=FMT Output, using pretty printing (consumes memory)
--stream=FMT Output, using a streaming format
--compress Compress the output with gzip
Location
--loc=DIR Location (a directory)
--tdb= Assembler description file
Symbol definition
--set Set a configuration symbol to a value
--mem=FILE Execute on an in-memory TDB database (for testing)
--desc= Assembler description file
General
-v --verbose Verbose
-q --quiet Run with minimal output
--debug Output information for debugging
--help
--version Version information
--strict Operate in strict SPARQL mode (no extensions of any kind)
jena#debian-clean:~$
But I've not succeded in getting it to write anything to STDOUT.
When I use the --loc parameter to point to a DB, a new copy of that DB appears in the subfolder: Data-0001, but nothing appears in STDOUT.
When I try the --tdb parameter, and point it to a ttl file, I get a stack trace complaining about its formatting.
Google has turned up the Jena documentation telling me the command exists, and that's it. So any help appreciated.

"--loc" should be the same as used to create the database.
Suppose that's "DB2". For TDB2 (not TDB1) after the database is created, then "DB2/Data-0001" will already exist. Do not use this for --loc. Use "--loc DB2".
If it is a TDB1 database (the files are in the directory at "--loc", no "Datat-0001"), the use tdbdump. An empty database has no triples/quads in it so you would get no output.
Fuseki currently (up to 3.16.0) has to be called with the same setup each time it is run, which is fragile regarding TDB1/TDB2. If you created the TDB2 database outside Fuseki and only use command line args, you'll need "--tdb2" each time.
Fuseki in next release (3.17.0) detects existing database type.

Related

Apache Jena: riot does not produce output

I recently installed Apache Jena 3.17.0, and have been trying to use it to convert nquads files to ntriples.
As per the instructions, here (https://jena.apache.org/documentation/tools/) I first set up my WSL (Ubuntu 20.04) environment
$ export JENA_HOME=apache-jena-3.17.0/
$ export PATH=$PATH:$JENA_HOME/bin
and then attempted to run riot to do the conversion (triail.nq is my nquads file).
$ riot --output=NTRIPLES -v triail.nq
When I ran this, I got no output to the terminal. I'm not sure what is going wrong here, since there is no error message. Does anyone know what could be causing this / what the solution could be?
Thanks in advance!
The command will read the quad (multiple graph) data and output only the default graph. Presumably there is no default graph data in triail.nq.
If "convert" means combine all the quads into a single graph, then remove the graph field on each line of the data file with a text editor.
Otherwise, read into a RDF dataset and copy the named graphs into a single graph and output that.

How to get a program's std-out to fluentd (without docker)

Scenario:
You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.
Question:
How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?
When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.
Thoughts:
Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?
I like the 'tail' input plugin, which could be what I am looking for, but then:
the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...
I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).
TL;DR
std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
Windows: use fluent-bit.
App side config: use single (or predictable) log locations, and the
fluentd/fluent-bit 'in_tail' plugin.
logging general:
It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):
My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt
This opens the option for fluentd to read from this/these file(s).
Windows:
For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.
fluent-bit supports:
the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.
Appl. side config:
The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.
Fluent-bit setup/config:
For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).
For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).
There are 2 execution methods:
Providing configuration directly via the commandline
Using a config file (example included in zip), and referring to it with the -c flag.
Directly from commandline
Some example executions (without making use of the option to work with a configuration file) can be found here:
PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'
-i declares the input method. Currently, only a few plugins have been implemented, see the man page below.
PS fluent-bit.exe --help
Available Options
-b --storage_path=PATH specify a storage buffering path
-c --config=FILE specify an optional configuration file
-f, --flush=SECONDS flush timeout in seconds (default: 5)
-F --filter=FILTER set a filter
-i, --input=INPUT set an input
-m, --match=MATCH set plugin match, same as '-p match=abc'
-o, --output=OUTPUT set an output
-p, --prop="A=B" set plugin configuration property
-R, --parser=FILE specify a parser configuration file
-e, --plugin=FILE load an external plugin (shared lib)
-l, --log_file=FILE write log info to a file
-t, --tag=TAG set plugin tag, same as '-p tag=abc'
-T, --sp-task=SQL define a stream processor task
-v, --verbose increase logging verbosity (default: info)
-s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
-q, --quiet quiet mode
-S, --sosreport support report for Enterprise customers
-V, --version show version number
-h, --help print this help
Inputs
tail Tail files
dummy Generate dummy data
statsd StatsD input plugin
winlog Windows Event Log
tcp TCP
forward Fluentd in-forward
random Random
Outputs
counter Records counter
datadog Send events to DataDog HTTP Event Collector
es Elasticsearch
file Generate log file
forward Forward (Fluentd protocol)
http HTTP Output
influxdb InfluxDB Time Series
null Throws away events
slack Send events to a Slack channel
splunk Send events to Splunk HTTP Event Collector
stackdriver Send events to Google Stackdriver Logging
stdout Prints events to STDOUT
tcp TCP Output
flowcounter FlowCounter
Filters
aws Add AWS Metadata
expect Validate expected keys and values
record_modifier modify record
rewrite_tag Rewrite records tags
throttle Throttle messages using sliding window algorithm
grep grep events by specified field values
kubernetes Filter to append Kubernetes metadata
parser Parse events
nest nest events by specified field values
modify modify records by applying rules
lua Lua Scripting Filter
stdout Filter events to STDOUT

If neo4j-shell is deprecated then how do I dump the contents of the database (for backup)

I've just been looking into how to backup the database and have found that neo4j-shell -c dump > my-db-dumb.cql looks like a good solution, which exports everything to a cypher query which creates everything when run (a bit like mysqldump for MySQL).
However, according to the official documentation, neo4j-shell has beed deprecated in favour of cypher-shell, and I can't find the equivalent dump function for cypher-shell. Is there one? If not, what should I do instead of neo4j-shell -c dump? Or is there a better way of backing up the database (I have the community edition)? One advantage of the above solution is you don't have to stop the database.
The most useful option is to shutdown the data and then take backups using the new neo4j-admin command.
If you cannot shutdown the graph, then you can manually copy the "graph.db" directory to someplace else, and then use neo4j-shell using the -path option on new location. As far as version 3.1.1 is concerned, the neo4j-shell is working perfectly.

Cayley docker isn't writing data

I'm using [or, trying to use] the docker cayley from here: https://github.com/saidimu/cayley-docker
I created a data dir at /var/lib/apps/cayley/data and dropped the .cfg file in there that I was instructed to make:
{
"database": "myapp",
"db_path": "/var/lib/apps/cayley/data",
"listen_host": "0.0.0.0"
}
I ran docker cayley with:
docker run -d -p 64210:64210 -v /var/lib/apps/cayley/data/:/data saidimu/cayley:v0.4.0
and it runs fine, I'm looking at it's UI in the browser:
And I add a triple or two, and I get success messages.
Then I go to the query interface and try to list any vertices:
> g.V
and there is nothing to be found (I think):
{
"result": null
}
and there is nothing written in the data directory I created.
Any ideas why data isn't being written?
Edit: just to be sure there wasn't something wrong with my data directory, I ran the local volume mounted docker for neo4j on the same directory and it saved data correctly. So, that eliminates some possibilities.
I can not comment yet but I think to obtain results from your query you need to use the All keyword
g.V().All() // Print All the vertices
OR
g.V().Limit(20) // Limits the results to 20
If that was not your problem I can edit and share my dockerfile which is derived from the same docker-file that you are using.
You may refer to the lib here to learn more about how to use Cayley's APIs and the data format in Cayley and some other stuff like N-Triples N-quads and RDF:
Cayley APIs usages examples (mocha test cases)
Clearly designed entry-level N-quads data for getting you in: in the project test/data directory

VBS printer script executing error

I have some trouble executing/using vbs scripts linked to printers. They are located in %windir%/System32/Printing_Admin_Scripts
The objective is to plan a weekly print task to preserve ink cartridge
Looking at the scripts, everything was available for me to create this task
The main script to use is prnqctl.vbs
Before to create my task, I have tried to test the script and this is what I got (sorry for the french version, I will try to update the screenshot in english later):
There is obviously something wrong.
I have tried to google the error code, nothing conclusive.
I have tried to run my script in admin mode and also under admin session, same problem
I have made some research on CIMWin32, it seems to be a dll and I can find it in some locations of my filesystem
My OS is W8.1.
If anybody has a suggestion or solution, I'm interested in
==>cscript C:\Windows\System32\Printing_Admin_Scripts\en-US\prnqctl.vbs -e
Unable to get printer instance. Error 0x80041002 Not found
Operation GetObject
Provider CIMWin32
Description
Win32 error code
The error culprit is clear: you should provide a valid -p argument. It's a mandatory parameter in case of -e operation:
==>cscript C:\Windows\System32\Printing_Admin_Scripts\en-US\prnqctl.vbs -e -p "Fax"
Success Print Test Page Printer Fax
==>

Resources