i need to load a lot of small rdf files into a fuseki database.
i'm trying
~ tdbloader2 -l {DB} /data/rdf/*.rdf
bash: /opt/apache-jena-3.1.0/bin/tdbloader2: Argument list too long
better ways to do this?
This is an error from the shell. see for example "Argument list too long error for rm, cp, mv commands".
However, it is a good idea to parse all the files to check they are all valid before trying to bulk load because one error aborts the bulk loader. During checking, you might as well convert to N-triples which loads faster.
Related
I'm converting a Latex document with Tex4ebook (built on tex4ht) and I get the following warning: exec_epub: tidy command seems missing, you should install it in order to make a valid epub file.
But where do I install it and what does it do?
Thanks in advance.
tex4ebook uses HTML Tidy for clean-up of the generated XML files that can contain some issues resulting from the conversion process. If tex4ebook cannot find Tidy, it will use regular expressions for the clean-up. It is usually enough for production of the valid Epub file, so you don't really need to worry about the warning, as long as the generated file is valid.
I have a 22GB production.log file in my Ruby on Rails app. I want to browse/search the contents over SSH without downloading. Is this possible? Are there any tools?
22GB is a very large file so it would be risky for your server to open the whole file using any tools. I'd recommend to split the file into multiple parts and search in each part. For example, using this command to split your file into small chunks of 1GB.
split -b 1GB very_large_file small_file
Also, you should set logrotate for your server to avoid log file getting too big.
I am working on a web application using Rails which user can upload a zip file which contains its data/file/docs and etc. But I'm concerned with security right now, I want to scan the uploaded zip file and remove all kind of executable such exe, bash and etc how can I do this?
Edit: I am aware of clamav API for rails but it would only scan the file for malicious files not removing the executable, just imagine opening a wrong uploaded executable file in the server and the cost of this action server/business-wide!
First, it would be better and more robust to whitelist allowed file types, and not blacklist disallowed ones (eg. executables). So you should have a list of types you allow if that is possible in your application.
Then the question is how you determine the type of a file.
The trivial way is checking the file extension, but that's not very strong. It may still be good for a first check to avoid spending precious cpu time on further checks.
After that, you can use the filemagic database to quite reliably find the type of uploaded files. You have two options:
If your application runs on linux, you can call the file tool directly, something like filetype = `file -Ib #{filename}` to get the filetype. Note that filename in this example needs to be sanitized to avoid OS command injection!
If you want to support Windows too (or just want to avoid calling shell commands and have nicer code), you can use the ruby-filemagic gem:
require 'filemagic'
filename = 'yourfile.ext'
magic = FileMagic.new
filetype = magic.file(filename)
The problem with ruby-filemagic is that it's not maintained anymore, but it would probably still work fine to find executables.
I'm running a VM with Debian 7.0 x64 and need to troubleshoot something with a provider, so when I run a grep command, the console outputs a long report. I need to copy all of that text that has been output and place in the body of an email, or post directly on another forum board. I'm sure the solution must be simple, but I can't find it in searching online. I see suggestions for right-clicking with the mouse but my VM console doesn't response to mouse clicks, and then I see suggestions for copying and modifying files within the console, but as I said above I just need to take the raw text to paste elsewhere.
Thanks for the help!!!
the easiest way would be to save the output to a file and attach that to your email. (personally i hate emails that have inlined long error-logs without good cuase - like annotations).
this would also allow you to compress the file before attaching it, reducing the size considerably (as text compresses quite nicely).
if this is not an option, there is xclip, which reads from stdin and puts that into a selection.
$ ls | xclip
allows you to paste (with your middle-mousebutton) the contents of a dir.
if you must use Ctrl-v for pasting, you can also do:
$ ls | xclip -selection c
Is there a way (or any kind of hack) to read input data from compressed files?
My input consists of a few hundreds of files, which are produced as compressed with gzip and decompressing them is somewhat tedious.
Reading from compressed text sources is now supported in Dataflow (as of this commit). Specifically, files compressed with gzip and bzip2 can be read from by specifying the compression type:
TextIO.Read.from(myFileName).withCompressionType(TextIO.CompressionType.GZIP)
However, if the file has a .gz or .bz2 extension, you don't have do do anything: the default compression type is AUTO, which examines file extensions to determine the correct compression type for a file. This even works with globs, where the files that result from the glob may be a mix of .gz, .bz2, and uncompressed.
The slower performance with my work around was most likely because Dataflow was putting most of the files in the same split so they weren't being processed in parallel. You can try the following to speed things up.
Create a PCollection for each file by applying the Create transform multiple times (each time to a single file).
Use the Flatten transform to create a single PCollection containing all the files from PCollections representing individual files.
Apply your pipeline to this PCollection.
I also found that for files that reside in the cloud store, setting the content type and content encoding appears to "just work" without the need for a workaround.
Specifically - I run
gsutil -m setmeta -h "Content-Encoding:gzip" -h "Content-Type:text/plain" <path>
I just noticed that specifying the compression type is now available in the latest version of the SDK (v0.3.150210). I've tested it, and was able to load my GZ files directly from GCS to BQ without any problems.