Generating bibliographic files (BibTeX, RIS, etc.) from database records - mapping

Are there methods or tools to facilitate generating bibliographic data files (for BibTeX, EndNote, RefMan, etc.) from database records to show to visitors of a website so they can easily import the citations?

A powerful tool to convert between various bibliographic formats is bibutils. EndNote and RefMan should both readily accept the RIS format.

On a Linux system you can execute
cat myrisfile.ris | ris2xml | xml2bib > mybibfile.bib
to convert myrisfile.ris to mybibfile.bib in a BibTeX format if the package bibutils is installed.

Related

Count Lines, grep, head, and tail inside Feather Files

Setup: I am contemplating switching from writing large (~20GB) data files with csv to feather format, since I have plenty of storage space and the extra speed is more important. One thing I like about csv files is that at the command line, I can do a quick
wc -l filename
to get a row count, even for large data files. Also, I can quickly search for a simple string with
grep search_string filename
The head and tail commands are also very useful at times. These are straight-forward and work well with csv files, but not with feather. If I try any of them on a feather file, I do not get results that make sense or are helpful.
While I certainly can read a feather file into, say, Python or R, and analyze it then, the hassle of writing out the path and importing the necessary libraries is something I'd rather dispense with.
My Question: Does there exist either a cross-platform (at least Mac and Linux) feather file reader I can use to quickly read in and view feather data (this would be in tabular format) with features corresponding to row count, grep, head, and tail? Or are there simple CLI utilities I could install that would enable me to do the equivalent of line count, grep, head, and tail?
I've seen this question, but it is very incomplete relative to my question.
Using feather files you must use Python or R programs.
To use csv you can use any of the common text manipulation utilities available to Linxu/Unix users.
Linux text manipulation tools
reader less
search grep
converters awk sed
extractor split
editor vim
Each of the above tools requires some learning and practice.
Suggestion
If you have programming skill, create a program to manipulate your feather file.

How do I convert ICU formatted strings into an TMX (Translation memory exchange) file?

I am attempting to aggregate multiple data sources and locales into a single TMX translation memory file.
I cannot seem to find any good documentation/existing tools on how converting into TMX format might be achieved. These converters are the closest thing I have found but they do not appear to be sufficient for formatting ICU syntax.
Right now I have extracted my strings into JSON format which would look something like this:
{
foo_id: {
en: "This is a test",
fr: "Some translation"
},
bar_id: {
en: "{count, plural, one{This is a singular} other{This is a test for count #}}",
fr: "{count, plural, one{Some translation} other{Some translation for count #}}"
}
}
Based on how many translation vendors allow ICU formatting when submitting content and then exporting their TM as .tmx files it feels like this must be a solved problem but information seems scarce, does anyone have experience with this? I am using formatjs to write the ICU strings.
Since TMX only really supports plain segments with simple placeholders (not plural forms) it's not easy to convert from ICU to TMX.
Support for ICU seems pretty patchy in translation tools but there is another format which does a similar job and has better support: .po gettext. Going via .po to get to TMX might work:
Use this tool ICU2po to convert from ICU to .po format
Import the .po file into a TMS e.g. Phrase or a CAT tool e.g. Trados
Run human/machine translation process
Export a TMX

ArangoDB - how to import neo4j database export into ArangoDB

Are there any utilities to import database from Neo4j into ArangoDB? arangoimp utility expects the data to be in certain format for edges and vertices than what is exported by Neo4j.
Thanks!
Note: This is not an answer per se, but a comment wouldn't allow me to structure the information I gathered in a readable way.
Resources online seem to be scarce w/r to the transition from neo4j to arangodb.
One possible way is to combine APOC (https://github.com/neo4j-contrib/neo4j-apoc-procedures) and neo4j-shell-tools (https://github.com/jexp/neo4j-shell-tools)
Use apoc to create a cypher export file for the database (see https://neo4j.com/developer/kb/export-sub-graph-to-cypher-and-import/)
Use the neo4j-shell-tool cypher import with the -o switch -- this should generate csv-files
Analyse the csv-files,
massage them with csvtool OR
create json-data with one of the numerous csv2json converters available (npm, ...) and massage these files with jq
Feed the files to arangoimp, repeat 3 if necessary
There is also a graphml to json converter (https://github.com/uskudnik/GraphGL/blob/master/examples/graphml-to-json.py) available, so that you could use the afforementioned neo4j-shell-tools to export to graphml, convert this representation to json and massage these files to the necessary format.
I'm sorry that I can't be of more help, but maybe these thoughts get you started.

how do I convert DAQ-derived mxd file format to csv?

Background:
I was given a pile of yokagawa "mxd" files without documentation or
description, and told "convert it".
I have looked for documentation and found none. The OEM doesn't seem to "do" reproducibility in the sense of a "code book". (link)
I have looked for online code for converters and found none.
National Instruments has a connector, but only if I use latest/greatest
LabVIEW (link). I don't have that version.
The only compatible suffix is from ArcGIS, but why would DAQ use a format like that.
Questions:
Is there a straightforward way to convert "mxd" to "csv"?
How do I find the relationship using the binary data? Eyeballing HEX seems slow/inefficient.
Is there any relationship between DAQ mxd and ArcGIS mxd?
Yokogawa supplies a progam called MX100 Standard Software: https://y-link.yokogawa.com/YL008/?Download_id=DL00002238&Language_id=EN, this program can read the *.mxd files and also export them to ascii or excel. See the well hidden manual: http://web-material3.yokogawa.com/IMMX180-01E_040.pdf, page 105 has chapter 3.7: converting data formats.

Where can I find get a dump of raw text on the web?

I am looking to do some text analysis in a program I am writing. I am looking for alternate sources of text in its raw form similar to what is provided in the Wikipedia dumps (download.wikimedia.com).
I'd rather not have to go through the trouble of crawling websites, trying to parse the html , extracting text etc..
What sort of text are you looking for?
There are many free e-books (fiction and non-fiction) in .txt format available at Project Gutenberg.
They also have large DVD images full of books available for download.
NLTK provides a simple Python API to access many text corpora, including Gutenberg, Reuters, Shakespeare, and others.
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
the gutenberg project has huge amounts of ebooks in various formats (including plain text)

Resources