How do I convert ICU formatted strings into an TMX (Translation memory exchange) file? - localization

I am attempting to aggregate multiple data sources and locales into a single TMX translation memory file.
I cannot seem to find any good documentation/existing tools on how converting into TMX format might be achieved. These converters are the closest thing I have found but they do not appear to be sufficient for formatting ICU syntax.
Right now I have extracted my strings into JSON format which would look something like this:
{
foo_id: {
en: "This is a test",
fr: "Some translation"
},
bar_id: {
en: "{count, plural, one{This is a singular} other{This is a test for count #}}",
fr: "{count, plural, one{Some translation} other{Some translation for count #}}"
}
}
Based on how many translation vendors allow ICU formatting when submitting content and then exporting their TM as .tmx files it feels like this must be a solved problem but information seems scarce, does anyone have experience with this? I am using formatjs to write the ICU strings.

Since TMX only really supports plain segments with simple placeholders (not plural forms) it's not easy to convert from ICU to TMX.
Support for ICU seems pretty patchy in translation tools but there is another format which does a similar job and has better support: .po gettext. Going via .po to get to TMX might work:
Use this tool ICU2po to convert from ICU to .po format
Import the .po file into a TMS e.g. Phrase or a CAT tool e.g. Trados
Run human/machine translation process
Export a TMX

Related

how can I parse json-ld to markdown

Is there an existing parser to parse json-ld to markdown? I want to generate documentation from my jsonld file. If such a thing doesn't exist, how should I go ahead writing one? or perhaps I could use a json to markdown converter? Any suggestions on how could do this?
I was just googling for such a program, and found your question.
The closest things I could find are: ocxmd, which is an extension to Markdown; and md-ld, which does not even use proper Markdown - instead, it apparently creates an incompatible version of the format which can be parsed to JSON-LD.
If I were writing such a converter in Python, I would use:
pyld to parse JSON-LD files and expand them using the #context;
And a template engine, likely Jinja2, to generate Markdown representation of every node of the JSON-LD document.
The program would be based on recursion. You might have separate functions to display:
URIs,
Numbers,
Images,
...
The program will recurse over the JSON-LD document and convert each of its sections into Markdown format.

how do I convert DAQ-derived mxd file format to csv?

Background:
I was given a pile of yokagawa "mxd" files without documentation or
description, and told "convert it".
I have looked for documentation and found none. The OEM doesn't seem to "do" reproducibility in the sense of a "code book". (link)
I have looked for online code for converters and found none.
National Instruments has a connector, but only if I use latest/greatest
LabVIEW (link). I don't have that version.
The only compatible suffix is from ArcGIS, but why would DAQ use a format like that.
Questions:
Is there a straightforward way to convert "mxd" to "csv"?
How do I find the relationship using the binary data? Eyeballing HEX seems slow/inefficient.
Is there any relationship between DAQ mxd and ArcGIS mxd?
Yokogawa supplies a progam called MX100 Standard Software: https://y-link.yokogawa.com/YL008/?Download_id=DL00002238&Language_id=EN, this program can read the *.mxd files and also export them to ascii or excel. See the well hidden manual: http://web-material3.yokogawa.com/IMMX180-01E_040.pdf, page 105 has chapter 3.7: converting data formats.

How to scan words, lines and their properties on the text?

I'need to scan a document. It's not OCR, let me show you:
--Example--
Table of Contents
Some Italic Words
Sentence 23
--End--
Suppose that as a ".doc" formatted text. I need to scan it line by line and understand the first line is bold, second is italic and third one includes space after first word and followed by a number. Reason i want to recognize them is i need to categorize them in a table view like bold lines italics, numbereds etc.
I'm okay in both swift and objective-c but totally clueless about document scanning. If you offer any reference, framework or approach i would be grateful to hear.
variant: your doc is really a docx. (docx is xml) Parse the XML. The format defines XML tags it uses to mark stuff bold or italic or whatever -- a docx is kind of like html.
variant: If your doc is really a doc! then we are not talking about xml but a binary format. It is also document and you can go parse it but I don't think will be easy
BUT
There is a library I know: doc2text that can parse a lot of stuff. (http://www.textlib.com/doc2text.html)
We used in past projects and it did an okay job and using this saves you A LOT of effort writing your own parsers

Combining keys and full text when working with gettext and .po files

I am looking at gettext and .po files for creating a multilingual application. My understanding is that in the .po file msgid is the source and msgstr is the translation. Accordingly I see 2 ways of defining msgid:
Using full text (e.g. "My name is %s.\n") with the following advantages:
when calling gettext you can clearly see what is about to be
translated
it's easier to translate .po files because they
contain the actual content to be translated
Using a key (e.g. my-name %s) with the following advantages:
when the source text is long (e.g. paragraph about company), gettext calls are more concise which makes your views cleaner
easier to maintain several .po files and views, because the key is less likely to change (e.g. key of company-description far less likely to change than the actual company description)
Hence my question:
Is there a way of working with gettext and .po files that allows combining the advantages of both methods, that is:
-usage of a keys for gettext calls
-ability for the translator to see the full text that needs to be translated?
gettext was designed to translate English text to other languages, and this is the way you should use it. Do not use it with keys. If you want keys, use some other technique such as an associative array.
I have managed two large open-source projects (50 languages, 5000 translations), one using the key approach and one using the gettext approach - and I would never use the key approach again.
The cons include propagating changes in English text to the other langauges. If you change
msg_no_food = "We had no food left, so we had to eat the cats"
to
msg_no_food = "We had no food left, so we had to eat the cat's"
The new text has a completely different meaning - so how do you ensure that other translations are invalidated and updated?
You mentioned having long text that makes your scripts hard to read. The solution to this might be to put these in a separate script. For example, put this in the main code
print help_message('help_no_food')
and have a script that just provides help messages:
switch ($help_msg) {
...
case 'help_no_food': return gettext("We had no food left, so we had to eat the cat's");
...
}
Another problem for gettext is when you have a full page to translate. Perhaps a brochure page on a website that contains lots of embedded images. If you allow lots of space for languages with long text (e.g. German), you will have lots of whitespace on languages with short text (e.g. Chinese). As a result, you might have different images/layout for each language.
Since these tend to be few in number, it is often easier to implement these outside gettext completely. e.g.
brochure-view.en.php
brochure-view.de.php
brochure-view.zh.php
I just answered a similar (much older) question here.
Short version:
The PO file format is very simple, so it is possible to generate PO/MO files from another workflow that allows the flexibility you're asking for. (your devs want identifiers, your translators want words)
You could roll this solution yourself, or use a cloud-based app like Loco to manage your translations and export a Gettext file with identifiers when your devs need them.

Need help to gather data from a txt file and insert in a web page?

could someone advise me on the most efficient way to gather data from one source, select a specific piece of data and insert it in a web page? Specifically, I wish to:
Call up this buoy data text file: http://www.ndbc.noaa.gov/data/realtime2/46237.txt
Find the water temperature and insert that value in my web page.
First big question: What scripting language should I use? (I'm assuming Fortran is not an option :-)
Second not so big question: This same data set is available in graphic and xml format. Would either of these data formats be more useful than the .txt file?
Thanks in advance.
Use Perl.
(Hey, you asked. Normally one programs in whatever language one would normally use.)
The XML format won't be much more useful than the text format.
This text file format is just about as simple as it could ever get. Just about any scripting or general purpose programming language will work. The critical part is to split each line on a regex "\s+". i.e. in Python it would be:
import re
theFileObject = open('/path/to/downloaded/file.txt')
for line in theFileObject.readlines():
columns = re.split(r'\s+', line)
# each column is columns[0] through columns[19]
So basically choose whatever programming language seems easiest to you. Any .NET language would be equally capable, as well as ruby, python, scheme, etc. I personally have a distaste for perl because I find it very difficult to read

Resources