gettext upper and lower cases - localization

In gettext, is there a way to avoid duplicating upper and lower case translations? right now my translations looks like this, which works but it is duplicating.
msgid "Customer Service"
msgstr "Kontakta oss"
msgid "CUSTOMER SERVICE"
msgstr "KONTAKTA OSS"

The two strings are different, there’s nothing “duplicated” here.
If you don’t want such translation entries, change your code to not contain both literals and do case changes programatically.

Related

How to automatically add curly braces to all entries on JabRef?

When I download a .bib item from IEEEExplore, the paper title gets formatted in lowercase even if original title is not capitalized - in order to keep original formatting, I have to manually add curly brackets around the title. How do I set up JabRef to add curly braces to all the library entries so to keep the title capitalized?
I believe this is a known JabRef missing feature, see here:
https://discourse.jabref.org/t/add-around-capital-letters-automatically/222
To cleanup existing entries by adding braces around capitals, one could hope to use Quality > Cleanup Entries and then some field formatters. However, the current field formatters contain "Unprotect terms" and not the opposite.
I there there is a reason for JabRef not to automatically protect all capitals it encounters, which is that it might be desirable not to force capitalization everywhere, in order to respect conventions from specific citation styles. Protection then should be used mostly for "protected terms" like abbreviations, etc.
Jabref does provide this feature: Options > Preferences > Protected terms files. You can add new files if needed.
Of course, if the IEEE entry had curly braces and JabRef removes them, then I think that should count as a bug.

How to apply a style to Object Text with an OLE inside

I wanted to apply a style, strikethrough in this case, to the object's text, but I have encountered some problems for when the attribute contains an OLE Object.
Doing obj."Object Text" = richText obj."Object Text""" does not work as it takes away the OLE.
obj."Object Text" = richTextWithOle "{\\strike " o."Object Text" "}" does not work because richTextWithOle does not accept a string as parameter, only attributes.
obj."Object Text" = richText "{\\strike " o."Object Text" "}" stops Doors from responding, probably with no recover, as I waited like 5 minutes, from a small module with a single OLE, before force closing the instance.
Is this actually possible? If so, is there a way to achieve this?
Thank you for your answers.
First of all, I would take a small step back and have a look at what you may try to achieve in the end. If you want to mark requirements according to their state, I would suggest to use a separate attribute, which shows the actual validity of an object (e.g. "valid", "invalid", "tbd" or something along those lines) and not strike through the Object Text.
The other issue, that I would have is the "coarseness" of your requirements, which may result in a better separation between a textual requirement and a picture requirement or diagram, that illustrates something in more detail. Maybe the first step would be to clean up the requirements, i.e. to have them separated in a certain way, which also would make your strike-through issue more manageable.
Let me know, if that helped you. If not, maybe you actually have to build a DXL Script, which for example parses the content of an Object Text and goes through the rich text tags and handles them accordingly.

Add forbidden words to TexStudio / Latex

I have some words in my language (German) that seem to be valid according to TexStudios spellchecker.
However they must not be used for my thesis (and globally for me at least).
Is it possible to add words to a list, that trigger a (optimally huge) sign "DO NOT USE THIS!" or even prevent compilation in Latex when such words are used?
I'm looking for something like a negative dictionary.
I've seen files like "badwords" or "stopwords" but don't know when/how they are used. I can freely use them although "check for bad words" is on.
In case anyone else has the problem: Badword files are named after the main language. For me it happened that I have "de_DE_frami" as the dictionary set. Hence it did not use the "de_DE.badwords".
For a good highlighting: One can change the appearance in the options dialog (syntaxhighlighting->badwords) and make it e.g. background red, size 200%
I'd still would like to have a "bad" words and a "impossible" words distinction as you can sometimes not avoid "bad" words or they are not bad in all contexts.

Regexp for a name

I need to make sure people enter their first, middle and last names correctly for a form in Rails. So the first thought for a regular expression is:
\A[[:upper:]][[:alpha:]'-]+( [[:upper:]][[:alpha:]'-]*)*\z
That'll make sure every word in the name starts with an uppercase letter followed by a letter or hyphen or apostrophe.
My first question I guess doesn't have much to do with regular expressions, though I'm hoping there's a regular expression I can copy for this. Are letters, hyphens and apostrophes the only characters I should be checking in a name?
My second question is if it's important to make sure each name has at least 1 uppercase letter? So many people enter all lowercase names and I really want to avoid that, but is it sometimes legitimate?
Here's what I have so far that makes sure there's at least 1 uppercase letter somewhere in the name:
\A([[:alpha:]'-]+ )*[[:alpha:]'-]*[[:upper:]][[:alpha:]'-]*( [[:alpha:]'-]+)*\z
Isn't there a [:name:] bracket expression? :)
UPDATE: I added . and , to the characters allowed, surprised I didn't think of them originally. So many people must have to deal with this kind of regular expression! Nobody has any pre-made regular expressions for this sort of thing?
A good start would be to allow letters, marks, punctiation and whitespace. To allow for a given name like "María-Jose" and a last name like "van Rossum" (note the whitespace).
So that boils down to something like:
[\p{Letter}\p{Mark}\p{Punctuation}\p{Separator}]+
If you want to restrict that a bit you could have a look at classes like \p{Lowercase_Letter}, \p{Uppercase_Letter}, \p{Titlecase_Letter}, but there may be scripts that don't have casing. \p{Space_Separator} and \p{Dash_Punctuation} can narrow it down to names that I know. But names I don't...I don't know...
But before you start constructing your regex for "validating" a name. Please read this excellent piece on names by W3C. It will shake even your concepts of first, middle and last names.
For example:
In some cultures you are given a name (Björk, Osama) and an indication of who your father (or mother) was (Guðmundsdóttir, bin Mohammed). So the "first name" could be "Björk" but:
Björk wouldn’t normally expect to be called Ms. Guðmundsdóttir. Telephone directories in Iceland are sorted by given name.
But in other cultures, the first name is not given, but a family name. In "Zhāng Mànyù", "Zhāng" is the family name. And how to address her, would depend how well you know her, but again "Ms. Zhāng" would be strange.
The list of examples goes on and ends in a some 30+ links to Wikipedia for more examples.
The article does end with suggestions for field design and some pointers on what characters to allow:
Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. Don't require names to be entered all in upper case – this can be difficult on a mobile device. Allow the user to enter a name with spaces , eg. to support prefixes and suffixes such as de in French, von in German, and Jnr/Jr in American names, and also because some people consider a space-separated sequence of characters to be a single name, eg. Rose Marie.
To answer your question about capital letters: in many areas of the world, names do not necessarily start with a capital letter. In Dutch for instance, you have surnames like "van der Vliet" where words like "van", "de", "den" and "der" are not capitalised. Additionally, you have special cases like "De fauw" and "Van pellicom" where an administrative error never got rectified, and the correct capitalisation is fairly illogical. Please do not make the mistake of rejecting such names.
I also know about town names in South Africa such as eThekwini, where the capital letter is not necessarily the first letter of the word. This could very well appear in surnames or given names as well.

Tool in the gettext suite to unify source strings with fuzzy match?

Is there any way to leverage the tools in the gettext suite to do something like fuzzy match the source strings within one PO file to find strings which are almost identical? This would seem like a useful quality check to improve the sources. Example:
#: my_file
msgid "Sorry, something went wrong"
msgstr ""
#: some_other_file
msgid "Sorry, something went wrong."
msgstr ""
#: yet_another_file
msgid "Sorry, something is wrong"
msgstr ""
These strings are virtually identical and the source code could possibly be changed to use the same message in each instance. This would reduce the l10n work and make the UI more coherent. It would seem to me that the fuzzy match algorithm in msgmerge should already be pretty well suited to identify these instances. Yet I could not find an obvious way to do this.
You don't want to do any kind of folding without human supervision.
Most translation tools have that feature, but a human should validate such folding.
You can't even do it for perfectly identical strings because of the context.
Why:
buttons ("commands") often get translated differently than labels and titles ("descriptions)")
Example: "Print" is translated to French as "Imprimer" (buttons) or "Impression" (titles)
gender, number, case, will change the translation.
Example: translating a "New" button into Spanish can give you "Nuovo" (masculine, singular), "Nuevos" (masculine, plural), "Nueva" (feminine, singular) "Nuevas" (feminine, plural)
the same word can be translated differently if it has a different meaning.
Example: "Scan" will have different translations if it is about scanning the disk (for a virus) or scanning a piece of paper.
So, you don't want to "magically merge strings" to save a few cents, if the price is lower translations quality.

Resources