How do I merge POT and PO files so that I exclude entries that are not in the POT file? - translation

In short, I am trying to find a way to create a new PO file from a new POT and an existing PO file - but I want to exclude any strings (and their translations) that are not in the POT file.
Every time we change the wording on our cakePHP site, we generate a new POT file that contains all the translatable strings in the site. But when we merge it with the existing PO file (using POEdit), the merge process only adds the POT entries to the PO file. It doesn't remove the translations we no longer need. We have over 12k unneeded translations in our PO files. This makes our translator very unhappy. She has taken to just looking at the site and sending me translations to add manually, which makes me very unhappy.
I've looked around for tools that do this destructive merge, but I haven't been successful finding one. Before I head off to write one...is there something I missed?
(Sorry if this belongs on a different exchange, I will move this post to a better exchange if anyone tells me which one).

What you describe as "destructive" merge is the standard, normal merge operation in gettext and what everybody wants — you'd have to go out of your way to accomplish non-destructive versions, and I'm not even sure how.
From this it's safe to conclude that (1) you must be doing some weird steps not described above, or (2) your POT file contains more than you think it does (e.g. because you append to it instead of replacing it), or (3) you or the tools you use misinterpret the resulting PO file.
To merge using GNU gettext command line tools:
msgmerge -U your_old_translation.po latest_strings.pot
To merge using Poedit (notice the spelling):
Open PO file with the (now outdated) translations.
Use Catalog → Update from POT file…
Choose the newly regenerated POT file.
Notice that by default, outdated translations are kept in the PO file as backup. In Poedit, you can purge them (see Catalog → Purge deleted translations). However, these obsolete entries are stored in a different way in the PO file, as specially formatted comments, and are not visible or editable in Poedit or any conforming PO editing tool.
If I were to bet, I'd say (3) is the most likely cause (in which case, use a better editor like, ahem, Poedit), or perhaps (2) (should be easy to review by searching the POT for now-unused strings).
But merging really does the right thing that you expect it to do.

Related

How to parse only user defined source files with clang tools

I am writing a clang tool, yet I am quite new to it, so i came across a problem, that I couldn't find in the docs (yet).
I am using the great Matchers API to find some nodes that I will later want to manipulate in the AST. The problem is, that the clang tool will actually parse eeeverything that belongs to the sourcefile including headers like iostream etc.
Since my manipulation will probably include some refactoring I definitely do not want to touch each and every thing the parser finds.
Right now I am dealing with this by comparing the sourceFiles of nodes that I matched against with the argumets in argv, but needless to say, that this feels wrong since it still parses through ALL the iostream code - it just ignores it whilst doing so. I just cant believe there is not a way to just tell the ClangTool something like:
"only match nodes which location's source file is something the user fed to this tool"
Thinking about it it only makes sense if its possible to individually create ASTs for each source file, but I do need them to be aware of each other or share contextual knowledge and I also haven't figured out a way to do that either.
I feel like I am missing something very obvious here.
thanks in advance :)
There are several narrowing matchers that might help: isExpansionInMainFile and isExpansionInSystemHeader. For example, one could combine the latter with unless to limit matches to AST nodes that are not in system files.
There are several examples of using these in the Code Analysis and Refactoring with Clang Tools repository. For example, see the file lib/callsite_expander.h around line 34, where unless(isExpansionInSystemHeader)) is used to exclude call expressions that are in system headers. Another example is at line 27 of lib/function_signature_expander.h, where the same is used to exclude function declarations in system headers that would otherwise match.

How can I merge several files on SPSS by variable label?

I have 48 .sav data sets containing results of a monthly survey. I need to merge the cases of all common variables from them, in order to come up with a 4 years aggregate. As I'm new to SPSS and I'm not very proficient with syntax (although i can follow it) I would normally do this using Data - Merge files - Add Cases but most of these common variables have different variable names on each data set as the questions are not always formulated in the same order and some questions only appear on one or two data sets.
However, the variable labels do not change from one data set to another. It would be great if someone knows a way to merge this data sets by variable label instead of variable name. Swapping variable names and variable labels would also do as then I could use Data - Merge files - Add Cases without problems.
Many thanks beforehand!
The merge procedures such as ADD FILES (Data > Merge Files > Add Cases) provide a capability to rename variables in the input files before merging. However, if there are a lot of variables to merge, this would get pretty tedious and error prone. Also, the dialog box supports only merging two files, while syntax allows up to 50.
Variable labels are generally not valid as variable names due to the typical presence of characters such as blanks and punctuation and length restrictions. If you have a rule that could be used to turn labels into valid variable names, that could be automated, or if the variables are always in the same order and are present in all the files, they could be renamed something like V1, V2, ...
The renaming could be done manually in syntax that you would craft for each file, or this could be done with a short Python program that you run on each file. I can write that for you if you provide details and, preferably, a sample dataset to test with (jkpeck AT gmail.com).
The Python code could loop over all the sav files in a directory and apply the renaming logic to each in one step.

Why msgmerge marked some of my translation as fuzzy?

I use msgmerge to merge my existing po file with an updated pot file, e.g.
msgmerge test-zh_TW.po test.pot > test.po
I've found that after the msgmerge, some of the fields are marked as fuzzy, why is that?
(I want to know the reason, I know I can turnoff them by -N, but why it is the default in the 1st place?)
Quoting the documentation for the manual
Fuzzy entries, even if they account for translated entries for most other purposes, usually call for revision by the translator. Those may be produced by applying the program msgmerge to update an older translated PO files according to a new PO template file, when this tool hypothesises that some new msgid has been modified only slightly out of an older one, and chooses to pair what it thinks to be the old translation for the new modified entry. The slight alteration in the original string (the msgid string) should often be reflected in the translated string, and this requires the intervention of the translator. For this reason, msgmerge might mark some entries as being fuzzy.
In short it is because the fuzzy matching algorithm in msgmerge, finds some of the new messages to be close enough to the old one, to warrant associating it with the old translation, but it marks it is a fuzzy, in order to prompt the translator to revise the translation because it is only a fuzzy or partial match.
The reason this is the default behaviour is because the implementation of msgmerge.c has the following lines.
/* Determines whether to use fuzzy matching. */
static bool use_fuzzy_matching = true;
References
Fuzzy Entries from gettext manual
Invoking the msgmerge Program
Source Code For msgmerge

Flip doxygen's graphs from top-to-bottom orientation to left-to-right

The doxygen graph for "includes" and "is included by" are created with nesting depth increasing from top to bottom (using 1.8.5).
Since we have mostly shallow graphs with many nodes, this leads to very wide graphs with ugly horizontal scroll bars. Is there a way to teach doxygen to create these graphs in a left-to-right orientation, the way it creates caller/call graphs?
I know that graphviz/dot supports this, but can't find a way to tell doxygen my preference.
There is a similar question asked recently which I am duplicate answering:
Doxygen: Is it possible to control the orientation of dependency graphs?
After looking for the same myself and finding nothing, the best I can offer is a hack using the graph attribute rankdir.
Step 1) Make sure Doxygen keeps the dot files. Put DOT_CLEANUP=NO in your confige file.
Step 2) find your dot files that Doxygen generated. Should be in the form *__incl.dot. for steps below I will refer to this file as <source>.dot
Step 3a) Assuming the dot file did not explicitly specify rankdir (usually it is TB" by default), regenerate the output with this command.
dot -Grankdir="LR" -Tpng -o<source>.png -Tcmapx -o<source>.map <source>.dot
Step 3b) If for some reason rankdir is specified in the dot file, go into the file and add the rankdir="LR" (by default they are rankdir is set to "TB").
digraph "AppMain"
{
rankdir="LR";
...
Then regenerate the output with:
dot -Tpng -o<source>.png -Tcmapx -o<source>.map <source>.dot
You need to redo this after every run of Doxygen. A batch file might be handy, especially if you want to process all files. For step 3b, batch replacing text is outside of the scope of this answer :). But here seems to be a good answer:
How can you find and replace text in a file using the Windows command-line environment?

How do I "diff" multiple files against a single base file?

I have a configuration file that I consider to be my "base" configuration. I'd like to compare up to 10 other configuration files against that single base file. I'm looking for a report where each file is compared against the base file.
I've been looking at diff and sdiff, but they don't completely offer what I am looking for.
I've considered diff'ing the base against each file individually, but my problem then become merging those into a report. Ideally, if the same line is missing in all 10 config files (when compared to the base config), I'd like that reported in an easy to visualize manner.
Notice that some rows are missing in several of the config files (when compared individually to the base). I'd like to be able to put those on the same line (as above).
Note, the screenshot above is simply a mockup, and not an actual application.
I've looked at using some Delphi controls for this and writing my own (I have Delphi 2007), but if there is a program that already does this, I'd prefer it.
The Delphi controls I've looked at are TDiff, and the TrmDiff* components included in rmcontrols.
For people that are still wondering how to do this, diffuse is the closest answer, it does N-way merge by way of displaying all files and doing three way merge among neighboors.
None of the existing diff/merge tools will do what you want. Based on your sample screenshot you're looking for an algorithm that performs alignments over multiple files and gives appropriate weights based on line similarity.
The first issue is weighting the alignment based on line similarity. Most popular alignment algorithms, including the one used by GNU diff, TDiff, and TrmDiff, do an alignment based on line hashes, and just check whether the lines match exactly or not. You can pre-process the lines to remove whitespace or change everything to lower-case, but that's it. Add, remove, or change a letter and the alignment things the entire line is different. Any alignment of different lines at that point is purely accidental.
Beyond Compare does take line similarity into account, but it really only works for 2-way comparisons. Compare It! also has some sort of similarity algorithm, but it also limited to 2-way comparisons. It can slow down the comparison dramatically, and I'm not aware of any other component or program, commercial or open source, that even tries.
The other issue is that you also want a multi-file comparison. That means either running the 2-way diff algorithm a bunch of times and stitching the results together or finding an algorithm that does multiple alignments at once.
Stitching will be difficult: your sample shows that the original file can have missing lines, so you'd need to compare every file to every other file to get the a bunch of alignments, and then you'd need to work out the best way to match those alignments up. A naive stitching algorithm is pretty easy to do, but it will get messed up by trivial matches (blank lines for example).
There are research papers that cover aligning multiple sequences at once, but they're usually focused on DNA comparisons, you'd definitely have to code it up yourself. Wikipedia covers a lot of the basics, then you'd probably need to switch to Google Scholar.
Sequence alignment
Multiple sequence alignment
Gap penalty
Try Scooter Software's Beyond Compare. It supports 3-way merge and is written in Delphi / Kylix for multi-platform support. I've used it pretty extensively (even over a VPN) and it's performed well.
for f in file1 file2 file3 file4 file5; do echo "$f\n\n">> outF; diff $f baseFile >> outF; echo "\n\n">> outF; done
Diff3 should help. If you're on Windows, you can use it from Cygwin or from diffutils.
I made my own diff tool DirDiff because I didn't want parts that match two times on screen, and differing parts above eachother for easy comparison. You could use it in directory-mode on a directory with an equal number of copies of the base file.
It doesn't render exports of diff's, but I'll list it as a feature request.
You might want to look at some Merge components as what you describe is exactly what Merge tools do between the common base, version control file and local file. Except that you want more than 2 files (+ base)...
Just my $0.02
SourceGear Diffmerge is nice (and free) for windows based file diffing.
I know this is an old thread but vimdiff does (almost) exactly what you're looking for with the added advantage of being able to edit the files right from the diff perspective.
But none of the solutions does more than 3 files still.
What I did was messier, but for the same purpose (comparing contents of multiple config files, no limit except memory and BASH variables)
While loop to read a file into an array:
loadsauce () {
index=0
while read SRCCNT[$index]
do let index=index+1
done < $SRC
}
Again for the target file
loadtarget () {
index=0
while read TRGCNT[$index]
do let index=index+1
done < $TRG
}
string comparison
brutediff () {
# Brute force string compare, probably duplicates diff
# This is very ugly but it will compare every line in SRC against every line in TRG
# Grep might to better, version included for completeness
for selement in $(seq 0 $((${#SRCCNT[#]} - 1)))
do for telement in $(seq 0 $((${#TRGCNT[#]} - 1)))
do [[ "$selement" == "$telement" ]] && echo "${selement} is in ${SRC} and ${TRG}" >> $OUTMATCH
done
done
}
and finally a loop to do it against a list of files
for sauces in $(cat $SRCLIST)
do echo "Checking ${sauces}..."
loadsauce
loadtarget
brutediff
echo -n "Done, "
done
It's still untested/buggy and incomplete (like sorting out duplicates or compiling a list for each line with common files,) but it's definitely a move in the direction OP was asking for.
I do think Perl would be better for this though.

Resources