Semi-Automating the DDTP translations: step by step process (contact: #ubuntu-fr-l10n or #ubuntu-translators on freenode).
Rationale
The DDTP represent around 50 000 strings to translate * 140 languages= 7 million strings to translate !
On good weeks, a typical yet active translation team translates 500 strings. It would take a hundred weeks with highly motivated volunteers of a large translation team, working non-stop, at their best to fully translate a language.
The solution
A Word of Caution before beginning
Machine translations don't replace (yet) human translations. It needs to be validated and corrected in many cases.
Depending on the target language, the quality of the translations will fluctuate. (e.g. French, German, Spanish…) should work well. Smaller languages might have more issues.
Another factor is the complexity of the strings. You'll have really good translations for simple strings and poor translations for poorly written or complex descriptions.
Finally, DO take time to do a mass review of the translations after the initial import. Given the size of the DDTP, a simple find and replace can save dozens of hours of translation.
Make sure your review team is closed and has strong standards to avoid validation of poorly translated strings by too enthusiastic members
Process
1/ Preparing the DDTP for Machine Translation
Ask for a Launchpad translation export of the DDTP for main, universe, restricted and multiverse (.po files for your language and .pot templates)
-
-
-
-
Gather the .po and .pot files in a folder on your drive
We had to split the DDTP universe po in 14 files and main in 5 files to make it match Google Translator Kit's size requirements. You'll have around 14 files in UNIVERSE, and 5 in MAIN but 1 in MULTIVERSE (use MULTIVERSE for tests)
Use the Python script below created by Redmar to split the po files in files below 900k
-
Gather the translated po files on a shared online folder and split the work between team members.
2/ Perform the Machine Translation
<REPEAT FOR EACH CHUNK OF THE FILE>
-
You might see errors:
the file might be too big (remove a few strings)
After a couple of seconds, Google Translator Kit displays the file along with automatic translations. You could theoretically review them here, but that's not the point.
You should fix some mistakes directly in Google Translator Kit thanks to its powerful search and replace functionnality that can be limited to translations only
The \n (line break) become \ n which messes everything
The \" become \ " (with a space) which messes everything
Check the po file validates in Poedit (do more fixes directly in Google Translator Kit if there are many mistakes)
</REPEAT FOR EACH CHUNK OF THE FILE>
3/ Bring back the suggestions into your drive
Download all the translated po back to your drive or your shared online folder in a specific folder (named translated to avoid messing up with original files)
4/ Do mass-Replace of obvious mistakes in each file to be able to import them back into Launchpad and also save a lot time when correcting the strings
DO NOT CORRECT ORIGINAL ENGLISH STRINGS: if you do so, Launchpad will consider those new strings (as now they differ) and thus discard the string AND your translation.
Use Gedit syntax coloring to quickly spot mistakes
Use RegEx to correct as many mistakes as possible before initial import into the Bogus Project. The Time Savings are HUGE.
Make sure that the PO file validates (using PoEdit)
It seems that Google Translate adds spaces after \ .
The \n (line break) become \ n which messes everything
The \" become \ " (with a space) which messes everything
The syntaxic coloration in gEdit highlights the mistakes well.
It might create bugs during the launchpad import as "" is a string delimiter.
Strings that you should also be careful for (Global)
Any word in English in the translated part
The beginning of your string should be capitalized (Paquet, Interface…)
Consistency in similar strings
Punctuation and Typography rules (spaces..)
-
Strings that you should be careful for (French specific)
-
5/ Import the PO files back in our bogus project in Launchpad
We can't import them as fuzzy in the DDTP, because Launchpad doesn't handle fuzzy (
https://bugs.launchpad.net/launchpad/+bug/493084) . We don't want to import the automatic translations as "Translated" because there still are many mistakes (ans also because it's fed back to upstream). We thus need to use the "bogus project" solution to show our machine translations as suggestions in the real DDTP. As a result, they should appear as suggestions for the real DDTP so that they can be reviewed by humans.
We have thus created a cross-language project. You shouldn't create your own. Get in touch on the ubuntu-translators ML to import your language.
-
Import the DDTP pot files (templates)
-
-
Import the various automatically translated chunks (next chapter)
Wait for the files to be accepted and imported
6/ Review Process
Go to the DDTP, the imported translations in DDTP automation bogus project should appear as suggestions
Ask for forum members to do a first review for obvious mistakes and make suggestions
Then, team members can make the actual review and validate the translation
Additional Advice
Focus on validating high-priority and high-impact packages first
Using Nightmonkey, focus on the most visible packages. Just replace the langage code on the links below
-
-
-
Focus on mass validations & low-hanging fruits first
For validations, you can use the very buggy search feature on Launchpad. It timeouts 8 times out of 10, but is valuable when it doesn't (Very specific searches reduce the risk of timing out to 50%)
French Specific (you can take inspiration)
Done by Sylvie : "this package provides a module for"
-
Done by Sylvie : "Translation data for all supported packages for:"
-
Done by Sylvie : "Translation data updates for all supported packages for:"
-
Done by Sylvie : "Translation data updates for all supported KDE packages for:"
-
Done by Sylvie : "Translation data for all supported KDE packages for: "
-
Done by Sylvie : KDE translation updates for language
-
Done by Sylvie : "Translation data updates for all supported GNOME packages for:"
-
Done by Sylvie : "Translation data for all supported GNOME packages for: "
-
Done by Sylvie : "KDE translations for language "
-
-
Bring back the validated translations in the Bogus Project
To simplify mass corrections, regularly export the official DDTP in their normal form and reimport it in the Bogus project.
The manually reviewed suggestions will replace the automatic suggestions, reducing time to mass correct mistake and removing the previous automated suggestions.
Recycling fuzzy translations from older releases (contact Redmar for more details)
When you merge the translations of an older version of Ubuntu into the current version, there will be a lot of « fuzzy » translations for strings that are similar (for example, meta packages for different programs, debugging symbols etc).
msgmerge quantal_ddtp.po raring_ddtp.po -o merged_ddtp.po, for example
Those fuzzy strings often only need a few small changes (eg. program name) to be accepted, which can really speed up translations. And you don't have to worry about google putting in a weird translation, since it is all based on earlier translations done by a human.
You have to unfuzzy them using Find & Replace in a text editor. Remove the line with #, fuzzy. Save. Upload.
#. type: newglossaryentry{#2}
#: frontmatter/glossary-entries.tex :15
#, fuzzy
msgid "bla bla english"
msgstr "bla bla translated"
Status, Results, Todo, Bugs & Questions
Results
Todo & Bugs
Create a DDTP automation LP team and invite coordinator for each language
Questions
Your question here
Status for French
Main
6000 suggestions to generate, splitted by Pierre, 5 parts, translated and merged
Restricted
Translated manually, not applicable
Multiverse
300 suggestions to generate, likely technical & difficult, Splitted by Pierre & Merged
Universe
14 parts, translated and merged