Apache OpenOffice (AOO) Bugzilla – Issue 59960
Add barbarisms correction to spellchecking
Last modified: 2014-02-24 17:52:13 UTC
REQ: Add barbarisms correction Well, sorry for long text :-( * Some info about Abiword barbarisms support, my explanation about what is a barbarism is taken from here: Mail thread: http://www.abisource.com/mailinglists/abiword-dev/02/Sep/0498.html * What is a barbarism Barbarism is a problem that manly concerns to minority languages, i.e. languages that are competing, in the same territory, with a more powerful one, called "rooflanguage", for example Welsh, Catalan, Occitan, and others. When two languages compete in the same territory comes up interferences, but they are not symmetric. The roof language is weakly affected but the minority one can be strongly affected, and can disappear (glottophagy). One of these interferences is barbarism. * Example: In Catalan: "tamany" is taked from Spanish "tamaño" and should be corrected by "mida" or "grandà ria", means "size", in English. Any spellchekcer without barbarism support doesn't suggest "mida" or "grandà ria" when tamany is checked. * Other cases of use for the same feature: Another possiblity for this feature is what I call "custom user suggestions". Example: If an user types wrongly the same word again and again, but hunspell can't suggest the correct word, then the user can add this wrongly typed word to barbarisms data file with the correctly word as suggestion. An other example: Imagine a common very very large text (as company name, or any text), then user can create a dummy wrong word (as myword01) in the barbarims data file and add the correct word as suggestion (my real very large company name). * How to implement it (idea or aproach) Using the MyThesaurus, with an special thesaurus file where entries are barbarisms and their synonyms are the correct suggestions. Adding something like this in suggest() function (suggestmgr.cxx): if ((nsug < maxSug) && (nsug > -1)) nsug = barbarims(wlst, word, nsug); And coding a barbarims function: barbarisms() must check the word in the barbarism 'thesaurus' file If word is in 'thesaurus' file then barbarisms() must add 'synonyms' of word as suggestions in wlst. barbarisms() must update nsug properly. * Known problems in this aproach: - Working at word level, not sentence level. We are just hacking a spell checker, not doing a grammar checker. So, some barbarims can't be corrected. It can't be solved. - Currently, words that can be declined have to be coded several times (plurals, verbs declinations, etc). It's reported as a enhancement of MyThesaurus in OOo (issue 19563) http://www.openoffice.org/issues/show_bug.cgi?id=19563
Created attachment 32842 [details] Diff for MySpell code
The attached file (32842) is just a working example code, not a patch. I'm just a beginner programer.
SBA: issue 64246 is another example of "one word - two spelling options" . acknowledgement <-> acknowledgment. In Germany, too, we have the problem of "The New German spelling" vs. "Many old documents PLUS a workforce of people who did NOT learn about the new spelling. Therefore there is a long list of words that are still "tolerated" in the old spelling. Please note that with an "Exception dictionary", the user can easily edit certain proposals for certain words. In my opinion this is a more convenient way in daily use than to enhance or implement a Thesaurus-like thingie.