A Django site.
March 1, 2011
» Scamraeg

According to information from Symantec the Welsh language has reached a new level in its online use – it is now being used for scam emails. Is this progress I wonder?

May 20, 2010
» A less bilingual Welsh Assembly

I see from a story on the BBC that English language speeches in the Welsh Assembly will no longer be translated into Welsh in the official record of proceedings. While this has raised eyebrows/objections/hackles in various quarters for a variety of different reasons, it is interesting that part of the argument in favour of this refers to “…proposals to make the records of our debates and proceedings more user-friendly by imaginative use of modern technology.” Given the importance of parallel bilingual texts for technology such as Google Translate, I can’t help wondering whether this will in fact inhibit the use (imaginative or otherwise) of Welsh in future modern technology and whether the strategic goal to “increase participation in the democratic process here in Wales”, will ultimately result in cementing English as the language of Welsh politics.

While this may be an easy way to save £250,000 in austere times, we should perhaps be mindful of unintended consequences, both directly for the language and indirectly in terms of the erosion of the Assembly’s claim to be “an exemplar organisation in its delivery of bilingual services.” Exemplar organisations are needed to develop and demonstrate innovative, effective, affordable, responsive bilingual organisational practices – if the Welsh Assembly can’t fulfil this role, who else will?

March 22, 2010
» BookCrossing and Your Language Depend on YOU!

The latest issue of BookCrossing News includes a call for volunteers to form teams to translate the forthcoming BookCrossing 2.0 interface. The call doesn’t mention any specific languages, and they are only looking for teams of up to three people – so will we see BookCrossing in Welsh, we’ll have to wait and see.

Read on if you are interested in the call…

“BookCrossing and Your Language Depend on YOU!

Launch of the new BookCrossing 2.0 website approaches. It promises many new and exciting features, but perhaps the most eagerly anticipated change will be the ability to view BookCrossing in languages other than English. Call it localization, internationalization or even I18n, it all comes down to you being able to read www.bookcrossing.com in your language of choice. To make this happen, we need your help. Some professional translators will be used, but it often takes a member of the community to understand BookCrossing jargon and how the site operates. That’s where you come in.

If you love BookCrossing, have some time to translate English text into your language, and want to help get wording and meaning just right, please consider lending your language skills to the site. Email laura@bookcrossing.com with the following:

• Name • Screen name • Primary Language • Credentials • Time Commitment • Willingness to work with deadlines (quick turnaround time within 4-5 days of assignment)

Language-based teams of up to 3 people will be formed before the end of March, so send in your volunteer information as soon as possible! Please believe that this is a very worthy project. Once it’s completed, you’ll take well-deserved pride in having been part of its creation, and the international membership will be indebted to all of you. So please sign up — BookCrossing and your language need you!”

September 8, 2009
» Changing rooms at Debenhams

As the words were leaving my mouth, I knew the question I was asking was stupid, “Are the changing rooms here?” The look on the faces of the two ladies on the till only served to confirm the fact. I was clearly and obviously standing at the checkout.

What was I thinking, or rather, what language was I thinking in…

...I really was confused, but it really wasn’t my fault (this time).

I had clearly followed the sign that said “Ystafell wisgo” – unfortunately this really wasn’t a good translation of the English “This way to pay”.

The ladies on the till seemed mildly amused and faintly embarrassed when this was pointed out to them.

A different sign revealed that it wasn’t a good translation for “Cosmetics” either (no, I was still looking for the changing rooms actually).

Why would a company spend money on translating their signs but then have no quality control procedures in place? Surely you wouldn’t need to be a Welsh speaker to think it unlikely that the Welsh for “This way to pay” and “Cosmetics” was the same!

September 2, 2009
» Google gyfieithu

Whilst idly considering the issues of bilingual blogging I came across an announcement about Google Translate on the murmur blog.

Google Translate now includes Welsh among its languages, but by Google’s own admission the quality of translation is “still a little rough”…

...but how rough? On their Research Blog Google suggest that it is “often good enough to give a basic understanding of the text”, so I decided to put it to the test with some text borrowed from murmur.

Original Welsh:

Dulliau ystadegol o gyfieithu peirianyddol sy’n galluogi’r cyfieithiadau yma – hynny yw, mae Google yn defnyddio’r swmp anferthol o destunau sydd ar gael ar y we i ganfod patrymau tebyg yn eu geiriau a thrwy ddefnyddio testunau cyfochrog mewn gwahanol ieithoedd maent yn chwydu allan yr hyn sy’n cyfateb agosaf mewn iaith arall.

Google translation:

Statistical machine translation methods which enable the translation here – that is, Google will use huge quantities of texts available on the web to find similar patterns in their words and by using parallel texts in different languages they vomiting out what is the nearest equivalent in another language.

Original English:

These translations are produced by statistical approaches to machine translation – that is, Google uses the vast amounts of text on the web in order to find patterns between the words, and by using parallel texts of different languages are able to produce what appear to be translations.

Google translation:

Mae’r rhain cyfieithiadau yn cael eu cynhyrchu gan ddulliau ystadegol i peiriant cyfieithu – hynny yw, mae Google yn defnyddio symiau helaeth o’r testun ar y we er mwyn canfod patrymau rhwng y geiriau, a thrwy ddefnyddio ochr yn ochr testunau o ieithoedd gwahanol yn cael eu gallu i gynhyrchu hyn yn ymddangos yn cyfieithiadau .

Well the Welsh to English is a little colourful perhaps, but pretty good I think. English to Welsh looks like something I might have written, but again I think the gist is there (more fluent Welsh speakers may disagree?). Also it will apparently improve over time.

One of the interesting comments that Google make is that “We’ve found that one of the most important factors in adding new languages to our system is the ability to find large amounts of translated documents from which our system automatically learns how to translate. As a result, the set of languages that we’ve been able to develop is more closely tied to the size of the web presence of a language and less to the number of speakers of the language.” Whilst we might be critical of the number of obscure forms and dull documents which are produced bilingually on the web under the mantle of Welsh Language Schemes, perhaps we are now seeing some unanticipated benefits for the language – or maybe people clever than me had planned this all along?

July 7, 2009
» History mis-repeating itself

Yet another funny (perhaps, though the joke is wearing a little thin now) story about a mistranslated sign Walkers’ sign lost in translation – you know the sort of thing, English says “look left”, Welsh says “look right”... hang on, that sounds awfully familiar…

...yes on the 16th January 2006 the BBC carried a very similar story under the headline Pedestrian Sign’s Forked Tongue. Rather intriguingly the left and right were in the opposite languages in 2006, so with a little bit of careful peeling they could perhaps create two correctly translated signs. It does make you wonder what sort of quality control procedures are employed by the companies creating these signs – sigh!

May 13, 2009
» Automatic translation from Welsh gets a boost from France!

Having previously suggested that Google might be doing some interesting things with regards to minority languages, I was delighted to receive the following press release about the Apertium Welsh-English translator.

I know that they have had a few set-backs in the past, and they seem like nice guys, so it is great to see them getting a bit of a boost.

I think Fran’s comments about not getting any Welsh students applying for the post are interesting – many people have commented that Wales should be well-placed to be a leader in bilingual software design, localisation, translation technology and so on, and we have some great people doing some excellent work around the country – maybe we need to think about how this might be more directly fed into the computing and other curricula in universities to really build a knowledge/skill base and develop an industry.

Press Release 12 Mai 2009

Automatic translation from Welsh gets a boost from France!

High-quality Welsh-English machine translation will come a step closer when a new initiative gets underway this month.

The multinational Apertium team, which released their Welsh-English translator (http://www.cymraeg.org.uk [1]) in August 2008, has been accepted into the fifth Google Summer of CodeTM [2], and one of the projects to be funded will be an improvement to that translator.

Apertium (http://www.apertium.org) is a Free Software [3] machine translation platform. It was first developed to handle translation between related languages in Spain, but over the last few years it has been extended to deal with other languages. To date, translators for 17 language pairs have been released, covering languages spoken by 1.1bn people, from English (est. 500m speakers) to Aranese (est. 4,000 speakers). A similar number of other language pairs are in development – these include Indian languages like Hindi and Bengali, and Scandinavian languages like Norwegian and Sami.

Google Summer of Code offers student developers stipends to write code for open-source projects, advised by mentors already working on the projects, and has helped create millions of lines of code for dozens of projects. This was the first year that Apertium applied for the program, and 9 Apertium projects are being supported.

The Apertium Welsh-English translator works by applying grammatical rules to a Welsh sentence to turn it into an English sentence. An alternative approach (adopted by software like Moses [4]) is to use a large body of text to work out what the likely translation of a given phrase is.

The Summer of Code student, Gabriel Synnaeve from Grenoble, France [5], will be working on combining these two approaches, using techniques developed at Carnegie-Mellon University in the USA [6]. The aim is to improve the quality of the translation – in effect, the Apertium and Moses translations will be compared, and the best bits of each will be used in the final translation.

For instance, take the Welsh sentence: “Mae Heddlu’r De yn ymchwilio i farwolaeth dyn 41 oed o Abertawe.” (South Wales Police are investigating the death of a 41-year old man from Swansea.)

Apertium currently produces: “South Wales Police is investigating death man 41 years old from Swansea.”

Moses currently produces: “the south wales police investigation into the death of a man 41 years of age of abertawe.”

The aim is to combine the best chunks from each program, so that we get something like:
  • [is investigating] +[the death of a man] *[41 years old] *[from Swansea] Here, the chunks marked * come from Apertium, and the one marked + from Moses, and combining both improves the quality of the translation.

This is cutting-edge stuff, and has rarely been tried before. Prof Harold Somers, in a 2004 report for the Welsh Language Board [7], suggested that a medium-term goal for machine translation in Welsh would be “to integrate … different [machine translation] engines into a single system”. Nothing has been done on that to date, and Gabriel’s work will be the first attempt to bring this vision of “multi-engine machine translation” for Welsh closer to reality.

Francis Tyers [8], who will be mentoring Gabriel, said, “I was quite surprised that we didn’t get any Welsh students applying, but this is a fantastic opportunity to improve Welsh language technology. I have no doubt we’ll see some real gains in the translation quality.”

Gabriel has already started work. “At the minute I’m fine-tuning the Moses Welsh-English translator to make it as efficient as possible. The Apertium community is very friendly, and I wanted to participate in a big open source project, so I’m glad I went for it.”

Kevin Donnelly [9], who co-developed the Apertium Welsh-English translator with Francis, noted that this was a big step forward for Welsh. “It is wonderful that so many talented people are working on Apertium, and that they are giving Welsh such a high priority. What we need now is for bodies promoting Welsh here in Wales to step up to the plate and give whatever enouragement and other support they can.”

Notes

[1] http://ufal.mff.cuni.cz/pbml-91-100.html. Francis Tyers and Kevin Donnelly (2009): “apertium-cy – a collaboratively-developed free RBMT system for Welsh to English”, Prague Bulletin of Mathematical Linguistics, 91.

[2] http://code.google.com/soc

[3] http://www.fsf.org/about/what-is-free-software. The Free Software Foundation’s definition of “Free Software” is software that the user is free to use, copy, change, and distribute.

[4] http://www.statmt.org/moses. Moses is an open-source statistical machine translation system.

[5] Gabriel Synnaeve is a student at the École Nationale Supérieure d’Informatique et de Mathématiques (http://ensimag.grenoble-inp.fr), a leading informatics and mathematics centre. He will graduate in September 2009 and will then begin work on a doctorate on Bayesian machine learning.

[6] Alon Lavie (http://www.cs.cmu.edu/alavie) is leading this work. See also: http://www.cs.cmu.edu/alavie/papers/EAMT-2005-MEMT.pdf. S. Jayaraman and A. Lavie (2005): “Multi-Engine Machine Translation Guided by Explicit Word Matching”, Proceedings of EAMT-2005.

[7] http://www.byig-wlb.org.uk/english/publications/publications/2302.doc. Harold Somers (2004): “Machine translation and Welsh: the way forward.”, Report for the WLB.

[8] Francis Tyers studied computer science at Aberystwyth, and is now a language engineer for Prompsit Language Engineering, S.L. and a PhD student at the Universitat d’Alacant. He is a key Apertium developer, with a special interest in extending it to handle the Celtic languages.

[9] Kevin Donnelly has been working on Free Software in Welsh since 2003, and developed the online Welsh dictionary Eurfa (http://www.eurfa.org.uk).

Contact: Kevin Donnelly, 01248-715925, kevin@dotmon.com

=====

Datganiad i’r Wasg 12 Mai 2009

Cyfieithu awtomatig o’r Gymraeg yn cael hwb o Ffrainc!

Bydd cyfieithu peirianyddol o ansawdd da o Gymraeg i Saesneg yn dod yn agosach pan gychwynnir ar broject newydd y mis yma.

Mae’r tîm rhyngwladol Apertium, a ryddhaodd eu cyfieithydd Cymraeg-Saesneg (http://www.cymraeg.org.uk [1]) ym mis Awst 2008, wedi cael ei dderbyn i mewn i’r pumed Google Summer of CodeTM [2], a bydd gwelliannau i’r cyfieithydd hwn yn cael ei ariannu fel un o’r projectau.

Platfform cyfieithu peirianyddol yw Apertium (http://www.apertium.org), sy’n Feddalwedd Rhydd [3]. Datblygwyd yn y dechrau i gyfieithu rhwng ieithoedd sy’n perthyn i’w gilydd yn Sbaen, ond dros y blynyddoedd diweddar estynnwyd y rhagleni drin iaethoedd eraill. yn cynrychioli 1.1bn o bobl, o Saesneg (tua 500m o lefarwyr) i Araneg (tua 4,000 o lefarwyr). Mae nifer tebyg o barau eraill yn cael eu datblygu, sy’n cynnwys ieithoedd Indeg megis Hindi a Bengaleg, ac ieithoedd Scandinafaidd megis Norwyeg a Sami.

Hyd yn hyn, mae cyfieithyddion ar gyfer 17 pâr o ieithoedd wedi eu rhyddhau,

Mae Google Summer of Code yn cynnig lwfans i fyfyrwyr i ysgrifennu cod ar gyfer projectau cod-agored, gyda chyngor gan fentoriaid sy’n gweithio esoes ar y projectau, ac mae o wedi helpu i greu miliynau o linellau o god ar gyfer dwsinau o brojectau. Dyma’r flwyddyn cyntaf i Apertium wneud cais i’r rhaglen, ac ariannir 9 o brojectau Apertium.

Mae’r cyfieithydd Cymraeg-Saesneg Apertium yn gweithio gan weithredu rheolau gramadegol i frawddeg Gymraeg i’w throi hi’n frawddeg Saesneg. Ffordd arall o wneud hyn (a ddefnyddir gan feddalwedd megis Moses [4]) yw defnyddio corff mawr o destun i weithio allan beth yw’r cyfieithiad tebygol am unrhyw ymadrodd.

Bydd y myfyriwr, Gabriel Synnaeve o Grenoble, Ffrainc [5], yn ceisio cyfuno’r ddwy ffordd yma o weithio, gan ddefnyddio technegau a ddatblygwyd ym Mhrifysgol Carnegie-Mellon yn yr UDA [6]. Yr amcan yw gwella ansawdd y cyfieithiad – bydd y cyfieithiadau Apertium a Moses yn cael eu cymharu, a’r darnau gorau o bob un yn cael eu defnyddio yn y cyfeithiad terfynol.

Er enghraifft, gweler y frawddeg Gymraeg: “Mae Heddlu’r De yn ymchwilio i farwolaeth dyn 41 oed o Abertawe.”

Mae Apertium ar hyn o bryd yn cynhyrchu: “South Wales Police is investigating death man 41 years old Swansea.”

Mae Moses ar hyn o bryd yn cynhyrchu: “the south wales police investigation into the death of a man 41 years of age of abertawe.”

Y bwriad yw cyfuno’r darnau gorau o bob rhaglen, i gynhyrchu rhywbeth fel:
  • [is investigating] +[the death of a man] *[41 years old] +[of] *[Swansea] Yma, mae’r darnau a nodir gan * yn dod o Apertium, a’r rhai a nodir gan + o Moses, ac mae cyfuno’r ddau yn gwella ansawdd y cyfieithiad.

Dyma waith arloesol, heb ei wneud o’r blaen. Awgrymodd yr Athro Harold Somers, mewn adroddiad ym 2004 ar gyfer Bwrdd yr Iaith [7], y dylai amcan tymor-canol ar gyfer cyfieithu peirianyddol yn Gymraeg fod “to integrate … different [machine translation] engines into a single system”. Nid oes unrhyw beth wedi ei wneud hyd yn hyn, a gwaith Gabriel fydd y cais cyntaf i ddod â’r syniad yma o “multi-engine machine translation” ar gyfer y Gymraeg yn agosach i fodolaeth.

Dywedodd Francis Tyers [8], fydd yn rhoi cyngor i Gabriel, “Dipyn o siom oedd hi nad oedden ni’n cael cais gan fyfyriwr Cymreig, ond mae hyn yn gyfle gwych i wella technoleg iaith yn Gymraeg. Rydym ni’n siŵr o weld cynnydd o safbwynt ansawdd y cyfieithu.”

Mae Gabriel wedi cychwyn ar y gwaith eisoes. “Ar hyn o bryd dwi’n gwneud newidiadau mân i’r cyfieithydd Moses i’w wneud mor effeithlon â phosib. Mae’r gymuned Apertium yn gyfeillgar iawn, ac roeddwn i eisiau cyfrannu i broject mawr cod-agored, felly dwi’n falch nes i’r cais.”

Dywedodd Kevin Donnelly [9], a weithiodd gyda Francis i greu’r cyfieithydd Cymraeg -Saesneg Apertium, fod hwn yn gam mawr i’r Gymraeg. “Mae’n ardderchog cael cymaint o bobl dalentog yn gweithio ar Apertium, a braf yw hi gweld eu bod nhw’n ystyried Cymraeg fel blaenoriaeth. Yr hyn sydd angen rŵan yw ymdrech gan y mudiadau sy’n hybu Cymraeg yma yng Nghymru i annog a rhoi cefnogaeth i’r gwaith yma.”

Notes

[1] http://ufal.mff.cuni.cz/pbml-91-100.html. Francis Tyers and Kevin Donnelly (2009): “apertium-cy – a collaboratively-developed free RBMT system for Welsh to English”, Prague Bulletin of Mathematical Linguistics, 91.

[2] http://code.google.com/soc

[3] http://www.fsf.org/about/what-is-free-software. Mae’r Free Software Foundation yn diffinio “Meddalwedd Rhydd” fel meddalwedd y gellir ei ddefnyddio, copïo, newid a dosbarthu gan y defnyddiwr.

[4] http://www.statmt.org/moses. System cyfieithu peirianyddol ystadegol yw Moses – mae’n god-agored.

[5] Gabriel Synnaeve yw myfyriwr yn yr École Nationale Supérieure d’Informatique et de Mathématiques (http://ensimag.grenoble-inp.fr), canolfan bwysig ar gyfer mathemateg ac thechnoleg gwybodaeth. Bydd o’n graddio ym mis Medi 2009, ac yn cychwyn gwaith wedyn ar ddoethuriaeth ar ddysgu peirianyddol Bayesaidd.

[6] Alon Lavie (http://www.cs.cmu.edu/alavie) is leading this work. See also: http://www.cs.cmu.edu/alavie/papers/EAMT-2005-MEMT.pdf. S. Jayaraman and A. Lavie (2005): “Multi-Engine Machine Translation Guided by Explicit Word Matching”, Proceedings of EAMT-2005.

[7] http://www.byig-wlb.org.uk/english/publications/publications/2302.doc. Harold Somers (2004): “Machine translation and Welsh: the way forward.”, Report for the WLB.

[8] Astudiodd Francis Tyers wyddoniaeth cyfrifiadurol yn Aberystwyth, ac ar hyn o bryd mae’n beiriannwr iaith gyda Prompsit Language Engineering, S.L. ac yn fyfyriwr PhD ym Mhrifysgol Alacant. Mae’n un o’r datblygwyr blaenorol Apertium, gyda diddordeb arbennig yn ei estyn i drin yr ieithoedd Celtaidd.

[9] Mae Kevin Donnelly wedi bod yn gweithio ar Feddalwedd Rhydd yn Gymraeg ers 2003, a datblygodd Eurfa, geiriadur arlein Cymraeg (http://www.eurfa.org.uk).

Cysyltwch â: Kevin Donnelly, 01248-715925, kevin@dotmon.com

March 3, 2009
» Perfect Translation and Website Localization Services

I get a depressingly large amount of spam email, which our spam filter does a pretty good job of identifying and tagging so that I can shift it off to a folder for a quick review before deleting. It’s all the usual kind of stuff – warning messages from banks, promises of enlargement, shady offers to transfer millions of dollars to my account and so on.

However, in amongst all the chaff I just found this single grain of wheat, which I feel I have to share:

Subject: Perfect Translation and Website Localization Services

Dear Sirs/Madamp

We are a porofessional translation and cultural solution company with over 10 years of experience in the market.

...

and so it goes on.

So much joy from one small email.

November 25, 2008
» Serendipity

Yesterday was one of those days when I was looking for something on the WWW, didn’t find it, but found some other (more) interesting things instead.

Warning, both of the more interesting things I found are large pdf files.

The first thing is a dissertation entitled Appropriating New Technology for Minority Language Revitalization: The Welsh Case by Mourad Ben Slimane at the Frei Universitat Berlin (it’s in English).

The second thing a dissertation entitled The Role of Online MT in Webpage Translation by Fredrico Gaspari at the University of Manchester.

I haven’t had a chance to read either of them yet – enjoy!

November 20, 2008
» 'Historic' use of Welsh in EU

Normally I’d consider a report about the use of Welsh in the EU to be of only passing interest, without much direct relevance to technology. However Mark’s comments on my post about Google Reader flag up the importance of having a large corpus of parallel texts to enable machine translation. Perhaps this would be a less obvious, but more tangible benefit of any wider use of Welsh within the EU in the future.