[image] [image] That moral little AI is at it again. Learn how to save the planet and use conditional constructions! Full text here.[image] [image] That moral little AI is at it again. Learn how to save the planet and use conditional constructions! Full text here....more
[image] [image] We have now produced getting on for twenty picture books using C-LARA, and we're trying to figure out why some of them are quite good wh[image] [image] We have now produced getting on for twenty picture books using C-LARA, and we're trying to figure out why some of them are quite good while other just don't work at all. We have some tentative hypotheses:
1. The AI finds some visual styles easier to work with than others. It's particularly fond of manga/anime. 2. The AI prefers not to use white European characters. 3. The AI likes to include a moral message. 4. The AI likes some quirky humour.
None of these would in any way be odd, if in fact they are correct guesses. There's a great deal of manga/anime on the web that the AI could have trained on, and it's a fairly well-defined style. The AI has certainly been instructed to observe diversity rules when generating images of people, and not make everyone look white and European. Everyone who works with Chat knows it has a strong moral sense. And everyone knows it will try to be funny if you give it an opportunity.
We're creating a series of language texts for low intermediate students of English, and for the current one I thought I would try out our recipe. I gave the AI the following initial prompt:
Could you write the text of a quirky pedagogical picture book for low intermediate students of English entitled "Journalist Jamila Loves Subject-Auxiliary Inversion"? Jamila is a young journalist who speaks excellent English. Her best friend is Farzad, another journalist who reports from war zones. Farzad meets up with Jamila when he's feeling bad and poses rhetorical questions about the terrible things he's seen. He's so upset that he usually forgets how to express himself correctly in English. Jamila is very sympathetic. She gives him a hug and then she corrects his grammar. It's a kind of game between them, and it makes them both feel better.
We want 15-25 pages with 3-5 sentences per page, no page numbers.
For the images, I gave it this prompt for the initial image that would set the style for all the other ones:
An image of Jamila, a young journalist, sitting with her friend Farzad, another young journalist. Farzad is very upset. Jamila is holding his hand and trying to comfort him.
Use a manga style with exaggeratedly large eyes.
The AI did everything else, except that I regenerated a few images where DALL-E-3 clearly hadn't followed GPT-4o's instructions.
I like the result! Some people will say it's overly sentimental, but I was actually kind of moved. I guess I just have a thing for subject-auxiliary inversion.
A bridge player friend recently married an Icelander and is trying to learn her husband's language. I showed her C-LARA and asked if she would [image]
A bridge player friend recently married an Icelander and is trying to learn her husband's language. I showed her C-LARA and asked if she would like it to write her a easy picture book story to practice with. After a little discussion, we agreed that a dragon would fall in love with a queen who played bridge and have to learn the Acol system to win her heart.
It turned out that GPT-4o's Icelandic is still rather uncertain, but our Icelandic colleague Sölvi helped clean it up. You can see the result here. _______________
In Anna Drottning og Drekinn Dóri: Jötnarnir koma, a group of frost giants turn up, and the only way to save the kingdom is to defeat them at the bridge table. Here, Queen Anna (possibly distracting the giants with her opulent cleavage, though the text is unclear on this point), plunks down the winning card to bring their nefarious schemes to naught.
I asked C-LARA for a pedagogical picture book designed to appeal to 14-year old male students of English and entitled "Supermodel Lisa Loves Ph[image]
I asked C-LARA for a pedagogical picture book designed to appeal to 14-year old male students of English and entitled "Supermodel Lisa Loves Phrasal Verbs". Full text here.
It seems to me that the AI has a reasonable understanding of what 14-year-old boys find interesting (view spoiler)[Hint: not phrasal verbs (hide spoiler)]. Apologies to female readers who wander in and find they are unable to retain their breakfasts. You have been warned....more
I am having so much fun with the new C-LARA picture book functionality, which lets you create a multimodal picture book in less than an hour! This repI am having so much fun with the new C-LARA picture book functionality, which lets you create a multimodal picture book in less than an hour! This report, which I've just posted here, shows you how to do it, using the fully AI-generated example text Kitten's Busy Morning.
If you were wondering why I've been posting so little the last few weeks, then wonder no more. Available for free download here. ___________
But seriousIf you were wondering why I've been posting so little the last few weeks, then wonder no more. Available for free download here. ___________
But seriously...
ChatGPT-based Learning And Reading Assistant (C-LARA) is a project that's been taking up most of my time for the last year. The basic idea was to build a web platform that lets people create easy-to-read multimodal texts in foreign languages, and have ChatGPT-4 do as much of the work as possible. Chat appears in two roles. With its software component hat on, it writes the texts, cuts them up into roughly sentence-length pieces, adds glosses to the words, marks them with root forms, and adds TTS audio. With its software engineer hat on, it's written most of the codebase. This report gives you a detailed picture of what we've done.
You may think there are too many details - does it really need to be 144 pages long? But it hasn't primarily been written for you, it's been written for ChatGPT-4. Although the AI is responsible for the greater part of the work performed in the project, it periodically has to be reinitialised, and then I need to tell it who it is again. Having this report available makes the job easy: I can give it the text in half a dozen instalments, which takes a quarter of an hour, and then it's up to speed again.
For humans, here are some of the bits you might find interesting. First of all, we've made C-LARA easier to use. There's a new top level called "Simple C-LARA", which lets you create a multimodal text with an initial request and a couple of button presses. You choose the text language and the glossing language, provide a sentence or two telling the AI what to write, and it creates a short illustrated text for you. (The illustrations comes from DALL-E-3). There are straightforward options to edit and correct when it gets things wrong. You can also paste in an existing piece of text if you prefer, and tell the AI to annotate it instead. For example, that's how I created the multimodal Norwegian passage in my review of Jon Fosse's Melancholia I-II.
Second, we've done some work to evaluate C-LARA. We describe an experiment we presented at the ALTA 2023 conference late last yeat, where we created six texts of widely different kinds in English, Faroese, Farsi, Mandarin and Russian and carefully checked how often the AI was making mistakes as it wrote and annotated them. Not surprisingly, it's much better at some languages than others. It turns out to be nearly as good at Mandarin as it is at English, but it's clearly worse at Russian, worse again at Farsi, and having serious problems with Faroese. Of course, it's miraculous that it can do anything in Faroese, an obscure Scandinavian language spoken by about 50,000 people. We repeated the experiments a few months later, and found it had improved a good deal in English. We also analyse the codebase to quantify the AI's contribution to writing it. We found that it had written nearly all of the simple modules and done the greater part of the work on the middling difficult ones. There were a couple of top-level pieces of functionality, in particular "Simple C-LARA", where it couldn't deliver: they required an overview of the whole project, and its context window doesn't seem to be up to it yet. I'm curious to find out what happens in GPT-5.
Third, we present some case studies where people have started to use C-LARA. We have a primary school teacher in Holland who teaches a weekly class for Romanian kids whose parents don't want them to forget their heritage language. We'd never used C-LARA with Romanian, a language I know nothing about, but Lucretia just tried giving it requests in Dutch and said it produces cute, funny little Romanian stories that the kids like. We've also been collaborating with linguists at the University of New Caledonia, who are using it for a couple of the Indigenous languages there. Here, the AI can't help with the writing; they need to create the texts by hand, but that's possible too. They're pleased with the results, which are freely available on the web.
In the appendices we give more details, including examples with step-by-step screenshots showing how to create a C-LARA text. Try it out, the platform is already a whole lot of fun to play with! It's just amazing what you can do with an AI to help you. ___________ [Update, Apr 9 2024]
One of the most useful things about writing this kind of document is that it forces you to think carefully about what tasks you should be planning to do next. We put together a long list in the "Further work" section (§9.1), and we've already started. More about that in a recent post on the C-LARA blog....more
For reasons totally unconnected with the fact that Not insists on giving him tasty little home-cooked treats (roast lamb, lightly seared salmon and paFor reasons totally unconnected with the fact that Not insists on giving him tasty little home-cooked treats (roast lamb, lightly seared salmon and pan-fried chicken liver have all featured on the menu), Finley has inexplicably become rather fussy about his diet. In particular, Felix As Good As It Looks Beef in Jelly, once his favourite, now gets no more than a disdainful glance.
ChatGPT-4 and I, both fans of the Struwwelpeter, felt this kitten needed to acquire more moral fibre. After some discussion, Chat has composed a charming poem with accompanying pictures, which we hope will show Finley the error of his ways. You can find it posted here, or go here if you want to read it without creating a C-LARA account.
Gute Lektüre! ____________________
In response to overwhelming public demand, Chat has now produced an English edition. You'll find it here, or here if you don't want to create a C-LARA account....more
Over the last few weeks, ChatGPT-4 and I have had a lot of fun helping our New Caledonian colleagues Pauline Welby and Fabrice Wacalie to construct a Over the last few weeks, ChatGPT-4 and I have had a lot of fun helping our New Caledonian colleagues Pauline Welby and Fabrice Wacalie to construct a multimodal alphabet book for the Oceanic language Drehu. I had been told it was the language of the Island of Lifou and one of the many languages of the indigenous Kanak people of New Caledonia. Unfortunately I knew nothing much else about Drehu, and it turned out that Chat was scarcely better acquainted with it than I was; but all the same, we found we could make ourselves useful. We first modified our C-LARA platform so that it could be used for languages where there was no clever AI and no text-to-speech engine available, and everything needed to be done by hand. In particular, we arranged things so that Pauline could fill in a table which gave French-language phonetic equivalents for the rather idiosyncratic Drehu writing system; for example, the word Drehu itself is pronounced approximately JAY-who.
The most interesting part, though, was creating the images. We were working off a pamphlet that had been produced by the Academy of Kanak Languages, where the images mostly appeared to have been downloaded from the web. Asking for permission to use them was going to take ages, and anyway we weren't blown away by the quality. We wondered if ChatGPT-4 and DALL-E-3 could create new ones. Chat said it was able to do this, and assured us that the result would be both tasteful and culturally appropriate. Although we weren't completely sure we could trust it, we were so curious to find out more than we let the AI have its way. It was remarkably enthusiastic about the project, and turned out eighty images in a couple of days. When we showed them to Drehu native speaker Fabrice, who had been on vacation the previous week, we were impressed to find that the AI had called it right. Fabrice thought they were very good.
You can access the alphabet book online here; if you don't have a C-LARA account, creating one is free and takes a minute. Each page in the book can be accessed in a 'Words' or a 'Sounds' mode. In 'Words' mode, hovering over a word shows a French translation and clicking on it plays audio. In 'Sounds' mode, words are broken up into component sounds. Hovering over a letter group shows you an approximate French equivalent, and clicking both plays audio and shows you all the words in the alphabet book which contain that sound. The following screenshot, showing the page for treu ("moon") illustrates:
[image]
Here, the user has just clicked on the initial "tr", which is glossed as roughly corresponding to the French "tch". They can hear Fabrice pronouncing it, and they're also shown other "tr" words on the right.
If you have thoughts about this project, we'd love to hear them!...more
People who should know better keep insisting that the novel is dead, but they are making a fundamental mistake. It is the publishing industry, with itPeople who should know better keep insisting that the novel is dead, but they are making a fundamental mistake. It is the publishing industry, with its soulless insistence on chasing profit to the exclusion of everything else, which is dying. The novel is very much alive: you just need to look for it in the right places.
Melancholia is a stunning example. When Not and I first heard about this book, we couldn't help smiling: here's a six hundred page stream-of-consciousness account based on two days in the life of an obscure nineteenth century painter, moreover written, not just in Norwegian, but in the less commonly spoken version of that small language. It sounds like an SBS Woman parody come to life. But I found, to my considerable surprise, that the book works. It isn't just readable, it's compulsively readable, and it says some things about art and the human mind that...
So what's it saying, you want to know? I was wondering how I could try to explain, but on reaching the book's final pages I found that the author had anticipated me. The painter's sister, now a very old woman, is sitting on the toilet looking at the picture her brother had given her many years ago:
Og ein dag kom han Lars springende etter henne og gav henne dette biletet, og ho sa vel ikkje takk eingong, tenkjer ho Oline, og ikkje syntest ho vel at biletet var noko særleg, heller, helst var det vel berre noko rableri, syntest ho nok, men ho tok då imot og så hengde det der på veslehuset og der har det nu hange i alle dei år, tenkjer ho Oline, og ho synest vel og etter kvart at biletet er vakkert, og ho skjøner vel og kva Lars kan ha meint med det biletet, gjer ho vel, men å seie det! få sagt kva han kan ha meint! nei det går vel ikkje, eg ho kan vel omogeleg seie det, heller, for då var det vel ikkje noko vits for han Lars å male biletet, då, kan ein vel tenkje, tenkjer ho Oline, men biletet er fint, det, sjølv om det vel helst er noko rableri, fordi han Lars ha malt det, er biletet fint, det meiner ho nok, ja, om einkvan andre enn han Lars hadde malt det, hade ho ikkje synst at det var noko vakkert, tenkjer ho Oline, men no synest ho at biletet er så vakkert at det nesten er som om ho skal ta til tårene når ho ser på det.
My translation:
And one day Lars came running after her and gave her this picture, and she didn't even say thank you, thinks Oline, and she didn't think the picture was anything special either, really just a scribble, she thought, but she let him give it to her and she hung it in the outhouse and it's been hanging there all these years, thinks Oline, and in the end she thought the picture was beautiful, and she understands what Lars meant with the picture, she does, but how would she say it! say what he meant! no you can't do that, she could never say it, because then why would Lars have painted the picture would he, thinks Oline, but the picture is lovely, even if it's just a scribble, because Lars painted it the picture is lovely, that's what she thinks, yes, even though if someone else had painted it she wouldn't have thought it was anything special, thinks Oline, but now she thinks the picture is so beautiful that tears almost come to her eyes when she looks at it.
Please forgive the infelicities in my translation: this is almost the first thing I've read in nynorsk. But it won't be the last. ________________
If you want some idea of what the passage sounds like in Norwegian, here is a C-LARA version. Word glosses by GPT-4, audio by Google TTS (NO-Wavenet-B voice; unfortunately I can't find a nynorsk TTS voice) and image by DALL-E-3....more
I have been experimenting with a new feature in our C-LARA platform, where you now have the option of passing the text to DALL-E-3 and requesting an iI have been experimenting with a new feature in our C-LARA platform, where you now have the option of passing the text to DALL-E-3 and requesting an image to go on the front page. This worked fine the first twenty or so times, but when I gave it Baudelaire's poem "Recueillement", from this collection, I received the following error message:
Exception: Error code: 400 - {'error': {'code': 'content_policy_violation', 'message': 'Your request was rejected as a result of our safety system. Image descriptions generated from your prompt may contain text that is not allowed by our safety system. If you believe this was done in error, your request may succeed if retried, or by adjusting your prompt.', 'param': None, 'type': 'invalid_request_error'}}
Well, it's hard to disagree. But what a clever AI to make that judgement call! ________________ [However, a little later...]
I should know by now that it's always wise to see if an experiment can be replicated. When I tried to do the same thing a second time, I got this image: [image] Also an impressive response! ________________ [And after another couple of hours...]
I liked Dmitri's message #4 and wondered what C-LARA would make of his witty suggestion. Here is its little story with accompanying illustration. You will need to create a C-LARA account to access it, free and takes one minute....more
I've been experimenting with the idea of combining ChatGPT, DALL-E, the ReadSpeaker TTS engine and the LARA toolkit to create multimedia stories that I've been experimenting with the idea of combining ChatGPT, DALL-E, the ReadSpeaker TTS engine and the LARA toolkit to create multimedia stories that can be used as reading material for people who want to improve their foreign language skills. Here's an example. I simply gave the prompt "Write a short, quirky news story in Italian that could be used in an intermediate language class", and let Chat get on with it; when it had finished, I also asked it to add an English gloss for each word. I created a DALL-E image and converted into multimodal form using the LARA toolkit, the whole thing took about half an hour.
You can see the result here (view in Chrome or Firefox). People whose Italian is better than mine have said good things about it. _________________________
I've now created similar stories in about twenty more languages, there's a complete list below. For some reason ChatGPT likes writing about heroic animals, I have no idea why! In a few cases (Mandarin, Spanish, Swedish), I asked it not to do that, since I was getting tired of the theme.
There are several languages here that I don't know at all, and others that I know very badly. I've just cut and pasted Chat's text, making a few minor corrections to keep things consistent when there were obvious formatting errors.
After each link, I'm adding comments received from native and near-native speakers. If you speak one of these languages and have thoughts about any aspect of the content, please feel free to post below or PM me!
There is a pause between the second last and last words in the first sentence of the last paragraph which must not be there as there is no punctuation mark between the two words. For the same reason, there must not be any pause between the third last and second last words in the last sentence of the same paragraph.
The Bengali word chosen for the English word ‘inventor’ actually means artist and not inventor; there is a more accurate word in Bengali for inventor.
The last sentence could be improved by restructuring it. But, I think that would not do much to improve it. The reason is this: The English translation of the last sentence as it stands now would be the following: The artist is showing a huge amount of interest in his initiative because of this success and he/she is waiting for his/her next technological invention.
The sentence needs to rephrased if the following in intended:
The inventor is greatly motivated by this success and is looking forward to his/her next technological invention.
"Cat saves baby" (French) Christèle (native) says it's more or less perfect as far as the French goes, but queries the plausibility. How could the cat have saved the baby?
"Dog reunited with family" (French) Christèle (native) says it's more or less perfect as far as the French goes, but queries the plausibility. Why would you do a DNA test on a dog?
"Time capsule found in school" (German) Mark (near-native?) says in message #33 that it is "flawless". However, Berengaria (C2 level) says in message #63: "The German one about the time capsule is grammatically correct, but one or two word choices/phrasings are odd". Leidzeit (native) comments further in message #65: "I would not say the German story is flawless. The use of the past tense (preterit) is odd but okay for a piece written in the local paper. But you would not use gewährten as the stuff found still allows you to look back. As for style as an editor I would ask the writer not to use the als-clause two times."
"Monkey business" (Hindi) Saurabh (native) says in message #52 that "the Hindi one is simple, the words used are more daily-usage than literary, and the story, surprisingly, I could imagine in a Hindi children's book." Peter (near-native) says: "Seems pretty good to me: a bank raid in New Delhi by a gang of monkeys... seems all too plausible to me!"
"Cheeky horse" (Irish) Neasa (native) says the word choice is very odd. In particular, the key content word capall is an obscure archaic word for "horse" never used in modern Irish.
"Cat elected mayor" (Italian) Catia (native) says one small mistake. Plch (native) says in message #35 below that the name of the town should be feminine, otherwise perfect. Ivana (near-native) says very good.
"Historical cat" (Latin) Not's mother, a retired Latin teacher, said there were no obvious mistakes in Chat's grammar. She would however prefer not to give its homework a mark until she has seen at least one more Latin composition by an AI.
"Historical dog" (Latin) [This text was created using an early version of the new C-LARA platform using a story originally written by the Bing engine.]
"Inexplicable event at museum" (Slovak) Branislav (native speaker) says that there are some unnatural word choices and minor grammatical errors, and in general that it sounds as though it's been written by someone whose native language is English. But he added that he quite liked the story, and that it would only take a few minutes for a fluent Slovak speaker to fix it up so that it was fully acceptable.
I checked the Ukrainian version. Despite its numerous instances of unconventional usage and dubious stylistic choices the text overall could easily pass as a work of a 10 year old child. It was unexpected to find a newly coined participle блукуючи instead of блукаючи the blunder most probably caused by the wrong declension of the same verb. Overall not too shabby.
I read the story. It is well written. There are two points where I felt it could improve from my perspective:
- The AI generated an odd use of ‘mahli’ (local) birds in the title. While everywhere else the use of word ‘mahala’ sounds appropriate. In the title when used to describe the birds, it seems a bit off. - At another point, the use of the words ‘ibtadai doar’ though might have sounded better in English, it reads a bit off in the Urdu sentence. The use of these word here does not flow with the present tense/contemporary nature of the story, making it seem like a historical text and translates more as in the ‘beginning era of their efforts’ rather than ‘at the start of their effort’ at least to me.
Otherwise, the text reads really well as a short story about birds attempting to fly to the moon and inspiring children and adults to appreciate their efforts and expand their own horizons.
As a native speaker, I can say that it flows quite smoothly and it's almost impossible to tell if it was written by a Vietnamese person or not. However, upon closer inspection, there may be a few words that are not 100% appropriate (but they are still not incorrect).
I have just posted the LARA edition of Noms de pays : le nomhere, so Du Côté de chez Swann is now complete.
When I first read the series, in English, I remember wondering what the point was of the long essay on the way we build up mental images of things based on the sound of their names. I can no longer reconstruct my impressions properly: could I really have thought that? As Proust would point out, evidence that I was a different person then....more
I liked Cliff Goddard's Pitjantjatjara/Yankunytjatjara Learner's Guide so much that I decided I had to find out more about these languages. I particularly wanted to get a better idea of what Pitjantjatjara sounds like. After searching around, I located a old course that had once been taught at Adelaide University; the material, both text and audio, was available for sale on USB sticks from a couple of Australian sites. The package looked like this:
[image]
I paid my AUD 30 and waited for it to arrive.
The course turned out to be excellent, and gives a solid introduction to Pitjantjatjara grammar and pronunciation: it had been developed between 1966 and 1968 by the the Rev Jim Downing, a Pitjantjatjara man called Gordon Ingkatji, and legendary MIT linguist Ken Hale, with audio recorded by Ingkatji and a so far unidentified female speaker. (I would love to know more about how these apparently very different people collaborated). In several places, knowledgeable people say it was the first course ever offered on an Indigenous Australian language.
I couldn't resist the temptation to convert it into LARA format so that text, sound and translations were all directly linked together. It took a while, but I fitted in a chapter every now and then and completed all 16 of them in a couple of months. When I thought I'd accomplished everything I could reasonably do unaided, I asked around again and a colleague put me in touch with a Melbourne linguist called Sasha Wilmoth, who'd just finished a PhD on Pitjantjatjara. Sasha went through my initial draft with amazing efficiency and the next day gave me a comprehensive list of things that needed fixing: luckily, they were all easy to take care of, and the second draft was a great deal better than the first. After a couple more rounds of fixing and improving, we decided we were done. We've just submitted a short paper about our efforts to a 2023 meeting.
This exercise was both enjoyable and instructive; I now feel I at least know something about Pitjantjatjara. The USB stick also contains a second course ("Advanced Pitjantjatjara", 12 units), and I'm thinking of having a go at converting that too. Stay tuned for further developments. ________________ [Update, Mar 27 2024]
Sasha and I did indeed convert the "Advanced Course", and our paper was presented at the 2023 edition of Computational Methods for Endangered Languages. You can access it here. We were feeling pleased with ourselves, but then disaster struck! Gordon Ingkatji's daughters, who'd previously told Sasha that they were very happy to see their father's memory honoured in this way, contacted her to say they'd reconsidered: in fact, they wanted financial compensation for allowing the course to be put online. The copyright situation was unclear, but in these situations the family's wishes are always considered first. Unfortunately, we had no budget to pay them, so we were forced to take the course down again.
But as of this week, there is a happy ending to the story. The people at AUSIL, who had distributed the USB stick version and have vast experience in dealing with Indigenous language rights issues, took on the case and negotiated an amicable settlement with the Ingkatji family. They have just put the course online again on their site; you'll find it here.
In my opinion, the reason why so many people were willing to give up their time to help make sure that the course was preserved is simply that it is very good. It is also an important historical document, representing a landmark in Indigenous language studies. I am really happy that I was able to make a small contribution to this effort....more
I reread Maupassant's classic short story earlier this week while putting together this LARA version. Looking at the other reviews, I see much about tI reread Maupassant's classic short story earlier this week while putting together this LARA version. Looking at the other reviews, I see much about the moral aspects, the author's literary craftsmanship, the relationship to Madame Bovary, and whether or not diamonds are a girl's best friend. But there is curiously little about a question that surely must have occurred to other people: what happened next? Luckily, I happened to know that Peter Chelsom, undisputed king of tasteful adaptations from the French, was following up his triumph in Hector and the Search for Happiness with another masterpiece. I am proud to present the initial scene from The Necklace II: This Time It's Personal, soon to be released by Amazon Prime:
MME FORESTIER: ... Oh, my poor Mathilde! Why, my necklace was paste! It was worth at most only five hundred francs!
[A moment of stunned silence]
MME LOISEL: I... I... ah, I guess all's well that ends well. When can I have it?
MME FORESTIER: [who is rapidly reevaluating the situation] Have what?
MME LOISEL: The necklace. The real necklace.
MME FORESTIER: I'm sorry, I don't understand.
MME LOISEL: Oh... Jeanne, please, please, you aren't going to be difficult about this are you? We spent ten years of our lives scrimping and saving to buy this stupid piece of jewelry. It seems the whole thing was a mistake. But at least, now we own something worth forty thousand francs.
MME FORESTIER: I have absolutely no idea what you are talking about.
MME LOISEL: The necklace! You know it's mine! You just told me it was!
MME FORESTIER: What necklace?
MME LOISEL: The one you lent me back in... I mean, actually...
MME FORESTIER: Oh yes, that extremely valuable diamond necklace I once let you borrow. I must have been insane. When you were late returning it, I wondered if I'd ever see it again. A narrow escape, I've always thought.
MME LOISEL: But... but, Jeanne, please, you can't... I mean, you know perfectly well that legally...
MME FORESTIER: Are you threatening me?
MME LOISEL: No, no, of course not...
MME FORESTIER: If it ever did go to court, the judge would rule there was no case to answer. Though I doubt it would get that far. You probably can't even afford a lawyer.
MME LOISEL: But Jeanne...
MME FORESTIER: Mathilde, I'm worried about you. Have you considered seeking professional help?
[A long pause]
MME LOISEL: You poisonous bitch. I'll get even with you if it's the last thing I ever do.
MME FORESTIER: I think it would be. Better not try, sugar plum.
I've just posted a LARA version of Un amour de Swann, built using the same methods as with Combray out of a public domain audiobook taken from LitteratureAudio and parallel French and English text taken from Gutenberg. This gives a total of about 19 hours of Proust in LARA form. The scripts still need more work, but they're starting to get there: Combray took three weeks and a substantial revision of the code, Un amour de Swann a few days and some minor bug fixing. Coming next, Noms de pays : le nom and then A l'ombre des jeunes filles en fleurs. By the time we get to Le temps retrouvé, I hope everything will be quite stable.
There is a good deal of editing and quality control involved. If you're interested in contributing to this project, please leave a comment or PM me! ________________________________
I finished rereading it in the new multimedia format, and, as with Combray, I was astonished to see how much more I appreciated it this way. It becomes clear that, even for people who fancy themselves as connoisseurs of Proust, we tend to underestimate just how complex and subtle a writer he is. It helps a great deal to be able to read the French text while simultaneously getting the viewpoints of a French person who's spent a lot of time thinking about how she should read it aloud, and an English person who's spent a lot of time thinking about how he would say it in his language. Both of them quite frequently made obvious mistakes: as noted, Proust is very challenging. But much more frequently, they showed me things I'd missed on previous readings.
Starting with what the book is about. I'm embarrassed to admit it, but (view spoiler)[I'd somehow acquired the idea that it's about jealousy. It isn't. It's about the nature of the mind. (hide spoiler)]...more
I've been spending a lot of time this year trying to develop ways to build multimodal LARA documents automatically out o[Original review, Sep 13 2022]
I've been spending a lot of time this year trying to develop ways to build multimodal LARA documents automatically out of public domain internet resources. In principle, as Not told me a while ago, it should be easy. For many classic works of literature, everything you need is already there: the original text, a good English translation, and a high-quality audiobook. You just need to pull them apart and then put the pieces back together again so that the text, audio and translations line up. Surely there can't be much to it?
Not's intuition was spot on, though the details have taken a while to work out and still need considerable tidying up. Basically, my recipe goes like this. You start with the audio and cut it into pieces at silences using the ffmpeg tool. You then take the pieces of audio and send them for processing by Google Cloud Speech to Text. This is far from 100% accurate, but it's good enough that you can write a script which aligns the speech recognition results against the text of the book and matches them quite reliably. Next, you take the source text and the translation, and send them for processing at the YouAlign site; this cuts the two texts into roughly sentence-length chunks in a way which matches corresponding passages.
The problem is that the two alignments, source/audio and source/target, are not consistent with each other, since silences and sentence-breaks are not at all the same thing. In general, a sentence contains many silences comparable in length to the ones you get at periods. However, the aligments agree well enough that you can in practice take the places where they do agree and use those to create a consistent alignment. Most often, this means that a sentence found by YouAlign corresponds to several silence-delimited audio segments, though sometimes you need more than one YouAlign segment for it to work. For each combined segment, you stick togther all the relevant audio chunks and all the relevant translation chunks, and you're there. There is slightly more to it than the above, but basically it is indeed quite simple: it works because the core resources, Google Cloud Speech to Text and YouAlign, are very good, and you just have to find a way to exploit that power.
Proust's Combray is my first full-scale test of the idea. The original French text and the Scott Moncrieff translation were both downloaded from Gutenberg; the audio, about 8 hours and beautifully recorded by Monique Vincens, comes from LitteratureAudio. The resulting LARA version is posted here, view in Chrome or Firefox. You can use the audio controls to play audio a page at a time or a sentence at a time. Clicking on a pencil icon shows a translation of the previous sentence on the right; clicking on a word shows a concordance of places where that word occurs in the text.
We are writing a paper about this work, due at the end of the month. If you have any feedback, in particular including suggestions for what texts to do next or ideas about how one might use resources like LARA Combray in practice, it will be much appreciated! _______________________________ [Update, Sep 17 2022]
Picking up on Théodore's recommendation in message #9, I have used the same technique to create a LARA version of Rimbaud's Les poètes de sept ans. It's posted here. What a great poem! I had not seen it before, thank you Théodore. The audio is again from LitteratureAudio, this time by recorded by Alain Degandt. _______________________________ [Update, Oct 11 2022]
We have now submitted our paper. In the course of writing it, I went through four texts I'd created using the alignment method, listening to each piece of audio, checking it against the text and translation that the aligner had matched to it, and correcting where it was wrong. This kind of annotation work is common in language technology projects, and in nearly all cases it's painfully dull. But not here! I was amazed to find how much I enjoyed reading Combray in this new way, and how much more I got out of it as I listened to the French audio while flicking my eyes back and forward between the pieces of French text and English translation, which were neatly lined up for me.
There were two things in particular that stood out. First, it's possible to read a good deal more quickly. Proust is a notoriously demanding author; when reading in normal text form I usually feel I've reached my limit after at most 20-25 pages, and can no longer maintain the concentration needed to disentangle the longer sentences. Here, I was supported by the audio and the English translation, and I could read the whole book in two or three days. This exposed all sorts of connections I hadn't noticed before. Second, listening to Monique Vincens reading aloud made me properly aware of how funny Proust is; his irony is often so subtle that I hadn't noticed it, but she does a wonderful job of conveying the humour.
All in all, I felt I was appreciating the book at a different level. I will soon start putting together a LARA version of A l'ombre des jeunes filles en fleurs....more
Over the last two and a half years, it has been my great pleasure to help my talented Icelandic colleagues use the LARA platform to put togethe[image]
Over the last two and a half years, it has been my great pleasure to help my talented Icelandic colleagues use the LARA platform to put together a multimedia edition of the Poetic Edda. Three poems - Völuspá, Hávamál and Lokasenna - have already been posted separately, and some people will remember the Goodreads reading groups we had for them.
As of today, the project has passed another milestone, and we have just posted a combined edition which contains ten poems (Völuspá, Hávamál, Vafþrúðnismál, Grímnismál, Skírnismál, Hárbarðsljóð, Hymiskviða, Lokasenna, Þrymskviða and Alvíssmál) organised as a single document which you can find here. As with the individual poems, you can view it in Chrome or Firefox and listen to the original Old Norse a verse at a time. Hovering over a ᚠ rune shows a verse translation from the public domain Bellows edition. Hovering over a word shows an English gloss; clicking on it plays audio, and also brings up a concordance on the right hand side where you can see different places the word occurs, both in the current poem and also in the other ones.
The project has been a true labour of love, and hopefully this is still just the beginning. ___________________ [Update, Aug 26 2022]
After further diligent work, there's now a second version posted here where the glosses are in modern Icelandic. Even if you don't know any Icelandic or Old Norse, it's quite interesting to go through a few verses, hovering over the Old Norse words to discover how much the language has changed since 1280. ...more
To me, the point of Flaubert's famous novella is the contrast between the unyielding grimness and ugliness of the subject-matter and the extraordinaryTo me, the point of Flaubert's famous novella is the contrast between the unyielding grimness and ugliness of the subject-matter and the extraordinary serenity and beauty of the style. This doesn't quite work in translation; I see a number of puzzled reviews. But if you want to find out what you've been missing, I've just used the new methods we've been developing to construct a LARA version out of high-class audio from the litteratureaudio.com site, the English translation available on Gutenberg, and a state-of-the-art text to speech engine. In Chrome or Firefox, you can listen to it a page at a time, a sentence at a time, or a word at a time. Hovering over a pencil icon shows you a translation of the previous sentence, and clicking on a word shows you all the places it occurs in the text, plus in most cases a link to a French lexicon page.
I'm curious to know if this changes anyone's mind about the story. If it does, please let me know!...more
I am continuing to experiment with the scripts I've developed for cutting up audiobooks and turning them into multimedia LARA texts (more details hereI am continuing to experiment with the scripts I've developed for cutting up audiobooks and turning them into multimedia LARA texts (more details here if you're curious). I've tried French and English, so I thought I'd move to a different language and do something in German. What better place to start than Aschenputtel, the original Brothers Grimm version of Cinderella? You can see the result here. As usual with LARA texts, view in Chrome or Firefox and hover over a pencil icon to get a translation of the preceding sentence; clicking on a word shows you all the places it occurs in the text, and also gives you a link to a German lexicon/grammar page. The audio comes from someone on Librivox called Hans Hafen, who sounds like the kind German grandfather every child ought to have to read Grimms Märchen to them. Vielen dank Herr Hafen!
I did the translation myself, partly because the one I found on Gutenberg was far removed from the German, and partly because it was fun. What a weird story Aschenputtel is compared to the bland Disney film! I started off thinking it was very politically incorrect, which I suppose it must be, but after a while I decided that the proto-feminist component was even stronger. Herr Charming, if you haven't figured it out yet, I must warn you that this is a seriously dangerous princess. You look like you're way out of your depth. _________________ [Update, Oct 1 2022]
This afternoon, we watched the Adelaide Gilbert and Sullivan Society's charming production of Into the Woods, which I'd never seen before. I was amazed to discover how closely the Cinderella segment followed Aschenputtel! If you're a fan, I definitely recommend checking out the original text....more