Learning vocab ahead of reading? (software for epub analysis)

schipelblorp@sh.itjust.works · edit-2 27 days ago

Learning vocab ahead of reading? (software for epub analysis)

emb@lemmy.world · edit-2 18 days ago

Was messing around with Jiten.moe (spiritual successor to jpdb, again boasts the utility of ingesting a book or subtitle file and creating anki cards) and it made me think of this question. (And Jiten is actually open-source, so the repo’s there with how they do it… but I’m pretty sure it’s mostly just wrapping a bunch of Japanese-specific tools.)

Did a little looking. Tried checking https://github.com/keon/awesome-nlp and didn’t see anything French specific, but did come across https://github.com/french-ai/french-nlp which might have useful stuff. It sounds like a library called Spacy could be useful.

But then I ran across this tool, which might be pretty close to what you’d need? https://github.com/FreeLanguageTools/vocabsieve

VocabSieve is a companion program for language learning with Anki. Its primary function is sentence mining, in which sentences with vocabulary words are collected and added into Anki for long term retention. It aims to help intermediate learners gain vocabulary efficiently by allowing card creation with minimal friction. Possible use cases include sentence mining from videos, texts, asynchronously from ereader highlights, and even completely automatically from books or subtitles.

I haven’t looked into exactly how the ‘automatically from books’ stuff would work or anything, but seems promising.

And I guess elephant in the room, NLP is the kind of task LLMs are actually pretty good at, so there’s also always that lazy-ish route: convert the book to text, feed it through an LLM and ask it to identify important vocabulary words.

schipelblorp@sh.itjust.works · 17 days ago

Thanks! Vocab sieve looks perfect (though experimental), and it works with KOReader, too. Fuck me, I’m running out of excuses.