I just jail broke my kindle and have a few epubs and thought maybe this would be a good time to change my approach to vocabulary.
What I’d like to do is learn the vocabulary for my reading before I read it, instead of after, or as I’m reading it.
My dream piece of software would do the following:
-
resolve all words down to their most basic form (ie, singular for nouns, infinitive for verbs, etc.) (My Language is French)
-
count occurences of each word
-
Filter out words I already know
-
Define the words with a bilingual dictionary to english, including original context sentence.
-
Make anki cards for me to study.
(6) God-tier programming: also include idiomatic expressions as vocabulary)
Does this exist?
Edit: Or help me assemble a pipe to get all these tasks done separately.


Was messing around with Jiten.moe (spiritual successor to jpdb, again boasts the utility of ingesting a book or subtitle file and creating anki cards) and it made me think of this question. (And Jiten is actually open-source, so the repo’s there with how they do it… but I’m pretty sure it’s mostly just wrapping a bunch of Japanese-specific tools.)
Did a little looking. Tried checking https://github.com/keon/awesome-nlp and didn’t see anything French specific, but did come across https://github.com/french-ai/french-nlp which might have useful stuff. It sounds like a library called Spacy could be useful.
But then I ran across this tool, which might be pretty close to what you’d need? https://github.com/FreeLanguageTools/vocabsieve
I haven’t looked into exactly how the ‘automatically from books’ stuff would work or anything, but seems promising.
And I guess elephant in the room, NLP is the kind of task LLMs are actually pretty good at, so there’s also always that lazy-ish route: convert the book to text, feed it through an LLM and ask it to identify important vocabulary words.
Thanks! Vocab sieve looks perfect (though experimental), and it works with KOReader, too. Fuck me, I’m running out of excuses.