I just jail broke my kindle and have a few epubs and thought maybe this would be a good time to change my approach to vocabulary.

What I’d like to do is learn the vocabulary for my reading before I read it, instead of after, or as I’m reading it.

My dream piece of software would do the following:

  1. resolve all words down to their most basic form (ie, singular for nouns, infinitive for verbs, etc.) (My Language is French)

  2. count occurences of each word

  3. Filter out words I already know

  4. Define the words with a bilingual dictionary to english, including original context sentence.

  5. Make anki cards for me to study.

(6) God-tier programming: also include idiomatic expressions as vocabulary)

Does this exist?

Edit: Or help me assemble a pipe to get all these tasks done separately.

  • dragontamer@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    3 days ago

    I feel like you’re approaching this incorrectly. Do you have graded readers?

    An A2 graded reader would assume you knew all A2 level words and have definitions for the B1+ / B2 (or beyond) words in the text.

    So instead of making software that does the work of making a graded reader, it is probably better to just start by using graded readers (where all this work has already been done).

    • schipelblorp@sh.itjust.worksOP
      link
      fedilink
      arrow-up
      1
      ·
      3 days ago

      I feel like it’s not that much work and the benefit is that it gives me a lot more freedom to read what appeals to me.

      FOr instance, I found an unseeded torrent of 600 French epubs. Imagine being able to do something as simple as sorting them by lexical complexity–that is do a unique word count and rank from lowest unique word count to most unique word count. Trivially simple to do and would yield me books that are constantly in my range of proximal learning.

      But, yes, thank you for the suggestion! I’ll look into some readers, depending if I feel more lazy than broke.

  • emb@lemmy.worldM
    link
    fedilink
    arrow-up
    3
    ·
    8 days ago

    JPDB.io does something like this for Japanese. Not sure you can really import books, but it basically combines some kind of parser in with a dictionary API, example sentence corpus, and its own spaced repetition system.

    Gotta be something along the line out there for most languages, but I can’t say I know of the tools. Honestly, the breaking-down-into-a base-word part of it is probably in the dictionary’s domain. If you give it a conjugated verb it should usually be able to tell. But then some ambiguities need context, not sure how to account for that.

    AnkiConnect lets you tap into the Anki APIs, Wiktionary or (from a quick search) Collins should have a dictionary API available for French-English. If the dictionary APIs are good then you could probably get pretty far with basic sentence parsing.

    But yeah, feels like there’s gotta be something ready made for it, wish I knew and could point you in a direction.

    • schipelblorp@sh.itjust.worksOP
      link
      fedilink
      arrow-up
      2
      ·
      8 days ago

      I’ve only done enough programming to know this is very possible. A word count is probably all I’d need to do this manualy. Just wondering if this is one of those things I do instead of learning, so the less time I spend on it, the better I’ll feel.

  • bluGill@fedia.io
    link
    fedilink
    arrow-up
    1
    ·
    8 days ago

    If you need the vocab first then it is too advanced. Pick easier works to read. As a beginner there is no option but it shouldn’t take too long before you can find something you can understand without looking up words.