Four months ago, we asked Are LLMs making Stack Overflow irrelevant? Data at the time suggested that the answer is likely “yes:”

  • Natanael@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 day ago

    You’ll never be able to capture every source of questions that humans might have in LLM training data.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      24 hours ago

      That’s the neat thing, you don’t.

      LLM training is primarily about getting the LLM to understand concepts. When you need it to be factual, or are working with it to solve novel problems, you can put a bunch of relevant information into the LLM’s context and it can use that even if it wasn’t explicitly trained on it. It’s called RAG, retrieval-augmented generation. Most of the general-purpose LLMs on the net these days do that, when you ask Copilot or Gemini about stuff it’ll often have footnotes in the response that point to the stuff that it searched up in the background and used as context.

      So for a future Stack Overflow LLM replacement, I’d expect the LLM to be backed up by being able to search through relevant documentation and source code.

      • Natanael@infosec.pub
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        17 hours ago

        Even then the summarizer often fails or bring up the wrong thing 🤷

        You’ll still have trouble comparing changes if it needs to look at multiple versions, etc. Especially parsing changelogs and comparing that to specific version numbers, etc

        • FaceDeer@fedia.io
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          15 hours ago

          How does this play out when you hold a human contributor to the same standards? They also often fail to summarize information accurately or bring up the wrong thing. Lots of answers on Stack Overflow are just plain wrong, or focus on the wrong thing, or don’t reference the correct sources (when they reference anything at all). The most common criticism of Stack Overflow I’m seeing is how its human contributors direct people to other threads and declare that the question is “already answered” there when it isn’t really.

          LLMs can do a decent job. And right now they are as bad as they’re ever going to be.

          • Natanael@infosec.pub
            link
            fedilink
            English
            arrow-up
            2
            ·
            13 hours ago

            Well trained humans are still more consistent and more predictable and easier to teach.

            There’s no guarantee LLM will get reliably better at everything. It still makes some mistakes today that it did when introduced and nobody knows how to fix that yet

            • FaceDeer@fedia.io
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              12 hours ago

              You’re still setting a high standard here. What counts as a “well trained” human and how many SO commenters count as that? Also “easier to teach” is complicated. It takes decades for a human to become well trained, an LLM can be trained in weeks. And an individual computer that’ll be running the LLM is “trained” in minutes, it just needs to load the model into memory. Once you have an LLM you can run as many instances of it as you want to spend money on.

              There’s no guarantee LLM will get reliably better at everything

              Never said they would. I said they’re as bad as they’re ever going to be, which allows for the possibility that they don’t get any better.

              Even if they don’t, though, they’re still good enough to have killed Stack Overflow.

              It still makes some mistakes today that it did when introduced and nobody knows how to fix that yet

              And humans also make mistakes. Do we know how to fix that yet?

              • Natanael@infosec.pub
                link
                fedilink
                English
                arrow-up
                2
                ·
                9 hours ago

                Getting humans to do their work reliably is a whole science and lots of fields can achieve it