• ejs@piefed.social
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    4
    ·
    20 hours ago

    Language modeling is equivalent to a dice roll (given a perfect random number generator). Setting the temperature to 0 removes all randomness from the output, meaning the model always selects the highest probability next word, and the model becomes 100% deterministic. That is, the output of a model is entirely predictable given temperature = 0, you know the model weights, and the seed/prompt.

    These technicalities aside, it’s true for both a dice roll event and a specific model/prompt event that, practically speaking, the outputs are treated as probabilistic despite being mathematically/technically deterministic: a human can’t predict with 100% accuracy the output of a die despite the theory (classical mechanics of die positioning, force, velocity, friction, …) proving determinism

    • YoureHotCupCake@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      18 hours ago

      You know the boundaries of a dice roll based one the number of sides of the dice. You will never know the boundaries of the AI.

      With a D20 I know I have a 5% chance of at a roll of 20 and 50% chance to roll over 10. With AI you don’t even know if the data it was trained on was even accurate or if it will hallucinate and speak nonsense.

      You can literally ask an AI how many letter Ts are in the word colonialism and it will tell you two. Now how on earth could anyone have known its probability to say that clearly wrong answer?

      Also each AI session its response to your prompts contribute to the context of the session and small alterations in how the AI speaks build up and change the outcome of a session, thus the AIs own responses effect its probabilities, another thing you cannot account for.

      • ejs@piefed.social
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        16 hours ago

        Yes, you do know the boundaries of AI. It is purely matrix multiplication: its output distribution is just as intelligible as the distribution of rolls of a dice. We receive a probability distribution for the next token given a sequence of tokens. This is demonstrable; search for softmax online.

        To fairly equate a dice roll event to a model prompt event we must understand the technicalities. To say you have a 20 sided die, is equivalent to saying you have a specific model’s architecture and value of every parameter, in the context of qualifying event determinism.

        If you can assume your die is fair, and 20 sided, that is an equivalent assumption about a model as to saying it’s llama-3.1-8B-instruct. That is, you do know the specific model weights, corresponding to a functional relationship between input and output which is deterministic. That is, if you know the model weights, which is equivalent to knowing whether a die is fair and n-sided, you can deterministically predict the output of a model as you can deterministically predict which number on a die will land

        You’re making specific, technical errors about the mathematical basis of language modeling, and equating things fallaciously to a similar deterministic event.

        Despite this, your intuition is right: we can’t perceptually predict the output of a model as we can’t perceptually predict what number will result from a die roll