Philosophy (MA) Psychology (BA)

Pro-AI Vegetarian Misanthrope Cynic Agnostic/Atheist

It is precisely because the users of social media are so so awful that we use it.

  • 1 Post
  • 49 Comments
Joined 25 days ago
cake
Cake day: May 10th, 2026

help-circle






  • Right…

    So, if I ask it a thousand random questions, a thousand times, and 99.9% of the time it gives the right answer: is that sufficient scientific evidence that it’s shit hot?

    How exactly would your experiment work? How much data are you going to need to, personally, accept that it’s actually - already - pretty reliable and only very rarely gives misleading answers in very niche areas, or when people deliberately try to confuse it?

    The thing is: this social media platform (like Reddit) has a massively anti-AI bias, and yet not a single person has been able to give me a single question that I can enter into the latest GPT model where GPT has produced inaccurate results.

    Perhaps this criticism was more relevant a few years ago. As it stands it’s an argument that doesn’t seem to be readily supportable.

    Can you prove to me, using “the scientific method” that the latest model of ChatGPT has a significant chance of producing misleading information?

    And for the critics…I’m not entirely sure what they think is currently a better source of answers to questions? Random Google results where the page has paid to be at the top? Social media? Or combing through Google Scholar and finding…whichever “scientific paper” supports…whatever one wants to believe…(We saw how well that went in the pandemic).


  • Ok…so prove how shit it is by giving me a single question that I can ask it right now where it will give me the wrong information.

    Your point is entirely reductionist: it’s a “glorified search engine” that processes all of the search results with a breadth and speed most people either cannot or will not (due to time constraints) do, in seconds, and then gives an answer that is superior to any of the individual sources it bases that answer on.

    It also expresses that answer in highly adaptable ways: e.g. it can express the final answer in a way suitable for a child or a postgraduate expert, or a person who learns best through answers written in a style that combines Norse Sagas and Shakespeare.







  • By definition that would be a rather difficult thing to falsify. I tend to find the answers in my own academic fields (philosophy and psychology) to be accurate and impressive.

    Admittedly that rarely requires asking for direct passages from texts: usually more the refinement of basic principles, application of such principles, or creating new theories by combining previous ones.

    When I ask for philosophical quotations, it hasn’t produced an error I’ve observed yet: presumably it often combs the many quotation websites for such things.

    Can you give me an example of a question I can ask it where it will misquote and/or rewrite something?

    I am not an expert in Magic The Gathering but perhaps I can somehow see what you mean.





  • The article states: “ChatGPT-4o performed best with 84.6% validity”

    It is reasonable to assume that the GPT 5.5 on thinking mode has significantly reduced the error rate.

    It is also worth noting that the error rate when it comes to diagnosis amongst real doctors is estimated to be around 5%

    Admittedly a quite old study: Singh, H., Meyer, A. N. D., & Thomas, E. J. (2014). The frequency of diagnostic errors in outpatient care: Estimations from three large observational studies involving US adult populations. BMJ Quality & Safety, 23(9), 727–731. https://doi.org/10.1136/bmjqs-2013-002627⁠�

    In response to your point: I am mainly interested in probabilistic reliability - if it gives the correct answer 99.9% of the time, it is clearly superior to the vast majority of human beings (with, perhaps, the exception of the best specialists in the most obscure niches) - especially given the sheer breadth of topics is can reliability answer questions on.

    Interestingly, my question “What was India like before the British arrived?” produces consistently biased and misleading answers. Though I haven’t asked it for the new model.



  • I asked it: paraphrase a poem about glass testicles by Carol Anne Duffy.

    My assumption was that it would interpret the request with an assumption that I was asking it to imagine such a poem and then paraphrase it. Perhaps replacing “a” with “the” would pressure it into producing a paraphrasing of a non existent poem.

    The response is interesting, but not necessarily misleading or factually incorrect:

    "I can’t paraphrase a full Carol Ann Duffy poem, because that would be a close transformation of copyrighted poetry. But I can help in one of these ways:

    Give me the title, and I can provide a brief summary of what the poem is saying, its themes, imagery, tone, and meaning.

    Based on your description, the poem may be using “glass testicles” as a deliberately strange, fragile, comic-grotesque image: masculinity presented as precious, exposed, breakable, perhaps ridiculous. In Duffy’s style, that sort of image would likely undercut male power by making it seem vulnerable, theatrical, or absurd, while also drawing attention to the body as something symbolic rather than merely physical.

    A safe paraphrase of the idea, rather than the poem, would be:

    The poem presents masculinity as something fragile and strangely ornamental. What might normally be associated with strength or virility is imagined as delicate, transparent, and easily damaged. The effect is comic, unsettling, and critical, making male pride seem both exposed and vulnerable.

    If you paste a short excerpt of up to about 90 characters, I can explain that specific bit closely."