SDF Chatter
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
RA2lover@burggit.moe to Chess@burggit.moeEnglish · 2 years ago

Debunking the Chessboard: Confronting GPTs Against Chess Engines to Estimate Elo Ratings and Assess Legal Move Abilities

blog.mathieuacher.com

external-link
message-square
1
fedilink
4
external-link

Debunking the Chessboard: Confronting GPTs Against Chess Engines to Estimate Elo Ratings and Assess Legal Move Abilities

blog.mathieuacher.com

RA2lover@burggit.moe to Chess@burggit.moeEnglish · 2 years ago
message-square
1
fedilink
Can GPTs like ChatGPT-4 play legal moves and finish chess games? What is the actual Elo rating of GPTs? There have been some hypes, (subjective) assessment, and buzz lately from “GPT is capable of beating 99% of players?” to “GPT plays lots of illegal moves” to “here is a magic prompt with Magnus Carlsen in the headers”. There are more or less solid anecdotes here and there, with counter-examples showing impressive failures or magnified stories on how GPTs can play chess well. I’ve resisted for a long time, but I’ve decided to do it seriously! I have synthesized hundreds of games with different variants of GPT, different prompt strategies, against different chess engines (with various skills). This post is here to document the variability space of experiments I have explored so far… and the underlying insights and results. The tldr; is that gpt-3.5-turbo-instruct operates around 1750 Elo and is capable of playing end-to-end legal moves, even with black pieces or when the game starts with strange openings. However, though there are “avoidable” errors, the issue of generating illegal moves is still present in 16% of the games. Furthermore, ChatGPT-3.5-turbo and more surprisingly ChatGPT-4, however, are much more brittle. Hence, we provide first solid evidence that training for chat makes GPT worse on a well-defined problem (chess). Please do not stop to the tldr; and read the entire blog posts: there are subtleties and findings worth discussing!
  • Mousepad@burggit.moeM
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Very interesting and in-depth write up. Thanks for sharing!

Chess@burggit.moe

chess@burggit.moe

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !chess@burggit.moe

A community for everything chess related! Organize tournaments, look for games, post puzzles, and more.


Rules

  1. Site-wide rules
  2. On topic: Top level posts must include topics relating to chess or chess news.
  3. NSFW content is allowed, but pornography is not. All NSFW posts must be flared. If not, the post will be removed until it is. Rule 1 still applies.
  4. Hate speech is strictly prohibited when used maliciously. Use common sense! None of the usual *phobia suspects will be permitted.
  5. Looking for game (LFG) posts must be contained in the month’s megathread.
  6. Use of chess engines in games between members is strictly disallowed unless otherwise agreed upon.

Resources

Lichess is the preferred website for online chess for this community as it is the most popular free & open-source website. Another option is chess.com, which is freemium.

Guides to understand algebraic chess notation.

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 1 user / day
  • 1 user / week
  • 1 user / month
  • 1 user / 6 months
  • 2 local subscribers
  • 18 subscribers
  • 44 Posts
  • 10 Comments
  • Modlog
  • mods:
  • Mousepad@burggit.moe
  • BE: 0.19.8
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org