Bots are currently scraping the internet for LLM training data at unprecedented rates[1][2][3], driving up costs and destabilizing public-facing websites. I want to talk about how this has been particularly difficult for wikis, and has gotten much worse in the last few months.
I was looking into this today, trying to figure out how to make it work in a docker compose but had just a hell of a time sadly. I’ll take another crack at it some other day. Fingers crossed!