• At this point, the main bottlenecks in improving models are electricity, chips, and training data. For the first two, you just need to increase production of the two (although that tends to take years). Training data is a little harder to come by. Almost all books, articles, comments, and easily accessible studies have been used for pre-training by the large model creators, and the rate at which models need new information is faster than reputable information is being produced. Articles and comments are no longer good sources for training because they are probably AI generated[1]. However, human conversations are a near limitless source of training data, and there is no risk of model collapse from them. If you have a surveillance state that can record, transcribe, and vectorize all conversations held by individuals, you gain a massive leg up in the AI race.


    1. https://graphite.io/five-percent/more-articles-are-now-created-by-ai-than-humans ↩︎