LLMs Can Think While Idle: Researchers from Letta and UC Berkeley Introduce ‘Sleep-Time Compute’ to Slash Inference Costs and Boost Accuracy Without Sacrificing Latency

cm0002@lemmy.world · 2 months ago

vrighter@discuss.tchncs.de · 2 months ago

slash inference costs by doing a bunch of useless inferences in the hope that the one the user actually wanted happened to be one of them.

It cannot be more efficient than just waiting for the input and inferring once based on that.

jwmgregory@lemmy.dbzer0.com · 1 month ago

i mean…

your brain essentially does this it’s just that compute and memory are one system and it is as physically optimized as possible in brain systems.

this strategy is less stupid than it sounds if you abandon von neumann purism imo

it can be more efficient than just waiting for the input and inferring once based on that… you are an example of this in real life.

vrighter@discuss.tchncs.de · 1 month ago

faster != more efficient. And you cannot compare brains to computers. Speculative execution improves speed in the cpu, at the cost of efficiency