SuspiciousCarrot78@aussie.zone to Selfhosted@lemmy.worldEnglish · 2 days agoDo you host your own AI?message-squaremessage-square191linkfedilinkarrow-up1164arrow-down135file-text
arrow-up1129arrow-down1message-squareDo you host your own AI?SuspiciousCarrot78@aussie.zone to Selfhosted@lemmy.worldEnglish · 2 days agomessage-square191linkfedilinkfile-text
minus-squareSuspiciousCarrot78@aussie.zoneOPlinkfedilinkEnglisharrow-up11·1 day agoLlama.cpp or death!
minus-squaretristynalxander@mander.xyzlinkfedilinkEnglisharrow-up3·19 hours agoIt’s not that hard to use llama.cpp directly anyway. Why would I use a wrapper when I can just run a python script?
minus-squareBlackLaZoR@lemmy.worldlinkfedilinkEnglisharrow-up1·1 hour agoI use LMStudio, because it has quality of life improvements like nice GUI and huggingface search engine. Also they have Vulkan backend that at least on 7900XTX is ~10% faster than rocm (on LLama 3 8b Q4_0 it gets 115Tokens/s vs 105 on rocm)
minus-squarebrucethemoose@lemmy.worldlinkfedilinkEnglisharrow-up3·21 hours agoOr exllama! Vllm, sglang, Lorax. Koboldcpp, Aphrodite, text-generation-webui, LM Studio, powerinfer, ktransformers, mlc-LLM, really whatever floats your boat. Just not ollama, specifically.
Llama.cpp or death!
It’s not that hard to use
llama.cppdirectly anyway. Why would I use a wrapper when I can just run a python script?I use LMStudio, because it has quality of life improvements like nice GUI and huggingface search engine. Also they have Vulkan backend that at least on 7900XTX is ~10% faster than rocm (on LLama 3 8b Q4_0 it gets 115Tokens/s vs 105 on rocm)
Or exllama! Vllm, sglang, Lorax. Koboldcpp, Aphrodite, text-generation-webui, LM Studio, powerinfer, ktransformers, mlc-LLM, really whatever floats your boat. Just not ollama, specifically.