

There’s something to be said for its scripting accessibility, too. Hence the many fabulous VSCode extensions.


There’s something to be said for its scripting accessibility, too. Hence the many fabulous VSCode extensions.
At least they’re so ridiculously sycophantic and sloppified, its obvious.
Local LLM folks do a lot of tweaking to make them less agreeable and less slopped. But the vast majority of spammers are too stupid to seek that out.
I am guilty of this.
It turns out “self healing” is no match for attention optimization.


Ghostty has single-instance mode on Linux that shares some resources.
Oh I didn’t know this. I will have to try it sometime.
That’s fine for lightweight usage, but for workflows like mine that involve heavy TUIs and multiple tmux sessions with dozens of windows/panes with big scrollback buffers, it becomes a bottleneck when one or more panes are flooding output from scripts/playbooks/etc.
Yeah, for sure. Different use cases. Hence I can keep both installed, heh.


I am not the expert, but… Complexity?
Sometimes I use Foot instead of Alacritty/Wezterm to save RAM in extreme situations. Foot’s also really nice because it uses a server/client model (again, saving RAM with many terminals), though I don’t know if that’s fundamentally impossible with GPU terminals.
+1
For stuff like editing massive files or huge folders, the least stuttery, fastest IDE for me is… VScode. Jetbrains (last I tried it) is awful.
Code may not use 1MB of RAM or idle dead asleep, but it utilizes the CPU/GPU efficiently.
Now, extensions are the caveat, like any app that supports extensions. Those can bog it down real quick.


Yeah…
I think most non-PC-gamer consumers will just go to Android and iOS :(. It’s the simplest path.
Not sure about business. Sheer entrenchment aside, I’ve heard conflicting reports ranging from Windows management systems being so good they’re utterly unparalleled, to Windows systems breaking so much IT is getting frustrated.


It’s all C++ now, so it doesn’t really need docker! I don’t use docker for any ML stuff, just pip/uv venvs.
You might consider Arch (dockerless) ROCM soon; it looks like 7.1 is in the staging repo right now.


Oh, I forgot!
You should check out Lemonade:
https://github.com/lemonade-sdk/lemonade
It’s supports Ryzen NPUs via 2 different runtimes… though apparently not the 8000 series yet?


Yeah… Even if the LLM is RAM speed constrained, simply using another device to not to interrupt it would be good.
Honestly AMD’s software dev efforts are baffling. They’ve focused on a few on libraries precisely no-one uses, like this: https://github.com/amd/Quark
While ignoring issues holding back entire sectors (like broken flash-attention) with devs screaming about it at the top of their lungs.
Intel suffers from corporate Game of Thrones, but at least they have meaningful contributions in the open source space here, like the SYCL/AMX llama.cpp code or the OpenVINO efforts.


It still uses memory bandwidth, unfortunately. There’s no way around that, though NPU TTS would still be neat.
…Also, generally, STT responses can’t be streamed, so you mind as well use the iGPU anyway. TTS can be chunked I guess, but do the major implementations do that?


The IGP is more powerful than the NPU on these things anyway. The NPU us more for ‘background’ tasks, like Teams audio processing or whatever its used for on Windows.
Yeah, in hindsight, AMD should have tasked (and still should task) a few engineers on popular projects (and pushed NPU support harder), but GGML support is good these days. It’s gonna be pretty close to RAM speed-bound for text generation.


Ah. On an 8000 APU, to be blunt, you’re likely better off with Vulkan + whatever omni models GGML supports these days. Last I checked, TG is faster and prompt processing is close to rocm.
…And yeah, that was total misadvertisement on AMD’s part. They’ve completely diluted the term kinda like TV makers did with ‘HDR’


You can do hybrid inference of Qwen 30B omni for sure. Or fully offload inference of Vibevoice Large (9B). Or really a huge array of models.
…The limiting factor is free time, TBH. Just sifting through the sea of models, seeing if they work at all, testing if quantization works and such is a huge timesink, especially if you are trying to load stuff with rocm.


I mean, there are many. TTS and self-hosted automation are huge in the local LLM scene.
We even have open source “omni” models now, that can ingest and output speech tokens directly (which means they get more semantic understanding from tone and such, they ‘choose’ the tone to reply with, and that it’s streamable word-by-word). They support all sorts of tool calling.
…But they aren’t easy to run. It’s still in the realm of homelabs with at least an RTX 3060 + hacky python projects.
If you’re mad, you can self-host Longcat Omni
https://huggingface.co/meituan-longcat/LongCat-Flash-Omni
And blow Alexa out of the water with a MIT-licensed model from, I kid you not, a Chinese food delivery company.
EDIT
For the curious, see:
Audio-text-to-text (and sometimes TTS): https://huggingface.co/models?pipeline_tag=audio-text-to-text&num_parameters=min%3A6B&sort=modified
TTS: https://huggingface.co/models?pipeline_tag=text-to-speech&num_parameters=min%3A6B&sort=modified
“Anything-to-anything,” generally image/video/audio/text -> text/speech: https://huggingface.co/models?pipeline_tag=any-to-any&num_parameters=min%3A6B&sort=modified
Bigger than 6B to exclude toy/test models.


That’s what I was thinking of, thanks.


I thought there was a cutoff for pascal cards too?


Also, random thing, but I did not get a notification for your reply.
I don’t think that’s a piefed thing, as it happens a lot to me, even with other .world users.


Nuke everything and start over? I guess you could keep the home directoy, but TBH I’d back it up and nuke it too, just in case.
Or, as a shorter term solution, run it off a USB drive.
Yeah.
People harp on Onlyfans, but how sexualized and “softcore teasing” Insta and even TikTok are kinda creeps me out. They’re literally entry points to OF.
I have a parent who’s blissfully off of social media, and it was interesting to see their reaction to what they’re like now.