tcsenpai

a day ago
I think they should start aiming for 20B models along with 32B and 7B. Usually 7B is enough for a 8GB GPU, 32B requires a 24GB GPU for decent quants (I can fit a 32B with IQ3_XXS but is not ideal) while 20-ish B models (such as magistral or gpt-oss) are a perfect fit for 16GB GPUs

embedding-shape

a day ago
Depends heavily on the architecture too, I think a free-for-all to find the better sizes is still kind of ongoing, and rightly so. GPT-OSS-120B for example fits in around 61GB VRAM for me when on MXFP4.

Personally, I hope GPU makers instead start adding more VRAM, or if one can dream, expandable VRAM.

refulgentis

a day ago
Unlikely to see more VRAM in the short term, memory prices are thru the roof :/ like, not subtly, 2-4x.

embedding-shape

a day ago
Well, GPUs are getting more VRAM, although it's pricey. But we didn't used to have 96GB VRAM GPUs at all, now they do exist :) But for the ones who can afford it, it is at least possible today. Slowly it increases.

refulgentis

21 hours ago
Agreed, in the limit, RAM go up. As billg knows, 128KB definitely wasn't enough for everyone :)

embedding-shape

21 hours ago
I'm already thinking 96GB might not be enough, and I've only had this GPU for 6 months or so :|