• 0 Posts
  • 1 Comment
Joined 3 years ago
cake
Cake day: June 10th, 2023

help-circle
  • I’m running Qwen 3.6 35B A3B (the MoE model) on an 8GB Vram Nvidia GPU with 32 GB of ram, with tweaking (and Turboquant) I’ve got it up to 30-40 Tokens per second and a 260k Context. It’s very usable. I’ve seen people report success with Dual 3060 Cards, but you’re still talking $1000-1500 for that kind of setup even if you have parts of it already.