When trying to serve chat template --serve. Run into cuda out of memory. tried to lower --gpu-memory-utilization from 0.9 to 0.7 but still get a cuda out of memory. Running on a a40 48gb VRAM.
· Sign up or log in to comment