Error serving chat_templae and gpu memory uilization.

by BYRIE - opened Feb 26

Feb 26

When trying to serve chat template --serve. Run into cuda out of memory. tried to lower --gpu-memory-utilization from 0.9 to 0.7 but still get a cuda out of memory. Running on a a40 48gb VRAM.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment