Error serving chat_templae and gpu memory uilization.

#1
by BYRIE - opened

When trying to serve chat template --serve. Run into cuda out of memory. tried to lower --gpu-memory-utilization from 0.9 to 0.7 but still get a cuda out of memory. Running on a a40 48gb VRAM.

Sign up or log in to comment