not supported in Turing GPU Architecture ?
#3
by
Durgaram
- opened
Turing architecture GPUs are not equipped with native bfloat16 (BF16) support, and their FP16 capabilities are limited—particularly when it comes to efficient half-precision model loading and inference, which is better optimized in Ampere and later architectures. so we can use this weight only in ampere gpus