Time to first token for GPU is wrong
#4
by
yuimo
- opened
سلام
Hi @yuimo ,
Apologies for the late reply, Your calculation of 1024 / 620 = 1.65s is based on the pre-fill tokens 1024) and the GPU pre-fill tokens/sec (620). While this calculation gives a time for pre-flling the 1024 tokens, it does not represent the Time to first token as reported in the benchmarks. Time to first token
often includes various overheads beyond just the pre-fill processing.
Thanks.