cao's picture

1 1

cao

tingcao

·

AI & ML interests

None yet

Organizations

None yet

authored 9 papers 12 months ago

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Paper • 2409.17066 • Published Sep 25, 2024 • 28

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Paper • 2303.08308 • Published Mar 15, 2023 • 1

ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices

Paper • 2303.09730 • Published Mar 17, 2023 • 1

Accelerating In-Browser Deep Learning Inference on Diverse Edge Clients through Just-in-Time Kernel Optimizations

Paper • 2309.08978 • Published Sep 16, 2023

AFPQ: Asymmetric Floating Point Quantization for LLMs

Paper • 2311.01792 • Published Nov 3, 2023

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Paper • 2306.14393 • Published Jun 26, 2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Paper • 2308.12066 • Published Aug 23, 2023 • 4

BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

Paper • 2402.10631 • Published Feb 16, 2024 • 2

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

Paper • 2407.00088 • Published Jun 25, 2024 • 12