> based on SigLIP2 & Command-A
> built for enterprise use cases π₯
> use with Inference Providers or transformers π€
read their blog https://huggingface.co/blog/CohereLabs/introducing-command-a-vision-07-2025
Yeah it's 112 for PCIe V100 and 125 for the SXM I think. One thing on the MI100 and other MIxx chip specs I was never clear on, if their float16 'matrix' numbers are matrix mul float16 w/ float32 accumulate (which is what you'd want). The datacenter NVIDIA chip 'tensor core' flops are usually float32 acc (unless it's a gamer card in which case that's halved).
The MI100 does have native bfloat16 which is a big win over V100.
I do feel though you are getting good TOPS/$ here because AMD hasn't been that successful in competing with NVIDIA on the full system offer (chips + driver/software). I've really really wanted this to change but AMD keeps frustrating... how do you find working with it so far in terms of issues / crashes / head banging? :) Hopefully things have been improving
FWIW, the MI100 was released after the A100, 3 years after the V100... that says something :) Also it's the matrix / tensor core mixed or reduced precision FLOPs that are of interest not the float32 FLOPS which are the 14 & 23 numbers..
transformers
in dedicated releases!v4.49.0-SmolVLM-2
and v4.49.0-SigLIP-2
.timm
use, but will work great with transformers
and other libs. Updated the base image, Python 3.12, Pillow-SIMD before better CPU use with image preprocessing, and made a number of other tweaks. From the Jupyter launcher you can run the terminal and setup a timm
environment in moments with setup_timm_dev
or setup_timm_scripts
helpers. Give it a try,
timm/jupyterlab-timm
timm
1.0.13 and OpenCLIP
2.30.0 releases to start the year. Both modest but worthwhile updates.timm
added a number of new model weights, supporting loading of: OpenCLIP
and timm
for two CLIP models that were missed. The DFN L/14 is π₯timm
remapping from OpenCLIP got their own timm hub instances to allow use with the upcoming Transformers TimmWrapperModel
Yeah, it's been working out well in runs so far, but as is often the case with new optimizers or optimizer enhancements milage can vary depending on many variables, curious to know how it works for your case. Case in point I had some great fine-tune results with adopt, but in this mini-imagenet case it rather flopped. But MARS, is actually doing really well here, and MARS w/ caution even better so it's very hard to cover all ground with new optimizers. MARS results to be added soon though
timm
release, v 1.0.12, with a focus on optimizers. The optimizer factory has been refactored, there's now a timm.optim.list_optimizers()
and new way to register optimizers and their attributes. As always you can use an timm
optimizer like a torch
one, just replace torch.optim
with timm.optim
adfactorbv
adopt
/ adoptw
(decoupled decay)mars
laprop
c
as well as cadamw
, cnadamw
, csgdw
, clamb
, crmsproptf