We trained this model to flip the unit economics of frame captioning, labeling, and video search. Processing 1B frames used to cost on the order of millions but is now viable for teams that aren’t the largest companies. We see this unlocking petabyte+ scale video libraries, that previously were impossible to search, categorize, or extract clips from. We have already deployed this model at internet scale in partnership with @grass. If you have a use-case for this model shoot use a dm. We move extremely fast.
Inference
Inference15.8. klo 02.02
Introducing ClipTagger-12b. A state-of-the-art video annotation model trained in collaboration with @grass. ClipTagger-12b delivers video annotation capabilities on par with Claude 4 and GPT-4.1 at 17x lower cost. Learn more:
4,33K