I was genuinely impressed by how easy this makes video search. I think embedding full frames with multimodal models is not the move at the moment, and this is the way. This might change in the future though! Video search is still very nascent and this is definitely an innovation
Inference
Inference19.8. klo 07.28
There's something really special about the schema that @grass developed for ClipTagger-12B. Once you start searching massive video datasets, using metadata filters for objects, production quality, logos, or actions becomes absolutely invaluable. The model we trained is great, but this was a real innovation that they came to us with.
2,34K