view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 3 days ago β’ 240
Shot categorizer Collection Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. β’ 3 items β’ Updated 8 days ago β’ 2
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. β’ 5 items β’ Updated 10 days ago β’ 63
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 11 days ago β’ 65
view article Article HuggingFace, IISc partner to supercharge model building on India's diverse languages 16 days ago β’ 14
Phi-4 Collection Phi-4 family of small language and multi-modal models. β’ 7 items β’ Updated 11 days ago β’ 109
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published 22 days ago β’ 129
view article Article PaliGemma 2 Mix - New Instruction Vision Language Models by Google 24 days ago β’ 65
view article Article ColPali: Efficient Document Retrieval with Vision Language Models π By manu β’ Jul 5, 2024 β’ 214
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 25 days ago β’ 93