Aritra Roy Gosthipaty's picture

Aritra Roy Gosthipaty PRO

ariG23498

·

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

upvoted an article 2 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

upvoted an article 2 days ago

Open R1: Update #3

published an article 2 days ago

Benchmarking Assisted Generation with Gemma 3 and Qwen 2.5: A Code-First Guide

View all activity

Organizations

ariG23498's activity

upvoted 2 articles 2 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

3 days ago

• 240

Article

Open R1: Update #3

By

and 9 others •

3 days ago

• 213

upvoted 2 collections 2 days ago

Gemma 3

4 items • Updated 2 days ago • 14

Gemma 3 Release

9 items • Updated about 19 hours ago • 234

upvoted a collection 8 days ago

Shot categorizer

Fine-tune of Florence-2 to generate shot categories, useful for data curation. Code: https://github.com/huggingface/movie-shot-categorizer. • 3 items • Updated 8 days ago • 2

upvoted a collection 10 days ago

C4AI Aya Vision

Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 10 days ago • 63

upvoted an article 10 days ago

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

11 days ago

• 65

upvoted an article 14 days ago

Article

Common AI Model Formats

By

•

15 days ago

• 30

upvoted 2 articles 15 days ago

Article

SigLIP 2: A better multilingual vision language encoder

22 days ago

• 134

Article

HuggingFace, IISc partner to supercharge model building on India's diverse languages

16 days ago

• 14

upvoted a paper 15 days ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 111

upvoted a collection 16 days ago

Phi-4

Phi-4 family of small language and multi-modal models. • 7 items • Updated 11 days ago • 109

upvoted an article 18 days ago

Article

Remote VAEs for decoding with HF endpoints 🤗

19 days ago

• 36

upvoted a collection 20 days ago

SigLIP 2

OpenCLIP and timm SigLIP 2 models • 45 items • Updated 21 days ago • 11

upvoted a collection 21 days ago

SigLIP2

36 items • Updated 2 days ago • 62

upvoted a paper 21 days ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published 22 days ago • 129

upvoted an article 22 days ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

23 days ago

• 205

upvoted 2 articles 23 days ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

24 days ago

• 65

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

By

•

Jul 5, 2024

• 214

upvoted an article 24 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

25 days ago

• 93