Microsoft

company

Verified

https://www.microsoft.com/en-us/research/

microsoft

AI & ML interests

None defined yet.

Recent Activity

Yif29 new activity about 11 hours ago

microsoft/LLM2CLIP-Llama3.1-8B-siglip2-so400m-patch14-224:Any rough estimate on when the model might be open-sourced?

nguyenbh new activity about 14 hours ago

microsoft/Phi-4-multimodal-instruct:How to use it with LM Studio?

nguyenbh new activity about 14 hours ago

microsoft/Phi-4-multimodal-instruct:Detected version 0.0.0. Error: FlashAttention2

View all activity

microsoft's activity

Yif29

in microsoft/LLM2CLIP-Llama3.1-8B-siglip2-so400m-patch14-224 about 11 hours ago

Any rough estimate on when the model might be open-sourced?

#1 opened about 14 hours ago by

nguyenbh

in microsoft/Phi-4-multimodal-instruct about 14 hours ago

How to use it with LM Studio?

#3 opened 15 days ago by

Detected version 0.0.0. Error: FlashAttention2

#43 opened 1 day ago by

Error during inference with image and text.

#12 opened 14 days ago by

I am quite late, but...

#27 opened 8 days ago by

fepegar

in microsoft/maira-2 1 day ago

Request: DOI

#15 opened 7 days ago by

freewym

in microsoft/Phi-4-multimodal-instruct 1 day ago

Does the model support beam search for ASR?

#31 opened 8 days ago by

gargamit

in microsoft/Phi-4-multimodal-instruct 1 day ago

fixes the asserion error when num_beams > 1

#42 opened 1 day ago by

freewym

in microsoft/Phi-4-multimodal-instruct 1 day ago

fixes the asserion error when num_beams > 1

#42 opened 1 day ago by

nguyenbh

in microsoft/Phi-4-multimodal-instruct 2 days ago

Demo inference code running error

#23 opened 11 days ago by

Can I use DeepSpeed with the vision fine-tuning code?

#35 opened 6 days ago by

phi

#41 opened 2 days ago by

nguyenbh

updated a model 2 days ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • Updated 1 day ago • 472k • 1.13k

nguyenbh

in microsoft/Phi-4-multimodal-instruct 2 days ago

Add Appendix B: Fine-tuning Korean speech

#40 opened 3 days ago by

Yif29

updated a collection 2 days ago

LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP modal more SOTA ever. • 11 items • Updated 2 days ago • 55

Yif29

published a model 2 days ago

microsoft/LLM2CLIP-Llama3.1-8B-siglip2-so400m-patch14-224

Updated 2 days ago • 2

xutan

authored 4 papers 2 days ago

CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models

Paper • 2410.13267 • Published Oct 17, 2024 • 1

Memories are One-to-Many Mapping Alleviators in Talking Face Generation

Paper • 2212.05005 • Published Dec 9, 2022

DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder

Paper • 2303.17550 • Published Mar 30, 2023

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25