nerdyface (Nerdy Face)

not-lain

posted an update 2 days ago

Post

1129

🚀AraClip is now fully integrated with Hugging Face 🤗

AraClip is a specialized CLIP model that was created by @pain and optimized for Arabic text-image retrieval tasks🔥

🔗 Try it out 🔗
🤖 model: Arabic-Clip/araclip
🧩 Gradio demo: Arabic-Clip/Araclip-Simplified
🌐 website: https://arabic-clip.github.io/Arabic-CLIP/

2 replies

·

JingzeShi

posted an update 5 days ago

Post

4630

We distill a more accurate and concise dataset from DeepSeek R1, and also provide a distillation pipeline code repository.🤗

Dataset: SmallDoge/SmallThoughts
Code: https://github.com/SmallDoges/small-thoughts

Tonic

posted an update 7 days ago

Post

1020

🙋🏻‍♂️Hey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

👆🏻if you like this topic please contribute code upstream 🚀

2 replies

·

Tonic

posted an update 9 days ago

Post

649

Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg](https://huggingface.co./KRLabsOrg)
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.

stefan-it

posted an update 11 days ago

Post

838

🇹🇷 😍 I'm very happy to finally announce my new Turkish LM called "BERT5urk":

stefan-it/bert5urk

It is a 1.42B T5-based model, trained with UL2 pretraining objective on the Turkish part of the awesome HuggingFaceFW/fineweb-2 dataset.

Feel free to check it out!

1 reply

·

stefan-it

posted an update 15 days ago

Post

3113

After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

1 reply

·

kargaranamir

authored a paper 16 days ago

On Relation-Specific Neurons in Large Language Models

Paper • 2502.17355 • Published 18 days ago • 6

stefan-it

posted an update 18 days ago

Post

5069

She arrived 😍

[Expect more models soon...]

2 replies

·

JingzeShi

posted an update 21 days ago

Post

2944

🤗Welcome to the Doge Edge Device Small language Model.

SmallDoge/Doge-160M-Instruct

fuzzy-mittenz

posted an update 30 days ago

Post

675

So frustrated with "Reasoning" Models.
Sure, introducing RAG into the mix, or giving it an interpreter to math with helps, but never as much as a model that has good instructions.

Even if it's just to repeat the information before answering, a normal model will usually out "Think" it's reasoning counterpart.

Not sure if it's my frustrations but the best answers I've received (from a reasoner), so far, are from the simple instructions to, "Do better!"

Figured I would share the special sauce.

Using 10-100x Compute just to heat the office can't be environmentally friendly, and It still has no Idea where my keys are.

Tonic