SNUH-HARI/DeepSeek-llama3.1-HARI-8B

Model Description

SNUH-HARI/DeepSeek-llama3.1-HARI-8B is a fine-tuned version of DeepSeek-llama3.1-Blossom with 8 billion parameters, optimized for healthcare applications. Developed by Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH), this model integrates medical open dataset (including synthesized data) and pseudonymized clinical notes to enhance patient safety and responsible AI in medicine.

Architecture: Transformer-based large language model (LLM)
Languages: English, Korean
Primary Domains: Healthcare, General NLP
Use Cases: Medical question answering, clinical decision support, patient safety applications

Training Details

Base Model: DeepSeek-llama3.1
Fine-Tuned Datasets:
- SNUH pseudonymized clinical notes for real-world medical knowledge
- MedicalLawQA (curated from Korea Legislation Research Institute data using GPT-4o-mini)
- Medical reasoning dataset from FreedomIntelligence/medical-o1-reasoning-SFT
Optimization: Mixed precision (FP16) for efficiency
Compute Resources: High-performance GPUs (e.g., NVIDIA H100 clusters)

Intended Use

This model is designed for research, healthcare AI, and legal AI applications. It is particularly suitable for:

Medical question answering
Clinical decision-making support
Healthcare policy and compliance

Limitations & Ethical Considerations

Not a replacement for medical professionals: Outputs should be validated by experts.
Potential biases: Legal and medical knowledge are jurisdiction-specific; users should verify regional applicability.
Privacy compliance: No personally identifiable information was used in training.

Evaluation & Benchmarks

This model was evaluated using 100 medical law-related QA pairs from the KMLE (Korean Medical Licensing Exam) 2019–2023 dataset.

Model	Accuracy (%)
DeepSeek-llama3.1-Bllossom-8B	34
DeepSeek-llama3.1-HARI-8B (ours)	TBD

How to Use

You can use the model via Hugging Face Transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "SNUH-HARI/DeepSeek-llama3.1-HARI-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "What are the legal requirements for prescribing narcotics in South Korea?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

output = model.generate(input_ids, max_length=1024)
print(tokenizer.decode(output[0], skip_special_tokens=True))

License

This model is released under the MIT License.

Citation

If you use this model in your research, please cite:

@misc{SNUH-HARI-DeepSeek-llama3.1-HARI-8B,
  title={SNUH-HARI/DeepSeek-llama3.1-HARI-8B},
  author={Hyeonhoon Lee ([email protected])},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co./Seoul National University Hospital (SNUH)-HARI/DeepSeek-llama3.1-HARI-8B}
}

SNUH-HARI
/

DeepSeek-llama3.1-HARI-8B