llm course @ HSE and vk llm
A collection of SmolLM-135M models fine-tuned with DPO, PPO, and Reward Modeling to enhance human-like expressiveness
Daniil Tsesarev
tsessk
AI & ML interests
transformers)
Recent Activity
updated
a model
6 days ago
tsessk/llm-course-hw2-dpo
updated
a model
6 days ago
tsessk/llm-course-hw2-reward-model
updated
a model
6 days ago
tsessk/llm-course-hw2-ppo
Organizations
None yet
Collections
1
models
9

tsessk/llm-course-hw2-dpo
Text Generation
•
Updated
•
7

tsessk/llm-course-hw2-reward-model
Text Classification
•
Updated
•
34

tsessk/llm-course-hw2-ppo
Text Generation
•
Updated
•
12

tsessk/content
Text Classification
•
Updated
•
23

tsessk/llm-course-hw1
Updated
•
13

tsessk/SmolLM2-FT-ORPO
Text Generation
•
Updated
•
14

tsessk/SmolLM2-FT-DPO
Text Generation
•
Updated
•
10

tsessk/SmolLM2-FT-PyCodeZone
Text Generation
•
Updated
•
12

tsessk/SmolLM2-FT-MyDataset
Text Generation
•
Updated
•
8
datasets
None public yet