Can't run in llama.cpp, wrong tensor shape
Opened a bug here since I saw the same issue with my own quants:
https://github.com/ggml-org/llama.cpp/issues/12376
converts and quantizes no problem, but fails to run.
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Hey @bartowski , Is this issue only for Q3's?
no it's for all sizes sadly!
BF16 also failed in the same way
I'll download Q8_0 to be extra sure, but I think it's safe to say it applies to all quants if it happens to BF16
Yup, Q8_0 breaks in the same way @amanrangapur
Yep can confirm! Interestingly HF is fine - I think GGUF isn't registering the K_norm size due to grouped query attention
I'm assuming llama.cpp assumed K norm and Q norm to be off the same shape maybe? Ie Q/K norm cannot be used with GQA but unsure
load_tensors: layer 64 assigned to device CUDA0, is_swa = 0
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv load_model: failed to load model, '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
I think I have got same problem.
π₯² Failed to load the model
Failed to load model
error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1
Same here