Can't run in llama.cpp, wrong tensor shape

#1
by bartowski - opened

Opened a bug here since I saw the same issue with my own quants:

https://github.com/ggml-org/llama.cpp/issues/12376

converts and quantizes no problem, but fails to run.

llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected 5120, got 1024, 1, 1, 1

Hey @bartowski , Is this issue only for Q3's?

no it's for all sizes sadly!

BF16 also failed in the same way

I'll download Q8_0 to be extra sure, but I think it's safe to say it applies to all quants if it happens to BF16

Yup, Q8_0 breaks in the same way @amanrangapur

Yep can confirm! Interestingly HF is fine - I think GGUF isn't registering the K_norm size due to grouped query attention

I'm assuming llama.cpp assumed K norm and Q norm to be off the same shape maybe? Ie Q/K norm cannot be used with GQA but unsure

load_tensors: layer  64 assigned to device CUDA0, is_swa = 0
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    load_model: failed to load model, '/home/data1/protected/Downloads/OLMo-2-0325-32B-Instruct-Q4_K_S.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

I think I have got same problem.

πŸ₯² Failed to load the model

Failed to load model

error loading model: check_tensor_dims: tensor 'blk.0.attn_k_norm.weight' has wrong shape; expected  5120, got  1024,     1,     1,     1

Same here

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment