Skip to content

[Bug] RuntimeError: Gemma3nE2B Vision fine-tuning (permute 5D vs 4D) with FastVisionModel #3046

@anup2122

Description

@anup2122
  1. Did you update? pip install --upgrade unsloth unsloth_zoo
    yes

  2. Colab or Kaggle or local / cloud
    local

  3. Number GPUs used, use nvidia-smi
    One NVIDIA GeForce RTX 3090

  4. Which notebook? Please link!
    attached the code here

  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
    torch==2.7.1
    torchvision==0.22.1
    transformers==4.54.0
    unsloth==2025.7.8
    unsloth_zoo==2025.7.10

  6. Which trainer? SFTTrainer, GRPOTrainer etc```pythonPut Minimal code to reproduce error here ###Remove Hugging Face token###``

ft_v1.txt

Description

I am trying to fine-tune Gemma 3n E2B for regression on image+text data using FastVisionModel.

Steps to Reproduce:

just run the code above

Logs:

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.1 with CUDA 1201 (you have 2.7.1+cu126)
Python 3.11.10 (you have 3.11.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
🦥 Unsloth Zoo will now patch everything to make training faster!
GPU = NVIDIA GeForce RTX 3090. Max memory = 23.684 GB.
0.0 GB of memory reserved.
==((====))== Unsloth 2025.7.8: Fast Gemma3N patching. Transformers: 4.54.0.
\ /| NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 23.684 GB. Platform: Linux.
O^O/ _/ \ Torch: 2.7.1+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.3.1
\ / Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
"--" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Gemma3N does not support SDPA - switching to eager!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.64s/it]
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\ /| Num examples = 10 | Num Epochs = 3 | Total steps = 9
O^O/ _/ \ Batch size per device = 2 | Gradient accumulation steps = 2
\ / Data Parallel GPUs = 1 | Total batch size (2 x 2 x 1) = 4
"-
-" Trainable parameters = 2,189,313 of 4,509,705,665 (0.05% trained)
0%| | 0/9 [00:00<?, ?it/s]Unsloth: Not an error, but GemmaRegressionModel does not accept num_items_in_batch.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient
use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False.
Traceback (most recent call last):
File "/local/home/anupd/work/ft/ft_v1.py", line 139, in
trainer.train()
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/trainer.py", line 2237, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "", line 320, in _fast_inner_training_loop
File "", line 34, in _unsloth_training_step
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/unsloth/models/_utils.py", line 1135, in _unsloth_pre_compute_loss
outputs = self._old_compute_loss(model, inputs, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/trainer.py", line 3879, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/work/ft/ft_v1.py", line 78, in forward
outputs = self.base(image=image, input_ids=input_ids, attention_mask=attention_mask, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/peft/peft_model.py", line 1647, in forward
return self.base_model(
^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/peft/tuners/tuners_utils.py", line 216, in forward
return self.model.forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/work/ft/unsloth_compiled_cache/unsloth_compiled_module_gemma3n.py", line 1695, in forward
return Gemma3nForConditionalGeneration_forward(self, input_ids, pixel_values, input_features, attention_mask, input_features_mask, position_ids, past_key_values, token_type_ids, cache_position, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, logits_to_keep, **lm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/utils/generic.py", line 961, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/work/ft/unsloth_compiled_cache/unsloth_compiled_module_gemma3n.py", line 1499, in Gemma3nForConditionalGeneration_forward
outputs = self.model(
^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/utils/generic.py", line 961, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/models/gemma3n/modeling_gemma3n.py", line 2117, in forward
outputs = self.language_model(
^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/utils/generic.py", line 961, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/models/gemma3n/modeling_gemma3n.py", line 1677, in forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/modeling_layers.py", line 93, in call
return self._gradient_checkpointing_func(partial(super().call, **kwargs), *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/_compile.py", line 51, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 488, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/autograd/function.py", line 575, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 263, in forward
outputs = run_function(*args)
^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/miniconda3/envs/unsloth_gemma3n_cu125/lib/python3.11/site-packages/transformers/models/gemma3n/modeling_gemma3n.py", line 1426, in forward
predictions = self.altup.predict(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local/home/anupd/work/ft/unsloth_compiled_cache/unsloth_compiled_module_gemma3n.py", line 822, in predict
.permute(0, 1, 3, 2)
^^^^^^^^^^^^^^^^^^^
RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 5 is not equal to len(dims) = 4
0%| | 0/9 [00:01<?, ?it/s]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions