ImportError: cannot import name 'Gemma3ForConditionalGeneration' from 'transformers'
$ pip freeze |grep transformers
DEPRECATION: Loading egg at /home/user/miniconda3/lib/python3.12/site-packages/sherpa_onnx-1.10.30+cuda-py3.12-linux-x86_64.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
curated-transformers==0.1.1
sentence-transformers==3.3.1
spacy-curated-transformers==0.3.0
transformers @ git+https://github.com/huggingface/transformers.git@6966fa190172b48b2fb46fe4552a13b943e692cf
$ python gemma3.py
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| api.gradio.app
|S-chain|-<>-127.0.0.1:9050-<><>-4.2.2.2:53-<><>-OK
|DNS-response| api.gradio.app is 34.223.133.184
|S-chain|-<>-127.0.0.1:9050-<><>-34.223.133.184:443-<><>-OK
Traceback (most recent call last):
File "/home/user/Downloads/gemma3.py", line 8, in
from transformers import AutoProcessor, Gemma3ForConditionalGeneration, TextIteratorStreamer
ImportError: cannot import name 'Gemma3ForConditionalGeneration' from 'transformers' (/home/user/miniconda3/lib/python3.12/site-packages/transformers/init.py)
Hi
@devops724
, Please try again by installing the latest transformers version '4.50.0.dev0' using the pip install git+https://github.com/huggingface/[email protected]
and let us know if the issue still persists. Thank you.
im getting the same error: (base) jovn@pop-os:/OpenManus/config$ pip install git+https://github.com/huggingface/[email protected]/OpenManus/config$ vllm serve "google/gemma-3-27b-it"
Collecting git+https://github.com/huggingface/[email protected]
Cloning https://github.com/huggingface/transformers (to revision v4.49.0-Gemma-3) to /tmp/pip-req-build-4ibj91bp
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-4ibj91bp
Running command git checkout -q 1c0f782fe5f983727ff245c4c1b3906f9b99eec2
Resolved https://github.com/huggingface/transformers to commit 1c0f782fe5f983727ff245c4c1b3906f9b99eec2
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: filelock in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (3.17.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.26.0 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (0.29.3)
Requirement already satisfied: numpy>=1.17 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (24.1)
Requirement already satisfied: pyyaml>=5.1 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (2024.9.11)
Requirement already satisfied: requests in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (2.32.3)
Requirement already satisfied: tokenizers<0.22,>=0.21 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (0.21.0)
Requirement already satisfied: safetensors>=0.4.1 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (0.5.3)
Requirement already satisfied: tqdm>=4.27 in /home/jovn/anaconda3/lib/python3.12/site-packages (from transformers==4.50.0.dev0) (4.66.5)
Requirement already satisfied: fsspec>=2023.5.0 in /home/jovn/anaconda3/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers==4.50.0.dev0) (2024.6.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/jovn/anaconda3/lib/python3.12/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers==4.50.0.dev0) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/jovn/anaconda3/lib/python3.12/site-packages (from requests->transformers==4.50.0.dev0) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/jovn/anaconda3/lib/python3.12/site-packages (from requests->transformers==4.50.0.dev0) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/jovn/anaconda3/lib/python3.12/site-packages (from requests->transformers==4.50.0.dev0) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/jovn/anaconda3/lib/python3.12/site-packages (from requests->transformers==4.50.0.dev0) (2024.8.30)
(base) jovn@pop-os:
INFO 03-12 11:53:22 init.py:207] Automatically detected platform cuda.
INFO 03-12 11:53:22 api_server.py:912] vLLM API server version 0.7.3
INFO 03-12 11:53:22 api_server.py:913] args: Namespace(subparser='serve', model_tag='google/gemma-3-27b-it', config='', host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=[''], allowed_methods=[''], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='google/gemma-3-27b-it', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, dispatch_function=<function ServeSubcommand.cmd at 0x7195a4ee9e40>)
INFO 03-12 11:53:22 api_server.py:209] Started engine process with PID 26363
INFO 03-12 11:53:23 config.py:2444] Downcasting torch.float32 to torch.float16.
INFO 03-12 11:53:25 init.py:207] Automatically detected platform cuda.
INFO 03-12 11:53:26 config.py:2444] Downcasting torch.float32 to torch.float16.
INFO 03-12 11:53:26 config.py:549] This model supports multiple tasks: {'generate', 'score', 'embed', 'classify', 'reward'}. Defaulting to 'generate'.
WARNING 03-12 11:53:26 arg_utils.py:1197] The model has a long context length (1048576). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 03-12 11:53:29 config.py:549] This model supports multiple tasks: {'generate', 'classify', 'embed', 'score', 'reward'}. Defaulting to 'generate'.
WARNING 03-12 11:53:29 arg_utils.py:1197] The model has a long context length (1048576). This may cause OOM errors during the initial memory profiling phase, or result in low performance due to small KV cache space. Consider setting --max-model-len to a smaller value.
INFO 03-12 11:53:29 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='google/gemma-3-27b-it', speculative_config=None, tokenizer='google/gemma-3-27b-it', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1048576, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=google/gemma-3-27b-it, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True,
INFO 03-12 11:53:31 cuda.py:229] Using Flash Attention backend.
INFO 03-12 11:53:31 model_runner.py:1110] Starting to load model google/gemma-3-27b-it...
WARNING 03-12 11:53:31 utils.py:78] Gemma3ForConditionalGeneration has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
INFO 03-12 11:53:31 transformers.py:129] Using Transformers backend.
ERROR 03-12 11:53:31 engine.py:400] 'Gemma3Config' object has no attribute 'vocab_size'
ERROR 03-12 11:53:31 engine.py:400] Traceback (most recent call last):
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
ERROR 03-12 11:53:31 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args
ERROR 03-12 11:53:31 engine.py:400] return cls(ipc_path=ipc_path,
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in init
ERROR 03-12 11:53:31 engine.py:400] self.engine = LLMEngine(*args, **kwargs)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 03-12 11:53:31 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 03-12 11:53:31 engine.py:400] self._init_executor()
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 03-12 11:53:31 engine.py:400] self.collective_rpc("load_model")
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-12 11:53:31 engine.py:400] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/utils.py", line 2196, in run_method
ERROR 03-12 11:53:31 engine.py:400] return func(*args, **kwargs)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 03-12 11:53:31 engine.py:400] self.model_runner.load_model()
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
ERROR 03-12 11:53:31 engine.py:400] self.model = get_model(vllm_config=self.vllm_config)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
ERROR 03-12 11:53:31 engine.py:400] return loader.load_model(vllm_config=vllm_config)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
ERROR 03-12 11:53:31 engine.py:400] model = _initialize_model(vllm_config=vllm_config)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
ERROR 03-12 11:53:31 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/transformers.py", line 135, in init
ERROR 03-12 11:53:31 engine.py:400] self.vocab_size = config.vocab_size
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] File "/home/jovn/anaconda3/lib/python3.12/site-packages/transformers/configuration_utils.py", line 214, in getattribute
ERROR 03-12 11:53:31 engine.py:400] return super().getattribute(key)
ERROR 03-12 11:53:31 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 11:53:31 engine.py:400] AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/jovn/anaconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/jovn/anaconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
raise e
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in init
self._init_executor()
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
self.collective_rpc("load_model")
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/utils.py", line 2196, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
self.model = get_model(vllm_config=self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
model = _initialize_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/model_executor/models/transformers.py", line 135, in init
self.vocab_size = config.vocab_size
^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/transformers/configuration_utils.py", line 214, in getattribute
return super().getattribute(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Gemma3Config' object has no attribute 'vocab_size'
[rank0]:[W312 11:53:32.543501591 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Traceback (most recent call last):
File "/home/jovn/anaconda3/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
args.dispatch_function(args)
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
uvloop.run(run_server(args))
File "/home/jovn/anaconda3/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/jovn/anaconda3/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server
async with build_async_engine_client(args) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/jovn/anaconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
this working for me
$ pip freeze |grep transformers
transformers @ git+https://github.com/huggingface/transformers@46350f5eae87ac1d168ddfdc57a0b39b64b9a029