o h@s^ddlZddlmZddlmZmZmZddlmZeZ dZ ddZ Gdd d ej j ZdS) N)process_vision_info) AutoProcessorQwen2VLForConditionalGeneration"Qwen2_5_VLForConditionalGeneration) ToPILImagea"Given a user prompt, generate an "Enhanced prompt" that provides detailed visual descriptions suitable for image generation. Evaluate the level of detail in the user prompt: - If the prompt is simple, focus on adding specifics about colors, shapes, sizes, textures, and spatial relationships to create vivid and concrete scenes. - If the prompt is already detailed, refine and enhance the existing details slightly without overcomplicating. Here are examples of how to transform or refine prompts: - User Prompt: A cat sleeping -> Enhanced: A small, fluffy white cat curled up in a round shape, sleeping peacefully on a warm sunny windowsill, surrounded by pots of blooming red flowers. - User Prompt: A busy city street -> Enhanced: A bustling city street scene at dusk, featuring glowing street lamps, a diverse crowd of people in colorful clothing, and a double-decker bus passing by towering glass skyscrapers. Please generate only the enhanced description for the prompt below and avoid including any additional commentary or evaluations: User Prompt:cCs|dddd}g}d}d}t|D]2\}}|dkr1|dkr1||7}|s-||d}| }q|rB|r8 |d|dq||7}q|rN|||S)N“"”Freplace enumerateappendisspacesresultZ in_quotestempidxcharr&/data/code/test/modules/conditioner.py split_strings(   rcs.eZdZdejdffdd ZddZZS)Qwen25VL_7b_Embeddericudacsftt|||_||_||_tj||ddt j |_ |j dtj|ddd|_t|_dS)Neager) torch_dtypeZattn_implementationFii@)Z min_pixelsZ max_pixels)superr__init__ max_lengthdtypedevicerfrom_pretrainedtotorchrcurrent_devicemodelrequires_grad_r processorQwen25VL_7b_PREFIXprefix)self model_pathr r!r" __class__rrrCs    zQwen25VL_7b_Embedder.__init__c Cs|}tjt||j|jjjtjtj d}tjt||j|jjjtjtj d}tjt||jtj tj d}g}g}g} dd} t t ||D]8\} \} } dgdg}|dd d|jd |dd d t| d |dd d| d |jj|d d d d}t|\}}|j|g|d dd}|j}| |}g}|D]4}|j|ddd dd}|j}|dddkr|dddkr|ddddf}| |q| |qtj|ddd}||j}|dkjd ddd}|dkjd ddd}tj|dd|f|d|dfgdddd|_|jdk d|_|j|j|j|jd|jdd d}|dd}|dddfd|j|| dt|j|jddf<tjt|j|jddtj tj d|| dt|j|jddf<qK||fS)N)r!r"cSs|dddddd}g}d}d}t|D]2\}}|dkr5|dkr5||7}|s1||d}| }q|rF|r< |d|dq||7}q|rR|||S)Nrrr 'Fr r r rrrrrps(   z2Qwen25VL_7b_Embedder.forward..split_stringuser)rolecontentrr3text)typer4image)r5r6FT)tokenizeZadd_generation_promptZ add_vision_idpt)r4imagespaddingreturn_tensors)r4r9Zvideosr:r;iiV)dimrieP)as_tuple) input_idsattention_mask pixel_valuesimage_grid_thwZoutput_hidden_states hidden_states)r%zeroslenr r'config hidden_sizebfloat16rr&longrziprr+to_pilr)Zapply_chat_templaterr@catr$r"nonzero unsqueezerArBrCminshapeones)r,caption ref_imagesZ text_listZembsrDmasksZinput_ids_listZattention_mask_listZemb_listrrtxtimgsmessagesr4Z image_inputsZ video_inputsinputsZold_inputs_idsZtext_split_listZ token_listZ text_eachZ txt_inputsZ token_eachZ new_txt_idsidx1idx2outputsZembrrrforwardVs      (   "&zQwen25VL_7b_Embedder.forward)__name__ __module__ __qualname__r%rJrr^ __classcell__rrr.rrBsr)r%Z qwen_vl_utilsr transformersrrrtorchvision.transformsrrMr*rnnModulerrrrrs   ,