;f+ddlmZddlmZddlZddlZddlZddlZddlm Z ddl m Z dZ dZ dZd Zd Zd Zd Zd Ze dfdZdZGdde ZdS))Image)BytesION)StoppingCriteria)IMAGE_TOKEN_INDEXc*|\}}d}d}td}|D]w\}}t||z ||z } t|| zt|| z} } t| | z||z} ||z| z } | |ks | |kr| |kr| }| }||f}x|S)a Selects the best resolution from a list of possible resolutions based on the original size. Args: original_size (tuple): The original size of the image in the format (width, height). possible_resolutions (list): A list of possible resolutions in the format [(width1, height1), (width2, height2), ...]. Returns: tuple: The best fit resolution in the format (width, height). Nrinf)floatminint) original_sizepossible_resolutionsoriginal_widthoriginal_heightbest_fitmax_effective_resolutionmin_wasted_resolutionwidthheightscaledownscaled_widthdownscaled_heighteffective_resolutionwasted_resolutions S/opt/hpcaas/.mounts/fs-036153e63d56f4dc2/home/jiuhai/llama3-mlp3x/llava/mm_utils.pyselect_best_resolutionr s'4#NOH !%LL- ' ' vEN*F_,DEE.1.52H.I.I3afOfKgKg+"#36G#GZiIijj"V^/CC ": : :?SWo?o?ouFI^u^u^'; $$5 !vH Oc|j\}}|\}}||z }||z }||kr(|}ttj||z|} n'|} ttj||z|}||| f} t jd||fd} ||z dz} || z dz} | | | | f| S)a1 Resize and pad an image to a target resolution while maintaining aspect ratio. Args: image (PIL.Image.Image): The input image. target_resolution (tuple): The target resolution (width, height) of the image. Returns: PIL.Image.Image: The resized and padded image. RGB)rrr)sizer mathceilresizernewpaste)imagetarget_resolutionrr target_width target_heightscale_wscale_h new_width new_height resized_image new_imagepaste_xpaste_ys rresize_and_pad_imager2*s',j#NO"3L-^+Go-G ?W#<==}MM "  .7":;;\JJ LL)Z!899M %, !> JJIi'A-Gz)a/G OOMGW#5666 rcg}|j\}}td||D]L}td||D]8}||||z||zf}||}||9M|S)a Divides an image into patches of a specified size. Args: image (PIL.Image.Image): The input image. patch_size (int): The size of each patch. Returns: list: A list of PIL.Image.Image objects representing the patches. r)r rangecropappend) r& patch_sizepatchesrrijboxpatchs rdivide_to_patchesr=MsGJME6 1fj ) )""q%,, " "AaZZ8CJJsOOE NN5 ! ! ! ! " Nrct|tur|}ntj|}t ||\}}||z||zfS)a Calculate the shape of the image patch grid after the preprocessing for images of any resolution. Args: image_size (tuple): The size of the input image in the format (width, height). grid_pinpoints (str): A string representation of a list of possible resolutions. patch_size (int): The size of each image patch. Returns: tuple: The shape of the image patch grid in the format (width, height). )typelistast literal_evalr) image_sizegrid_pinpointsr7r rrs rget_anyres_image_grid_shaperEcsZ Nt##-"/??*:7KLLME6 J * 4 44rct|tur|}ntj|}t |j|}t ||}t|jd}| jdjdf}|g|z}fd|D}tj |dS)a_ Process an image with variable resolutions. Args: image (PIL.Image.Image): The input image to be processed. processor: The image processor object. grid_pinpoints (str): A string representation of a list of possible resolutions. Returns: torch.Tensor: A tensor containing the processed image patches. r shortest_edgecVg|]%}|ddd&S)ptreturn_tensors pixel_valuesr) preprocess).0 image_patch processors r z(process_anyres_image..sG777$))+d)KKN[\]^777rrdim) r?r@rArBrr r2r= crop_sizer#torchstack) r&rPrDr best_resolution image_paddedr8image_original_resize image_patchess ` rprocess_anyres_imager[ws Nt##-"/??,UZ9MNNO'??L i.A(.KLLG!LL).*I9>ZiKj)kll*+g5M7777(5777M ;}! , , ,,rchtjttj|S)N)ropenrbase64 b64decode)r&s rload_image_from_base64r`s% :gf.u5566 7 77rc&|j\}}||kr|S||kr=tj|j||f|}||d||z dzf|Stj|j||f|}||||z dzdf|S)Nrr)r rr$moder%)pil_imgbackground_colorrrresults r expand2squarerfsLME6  7<%9IJJ Wq56>a"78999 7<&&)9;KLL W14a8999 rc t|dd}ggd}|dkrn|D]j}t|td|jD}|||d|j|jddd }|knJ|d kr1|D]-}t |||j}|.n||d dStfd Drtj d S)Nimage_aspect_ratio)z.Describe in detail what is shown in the image.zWhat is the text in the image?z9Locate the objects in the image, with their descriptions.padc3:K|]}t|dzVdS)N)r )rNxs r z!process_images..s,.^.^as1S5zz.^.^.^.^.^.^rrIT)textimagesrK image_mean image_stdpaddingrLranyresrJc3DK|]}|jdjkVdS)rN)shape)rNrl new_imagess rrmz!process_images..s1 > >a17jm) ) > > > > > >rrR) getattrrftuplerprqr6r[image_grid_pinpointsallrUrV)roimage_processor model_cfgrhtaskr&rvs @rprocess_imagesr~s ,@$GGJ   D U"" % %E!%.^.^?C].^.^.^)^)^__E#OeD]l]wDSD]gklllm{|}~E   e $ $ $ $ % x ' ' % %E(A_``E   e $ $ $ $ %vd;;;NKK > > > >: > > >>>4[333 rcfd|dD}d}g}d}t|dkrSt|ddkr:|ddjkr#d}||dd|||g|dzzD]}|||d |8|dkr t j|t jStd||S) Nc0g|]}|jS) input_ids)rNchunk tokenizers rrQz)tokenizer_image_token..s&UUUEYYu%%/UUUrzchdt||gt|zDddS)Ncg|] }|D]}| Srr)rNsublisteles rrQzCtokenizer_image_token..insert_separator..s%KKK7KKCKKKKr)ziplen)Xseps rinsert_separatorz/tokenizer_image_token..insert_separators5KK3q3%A,#7#7KKKCRCPPrrrI)dtypezUnsupported tensor type: ) splitr bos_token_idr6extendrUtensorlong ValueError) promptrimage_token_indexrK prompt_chunksrroffsetrls ` rtokenizer_image_tokenrs9UUUUV\\)=T=TUUUMQQQI F =A#mA&6"7"7!";"; a@PQR@SW`Wm@m@mq)!,---  m.?-@FQJ-O P P%%677$$$$! T ! !< <<< <E^EEFFF rc|d}|d}|ddr|ddz|dzS|dS)N/rz checkpoint-_)stripr startswith) model_path model_pathss rget_model_name_from_pathrsd!!#&&J""3''K2!!-002${2662rcdeZdZdZdejdejdefdZdejdejdefdZ dS)KeywordsStoppingCriteriac||_g|_d|_|D]}||j}t |dkr|d|jkr |dd}t ||jkrt ||_|jtj|||_ |j d|_ dS)Nrr) keywords keyword_idsmax_keyword_lenrrrr6rUrrru start_len)selfrrrkeywordcur_keyword_idss r__init__z!KeywordsStoppingCriteria.__init__s    C CG'i00:O?##a''OA,>)BX,X,X"1!"""5?##d&:::'*?';';$   # #EL$A$A B B B B""+r output_idsscoresreturnc |tjd|jz |j}fd|jD|_|jD]2}d|jd df}t j||rdS3|jdd| dfdd}|j D] }||vrdS dS)NrcDg|]}|jSr)todevice)rN keyword_idrs rrQz;KeywordsStoppingCriteria.call_for_batch..s(```JMM**;<<```rrT)skip_special_tokensF) r rurrrrUequalr batch_decoder) rrrkwargsrrtruncated_output_idsoutputsrs ` rcall_for_batchz'KeywordsStoppingCriteria.call_for_batchsZ%a(4>94;OPP````tO_```*  J#-a*2B12E1E1F1F.F#G {/<< tt .--jVGHH.E[_-``abc}  G'!!tt"urc g}t|jdD]D}||||d|Et |S)Nr)r4rur6r unsqueezerz)rrrrrr9s r__call__z!KeywordsStoppingCriteria.__call__skz'*++ T TA NN4..z!}/F/Fq/I/I6RR S S S S7||rN) __name__ __module__ __qualname__rrU LongTensor FloatTensorboolrrrrrrrs , , , )9 5CT cg    5#3U=N]arr)PILriorr^rUr!rA transformersrllava.constantsrrr2r=rEr[r`rfr~rrrrrrrsM ))))))------<   F,555(---:888   !!!v@Qae,     /     r