o m#gB @sddlZddlZddlZddlZddlmZmZmZddlZddl Z ddlZddl m Z ddlZddl Z ddlZddl Z ddlZddlmZddlZedgZddlZddlZddlZddlZddlZddlmZmZddlmZddlZddlmZddl Z!ddlm"Z#d6d d Z$d d Z%e&d7ddZ'ddZ(d7ddZ)de*deej+ej,ffddZ- d8dej.dej,dej,dee*de/dej.f d d!Z0d"d#Z1d$d%Z2dd&d'dd(dddgd)d*df d+d,Z3d-d.Z4d/d0Z5d1d2Z6d9d4d5Z7dS):N)Image ImageDraw ImageFont) AzureOpenAI)pyploten)TupleList) box_convert) ToPILImageSalesforce/blip2-opt-2.7bc Cs|s tjr dnd}|dkr7ddlm}m}|d}|dkr*|j|dtjd}n:|j|dtjd |}n-|dkrddd lm }m }|jd d d }|dkrX|j|tjd d }n |j|tjd d  |}| ||dS)NcudacpuZblip2r)Blip2ProcessorBlip2ForConditionalGenerationr ) device_map torch_dtype florence2) AutoProcessorAutoModelForCausalLMzmicrosoft/Florence-2-baseT)trust_remote_code)rr)model processor) torchr is_available transformersrrfrom_pretrainedfloat32float16torr) model_namemodel_name_or_pathdevicerrrrrrr#/home/user/app/utils.pyget_caption_model_processor$s* r%cCsddlm}||}|S)Nr)YOLO)Z ultralyticsr&) model_pathr&rr#r#r$get_yolo_model<s r(c Cst}|r|t|d}n|}g}t|D]F\}} t| d|jdt| d|jd} } t| d|jdt| d|jd} } || | | | ddf}|||q|d|d}}|ssd|jjvrqd}nd }d }g}|j}t dt||D]k}||||}|jj d kr|||gt|d d j |t j d}n|||gt|d d j |d}d|jjvr|j|d|ddddd}n|jdi|dddddd}|j|dd}dd|D}||q|S)NrrrZflorencez zThe image shows r ptimagestextreturn_tensors)r"dtyper" input_ids pixel_valuesiF)r4r5max_new_tokens num_beams do_sampledT) max_lengthr7Zno_repeat_ngram_sizeearly_stoppingZnum_return_sequences)skip_special_tokenscSsg|]}|qSr#strip).0genr#r#r$ gz+get_parsed_content_icon..r#)r len enumerateintshapeappendconfig name_or_pathr"rangetyperrrgenerate batch_decodeextend)filtered_boxesocr_bbox image_sourcecaption_model_processorpromptto_pil non_ocr_boxescroped_pil_imageicoordxminxmaxyminymax cropped_imagerr batch_sizegenerated_textsr"batchinputsZ generated_idsgenerated_textr#r#r$get_parsed_content_iconCs<..  &    rdc st}|r|t|d}n|}g}t|D]F\}}t|d|jdt|d|jd} } t|d|jdt|d|jd} } || | | | ddf} ||| q|d|d}|jddd g}jj|d d d }d }g}t dt||D]}||||}fdd|D}ggggd}|gt|}t|D]2\}}j |||dd}|d|d|d|d|d|d|d|dqt dd|dD}t|dD]>\}}t j jjt jd||jdt jd|gdd|d|<t j t jd||jdt jd|d|gdd|d|<qfdd|D}ddd d}|jd"i|djji|}|dd|djddf}j|d d d }d!d|D}||q|S)#Nrr)r*r+rruserz-<|image_1|> describe the icon in one sentence)rolecontentFT)tokenizeZadd_generation_promptr:csg|] }j|ddqS)r-r1)Zimage_processorr@x)rr#r$rBz1get_parsed_content_icon_phi3v..)r4attention_maskr5 image_sizesr-rir4rmr5rncSsg|]}|jdqSr))rGrjr#r#r$rBs)r2)dimcs"i|] \}}|t|qSr#)r concatenaterr@kvr3r#r$ s"z1get_parsed_content_icon_phi3v..{Gz?)r6 temperaturer8 eos_token_id)r=clean_up_tokenization_spacescSsg|] }|dqS) r>)r@resr#r#r$rBrlr#)r rDrErFrGrHr" tokenizerZapply_chat_templaterKZ_convert_images_texts_to_inputsmaxrcatZ pad_token_idoneslongzerositemsrMryrNrO)rPrQrRrSrUrVrWrXrYrZr[r\r]r^rmessagesrTr_r`r/Z image_inputsrbtextstxtinputmax_lenrtZ inputs_catZgeneration_argsZ generate_idsresponser#)r"rr$get_parsed_content_icon_phi3vnsT.. :<  rcs|dus t|ts Jddddfdd|}g}|r'||t|D]D\}d}t|D]\}}||krP|krP|krPd}nq5|ro|rjtfd d t|Dsi|q+|q+t|S) NcSs |d|d|d|dS)Nr*rr+r)r#)boxr#r#r$box_areas z remove_overlap..box_areacSsdt|d|d}t|d|d}t|d|d}t|d|d}td||td||SNrr)r*r+)r~min)box1box2x1y1x2y2r#r#r$intersection_areas z)remove_overlap..intersection_areacsl||}|||d}|dkr*|dkr*||}||}nd\}}t||||S)Ngư>r)rr)r~)rr intersectionunionZratio1Zratio2)rrr#r$IoUs  zremove_overlap..IoUTFc3s"|] \}}|kVqdSNr#)r@rsZbox3)rr iou_thresholdr#r$ s z!remove_overlap..) isinstancer tolistrOrEanyrHrtensor)boxesrrQrPrXZ is_valid_boxjrr#)rrrrrr$remove_overlaps.  &   r image_pathreturnc Cs`ttjdgddttgdgdg}t|d}t |}||d\}}||fS)Ni i5)max_size)g ףp= ?gv/?gCl?)gZd;O?gy&1?g?RGB) TZComposeZ RandomResizeZToTensorZ Normalizeropenconvertnpasarray)r transformrRimageZimage_transformed_r#r#r$ load_images rr:r*r+rRrlogitsphrases text_scalecCs|j\}} } |t| || |g}t|ddd} t|ddd} tj| d} ddt|jdD}dd lm }|||||d }| }|j || || |fd }d d t || D}||fS)aH This function annotates an image with bounding boxes and labels. Parameters: image_source (np.ndarray): The source image to be annotated. boxes (torch.Tensor): A tensor containing bounding box coordinates. in cxcywh format, pixel scale logits (torch.Tensor): A tensor containing confidence scores for each bounding box. phrases (List[str]): A list of labels for each bounding box. text_scale (float): The scale of the text to be displayed. 0.8 for mobile/web, 0.3 for desktop # 0.4 for mind2web Returns: np.ndarray: The annotated image. cxcywhxyxyrZin_fmtZout_fmtxywh)rcSsg|]}|qSr#r#)r@phraser#r#r$rBszannotate..r) BoxAnnotator)r text_paddingtext_thickness thickness)Zscene detectionslabelsZ image_sizecSsi|]\}}||qSr#r#)r@rrtr#r#r$ruszannotate..) rGrTensorr numpysvZ DetectionsrKZutil.box_annotatorrcopyannotatezip)rRrrrrrrrhwrrrrrrZ box_annotatorannotated_framelabel_coordinatesr#r#r$rs   rc Cs|d|d}}|j}|||dd|}t|d i|}Wdn1s,wY|j||j|||jdddgdd} | d | d | d } } } | | | fS) 9 Use huggingface model to replace the original model rrr-r.N) box_thresholdtext_thresholdZ target_sizesrrscoresrr#)r"rrno_gradZ&post_process_grounded_object_detectionr4size) rrcaptionrrrr"rboutputsresultsrrrr#r#r$predicts"  rcCsF|j||d}|djj}|djj}ddtt|D}|||fS)r)sourceconfrcSg|]}t|qSr#strr@rXr#r#r$rBrCz predict_yolo..)rrrrrKrD)rrrresultrrrr#r#r$ predict_yolos   rrwFg?Tg?c !sLd} d}t|d}|j\ t|||d\}}}|tg|j }t |}ddt t |D}|j\}|rVt|tg}|}ntd d }t|| |d }| r|d }d |jjvrwt||||}n t||||| d}ddt| D} t | }g}t|D]\}}|dt||d|q| |}n ddt| D} | }t|ddd}ddt t |D}|rtd ||||d|\}}n t||||||d\}}t|}t}|j|ddt !|"#d} |r!fdd|$D}|jdkr|jdks!J| ||fS)!z( ocr_bbox: list of xyxy format bbox zclickable buttons on the screenrwrF)rrrrr)rrrcSrr#rrr#r#r$rB1rCz'get_som_labeled_img..zno ocr bbox!!!N)rrrQrZphi3_v)rTcS g|] \}}d|d|qSz Text Box ID : r#r@rXrr#r#r$rBD z Icon Box ID rcSrrr#rr#r#r$rBKrrrrcSsg|]}|qSr#r#rr#r#r$rBPs)rRrrr)rRrrrrrPNG)formatasciics>i|]\}}||d|d|d|dgqS)rr)r*r+r#rrrrr#r$ru^s>z'get_som_labeled_img..r)rr#)%rrrrrrrrrr"rrrKrDrGrrprintrrI model_typerrdrErHrr r fromarrayioBytesIOsavebase64 b64encodegetvaluedecoder)!Zimg_pathr BOX_TRESHOLDoutput_coord_in_ratiorQrrdraw_bbox_configrSocr_textZuse_local_semanticsrrTZ TEXT_PROMPTZ TEXT_TRESHOLDrRrrrrrPZ caption_modelZparsed_content_iconZ icon_startZparsed_content_icon_lsrXrZparsed_content_mergedrrZpil_imgbuffered encoded_imager#rr$get_som_labeled_img"sT        $ rcCs||dd|dd|dd|dd|dd|ddf\}}}}t|t|t|t|f\}}}}||||fSNrr)r*rFrrkyrrr#r#r$get_xywhdsL$ rcCsd|dd|dd|dd|ddf\}}}}t|t|t|t|f\}}}}||||fSrr)rrkrxpZypr#r#r$get_xyxyi4$ rcCsd|d|d|d|d|d|df\}}}}t|t|t|t|f\}}}}||||fSrrrr#r#r$ get_xywh_yolonrrrcCs|duri}tj|fi|}d}dd|D}dd|D}|r\t|} t| tj} g} |D]$} t| \} } }}| | | ||ft| | | f| || |fddq1t | n|dkrhdd|D} n |d krsd d|D} || f|fS) NFcSg|]}|dqS)rr#r@itemr#r#r$rB{rCz!check_ocr_box..cSrror#rr#r#r$rB|rC)rrr*rcSrr#)rrr#r#r$rBrCrcSrr#)rrr#r#r$rBrC) readerZreadtextcv2ZimreadZcvtColorZ COLOR_RGB2BGRrrHZ rectanglepltZimshow)r display_imgoutput_bb_formatgoal_filtering easyocr_argsris_goal_filteredrYr0Z opencv_imgbbrrkrabr#r#r$ check_ocr_boxus( $  r )r Nr)r:r*r+)TrNN)8osrrtimePILrrrjsonrequestsZopenairsysrrr matplotlibrrZeasyocrReaderrastrtypingrr Ztorchvision.opsr reZtorchvision.transformsr Z supervisionr transformsrr%r(inference_moderdrrrarrayrrndarrayfloatrrrrrrrr r#r#r#r$s\        * 4,"  "B