A8hSSKrSSKrSSKrSSKrSSKJr SSKJrJrJ r J r J r SSK J r JrJrJr SSKrSSKrSSKrSSKJr SSKJrJrJrJr SSjrg) N)Image)MTVCrafterPipelineEncoderVectorQuantizerDecoder SMPL_VQVAE) ToPILImage transformsInterpolationMode functional)get_pose_images)concat_images_grid sample_videoget_sample_indexesget_new_height_widthc  Sn[5n [R"S/S/5n Sn Sn Sn [US5n[R "U5nSSS5 [ W[5(dU/n[R "S5n[R "S5n[R"U U [RS S 9RU5nURR5 URR!5 [R "U S S 9n[#S SS/SSS/SS/S9n[%SSSS9n['SSS/S SSS/SS/S9n[)UUU5RU5nUR+USSS9 USn[-UXC5unnUU- S-nUU- S-n[/USUSS9n[1[2R4"US 5U5n[R6"U5R9SS SS5R;5n[<R>"UUU4[@RB5n[<RD"UUUXC5nUS!:wa[FR"U5RIS"5n[R6"[RJ"U55R9SSS5R;5n[RL"[OU5V s/sHn URQ5PM sn 5n![<R>"U!UU4[@RB5n![<RD"U!UUXC5n!O,[RRT"U5n!USn"U"U!SS2SS2SS2SS24'[RJ"US#S$V#s/sH'n#U#SSRW5RY5PM) sn#5n$U$Un%[RZ"U%U- U- 5n&US%US&S/n'[][RRT"U%5U'5n(U(V)s/sH/n)U)R?UU45REUUUU-UU-45PM1 n(n)U&R_S5RU5n*U"U*SS'9un+n,U"U*5un-n []U-SRW5Ra5U-U-U'5n.U.V)s/sH/n)U)R?UU45REUUUU-UU-45PM1 n.n)US(- nU!S(- n!U "U5nU "U!5n!U"UUUUUUU!U+UUS)9 RbSn//n0[O[eU/55Hsn1U "UU1S-S*-RgSS+5R[Rh55U(U1U.U1U/U1/n2[kU2[eU25SS,9n2U0RmU25 Mu S-n3[nRp"U3U0S.S/9 U3$!,(df  GNt=fs sn fs sn#f! US#Un%GN+=fs sn)fs sn)f)0N1g?z//gemini/space/human_guozz2/dyb/models/CogVideoXzA/gemini/space/human_guozz2/dyb/models/MTVCrafter/MV-DiT/CogVideoXzQ/gemini/space/human_guozz2/dyb/models/MTVCrafter/4DMoT/mp_rank_00_model_states.ptrbz data/mean.npyz data/std.npydpm) model_pathtransformer_model_path torch_dtypescheduler_typecpu) map_locationi ) in_channels mid_channels out_channelsdownsample_timedownsample_jointi F)nb_codecode_dimis_traing@g?)r!r"r# upsample_rateframe_upsample_ratejoint_upsample_ratemoduleT)strictr video_length)stride video_pathRGBposejoints3d_nonparam video_height video_width) return_vqgo@) heightwidth num_framesnum_inference_stepsguidance_scaleseed ref_images motion_embeds joint_mean joint_stdg_@)colspadz output.mp4)fps)9r r Normalizeopenpickleload isinstancelistnprfrom_pretrainedtorchbfloat16tovae enable_tilingenable_slicingrrrrload_state_dictrrrdecord VideoReader from_numpypermute contiguousFresizer BILINEARcroprconvertarraystackrangeclonecopydeepcopyrnumpytensorr unsqueezedetachframeslenclampuint8rappendimageiomimsave)4devicemotion_data_pathref_image_path dst_width dst_heightr;r<r=r:to_pil normalizepretrained_model_pathtransformer_pathtokenizer_pathf data_listpe_meanpe_stdpipe state_dictmotion_encoder motion_quantmotion_decodervqvaedata new_height new_widthx1y1sample_indexes input_images ref_image_r>frame0r3 smpl_posesposes norm_posesoffsetpose_images_beforeimageinput_smpl_joints motion_tokensvq_loss output_motionpose_images_after output_images vis_imagesk vis_image output_paths4 =/gemini/space/human_guozz2/dyb/MTVCrafter/inference_engine.py run_inferencersJJ \F$$cUSE2IMZhN  %KKN  & i & &K ggo&G WW^ $F  - -(/NN    bj   HHHHN?J#sRVijlmhnCDFGBHIN"4$OLS#JUVfiADFI@Jadfi`jkN ~~| D G G OE *X.t< Q*uw&89J>"D$7 ;F(u)=vFzLMzLpu%,, :'>?DDb"bQZl\^_i\iEjkzLM",,Q/226:"#4EM7/0M1' a(8(<(<(>(E(E(G&(PSZ(Z\bcyJKyJoty*&=>CCRRPY\[]^h[hDijyJK %'Le#J\*L:&J/% #  fQ MJ 3}% &l1o1U:AA!SILLU[[YZ\nop\qtEFGtHJWXYJZ[ &ys9~1M )$' K OOK4 } & %P"Pg-V ^, M Ks;V<W&W.W. W6W('6W-< W W W%)r1rr2g@i )osrOrVroPILrmodelsrrrrrtorchvision.transformsr r r r r[rfrMrIrd draw_poser utilsrrrrrrrs8 TT]] %\\fr