a f/ @sbddlZddlZddlmZddlmZddlmZddlmZddl Z ddZ GdddZ dS) N)tqdm) SummaryWriter)Optionalcsdt|tr tfdd|DSt|tr<fdd|DSt|tr\fdd|DS|S)z'This is for trasfering into cuda devicec3s|]}t|VqdSN nested_map.0xmap_fn8/home/ubuntu/whole-song-gen_acc/chord_trainer/learner.py znested_map..csg|]}t|qSr rrr r r rznested_map..csi|]\}}|t|qSr rr kvr r r rznested_map..) isinstancetuplelistdictitems)structr r r rr s   rc@sveZdZddZeeedddZddZdd Zdd d Z d dZ dddZ dddZ ddZ ddZddZdS)DiffproLearnerc Cs@||_|d|_|d|_||_||_||_||_||_||_d|_ d|_ d|_ d|_ t jrhdnd|_t jjj|jd|_t jjj|jd|_t jdg|jd |_tj|jr|ndt|jt|jt|jt|d d }t|j|Wdn1s0Yt tj!|jd d ddS)Nz/logsz/chkptsrgcudacpu)enabledg _B)devicez /params.jsonwT) sort_keysindent)" output_dirlog_dircheckpoint_dirmodeltrain_dlval_dl optimizerparamsparam_schedulerstepepoch grad_normsummary_writertorchr is_availabler ampautocastZfp16 GradScalerscalertensor best_val_lossospathexistsrestore_from_checkpointmakedirsopenjsondumpprintdumps) selfr%r(r)r*r+r,r-Z params_filer r r__init__s2      .zDiffproLearner.__init__)lossesscheduled_paramscCsn|}|j|d<|dur6|D]\}}||d|<q|jpJt|j|jd}||||j|||_dS)ztype: train or valr0NZsched_)Z purge_step)r0rr1rr&r.Z add_scalarsflush)rDrFrGtypeZsummary_lossesrrwriterr r r_write_summary8s zDiffproLearner._write_summarycCsF|j}|j|jdd|Ddd|jD|jdS)NcSs*i|]"\}}|t|tjr"|n|qSr rr2Tensorrrr r rrLsz-DiffproLearner.state_dict..cSs*i|]"\}}|t|tjr"|n|qSr rLrr r rrQs)r.r/r(r+r7)r( state_dictr.r/rr+r7)rDZ model_stater r rrNFs  zDiffproLearner.state_dictcCsH|d|_|d|_|j|d|j|d|j|ddS)Nr.r/r(r+r7)r.r/r(load_state_dictr+r7)rDrNr r rrOXs   zDiffproLearner.load_state_dictweightscCslzJ|jd|d}t|}||td|d|d|jdWdStyftdYd S0dS) N/.ptzRestored from checkpoint z --> -z.pt!Tz-No checkpoint found. Starting from scratch...F)r'r2loadrOrBr/FileNotFoundError)rDfnamefpathZ checkpointr r rr=_s   z&DiffproLearner.restore_from_checkpointcCs&tj|rt|t||dSr)r:r;islinkunlinksymlink)rD save_name link_fpathr r r_link_checkpointjs  zDiffproLearner._link_checkpointFcCsv|d|jd}|jd|}|jd|d}|jd|d}t||||||rr|||dS)NrSrRrQz_best.pt)r/r'r2saverNr])rDrVis_bestr[Z save_fpathZlink_best_fpathr\r r rsave_to_checkpointos z!DiffproLearner.save_to_checkpointNcs.jjdurjjtj_|durFj|krFdStjdjdD]}t|fdd} |\}}t | D]6}t |t jrt |rtdjdjqjddkr||d jd dkrjdkrjdkrjd 7_q\q dS) NzEpoch )desccst|tjr|jS|Srrr2rMtor r rDr rs z&DiffproLearner.train..zDetected NaN loss at step z, epoch 2rtraini)r(rhr-r.lenr)r/rr train_steprvaluesrr2rMisnanany RuntimeErrorrKvalid)rD max_epochbatchrFrGZ loss_valuer rerrhys8     zDiffproLearner.traincsjdurjd}jD]N}t|fdd}|\}}|pH|}|D]\}}|||7<qRq|duszJ|D]\}}||tj<q|ddj|dkr|d_j ddn j dddS)Ncst|tjr|jS|Srrbrdrer rrfrz&DiffproLearner.valid..vallossT)r_F) r-evalr*rval_steprrjrKr9r`)rDrFrrZcurrent_losses_rrr rerrps&      zDiffproLearner.validcCs|jD] }d|_q |jR|jdurL|j}|jj||jfi|}nd}|j||j}Wdn1st0Y|d}|j| |j |j t j j|j|jjpd|_|j|j |j||fS)NrtgeA)r( parametersgradr5r-r. get_loss_dictr7scalebackwardunscale_r+nnutils clip_gradclip_grad_norm_r,Z max_grad_normr0update)rDrrparamrG loss_dictrtr r rrks*  . zDiffproLearner.train_stepc Cstx|jR|jdur@|j}|jj||jfi|}nd}|j||j}Wdn1sh0YWdn1s0Y||fSr)r2no_gradr5r-r.r(rz)rDrrrGrr r rrvs   LzDiffproLearner.val_step)rP)rPF)N)__name__ __module__ __qualname__rErrrKrNrOr=r]r`rhrprkrvr r r rrs"  !r) r2r@torch.nnr~rZtorch.utils.tensorboard.writerrtypingrr:rrr r r rs