a ne]@s8ddlmZmZmZmZmZddlZddlmm Z ddlmZddl m Z ddl mZmZmZddlmZmZmZmZddlmZddlmZdd lmZd d lmZd d lmZm Z d d l!m"Z"d dl#m$Z$ddl%m&Z&ddl'm(Z(edZ)de)_*ddZ+e),Gdddej-Z.e&,Gdddej-Z/dS))DictListOptionalTupleUnionN)nn) configurable)Conv2d ShapeSpeccat)Boxes ImageList Instances pairwise_iou)get_event_storage)retry_if_cuda_oom)Registry)build_anchor_generator)Box2BoxTransform_dense_box_regression_loss)Matcher)subsample_labels)PROPOSAL_GENERATOR_REGISTRY)find_top_rpn_proposalsZRPN_HEADz Registry for RPN heads, which take feature maps and perform objectness classification and bounding box regression for anchors. The registered object will be called with `obj(cfg, input_shape)`. The call should return a `nn.Module` object. cCs|jjj}t|||S)zA Build an RPN head defined by `cfg.MODEL.RPN.HEAD_NAME`. )MODELRPN HEAD_NAMERPN_HEAD_REGISTRYget)cfg input_shapenamer$G/home/alin0222/detectron2/detectron2/modeling/proposal_generator/rpn.pybuild_rpn_head:s r&csdeZdZdZedddeeeeedfddZdd Ze d d Z ee j d d dZ ZS)StandardRPNHeadaO Standard RPN classification and regression heads described in :paper:`Faster R-CNN`. Uses a 3x3 conv to produce a shared hidden state from which one 1x1 conv predicts objectness logits for each anchor and a second 1x1 conv predicts bounding-box deltas specifying how to deform each anchor into an object proposal. ))box_dim conv_dims in_channels num_anchorsr*r+c st|}t|dkrF|ddkr*|n|d}||||_|}nft|_t|D]R\}}|dkrl|n|}|dkrtd||||} |j d|| |}qXtj ||ddd|_ tj |||ddd|_ | D]2} t| tj rtjj| jddtj| jdqd S) a{ NOTE: this interface is experimental. Args: in_channels (int): number of input feature channels. When using multiple input features, they must have the same number of channels. num_anchors (int): number of anchors to predict for *each spatial position* on the feature map. The total number of anchors for each feature map will be `num_anchors * H * W`. box_dim (int): dimension of a box, which is also the number of box regression predictions to make for each anchor. An axis aligned box has box_dim=4, while a rotated box has box_dim=5. conv_dims (list[int]): a list of integers representing the output channels of N conv layers. Set it to -1 to use the same number of output channels as input channels. rrr)z3Conv output channels should be greater than 0. Got conv) kernel_sizestrideg{Gz?)stdN)super__init__len _get_rpn_convr/r Sequential enumerate ValueError add_moduler objectness_logits anchor_deltasmodules isinstanceinitnormal_weight constant_bias) selfr-r.r*r+Z cur_channels out_channelskZconv_dimr/layer __class__r$r%r4Ks,      zStandardRPNHead.__init__cCst||dddtdS)Nr)r0r1padding activation)r rReLU)rDr-rEr$r$r%r6~szStandardRPNHead._get_rpn_convcCstdd|D}tt|dks&Jd|d}t||}|j}|j}tt|dks\Jd||d||jjjdS)NcSsg|] }|jqSr$)channels).0sr$r$r% z/StandardRPNHead.from_config..rz&Each level must have the same channel!rzDEach level must have the same number of anchors per spatial positionr,)r5setrr.r*rr CONV_DIMS)clsr!r"r-anchor_generatorr.r*r$r$r% from_configs zStandardRPNHead.from_configfeaturescCsDg}g}|D].}||}||||||q ||fS)a Args: features (list[Tensor]): list of feature maps Returns: list[Tensor]: A list of L elements. Element i is a tensor of shape (N, A, Hi, Wi) representing the predicted objectness logits for all anchors. A is the number of cell anchors. list[Tensor]: A list of L elements. Element i is a tensor of shape (N, A*box_dim, Hi, Wi) representing the predicted "deltas" used to transform anchors to proposals. )r/appendr;r<)rDrYpred_objectness_logitspred_anchor_deltasxtr$r$r%forwards  zStandardRPNHead.forward)__name__ __module__ __qualname____doc__rintrr4r6 classmethodrWtorchTensorr_ __classcell__r$r$rHr%r'Bs 2  r'cseZdZdZedddddddeeejeje e e e e e e fe e e fe e e ee eee ffee dfd d Zeeeefd d d ZddZejjeeeeee eejeejfdddZejjeeeejeejeejeejeeejfdddZd eeeejfeeedddZ eeeejeejee e e fdddZ!eeeejdddZ"Z#S)!rzG Region Proposal Network, introduced by :paper:`Faster R-CNN`. gffffff?gg? smooth_l1) nms_thresh min_box_sizeanchor_boundary_thresh loss_weightbox_reg_loss_typesmooth_l1_beta) in_featuresheadrVanchor_matcherbox2box_transformbatch_size_per_imagepositive_fraction pre_nms_topk post_nms_topkrkrlrmrnrorpcst||_||_||_||_||_||_||_|d|dd|_ | d| dd|_ | |_ t | |_ | |_t| t r| | d} | |_||_||_dS)aD NOTE: this interface is experimental. Args: in_features (list[str]): list of names of input features to use head (nn.Module): a module that predicts logits and regression deltas for each level from a list of per-level features anchor_generator (nn.Module): a module that creates anchors from a list of features. Usually an instance of :class:`AnchorGenerator` anchor_matcher (Matcher): label the anchors by matching them with ground truth. box2box_transform (Box2BoxTransform): defines the transform from anchors boxes to instance boxes batch_size_per_image (int): number of anchors per image to sample for training positive_fraction (float): fraction of foreground anchors to sample for training pre_nms_topk (tuple[float]): (train, test) that represents the number of top k proposals to select before NMS, in training and testing. post_nms_topk (tuple[float]): (train, test) that represents the number of top k proposals to select after NMS, in training and testing. nms_thresh (float): NMS threshold used to de-duplicate the predicted proposals min_box_size (float): remove proposal boxes with any side smaller than this threshold, in the unit of input image pixels anchor_boundary_thresh (float): legacy option loss_weight (float|dict): weights to use for losses. Can be single float for weighting all rpn losses together, or a dict of individual weightings. Valid dict keys are: "loss_rpn_cls" - applied to classification loss "loss_rpn_loc" - applied to box regression loss box_reg_loss_type (str): Loss type to use. Supported losses: "smooth_l1", "giou". smooth_l1_beta (float): beta parameter for the smooth L1 regression loss. Default to use L1 loss. Only used when `box_reg_loss_type` is "smooth_l1" rr)TFZ loss_rpn_clsZ loss_rpn_locN)r3r4rqrpn_headrVrsrtrurvrwrxrkfloatrlrmr>rnrorp)rDrqrrrVrsrtrurvrwrxrkrlrmrnrorprHr$r%r4s$4    z RPN.__init__r"c s|jjj}||jjj|jjj|jjj|jjj|jjj|jjj |jjjd|jjj t |jjj d|jjj |jjjd }|jjj|jjjf|d<|jjj|jjjf|d<t|fdd|D|d<t|jjj|jjjd d |d <t|fd d|D|d <|S)Nry)weights) rqrlrkrurvrnrmrtrorprwrxcsg|] }|qSr$r$rOfr|r$r%rQrRz#RPN.from_config..rVT)allow_low_quality_matchesrscsg|] }|qSr$r$r~r|r$r%rQrRrr)rr IN_FEATURESPROPOSAL_GENERATORMIN_SIZE NMS_THRESHBATCH_SIZE_PER_IMAGEPOSITIVE_FRACTION LOSS_WEIGHTBBOX_REG_LOSS_WEIGHTBOUNDARY_THRESHrBBOX_REG_WEIGHTSBBOX_REG_LOSS_TYPESMOOTH_L1_BETAPRE_NMS_TOPK_TRAINPRE_NMS_TOPK_TESTPOST_NMS_TOPK_TRAINPOST_NMS_TOPK_TESTrrIOU_THRESHOLDS IOU_LABELSr&)rUr!r"rqretr$r|r%rWs,  zRPN.from_configcCs@t||j|jd\}}|d|d|d|d|d|S)a5 Randomly sample a subset of positive and negative examples, and overwrite the label vector to the ignore value (-1) for all elements that are not included in the sample. Args: labels (Tensor): a vector of -1, 0, 1. Will be modified in-place and returned. rr)r)rrurvfill_scatter_)rDlabelpos_idxneg_idxr$r$r%_subsample_labelss   zRPN._subsample_labels)anchors gt_instancesreturncCst|}dd|D}dd|D}~g}g}t||D]\}}tt||} t|j| \} } | j|jd} ~ |jdkr| ||j} d| | <| | } t |dkrt |j} n || j} || || q:||fS)a Args: anchors (list[Boxes]): anchors for each feature map. gt_instances: the ground-truth instances for each image. Returns: list[Tensor]: List of #img tensors. i-th element is a vector of labels whose length is the total number of anchors across all feature maps R = sum(Hi * Wi * A). Label values are in {-1, 0, 1}, with meanings: -1 = ignore; 0 = negative class; 1 = positive class. list[Tensor]: i-th element is a Rx4 tensor. The values are the matched gt boxes for each anchor. Values are undefined for those anchors not labeled as 1. cSsg|] }|jqSr$)gt_boxesrOr]r$r$r%rQGrRz0RPN.label_and_sample_anchors..cSsg|] }|jqSr$) image_sizerr$r$r%rQHrR)devicerr))r r ziprrrstorrm inside_boxrr5rf zeros_liketensorrZ)rDrrr image_sizes gt_labelsmatched_gt_boxesZ image_size_iZ gt_boxes_imatch_quality_matrix matched_idxsZ gt_labels_iZanchors_inside_imageZmatched_gt_boxes_ir$r$r%label_and_sample_anchors1s*        zRPN.label_and_sample_anchors)rr[rr\rrc st|}t|}|dk}|}|dk} t} | d||| d| |t|j|||j j d} |dk} t j t |dd| || tjdd} j|}| || |d }fd d |D}|S) ad Return the losses from a set of RPN predictions and their associated ground-truth. Args: anchors (list[Boxes or RotatedBoxes]): anchors for each feature map, each has shape (Hi*Wi*A, B), where B is box dimension (4 or 5). pred_objectness_logits (list[Tensor]): A list of L elements. Element i is a tensor of shape (N, Hi*Wi*A) representing the predicted objectness logits for all anchors. gt_labels (list[Tensor]): Output of :meth:`label_and_sample_anchors`. pred_anchor_deltas (list[Tensor]): A list of L elements. Element i is a tensor of shape (N, Hi*Wi*A, 4 or 5) representing the predicted "deltas" used to transform anchors to proposals. gt_boxes (list[Tensor]): Output of :meth:`label_and_sample_anchors`. Returns: dict[loss name -> loss value]: A dict mapping from loss name to loss value. Loss names are: `loss_rpn_cls` for objectness classification and `loss_rpn_loc` for proposal localization. rrzrpn/num_pos_anchorszrpn/num_neg_anchors)rorp)dimsum) reductionrycs$i|]\}}||j|dqS)ri)rnr )rOrFvrDr$r% rRzRPN.losses..)r5rfstackritemr put_scalarrrtrorpF binary_cross_entropy_with_logitsr rfloat32ruitems)rDrr[rr\r num_imagesZpos_maskZnum_pos_anchorsZnum_neg_anchorsstorageZlocalization_lossZ valid_maskobjectness_lossZ normalizerlossesr$rr%rms:    z RPN.lossesN)imagesrYrc sfddjD}\}}dd|D}fdd|D}jr|dusbJd||\}}|||||} ni} ||||j} | | fS)a Args: images (ImageList): input images of length `N` features (dict[str, Tensor]): input data as a mapping from feature map name to tensor. Axis 0 represents the number of images `N` in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used). gt_instances (list[Instances], optional): a length `N` list of `Instances`s. Each `Instances` stores ground-truth instances for the corresponding image. Returns: proposals: list[Instances]: contains fields "proposal_boxes", "objectness_logits" loss: dict[Tensor] or None csg|] }|qSr$r$r~rXr$r%rQrRzRPN.forward..cSs"g|]}|dddddqS)rrrJr)permuteflatten)rOscorer$r$r%rQsc sJg|]B}||jddjj|jd|jddddddddqS)rr)rJr(rr)viewshaperVr*rrrrr$r%rQs ( Nz&RPN requires gt_instances in training!)rqrVrztrainingrrpredict_proposalsr) rDrrYrrr[r\rrr proposalsr$)rYrDr%r_s(    z RPN.forward)rr[r\rc CsdtH|||}t||||j|j|j|j|j|j|jWdS1sV0YdS)a Decode all the predicted box regression deltas to proposals. Find the top proposals by applying NMS and removing boxes that are too small. Returns: proposals (list[Instances]): list of N Instances. The i-th Instances stores post_nms_topk object proposals for image i, sorted by their objectness score in descending order. N) rfno_grad_decode_proposalsrrkrwrrxrl)rDrr[r\rZpred_proposalsr$r$r%rs    zRPN.predict_proposals)rr\c Cs|djd}g}t||D]`\}}|jd}|d|}|jd|ddd|}|j||}| | |d|q|S)z Transform anchors into proposals by applying the predicted anchor deltas. Returns: proposals (list[Tensor]): A list of L tensors. Tensor i has shape (N, Hi*Wi*A, B) rrr)) rrrsizereshape unsqueezeexpandrtZ apply_deltasrZr) rDrr\NrZ anchors_iZpred_anchor_deltas_iBZ proposals_ir$r$r%rs  zRPN._decode_proposals)N)$r`rarbrcrrstrrModulerrrdr{rrrr4rer rWrrfjitunusedrr rrgrrr rr_rrrhr$r$rHr%rsj   G : E  5 r)0typingrrrrrrfZtorch.nn.functionalr functionalrdetectron2.configrdetectron2.layersr r r detectron2.structuresr r rrdetectron2.utils.eventsrdetectron2.utils.memoryrdetectron2.utils.registryrrVrbox_regressionrrmatcherrZsamplingrbuildrZproposal_utilsrrrcr&registerrr'rr$r$r$r%s,          $q