U lxd@shddlZddlZddlZddlZddlmZddlmZddlZd ej e e dddZ Gdd d Z dS) N) normalize)time)datak sample_sizec Cs:|jd}||kr,|tjd||g|jd}t|jdkrD|jdnd}t||f|j}tjj dd}t |D]}|dkr|t|jddg||ddf<qvtj |d|ddfdddf|dddfdddj dd}|t |} tj| dd} |t| |dg|j||ddf<qv|S)a Picks k points in the data based on the kmeans++ method. Parameters ---------- data : torch.Tensor Expect a rank 1 or 2 array. Rank 1 is assumed to describe 1-D data, rank 2 multidimensional data, in which case one row is one observation. k : int Number of samples to generate. sample_size : int sample data to avoid memory overflow during calculation Returns ------- init : ndarray A 'k' by 'N' containing the initial centroids. References ---------- .. [1] D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding", Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2007. .. [2] scipy/cluster/vq.py: _kpp rdeviceN)pZdim)shapetorchrandintr lenZzerostoZ distributionsZuniformZUniformrangeZcdistZaminsumZcumsumZ searchsortedZsample) rrr batch_sizeZdimsinitriZD2ZprobsZcumprobsr D:\so-vits-svc\cluster\kmeans.py_kpps  &D.rc@sReZdZdZddddedfddZed d Zed d Z d dZ ddZ dS) KMeansGPUa Kmeans clustering algorithm implemented with PyTorch Parameters: n_clusters: int, Number of clusters max_iter: int, default: 100 Maximum number of iterations tol: float, default: 0.0001 Tolerance verbose: int, default: 0 Verbosity mode: {'euclidean', 'cosine'}, default: 'euclidean' Type of distance measure init_method: {'random', 'point', '++'} Type of initialization minibatch: {None, int}, default: None Batch size of MinibatchKmeans algorithm if None perform full KMeans algorithm Attributes: centroids: torch.Tensor, shape: [n_clusters, n_features] cluster centroids g-C6?r euclideanzcuda:0c Cs||_||_||_||_||_||_tt|j }t |}t d|j|j ddd|_ td|j dddd|j dS)NgxAiz free_mem/GB:z minibatch:) n_clustersmax_itertolverbosemoder pynvmlZnvmlInitZnvmlDeviceGetHandleByIndexindexZnvmlDeviceGetMemoryInfointfree minibatchprint) selfrr r!r"r#r Z gpu_handleinforrr__init__Os  "zKMeansGPU.__init__cCs t|ddt|ddddS)z Compute cosine similarity of 2 sets of vectors Parameters: a: torch.Tensor, shape: [m, n_features] b: torch.Tensor, shape: [n, n_features] rr )r transposeabrrrcos_sim\s zKMeansGPU.cos_simcCsPd||dd|djddddddf|djddddddfS)z Compute euclidean similarity of 2 sets of vectors Parameters: a: torch.Tensor, shape: [m, n_features] b: torch.Tensor, shape: [n, n_features] r r-rr r .N)r.rr/rrreuc_simhszKMeansGPU.euc_simcCsD|jdkr|j}n|jdkr"|j}|||}|jdd\}}||fS)z Compute maximum similarity (or minimum distance) of each vector in a with all of the vectors in b Parameters: a: torch.Tensor, shape: [m, n_features] b: torch.Tensor, shape: [n, n_features] Zcosinerrr )r#r2r3max)r*r0r1Zsim_funcZsimZ max_sim_vZ max_sim_irrrmax_simrs   zKMeansGPU.max_simc CsLt|tjstd|jtjtjtjfks2td|jdksDtdt dt |j dt d}t |jd}t}|jd||kr|td|t|jd|g|j}n ||j}t||j tt|jd |||_~tjtj|j |j|jd }d }|j|dkrR|j|krR|td||jg|j}n|j|krj||j}t|jD]}t} |j|dkr|td||jg|j}n|}|j||jd d tj}|jdd\} } |d  |j d} | tj!|j |jdd d d fk|j} | || "ddd d d f}d|||k<||j#d"}|jd k rd |d d d fdd}nd }| $} || | 7<|jd ||||_|j%dkrt&d|d|'dt(t| d||j)krtqqt|j%d kr>t&d|d dt(t|dd|d|j d W5QRX|S)a Combination of fit() and predict() methods. This is faster than calling fit() and predict() seperately. Parameters: X: torch.Tensor, shape: [n_samples, n_features] centroids: {torch.Tensor, None}, default: None if given, centroids will be initialized with given tensor if None, centroids will be randomly chosen from X Return: labels: torch.Tensor, shape: [n_samples] mini_=33kk/k*remain mini=min(mini_,fea_shape) offset=log2(k/1000)*1.5 kpp_all=min(mini_*10/offset,fea_shape) kpp_sample=min(mini_/12/offset,fea_shape) zinput must be torch.Tensorzinput must be floating pointr z>input must be a 2d tensor with shape: [n_samples, n_features] g?ir )r dtypeNr/r T)Z return_countsrr.g?g?ziter:zerror:z time spent:zused z iterations (zs) to cluster z items into z clusters)* isinstancerTensorAssertionErrorr8ZhalffloatZdoublendimnpZpowerlogrZno_gradrrr(rr&rr rmin centroidsZcudaZ empty_cacheZonesrr r5Zint16uniqueexpandZarangerpowZlongr"r)itemroundr!)r*XoffsetrZ start_timexZnum_points_in_clustersZclosestrZ iter_timeZmatched_clustersZcountsZexpanded_closestmaskZc_graderrorZlrrrr fit_predictsX"  , $     *    "   >zKMeansGPU.fit_predictN) __name__ __module__ __qualname____doc__rr r, staticmethodr2r3r5rMrrrrr0s   r)r) mathpdbrr$Ztorch.nn.functionalrrnumpyr?r;r&rrrrrrs   )