Audio Classification
English
music
art
File size: 4,075 Bytes
31bb145
 
aef3419
 
 
 
 
 
d75216b
aef3419
 
 
351ee0f
cb6ba74
3794ef0
30d5177
 
3af403f
343cb30
eeb05df
d75216b
 
cb6ba74
343cb30
d75216b
 
351ee0f
 
cb6ba74
95e6cec
 
 
 
1d35a8e
 
 
 
 
 
 
 
 
 
 
 
 
fd9eceb
873ba75
fd9eceb
 
 
873ba75
fd9eceb
 
 
873ba75
fd9eceb
 
1d35a8e
fd9eceb
 
 
1d35a8e
fd9eceb
 
 
1d35a8e
fd9eceb
 
95e6cec
eeb05df
 
 
95e6cec
1e7bc0f
95e6cec
eeb05df
43f22a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: mit
datasets:
- ccmusic-database/chest_falsetto
language:
- en
metrics:
- accuracy
pipeline_tag: audio-classification
tags:
- music
- art
---

# Intro
The design of the chest-falsetto voice discrimination model aims to effectively differentiate between real and synthetic voices in audio samples, with four specific categories including male chest, male falsetto, female chest, and female falsetto voices. The model's training is based on a backbone network from the computer vision (CV) domain, which involves transforming audio data into spectrograms and fine-tuning to enhance the network's accuracy in recognizing different voice categories. During training, a dataset containing both real and synthetic voice samples is utilized to ensure the model adequately learns and captures features relevant to male and female chest and falsetto voices. Through this approach, the model can finely classify different genders and chest/falsetto voices, providing a reliable solution for accurate voice discrimination in audio. This model holds broad potential applications in fields such as speech processing and music production, offering an efficient and precise tool for audio analysis and processing. Its training and fine-tuning strategies based on computer vision principles highlight the model's adaptability and robustness across different domains, providing beneficial examples for further research and application.

## Demo (inference code)
<https://huggingface.co/spaces/ccmusic-database/chest_falsetto>

## Usage
```python
from huggingface_hub import snapshot_download
model_dir = snapshot_download("ccmusic-database/chest_falsetto")
```

## Maintenance
```bash
GIT_LFS_SKIP_SMUDGE=1 git clone git@hf.co:ccmusic-database/chest_falsetto
cd chest_falsetto
```

## Results
|     Backbone      |             Mel             |     CQT     |   Chroma    |
| :---------------: | :-------------------------: | :---------: | :---------: |
|     Swin-S V2     |         **_0.968_**         |    0.268    | **_0.268_** |
|     MaxViT-T      |            0.820            | **_0.933_** |    0.250    |
|                   |                             |             |             |
|      AlexNet      | [**_0.994_**](#best-result) | **_0.963_** | **_0.586_** |
| ShuffleNet V2 2.0 |            0.939            |    0.669    |    0.222    |
|     GoogleNet     |            0.983            |    0.274    |    0.292    |
|    MNASNet-A3     |            0.756            |    0.260    |    0.320    |
|  SqueezeNet 1.1   |            0.963            |    0.900    |    0.378    |
|      Average      |            0.918            |    0.610    |    0.331    |

### Best Result
<style>
  #falsetto td {
    vertical-align: middle !important;
    text-align: center;
  }
  #falsetto th {
    text-align: center;
  }
</style>
<table id="falsetto">
    <tr>
        <th>Loss curve</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/chest_falsetto/resolve/master/alexnet_mel_2024-07-30_11-52-53/loss.jpg"></td>
    </tr>
    <tr>
        <th>Training and validation accuracy</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/chest_falsetto/resolve/master/alexnet_mel_2024-07-30_11-52-53/acc.jpg"></td>
    </tr>
    <tr>
        <th>Confusion matrix</th>
        <td><img src="https://www.modelscope.cn/models/ccmusic-database/chest_falsetto/resolve/master/alexnet_mel_2024-07-30_11-52-53/mat.jpg"></td>
    </tr>
</table>

## Dataset
<https://huggingface.co/datasets/ccmusic-database/chest_falsetto>

## Mirror
<https://www.modelscope.cn/models/ccmusic-database/chest_falsetto>

## Evaluation
<https://github.com/monetjoe/ccmusic_eval>

## Cite
```bibtex
@dataset{zhaorui_liu_2021_5676893,
  author    = {Zhaorui Liu and Zijin Li},
  title     = {Music Data Sharing Platform for Computational Musicology Research (CCMUSIC DATASET)},
  month     = nov,
  year      = 2021,
  publisher = {Zenodo},
  version   = {1.1},
  doi       = {10.5281/zenodo.5676893},
  url       = {https://doi.org/10.5281/zenodo.5676893}
}
```