File size: 119,317 Bytes
c06855d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
{
"cells": [
{
"cell_type": "markdown",
"source": [
"#Latviešu valodas runas atpazīšana\n",
"\n",
"Šajā bloknotā ir rīki latviešu valodas runas atpazīšanai. Tiek izmantots LU MII AiLab izveidotais [runas atpazīšanas modelis](https://huggingface.co/AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17), kas veidots izmantojot [Balsu talkā](https://balsutalka.lv/) savāktos datus.\n",
"\n",
"Lai veiktu runas atpazīšanu audio failā sekojiet zemāk uzskaitītajiem soļiem."
],
"metadata": {
"id": "zZBBTnW-aThp"
}
},
{
"cell_type": "markdown",
"source": [
"##1. Nomainiet izpildlaika veidu uz T4 GPU\n",
"\n",
"Lai to izdarītu galvenajā izvēlnē šīs lapas augšpusē ejiet uz `Izpildlaiks` -> `Mainīt izpildlaika veidu` un izvēlieties `T4 GPU`\n",
"\n",
""
],
"metadata": {
"id": "GALJps6fDlQD"
}
},
{
"cell_type": "code",
"source": [
"#@title 2. Nospiediet uz atskaņošanas pogas, lai ielādētu nepieciešamos rīkus.\n",
"\n",
"import ipywidgets as widgets\n",
"from IPython.display import clear_output\n",
"import torch\n",
"from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline\n",
"\n",
"uploader = widgets.FileUpload(description='Izvēlieties audio', accept='audio/*', multiple=False)\n",
"# uploader = widgets.FileUpload(multiple=False)\n",
"display(uploader)"
],
"metadata": {
"id": "zLDYIFciCMTw",
"cellView": "form"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"##3. Izvēlieties audio failu\n",
"Pēc rīku ielādes parādīsies poga \"Izvēlieties audio\", nospiediet to un izvēlieties audio failu kurā atpazīt latviešu valodas runu."
],
"metadata": {
"id": "QxsuN4r0QGnl"
}
},
{
"cell_type": "code",
"source": [
"# @title 4. Palaidiet runas atpazīšanas procesu\n",
"\n",
"if len(uploader.data) == 0:\n",
" display(widgets.HTML(\n",
" value=\"<h3>Audio fails nav izvēlēts, lūdzu izvēlieties audio failu!</h3>\"\n",
" ))\n",
"else:\n",
"\n",
" display(widgets.HTML(\n",
" value=\"<h3>Notiek ielāde...</h3>\"\n",
" ))\n",
"\n",
" device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n",
" torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32\n",
"\n",
" model_id = \"AiLab-IMCS-UL/whisper-large-v3-lv-late-cv17\"\n",
"\n",
" model = AutoModelForSpeechSeq2Seq.from_pretrained(\n",
" model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=False, use_safetensors=True\n",
" ).to(device)\n",
"\n",
" processor = AutoProcessor.from_pretrained(model_id)\n",
"\n",
" pipe = pipeline(\n",
" \"automatic-speech-recognition\",\n",
" generate_kwargs={\"language\": \"latvian\", \"task\": \"transcribe\"},\n",
" model=model,\n",
" tokenizer=processor.tokenizer,\n",
" feature_extractor=processor.feature_extractor,\n",
" max_new_tokens=225,\n",
" chunk_length_s=30,\n",
" batch_size=16,\n",
" return_timestamps=False,\n",
" torch_dtype=torch_dtype,\n",
" device=device,\n",
" )\n",
"\n",
" clear_output()\n",
"\n",
" display(widgets.HTML(\n",
" value=\"<h3>Notiek audio atpazīšana...</h3>\"\n",
" ))\n",
"\n",
" result = pipe(uploader.data[0])\n",
"\n",
" with open('transcript.txt', 'w') as f:\n",
" f.write(result[\"text\"])\n",
"\n",
" result_widget = widgets.HTML(\n",
" value=result[\"text\"]\n",
" )\n",
"\n",
" clear_output()\n",
"\n",
" display(widgets.HTML(\n",
" value=result[\"text\"]\n",
" ))"
],
"metadata": {
"cellView": "form",
"id": "_6ovCwwqC6SM"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## 5. Saglabājiet audio failā atpazīto tekstu\n",
"Pēc audio faila apstrādes tajā atpazītais teksts tiks izdrukāts zem ceturtā soļa šūnas, kā arī tas tiks saglabāts teksta failā `transcript.txt`. Lai apskatītu šo failu uzklikšķiniet uz mapītes ikonas ekrāna sānā.\n",
"\n",
"\n"
],
"metadata": {
"id": "PLqlD2N7STNq"
}
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
} |