voice activity detection based on machine /deep learning

BESSEKHOUAD, Moussa; Hadj Moussa, KELLOU

dc.contributor.author	BESSEKHOUAD, Moussa
dc.contributor.author	Hadj Moussa, KELLOU
dc.date.accessioned	2023-01-16T09:44:12Z
dc.date.available	2023-01-16T09:44:12Z
dc.date.issued	2022
dc.identifier.uri	https://dspace.univ-ghardaia.edu.dz/xmlui/handle/123456789/5318
dc.description.abstract	Voice activity detection (VAD) is identifying speech sections and nonspeech sections in audio files, it's considered a key in many speech applications. Our VAD system is based on deep learning approach also is trained to interact with audio files that are in the Arabic language. As we know the real world interferes with many noise and sound, VAD must deal with a height level of noise, and that’s the reason why this document builds on two different models the first model receives noisy speech audio try to delete and reduce the noise, this model have Redundant Convolutional Encoder-Decoder structure (R-CED) trained by receiving the spectra of the noisy speech file and generate the spectra of the enhanced noisy speech file and the second model received the enhanced noisy speech file and classify the audio into speech section and non-speech section, this second model has artificial Neural Networks structure (ANN), receive the audio information directly, trained by common voice corpus Arabic language and Qut-noise datasets. Getting at the end a 90% accuracy at 5db SNR noise...الكشف عن النشاط الصوتي ( )VADهو تحديد المقاطع التي تحتوي على كلام والمقاطع الغير كلامية في الملفات الصوتية ، و يعتبر مفتا ً حا في العديد من تطبيقات الكلام. تم انشاء نظام VADالخاص بنا باستخدام نهج التعلم العميق كذلك تم تدريبه على التفاعل مع الملفات الصوتية التي تحتوي على اللغة العربية. وكما نعلم أن العالم الحقيقي يتداخل فيه العديد من الضوضاء والاصوت ،لذلك يجب أن يتعامل VADمع ضوضاء مرتفعة، وهذا هو السبب في أن هذه المذكرة تعتمد على نموذجين مختلفين النموذج الأول يستقبل ية بن صوتًا صاخبًا في محاولة لحذف وتقليل الضوضاء ، يحتوي هذا النموذج على فك التشفير التلافيفية المكررة R-CEDمدربة من خلال تلقي أطياف ملف الكلام الصاخب وتوليد أطياف ملف الكلام الصاخب طع المح ّ سن, وتلقى النموذج الثاني ملف الكلام الصاخب المح ّ سن ويصنف الصوت إلى مقاطع الكلامية ومقا غير الكلامية ، هذا النموذج الثاني مبني على بنية الشبكة العصبية الاصطناعية ، ANNيتلقى المعلومات الصوتية مباشرة ، مدرب من قبل مجموعة صوتية مشتركة باللغة العربية ومجموعات بيانات .Qut-Noise تم الوصول في النهاية إلى دقة تصل إلى ٪90في ضوضاء .SNR 5d	EN_en
dc.publisher	université Ghardaia	EN_en
dc.subject	Artificial Neural Networks, Deep Learning, convolutional encoder decoder, Voice activity detection, voice enhancement, audio processing	EN_en
dc.subject	الشبكات العصبية الاصطناعية ، التعلم العميق ، وحدة فك التشفير التلافيفية ، الكشف عن النشاط الصوتي ، تحسين الصوت , معالجة الصوت	EN_en
dc.title	voice activity detection based on machine /deep learning	EN_en
dc.type	Thesis	EN_en