Détection d’activité vocale utilisant l’apprentissage profond

Khene, Meroua; Souid, Sabrina

dc.contributor.author	Khene, Meroua
dc.contributor.author	Souid, Sabrina
dc.date.accessioned	2021-12-07T20:07:22Z
dc.date.available	2021-12-07T20:07:22Z
dc.date.issued	2019
dc.identifier.uri	https://dspace.univ-ghardaia.edu.dz/xmlui/handle/123456789/480
dc.description.abstract	R´esum´e La d´etection d’activit´e vocale (DAV) est consid´er´ee comme l’une des principales techniques pour de nombreuses applications vocales. C’est une m´ethode importante dans le traitement de la parole, car elle d´etecte la pr´esence ou l’absence d’une voix humaine. Auparavant, les performances de la DAV ´etaient bas´ees sur des m´ethodes qui d´ependent du traitement du signal, mais ne donnaient pas des performances satisfaisantes dans des environnements `a bruit ´elev´e, donc l’apprentissage profond est devenu une alternative. A partir de l`a, nous avons adopt´es dans l’´etude exp´erimentale ` sur trois structures pour l’apprentissage profond qui sont la m´emoire `a long et court terme (LSTM), l’unit´e r´ecurrente ferm´ee (GRU) et un r´eseau DenseNet, et nous avons ´egalement utilis´es les deux bases de donn´ees pour la parole et le bruit, qui sont LibriSpeech et QUT-NOISE successivement. Nous avons mesur´es la pr´ecision du WebRTC dans des environnements `a faible bruit avec diverses sensibilit´es et nous avons obtenus une pr´ecision de 98% Abstract Voice activity detection (VAD) is considered to be one of the main techniques for many voice applications. It is an important method in speech processing, because it detects the presence or absence of a human voice. Previously, the performance of the VAD was based on methods that depend on signal processing, but did not give satisfactory performance in high noise environments, so deep learning became an alternative. From there, We adopted in the experimental study of three structures for deep learning, long short-term memory (LSTM), gated recurrent unit (GRU) and a DenseNet network, and we also used the two databases for speech and noise, which are LibriSpeech and QUT-NOISE successively. We measured the accuracy of WebRTC in low noise environments with various sensitivities and we got accuracy of 98%. ملخص يعد اكتشاف النشاط الصوتي كتقنية من التقنيات الرئيسية للعديد من تطبيقات الكالم، فهو طريقة مهمة في معالجة الكالم، حيث يقوم بالكشف عن وجود أو غياب الصوت البشري. سابقا، تم االعتماد في أداء اكتشاف ًء النشاط الصوتي على طرق مرضيًا في البيئات ذات الضوضاء تعتمد على معالجةاالشارات، لكن لم تعطي أدا . ومنه اعتمدنا في الدراسة التجريبية على ثالث بنيات للتعلم العميق العالية، لذلك أصبح التعلم العميق بديالً وهي الذاكرة طويلة قصيرة المدى LSTM ،الوحدة المتكررة ذات البوابات GRU وشبكة DenseNet ،كما استخدمنا مجموعتي البيانات للكالم والضوضاء، وهي LibriSpeech و NOISE-QUT على التوالي. قمنا بقياس دقة WebRTC في البيئات ذات ضوضاء منخفضة وبحساسيات مختلفة وحصلنا على دقة % 98	EN_en
dc.publisher	جامعة غرداية	EN_en
dc.subject	R´eseaux de neurones, apprentissage profond, r´eseaux de neurones r´ecurrents, R´eseaux de neurones convolutifs, d´etection d’activit´e vocale, rapport signal sur bruit	EN_en
dc.subject	Neural networks, deep learning, recurrent neural networks, Convolutional Neural Networks, voice activity detection, signal-to-noise ratio	EN_en
dc.subject	الشبكات العصبية، التعلم العميق، الشبكات العصبية المتكررة، الشبكات العصبية التالفيفية، كاشف النشاط الصوتي، نسبة اإلشارة إلى الضوضاء	EN_en
dc.title	Détection d’activité vocale utilisant l’apprentissage profond	EN_en
dc.type	Thesis	EN_en