Deep Learning for Amharic speech recognition – Part 2

Some more examples after training on a bigger dataset. Everything else (model size, hyper-parameters, and so forth) have been kept the same. WER stands for Word Error Rate. Samples were taken from the training set, not validation/test.

Truth: የሚ ወጡት ኤምባሲው ውስጥ የሚሰሩ አባላት ና ጥገ ኞቻቸው እንደሆኑ አሜሪካኖቹ አስታውቀ ዋል
Predicted: የሚ ጡት ምባሲ ውስጥ የሚ ሰርአባላት ነው ትገንቻቸው እንዱሙ አሜሊካንቹ አስ ታውዋል
WER: 19
Truth: አስር ሺ ኢትዮጵያዊያ ንም ቤት አልባ ሆኑ
Predicted: አስር ሺ ኢትዮጵያውያ ንም ቤት አልባ ሆኑ
WER: 1
Truth: ያን ያህል ተዳርሰ ን እንወስ ና ለ ን እንጂ ውሳኔ ለ ማግኘት ወደ በላዩ የ ኢትዮጵያ ፉትቦል ፌዴሬሽን አን ሄድ ም
Predicted: የን የህል ተዳርሰንወስና ለን ንጅ ውሳኔለ ማግት ተወደ በላ ይ ኢትዮጵያ ፉት ቦል ፌዴድ ሽን ንደው
WER: 24
Truth: ሰውዬው ሚስቱ ን በጣም እንደሚ ወዳት በ ለ ሆሳ ስ ነገራት
Predicted: ሰውዬው ሚስቱ ን በጣም እንደሚ ወዳት በ ለሆሳስ ነገራት
WER: 2
Truth: በ መከላከያ ሚኒስቴር ሹም ሽር እና እስር እየተካሄደ ነው
Predicted: በመካላኪያሚቴርሹም ሽር ና ስር እየ ተካሄዳ ነው
WER: 11
Truth: ኢህ አ ድግ በ ርእሰ ከተማይቱ አዲስ አባ ሀያ ሰባት ወረዳ ዎች በ እርግጠኝ ነት እንደሚ ያሸንፍ እንደሚ ተማመን ገልፇ ል
Predicted: ያድግ በርሰ ከተማይቶ አዲ በባአያ ሰባቶ ረዳዎች በ ርግተኝነት እንደሚያሸነፍ እንደሚ ተማመንገልፇል
WER: 22
Truth: ለምን ስ እና ንገላታ ቸዋለን የሚ ል አስተያየት ሰንዝ ረዋል
Predicted: ለምንስ ና ንገላታቸዋለን የሚ ላ አ ሰያት ስንዝረዋል
WER: 9
Truth: ግን እሱን ከ ሶስት አመት በፊት ሸ ጬ ዋለሁ
Predicted: ግያ ሱም ከሶስት አመት በፊሸጩነሁ
WER: 11
Truth: አህዛብም ከጥፋት ውሀ በኋላ
Predicted: አህዛብም ከጥፋት ሀ በኋላ
WER: 1
Truth: በዚያች አመትም ስለ ከብቶቻቸው ሁሉ ፋንታ እህልን መገባቸው
Predicted: በዚያች አመትም ስለ ከብቶቻቸው ሁሉ ፈንታ እልን መገባቸው
WER: 2

5 thoughts to “Deep Learning for Amharic speech recognition – Part 2”

  1. i need a little bit explanation on this , namely the relationships between the four audio, truth , predicted and WER

  2. Hi Dell,

    Truth: What the speaker said. We want the speech recognizer to come as close to this as possible.

    Predicted: Speech recognizer’s understanding of the speech. When a sound file is given to the machine learning model, it outputs some text. Its called “Predicted” because machine learning models are commonly used to predict stuff. So normally the output from any ML model is referred to as “Predicted”.

    WER: The “distance” in number between Truth and Predicted. It’s possible to evaluate the accuracy of Predicted by simply comparing Truth and Predicted. But we compare thousands of Truth and Predicted pairs. In order to do this automatically, we need to express accuracy with some number. WER is a commonly used metric for this purpose. When WER=0, it means Predicted exactly matches Truth. The larger it is, the more Predicted has deviated from Truth. We strive for smaller values of WER. It can, and should, be calculated across the entire dataset to understand overall performance of the model. Acceptable level of WER can be decided based on the application area.

  3. Hi TILAYE
    Please, can you help me understanding deep learning techniques for text processing Amharic language? More specifically how to train custom word embedding and how to adopt NN architecture.

  4. Hi Mahlet,

    You can use Gensim to build a model for Amharic,
    https://en.wikipedia.org/wiki/Gensim

    There are other implementations too.

    You can also implement your own. It depends on what your priorities are. Writing it from scratch will give you a deep understanding of the algorithm but it takes up your time.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code