We tried to implement multi task learning (MTL) on Wav2Vec2.0 model with xls-r-300m checkpoint for performing both classification and isolated word recognition. The classification task is going good with a good accuracy. But in transcription, all the transcriptions made are same which are corresponding to pad token id. We used the following hyper parameters Learning rate 1e-05, Connectionist Temporal Classification (CTC) weight 1, classification (CLS) weight 0.1, Batch size 4, gradient checkpointing True, mask time length 4, attention dropout 0.094, feature projection dropout 0.0, hidden dropout 0.05, layer drop 0.045, mask time rob 0.05
We tried to implement multi task learning (MTL) on Wav2Vec2.0 model with xls-r-300m checkpoint for performing both classification and isolated word recognition. The classification task is going good with a good accuracy. But in transcription, all the transcriptions made are same which are corresponding to pad token id.
var
This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)