Click here to Skip to main content
15,885,899 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
We tried to implement multi task learning (MTL) on Wav2Vec2.0 model with xls-r-300m checkpoint for performing both classification and isolated word recognition. The classification task is going good with a good accuracy. But in transcription, all the transcriptions made are same which are corresponding to pad token id.

We used the following hyper parameters Learning rate 1e-05, Connectionist Temporal Classification (CTC) weight 1, classification (CLS) weight 0.1, Batch size 4, gradient checkpointing True, mask time length 4, attention dropout 0.094, feature projection dropout 0.0, hidden dropout 0.05, layer drop 0.045, mask time rob 0.05


What I have tried:

We tried to implement multi task learning (MTL) on Wav2Vec2.0 model with xls-r-300m checkpoint for performing both classification and isolated word recognition. The classification task is going good with a good accuracy. But in transcription, all the transcriptions made are same which are corresponding to pad token id.
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900