There are two possible general optimisations:
- Replace the function calls with the necessary code,
- Pre-calculate intermediate results.
The resulting code may look like this:
int z = Width * (Height - 1);
for (y = Height - 1; y >= 0; y--, z-= Width)
{
for (x = Width - 1; x >= 0; x--)
{
dest[x + z] = src[x * y];
}
}