15,792,771 members
Articles / Programming Languages / C++

# Fast stepwise rotation

Rate me:
23 Apr 2012CPOL5 min read 27K   27   22
We study the problem of computing R cos(a + k b) and R sin(a + k b) for increasing k

## Introduction

Let us study the problem of computing R cos(a + k b) and R sin(a + k b) for an increasing integer k. It arises when doing successive rotations (with scaling), or generating points around a circle. A little challenge is to find efficient ways to compute these numbers, by reducing the amount of calls to the transcendent functions, which are known to be costly even with a floating-point accelerator.

## Background

[1] William H., Saul A. Teukolsky, William T. Vetterling & Brian P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, CAMBRIDGE UNIVERSITY PRESS, Second Edition, pp.178-179.

[2] Goertzel, G. (January 1958), An Algorithm for the Evaluation of Finite Trigonometric Series, American Mathematical Monthly 65 (1): 34–35.

## The Solution

The straightforward approach requires 1 addition, 3 multiplies, 3 assignments, and 2 intrinsic function calls per angle, using a temporary, T. Coded in C, the body of the loop writes:

C++
```T= A + K * B;
X= R * cos(T);
Y= R * sin(T);```

Computing the angle by successive additions of `b` to `a` is not recommended because of the accumulation of truncation errors.

## The Rotation approach

It is well known from algebra that trigonometric functions enjoy many properties, including addition formulas, which we plan to exploit to derive incremental procedures:

cos(a + b) = cos(a) cos(b) – sin(a) sin(b), and

sin(a + b) = sin(a) cos(b) + cos(a) sin(b)

For the sake of conciseness, we will use complex number representation with the notation cis(a) that represents the number cos(a) + i sin(a), also known to be eia from Euler’s formula. The addition formulas just express the well-known rule of exponent addition:

cis(a + b) = ei(a+b) = eia eib = cis(a) cis(b)

Simple, right? (You can check that by developing the complex product.)

Back to our problem, we know that we have to compute the complex numbers:

Pk = R cis(a + k b)

For which we can easily derive this recurrence:

Pk+1 = R cis(a + (k+1) b) = R cis((a + k b) + b) = R cis(a + k b) cis(b) = Pk cis(b), with

P0 = R cis(a)

Programmatically, it translates to the code below, taking care not to overwrite the variables too early. This costs 2 adds, 4 muls, and 3 assigns, and we got rid of the function calls:

C++
```T= Const0 * X - Const1 * Y;
Y= Const0 * Y + Const1 * X;
X= T;```

with the one-time initialization:

C++
```X= R * cos(A);
Y= R * sin(A);
Const0= cos(B);
Const1= sin(B);```

## The Chord approach

The above recurrence has a little flaw in that truncation errors will accumulate over time. The biggest cause of truncation is caused by the fact that for small b, cos(b) is close to 1 (by McLaurin, cos(b)=1-b2/2+b4/24…) and many significant digits of b get lost; sin(b) also loses significant digits, but the corresponding absolute error is an order of magnitude smaller (sin(b)=b (1-b2/6+b4/120…)).

We can fix that by working with the difference between successive P values instead of the P values themselves [1]. Let us introduce the chords, given by:

Qk = Pk+1Pk

We have:

Qk = Pk cis(b) - Pk = (cis(b) - 1) Pk, and

Pk+1 = Pk + Qk

The magic in this formula lies in the factor (cis(b)-1) = cos(b)-1 + i sin(b), because cos(b)-1 can be expressed as -2 sin2(b/2) by the double-angle formula, now two orders of magnitude smaller. This increases the cost to 4 adds, 4 muls, 3 assigns:

C++
```T= X + (Const2 * X - Const1 * Y);
Y= Y + (Const2 * Y + Const1 * X);
X= T;```

with

```Const1= sin(B);
Const2= - 2 * sin(0.5 * B) * sin(0.5 * B);
X= R * cos(A);
Y= R * sin(A);```

By the way, history has given 1-cos(b) a name: the versed sine.

## The Goertzel approach

Goertzel[2] found a clever way to reduce the operation count involved, by developing a second order recurrence. Now, instead of relating Pk+1 to just Pk, we relate it to Pk-1 as well. From geometric considerations, we can observe that:

(Pk+1 + Pk-1) / 2 = cos(b) Pk, or

Pk+1 = 2 cos(b) Pk - Pk-1

What is gained here is that we no more multiply Pk by a complex number but by a real one. Hence, 2 adds, 2 muls, 6 assigns, using the two extra variables `XX` and `YY` to keep track of Pk-1:

C++
```T= Const3 * X - XX; XX= X; X= T;
T= Const3 * Y - YY; YY= Y; Y= T;```

with corresponding initializations:

C++
```Const1= sin(B);
Const2= - 2 * sin(0.5 * B) * sin(0.5 * B);
Const3= 2 * cos(B);
X= R * cos(A);
Y= R * sin(A);
XX= X + (Const2 * X + Const1 * Y);
YY= Y + (Const2 * Y - Const1 * X);```

## The Chord-Goertzel approach

As is turns out, unfortunately, Goertzel’s method suffers from numerical instability. And it does it in a stronger way because of large value cancellation in the subtraction of its two terms. Luckily, we can fix that by using the chord trick once more. We derive:

Qk = 2 cos(b) Pk - Pk-1- Pk = 2 (cos(b)-1), and

Pk - Pk-1+ Pk = 2 (cos(b)-1) Pk + Qk-1

Now we have recovered the accurate cos(b)-1 factor and removed the cancellation. This results in 4 adds, 2 muls, and 4 assigns, introducing the components of Qk-1, DX and DY:

C++
```DX= Const4 * X + DX; X+= DX;
DY= Const4 * Y + DY; Y+= DY;```

and the initializations:

C++
```Const1= sin(B);
Const2= - 2 * sin(0.5 * B) * sin(0.5 * B);
Const4= 2 * Const2;
X= R * cos(A);
Y= R * sin(A);
DX= - (Const2 * X + Const1 * Y);
DY= - (Const2 * Y - Const1 * X);```

## Speed comparison

We ran the above five implementations for one million iterations and timed them. Except for the straight version, running times are independent of the argument values. The fastest one, Goertzel, only takes 5 ns per iteration on a 2.2 GHz machine, which corresponds to 11 cycles only (!)

## Accuracy comparison

We used single precision floating-point variables to increase the accuracy degradation. The values in the table correspond to 1000 iterations starting from a=2 with step b=0.001. This is a typical behavior.

## Conclusions

Grouping our results we have:

 + x = cos/sin Cycles Error Straight 1 3 3 2 167 0.000000 Rotation 2 4 3 - 12 0.000024 Chord 4 4 3 - 15 0.000000 Goertzel 2 2 6 - 11 0.021021 Chord-Goertzel 4 2 4 - 13 0.000001

## Points of Interest

1. Do use an incremental algorithm, it is at least ten times faster than the trivial implementation.
2. Among the incremental algorithms, the running speeds do not vary so much.
3. Chord and Chord-Goertzel are both stable and fast.
4. On a PC, single and double precision versions run at the same speed and are blazingly fast!

## History

This is the second released version after formatting adjustments and use of the cis notation.

Written By
CEO VISION fOr VISION
Belgium
I fell into applied algorithmics at the age of 16 or so. This eventually brought me to develop machine vision software as a professional. This is Dreamland for algorithm lovers.

 First Prev Next
 excellent work! Southmountain19-Oct-19 18:46 Southmountain 19-Oct-19 18:46
 My vote of 5 Anshul Mehra18-Jul-13 17:42 Anshul Mehra 18-Jul-13 17:42
 My vote of 5 frije038-May-12 5:45 frije03 8-May-12 5:45
 Something seems to be missing? frije036-May-12 11:09 frije03 6-May-12 11:09
 Re: Something seems to be missing? YvesDaoust6-May-12 21:44 YvesDaoust 6-May-12 21:44
 Re: Something seems to be missing? frije037-May-12 9:16 frije03 7-May-12 9:16
 Re: Something seems to be missing? YvesDaoust7-May-12 11:25 YvesDaoust 7-May-12 11:25
 Could you please remove these "<" perilbrain22-Apr-12 3:55 perilbrain 22-Apr-12 3:55
 Re: Could you please remove these "<" YvesDaoust22-Apr-12 7:47 YvesDaoust 22-Apr-12 7:47
 Re: Could you please remove these "<" perilbrain22-Apr-12 20:58 perilbrain 22-Apr-12 20:58
 Re: Could you please remove these "<" YvesDaoust22-Apr-12 22:41 YvesDaoust 22-Apr-12 22:41
 lost art BigTimber@home21-Apr-12 15:47 BigTimber@home 21-Apr-12 15:47
 nonetheless JohnWallis4221-Apr-12 19:09 JohnWallis42 21-Apr-12 19:09
 Re: nonetheless YvesDaoust21-Apr-12 23:15 YvesDaoust 21-Apr-12 23:15
 Re: nonetheless BigTimber@home22-Apr-12 3:41 BigTimber@home 22-Apr-12 3:41
 You got that right. And logs too. I did a log implementation based on Chebyshev approximation in assembly language that was blazingly fast. Lack of FPU, and even before that, lack of multiply/divide instructions. We still did everything we needed with integer arithmetic. That was *real* programming.
 Re: nonetheless YvesDaoust22-Apr-12 23:26 YvesDaoust 22-Apr-12 23:26
 Re: lost art YvesDaoust21-Apr-12 23:02 YvesDaoust 21-Apr-12 23:02
 Re: lost art JohnWallis4222-Apr-12 3:22 JohnWallis42 22-Apr-12 3:22
 Re: lost art YvesDaoust22-Apr-12 21:52 YvesDaoust 22-Apr-12 21:52
 Re: lost art JohnWallis4223-Apr-12 2:13 JohnWallis42 23-Apr-12 2:13
 Re: lost art YvesDaoust23-Apr-12 3:37 YvesDaoust 23-Apr-12 3:37
 Re: lost art JohnWallis4223-Apr-12 5:07 JohnWallis42 23-Apr-12 5:07
 Last Visit: 31-Dec-99 19:00     Last Update: 2-Dec-23 2:03 Refresh 1