The problem seems to be that you don't have a clear idea what it means to "merge" two audio streams. What you probably mean is to mix both audio signals in a way that an audio mixer would do. For that to be possible, the samples of both streams must be taken at the same points in time, i.e. the sampling clocks must have been synchronized. (If that is not the case you first must do a sampling rate conversion of one on the streams, which is another story and not necessarily something for a beginner).
The "mixing" of two synchronized audio signals is actually very simple. If a(n) and b(n) are the n-th samples of the streams a and b, the output stream c is calculated by:
c(n) = f * a(n) + g * b(n);
The two factors f und g are in the range of [0, 1.0] and determine volume with which each of the streams is mixed in. Normally f + g should not exceed 1.0, otherwise you might produce overflows, which sound really nasty in your result.
So the steps you should take could be the following:
(1) Understand the PCM format and how you can access each sample of a stream
(2) Create the output stream and allocate the required buffer space
(3) Program a simple loop over both streams with the above mixing formula
(4) Write the generated stream to a file.
If you have problems in any of those steps, you might want to place a new question and you can now refer more precisely to the step in which you have the difficulty.