Introduction

I'm sure that most readers and professional developers have at some point or another used the simple formulas of Calculus since just about any realistic physics engine requires calculus to function properly at some level. But do you know its complicated story, apart from the notion that it was developed by Newton and Leibniz? I will show that the invention took almost 100 years of sharing different developments in mathematics and that the perhaps most useful, versatile, and important tool in all of mathematics was needed before one could even begin contemplating Calculus. I am aware that Archimedes used his method of exhaustion over 2000 years ago, which was published in his book "The Method". But that text had little to no impact on the development of Calculus in the modern era, as the actual text was not widely available until the 19th century. (Ok. I just realized that I'm calling the modern era from about 1600 to this day modern era). Central to this article is also the idea to show that the development of the numerical methods is actually the way these properties were found in the first place so let's start the journey.

The Mathematical Hunt for Nothing

The development starts in the region of southern France and northern parts of today's Italy. However due to the lack of public journals, things often happened parallel to each other, often in trade cities, seemingly because people started to ask the right mathematical questions, and sharing or boasting that you could solve this and that problem. With these questions and answers came further development; new tools to simplify the old problems and that helped solve new more difficult problems.

In Italy, the contest of trying to solve third and fourth-degree polynomial equations lead to several important additional findings like the emergence of complex numbers from its solutions. And here the need for an efficient tool for writing and solving these equations leads to the early development of algebra which I will rate as perhaps the most important development in all of mathematics throught it history. And the most significant development happened between 1550 and 1650 when the algebraic language begins to take a modern shape with the signs of the modern mathematics +, -, *, / and = implemented. One of the major characters in this development is the French mathematician (or I should really say lawyer as that was his profession and in France the leading mathematicians seems to be lawyers in this period.) Viete 1540 - 1603 (sometimes spelled Vieta or Vietæ. All the same man) produced one of the landmark books about algebra of his period. He also had an earlier more crude version of the Newton-Raphson root-finding method (which Newton was aware of before discovering his method) and studied roots of cycotomic equations extensively. He also worked as a code breaker and was accoused of witch craft from his enemies for cracking their code.

So what makes algebra so important? Modern algebra tools have made it possible to extend solutions and connect fields of mathematics and organize the results with the simplification of proofs. And at least, it was a necessary tool for the development of function theory, In short, it is the most essential mathematical tool. I recent times for developing programming languages were algebra is needed and indeed many of the people designing programming languages were originally educated in algebra. In short: Without algebra, this world would be different.

The journey originated with geometry and the search for areas under certain functions. It quickley followed that what they all needed was a almost zero number that you can divide and multiply with. A number or a function that approximates a very small number, written in modern form: \(\lim\limits_{\epsilon \to 0} \frac{f(x)}{\epsilon}\). These ideas in various forms on this subject pop up all over the place in the northwestern part of the Mediterranean in the 16th and 17th centuries.

Integration formula

The integration formula on the form

$\begin{equation} \int{x^n \, dx} = \frac{x^{n+1}}{n+1} \end{equation}$

was first developed by Cavalieri in Italy. However it was not a method that gave the formulas directly, it was a piece of geometrical evidence that he gave for the formulas from \(n = 1\) to \(9\). And from this, he postulated that this would be true for all powers.

The general proof first came with Fermat that ran with a more geometric approach coupled with algebra and got general formulas for all \(n\). The formula Fermat found was very general and holds even for fractional power.

But it is a bit complicated to show, so here I will instead start off by showing you a metod that starts off with an absolute ingenious way of developing the summation of integer powers that will eventually yield the integration formula. It was developed by a man who was in correspondence with Fermat nearly all his life, Pascal, and this correspondence reveals many new developments in mathematics originating with these two men. We start off with Pascal's brilliant treatise (book) the Arithmetical triangle and its brilliant development of the summation formula. You will be surprised to see that it is easy to follow and that it is very well written. And compared to the normal textbooks of modern tims this will instead show you lots of example on how to calculate things. Back to the question at hand; Pascal starts of wanting to find the sum:

$\begin{equation} \sum_{x=0}^{x=k}{x^n} = S_n(k) \end{equation}$

He uses that binomial theorem which im sure all have seen before in some form or another:

$\begin{equation} (a+b)^{n+1} = {n + 1 \choose 0} a^{n+1} b^0 + {n +1 \choose 1}a^{n-1} b + \cdots + {n +1 \choose n+1} a^0 b^{n+1} \end{equation}$

Now Pascal lets \(b = -1\) and lets a go from \(a = 1\) to \(a = m\). Admittedly Pascal does not follow this exactly so to simplify we just use a simple shortcut instead. Written out:

$\begin{equation} (a+b)^{n} - {n \choose 0} a^n b^0 = {n \choose 1}a^{n-1} b + \cdots + {n \choose n} a^0 b^{n} \end{equation}$

$\begin{equation} a^n b^0 - (a+b)^{n}= -{n \choose 1}a^{n-1} b - \cdots - {n \choose n} a^0 b^{n} \end{equation}$

And Pascal also moves the first term over to the left side of the equation rearrange the equation to get \(m^k - (m-1)^k\):

$ \begin{matrix} 1^{n} - 0^{n} & = & {n \choose 1} 1 \cdot (-1)^{n-1} & + & {n \choose 2} 1^2 \cdot (-1)^{n-2} & + & \cdots & + & {n \choose n} 1^{0} \cdot (-1)^{n}\\ 2^n- 1^n & = &{n \choose 1} 2 \cdot (-1)^{n-1} & + & {n \choose 2} 2^2 \cdot (-1)^{n-2} & + & \cdots & + & {n \choose n} 2^0 \cdot (-1)^{n} \\ 3^{n} - 2^{n} & = &{n \choose 1} 3 \cdot (-1)^{n-1} & + & {n \choose 2} 3^2 \cdot (-1)^{n-2}& + & \cdots& + & {n \choose n} 3^0 \cdot (-1)^{n} \\ \vdots & = & \vdots & + & \vdots & + & \cdots & + & \vdots \\ (m)^{n} - (m-1)^{n} & = &{n \choose 1} m^1 \cdot (-1)^{n-1} & + & {n \choose 2} m^2 \cdot (-1)^{n-2} & + & \cdots & + & {n \choose n} m^0 \cdot (-1)^{n} \\ \end{matrix}$

If we make the summation of each column sepeartly we get the following sums:

$\begin{equation} m^{n} = {n \choose 1} S_1(m) + {n \choose 2} S_2(m) + \cdots + {n \choose n-1} S_{n-1}(m) + {n \choose n} S_0(m) \end{equation}$

The thing that this setup reveals is that you can write the next power series sum as a combination of the previous lower-order sums. Pascal does not stop here, he continues with the exposition in his book "The arithmetical triangle" where he next gives summation formulas for different arithmetical progressions, like \(4+ 8+ \cdots + n*4\)/, this is often said that Gauss used a similar formula in his youth to solve a problem his teacher gave him. Evidence is anecdotal so we don't really know exactly what happened but all Gauss (or you for that matter) really had to do to find this formula was read Pascal's book! But we seek the opposite relation, from the power and range we want the summation formula. We have got a system of equation where \(k\) unknowns from \(k\) equations must be solved. Pascal does this for a couple of sums manually, but we will instead use matrices and modern theory of these to solve the system. We write ut the equations for an abritrary power series sum:

$ \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 & 0\\ 3 & -3 & 1 & 0 & 0\\ -4 & 6 & -4 & 1 & 0 \\ 5 & -10 & 10 & -5 & 1 \\ \end{bmatrix} \cdot \begin{bmatrix} S_0 \\ S_1 \\ S_2\\ S_4 \\ S_5 \end{bmatrix} = \begin{bmatrix} n \\ n^2 \\ n^3 \\ n^4 \\ n^5 \end{bmatrix} $

Here it is much simpler to write this as the equation using modern matrices and vectors:

$ A \cdot S_n = v_n$

With the variables:

$ A = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ -2 & 1 & 0 & 0 & 0\\ 3 & -3 & 1 & 0 & 0\\ -4 & 6 & -4 & 1 & 0 \\ 5 & -10 & 10 & -5 & 1 \\ \end{bmatrix} $

The unknown summation formulas for all powers up to n given as a vector:

$ S_n= \begin{bmatrix} S_0 \\ S_1 \\ S_2\\ S_4 \\ S_5 \end{bmatrix} $

The power of the upper limit is also represented as a vector:

$ v_n = \begin{bmatrix} n \\ n^2 \\ n^3 \\ n^4 \\ n^5 \end{bmatrix} $

How do we solve this system of equations so that we get the summation formula on the left-hand side alone, and all the known coefficients on the right-hand side? Now, here is a crazy idea for you. Let's divide one system of equations with another system of equations. This is exactly what we do. But in the full generality of things, inverting equations needs to be understood properly, which is really only another way to describe a division by. Matrix notation is interpreted like that of normal numbers. However ulike the normal number manipulations lets say that \(a\) and \(b\) are just ordinary numbers: \(a = 1\) gives you \(a^{-1} = \frac{1}{a}\) since \(\frac{a}{a} = 1\) there is a bigger problem with matrices. The problem is that \(a \cdot b\) might not be the same as \(b \cdot a\).

I remember being lazy when solving system of equations in my youth. I got tired of writing the \(x\), \(y\) and \(z\) after the numbers:

$ \begin{align*} \mathbf{a}_{11} \cdot x + \mathbf{a}_{12} \cdot y + \mathbf{a}_{13} \cdot z & = b_1 \\ \mathbf{a}_{21} \cdot x + \mathbf{a}_{22} \cdot y + \mathbf{a}_{23} \cdot z & = b_2 \\ \mathbf{a}_{31} \cdot x + \mathbf{a}_{32} \cdot y + \mathbf{a}_{33} \cdot z & = b_3 \\ \end{align*} $

I did something that is quite an old habit there, which you also most likely do without realizing it. I named the unknown quantities that I want to solve for \(x\), \(y\) and \(z\) while the constants know at the time of calculations are given starting from \(a\), \(b\) and \(c\) etc.

This is actually from around 1650 and the idea was introduced by Descartes. Viete had earlier proposed using vowels and consonants for separating those but I'm glad that didn't stick. Being more interested in the numbers that I had to do the calculations than the variables with I just out of convenience wrote the unknowns once at the top.

$ \begin{align*} & \, \begin{bmatrix} \, \, \mathbf{x} & \, \, \, \, \mathbf{y} & \, \, \, \, \, \mathbf{z} \,\,\,\,\, \end{bmatrix} \\ & \begin{bmatrix} \mathbf{a}_{11} & \mathbf{a}_{13} & \mathbf{a}_{13} \\ \mathbf{a}_{21} & \mathbf{a}_{23} & \mathbf{a}_{23} \\ \mathbf{a}_{31} & \mathbf{a}_{33} & \mathbf{a}_{33} \\ \end{bmatrix} = \begin{bmatrix} \mathbf{b_1} \\ \mathbf{b_2} \\ \mathbf{b_3} \end{bmatrix} \end{align*} $

Its more common to write it on the side of the matrix:

$ \begin{bmatrix} \mathbf{a}_{11} & \mathbf{a}_{13} & \mathbf{a}_{13} \\ \mathbf{a}_{21} & \mathbf{a}_{23} & \mathbf{a}_{23} \\ \mathbf{a}_{31} & \mathbf{a}_{33} & \mathbf{a}_{33} \\ \end{bmatrix} \cdot \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \\ \mathbf{z} \end{bmatrix}= \begin{bmatrix} \mathbf{b_1} \\ \mathbf{b_2} \\ \mathbf{b_3} \end{bmatrix} $

Continuing with the idea on what happens if you multiply two matrixes? It is easy to show. We start off be assuming that the first matrix gives the relation between \(A_1\) and \(A_2\) and \(B_1\) and \(B_2\).

$\begin{bmatrix} \mathbf{A_1} \\ \mathbf{B_1} \end{bmatrix} = \begin{bmatrix} \mathbf{T}_{11} & \mathbf{T}_{12} \\ \mathbf{T}_{21} & \mathbf{T}_{22} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{A_2} \\ \mathbf{B_2} \end{bmatrix} $

We have another matrix that tells the relation between \(A_2\), \(B_2\) with \(A_3\) and \(B_3\).

$\begin{bmatrix} \mathbf{A_2} \\ \mathbf{B_2} \end{bmatrix} = \begin{bmatrix} \mathbf{S}_{11} & \mathbf{S}_{12} \\ \mathbf{S}_{21} & \mathbf{S}_{22} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{A_3} \\ \mathbf{B_3} \end{bmatrix} $

This is now just two equations that explains the first matrix relation:

$ \begin{align*} A_1 &= T_{11} \cdot A_2 + T_{12} \cdot B_2 \\ B_1 &= T_{21} \cdot A_2 + T_{22} \cdot B_2\end{align*} $

And the second one is:

$ \begin{align*} A_2 &= S_{11} \cdot A_3 + S_{12} \cdot B_3 \\ B_2 &= S_{21} \cdot A_3 + S_{22} \cdot B_3 \end{align*} $

We can replace \(A_2\) and \(B_2\) with the last set of equations:

$ \begin{align*} A_1 &= T_{11} \cdot (S_{11} \cdot A_3 + S_{12} \cdot B_3) + T_{12} \cdot (S_{21} \cdot A_3 + S_{22} \cdot B_3) \\ B_1 &= T_{21} \cdot (S_{11} \cdot A_3 + S_{12} \cdot B_3) + T_{22} \cdot (S_{21} \cdot A_3 + S_{22} \cdot B_3)\end{align*} $

We now write the relation between the output \(A_1\) and \(B_1\) to the input \(A_3\) and \(B_3\):

$\begin{bmatrix} \mathbf{A_1} \\ \mathbf{B_1} \end{bmatrix} = \begin{bmatrix} \mathbf{T}_{11} & \mathbf{T}_{12} \\ \mathbf{T}_{21} & \mathbf{T}_{22} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{S}_{11} & \mathbf{S}_{12} \\ \mathbf{S}_{21} & \mathbf{S}_{22} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{A_3} \\ \mathbf{B_3} \end{bmatrix} $

The full equation inserting values from \(A_2\) and \(B_2\) gives:

$ \begin{align*} A_1 &= T_{11} \cdot S_{11} \cdot A_3 + T_{11} \cdot S_{12} \cdot B_3 + T_{12} \cdot S_{21} \cdot A_3 + T_{12} \cdot S_{22} \cdot B_3 \\ B_1 &= T_{21} \cdot S_{11} \cdot A_3 + T_{21} \cdot S_{12} \cdot B_3 + T_{22} \cdot S_{21} \cdot A_3 + T_{22} \cdot S_{22} \cdot B_3 \end{align*} $

The complete matrix is:

$\begin{bmatrix} \mathbf{A_1} \\ \mathbf{B_1} \end{bmatrix} = \begin{bmatrix} \mathbf{T}_{11} \cdot \mathbf{S}_{11} + \mathbf{T}_{12} \cdot \mathbf{S}_{21} & \mathbf{T}_{11} \cdot \mathbf{S}_{12} + \mathbf{T}_{12} \cdot \mathbf{S}_{22} \\ \mathbf{T}_{21} \cdot \mathbf{S}_{11} + \mathbf{T}_{22} \cdot \mathbf{S}_{21} & \mathbf{T}_{21} \cdot \mathbf{S}_{12} + \mathbf{T}_{22} \cdot \mathbf{S}_{22} \end{bmatrix} \cdot \begin{bmatrix} \mathbf{A_3} \\ \mathbf{B_3} \end{bmatrix} $

The general formula for multiplying two matrices can be proved with induction, with the end result being the formula:

$ A_{i,j} = \sum_{k = 1}^{i = n} a_{i,k} \cdot b_{k,j} $

With all tools that is necessary we go back to the problem of finding the summation formulas, which can now be solved as

$A^{-1} A \cdot S_n = A^{-1} v_n$

And we get the solution on the form:

$ S_n = A^{-1} v_n$

Inversion of matrices can be done by solving simple systems of linear equations. We simply set:

$ \begin{bmatrix} \mathbf{a}_{11} & \mathbf{a}_{12} & \mathbf{a}_{13} \\ \mathbf{a}_{21} & \mathbf{a}_{22} & \mathbf{a}_{23} \\ \mathbf{a}_{31} & \mathbf{a}_{32} & \mathbf{a}_{33} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $

This is solved by doing the same row operations on each side of the equal sign to get the identity matrix that originally was on the right hand side and produce the identity matrix on the left hand side. When that is done, you will get the inverse matrix on the right hand side.

However, by using the usual analytical techniques as I showed you as normal Gaussian elimination, the numerical error can be very high. It is not hard even to construct matrices that will blow up with numerical methods. But still, Gaussian elimination is widely used, and the reason is that there are no known examples of real-world problems that exhibit the kind of matrix structure that will make the Gaussian algorithm numerically unstable on a computer. This argument can be illustrated with the use of random matrices, and the probability of having such matrices are extremely unlikely to occur, Actually it is very difficult to pull off with low truncation errors trying to solve a general system of equations on a computer.

To solve a general unknown matrix the recommended technique is to use Householder transformations. If one has a matrix library with standard functions the code for producing (a not optimal code given here for simplicity) QR transformation of the Householder method:

        /// <summary>
        /// Householder QR method
        /// </summary>
        /// <param name="A">QR transformation will be done on the A matrix </param>
        /// <returns>A touple with the orthogonal matrix Q and the triangular matrix R</returns>
        public static (MathMatrix Q, MathMatrix R) QR(this MathMatrix A)
        {
            MathMatrix R = A.Clone();
            MathMatrix Q = new MathMatrix(A.Rows, A.Columns);
            Q = Q.Eye();

            for (int k = 0; k < A.Columns - 1; k++)
            {
                // Get the column part that is going to change
                MathMatrix v = R.GetColumn(k);

                //TODO: Optimize the code
                // Optimalization to the code can be done here. set items above diagonal in matrix vector to zero
                // this is normally not done since the zero elements are not really needed
                // it just simplifies the general code used at a later stage (as I have implemented it).
                for (int i = 0; i < k; i++)
                    v[i] = 0;

                // Size and direction of the reflection
                v[k] += sgn(v[k]) * v.Norm();
                // Making v a unit vector
                v /= v.Norm();

                //TODO: Optimize these op+erations too
                // v is a vector but also matrix (the full matrix is used here, and that is really not needed)
                R -= 2 * v * (v.Transpose() * R);
                Q -= 2 * v * (v.Transpose() * Q); 
            }
            return (Q, R);
        }

The power of this is that the inverse of Q is its transpose due to the fact that it is orthogonal. And the inverse of R is found using back substitution due to the fact it's lower triangular. In the case of Pascal's formula, I only need the back substitution. But giving the code without explaining the fundamentals might give you the wrong idea on how to go about solving matrices in general. Besides, I want to expand this article later on to do more general things with the matrix so this section will be needed then.

The end result is that the code will return the summation formulas for all powers up to \(k\):

$\begin{equation} \sum_{x=0}^{x=n}x^k = \int_0^{n}x^k \, dx + \frac{x^k}{2} + \cdots \end{equation}$

However, we are not finished developing the summation formula, because this is about integration. I'm sure you are all familiar with the notion of a Cauchy sum even if you have never heard the name, which sums up a finite number of rectangles in order to get the integration for a function. The only difference between a Cauchy sum and a Riemann sum is that Riemann uses infinitely many slices to define the integral while Cauchy uses a finite amount. This makes a huge difference in the end result but is usually just needed for theoretical functions. In practice, Cauchy will do just fine for a normal sufficiently smooth function. A way for mathematicians to say that the function looks nice.

We now change the summation instead to sum over \(dx\) instead, or rather \(\delta x\). Pascal also noted that there was a clear and obvious connection between summation and the integration formula but did not expand beyond the short mentioning of this. Proceeding, we then count the finite similar intervals and in reality we get a nice formula for the error compared to integration:

$\begin{equation} lim_{dx \to \frac{C}{n}} \int_0^C x^k \, dx \sum_{x=0}^{x=n}x^k = \int_0^{n}x^k \, dx + \frac{x^k}{2} + \cdots \end{equation}$

Let's start using the result for something. Assuming that it is more interesting to approach the integration by writing:

$\begin{equation} \int_0^C{x^k dx} = \sum_0^{\frac{C}{dx} (= n)}{(n \cdot dx)^k} \end{equation}$

Exchanging \(0\) with \(1\) does not change anything as \(f(0) = 0\) on the right hand side in Thus \(dx = \frac{C}{n}\) is plugged into the function \(f(dx k)\) where \(k \in [0, n]\). We need to replace the n value from the \(C\), thus ensuring that \(C\) divided by \(dx\) is an integer \(n\). Plugging this into the binomial theorem makes for \(a\) and \(b\) allows you to prove that it really is the formula. Replacing \(n = \frac{C}{dx}\) into the formulas found also makes it possible to do the summation without the for a loop.

int ModifiedUpperBound = (int)((double)UpperBound / dx);

ManualSum = 0;
for (int i = 1; i <= ModifiedUpperBound; i++)
{
    ManualSum += Math.Pow(i*dx, (double)Power);
}

ManualSum *= dx;

DoubleTable = MatrixExtentions.PascalTriangle(Power+1, 1, 0).ChessPattern();
SolvedDoubleTable = DoubleTable.InvertMatrix_LowerTriangular(DoubleTable);
int n = SolvedDoubleTable.Columns;
MathMatrix mm = new MathMatrix(n, 1);

for (int i = 0; i < n; i++)
{
    mm[i] = Math.Pow(ModifiedUpperBound, i + 1);
}
SumSolution = SolvedDoubleTable * mm;
FormulaSum = Math.Pow(dx, Power+1) * SumSolution[SumSolution.Rows-1, SumSolution.Columns-1];

This is the discrete analog of the Euler-Maclaurin summation formula, and a version of this was also known to Fermat. He used geometrical arguments to derive and prove it, while I used the binomial summation formula derived by Pascal to prove it. Well, prove is a bit strong. I have showed you a path you can take to prove it.

Difference formula

The formula that we often use but have little explanation of where it comes from at fist sight:

$\begin{equation} \frac{d \, x^n}{dx} = n \cdot x^{n-1} \end{equation}$

A quick note here on the notation of the differential. The \(d\) stands for difference and is first used by Leibnitz. Incidentally the same notation for integration is also from Liebnitz presented in the same paper.

$ (a+b)^n = {n \choose 0}a^nb^0 + {n \choose 1}a^{n-1}b + {n \choose 2}a^{n-2}b^2 + \cdots + {n \choose n}a^0b^n $

We not insert the values \( x \) and the arbitrary small shift \( dx \). Normally we reserve \( dx \) for an infinitely small number. In mathematical terms we often write something like \( dx = \lim\limits_{\Delta x \to 0} \Delta x\). This notation is from Bernoulli who uses it the first time in 1706 and widely employed The case is that we seek a very small real number that is not zero, since we can't divide by zero. The difference between the \(Delta\) and the \(d\) is that one a given distance the is small, while the other is a infinitely small distance that is defined without a limit.

$ (x+dx)^n = {n \choose 0}x^n {dx}^0 + {n \choose 1}x^{n-1}dx + {n \choose 2}x^{n-2}{dx}^2 + \cdots + {n \choose n}x^0{dx}^n $

We only include the first few terms:

$ (x+dx)^n = x^n + n x^{n-1}dx + \cdots $

Subtract and divide:

$ (x+dx)^n - x^n = n x^{n-1}dx + \cdots $

We get a general expression fro the derivative:

$ \frac{(x+dx)^n - x^n}{dx} = n x^{n-1} + \mathcal{O}(f''(x)) $

In modern terms (meaning after Euler):

$ \frac{f(x+dx) - f(x)}{dx} = f'(x) $

The binomial formula that is developed can be used to understand the rules of differentiation. Take the standard formula:

$ (u \cdot v)' = u' \cdot v + v' \cdot u $

In therms of the binomial expansion we can write a similar formula:

$ (x+dx)^n (y+dy)^n = x^n \cdot y^n + x^n \cdot n \cdot y^{n-1} dy + y^n \cdot n \cdot x^{n-1} dx + \cdots $

We realize that we get exactly the same as the result for the function with only x values, only this time we have the sum of the x and the y derivative respectively. It gets a bit complicated but in essence once you get how its done its easy to derive it generally. With a symbolic package it fairly straight forward:

var x = Expr.Variable("x");
var y = Expr.Variable("y");
var dx = Expr.Variable("dx");
var dy = Expr.Variable("dy");

ExpandedDx = ((x + dx).Pow(2).Expand().Subtract(x.Pow(2))).Divide(dx).Expand().Substitute(dx, 0).ToString();
ExpandedDy = ((y + dy).Pow(2).Expand().Subtract(y.Pow(2))).Divide(dy).Expand().Substitute(dy, 0).ToString();

Many attribute the binomial formula for fractions that enabled further developments of calculus, which eventually lead to the Taylor and MacLaurin formulas. But there is good evidence that the Scottish mathematician James Gregory already developed the Taylor series in 1670s. Taylor and MacLaurin only published their discoveries after 1700. While Gregory was from Scotland and worked there most of his life, he went abroad to study the "basic" building blocks of calculus when he went to Italy to study the mathematical developments there, seeing among other things Cavalieri's formula. On the return journey he gave out a book systemizing the things he had learned while in Italy, that book got him accepted in the Royal Society. Back in Scotland he also developed binomial formulas completely independent of Newton. Actually, he was in many respects a better mathematician than Newton as he seems to think more like a modern mathematician and sees the overall picture better. Although Newton seems to be much better than Gregory at understanding physics and making things work like a telescope which both did. Sadly Gregory dies of what is probably a heart attack at the age of 38.

As to the discovery of calculus by Newton a text surfaced in the 1920s (reference: Simmons) where Newton admitted that he took most of his ideas on the building block of calculus from Fermat. Indeed Fermat had already before Newton did his treatise developed the idea of finding maxima and minima point of a curve using derivatives. Thereby also do minimization of time to arrive at Snell's law of refraction. Even though Snell is not the first to notice this formula and it seems to first appear in a text from Thomas Harriot. As for Leibniz he had by 1673 he had progressed to reading Pascal’s Traité des Sinus du Quarte Cercle and it was during his largely autodidactic research that Leibniz said: "a light turned on". In many respects, the only difference between Newton and Leibnitz seems to be that Newton took inspiration from Fermat and Libenitz took inspiration from Pascal in order to further develop calculus.

References

There is actually a whole lecture series by Prof. Wildburger on the history of mathematics available here: Youtube lecture playlist

The lectures generally follows the book "Mathematics and Its History" by John Stillwell.

Other good books on mathematics and history are "History of Mathematics" by Victor J. Katz and a textbook that I would say is perhaps one of the best mathematics textbooks I have ever read: "Differential equations with applications and historical notes" by George F. Simmons For solving a general system of equations the book "Numerical linear algebra" by Trefethen and Bau is used in this article.

Another great classic history book of mathematics is Herman H. Goldstine's "A History o fnumerical analysis - From the 16th through the 19th century". Although its scope is much more narrow the previous textbooks.