Series & Approximation · intermediate · 50 min read

Power Series & Taylor Series

When do infinite polynomial expansions converge — and when can you differentiate and integrate them term by term? The bridge from finite Taylor approximations to infinite series representations, with the radius of convergence as the gatekeeper.

Abstract. A power series ∑ aₙ(x−c)ⁿ is an infinite polynomial — a series whose terms are functions of x rather than fixed numbers. The Cauchy-Hadamard theorem determines a radius of convergence R = 1/limsup|aₙ|^(1/n): the series converges absolutely for |x−c| < R and diverges for |x−c| > R. At the endpoints x = c ± R, convergence must be tested case by case using the convergence tests from Series Convergence & Tests. Inside the radius of convergence, a power series converges uniformly on every compact subset, which — via the interchange theorems from Uniform Convergence — justifies term-by-term differentiation and integration: the derivative of a power series is the series of derivatives, and the integral of a power series is the series of integrals, both with the same radius R. This makes power series infinitely differentiable inside their radius of convergence. A Taylor series ∑ f⁽ⁿ⁾(c)/n! · (x−c)ⁿ is a power series whose coefficients are determined by the derivatives of f at the center c. For analytic functions — those whose Taylor series converges to f in a neighborhood of c — the Taylor series provides a complete representation. But smooth does not imply analytic: the function e^(−1/x²) is C^∞ at the origin with all derivatives zero, yet is not identically zero, so its Taylor series converges to the wrong function. The Taylor series catalog (eˣ, sin x, cos x, ln(1+x), 1/(1−x), the binomial series (1+x)^α) forms the backbone of local approximation in both pure mathematics and applied ML. In machine learning, Taylor expansions appear in the descent lemma for gradient descent convergence, the Laplace approximation of posterior distributions, GELU activation function computation, and the matrix exponential for continuous-time dynamical models.

1. Overview & Motivation — From Finite to Infinite

Mean Value Theorem & Taylor Expansion gave us Taylor polynomials — finite sums Tn(x)=k=0nf(k)(a)k!(xa)kT_n(x) = \sum_{k=0}^{n} \frac{f^{(k)}(a)}{k!}(x-a)^k that approximate ff near aa. Series Convergence & Tests gave us convergence tests for infinite sums of numbers. What happens when we let nn \to \infty in the Taylor polynomial — when does the infinite sum converge, and does it converge to ff?

More generally: what happens when the terms of a series depend on xx? The geometric series n=0xn=11x\sum_{n=0}^{\infty} x^n = \frac{1}{1-x} already demonstrated this in Series Convergence & Tests — it converges for x<1|x| < 1 and diverges for x1|x| \geq 1. Power series generalize this pattern.

Why this matters in ML. Neural network activation functions are often approximated by truncated Taylor series. The GELU activation GELU(x)=xΦ(x)\text{GELU}(x) = x \cdot \Phi(x) is implemented in practice as 0.5x(1+tanh(2/π(x+0.044715x3)))0.5x(1 + \tanh(\sqrt{2/\pi}(x + 0.044715x^3))), which is a polynomial approximation derived from a Taylor expansion. Understanding when and where such truncations are valid requires the theory of power series convergence.

This topic sits at the intersection of three prerequisites. Series Convergence & Tests provides the convergence tests (ratio, root) that determine the radius of convergence. Uniform Convergence provides the uniform convergence theory that justifies term-by-term calculus. Mean Value Theorem & Taylor Expansion provides the Taylor polynomial machinery whose infinite extension we now analyze.

Power series overview — three behaviors: convergence inside R, divergence outside R, and the entire-function case R = ∞

2. Power Series — Definition and First Examples

📐 Definition 1 (Power Series)

A power series centered at cc is an expression of the form

n=0an(xc)n=a0+a1(xc)+a2(xc)2+\sum_{n=0}^{\infty} a_n (x - c)^n = a_0 + a_1(x-c) + a_2(x-c)^2 + \cdots

where a0,a1,a2,a_0, a_1, a_2, \ldots are real constants called the coefficients and cc is the center. The special case c=0c = 0 gives anxn\sum a_n x^n.

A power series is not a single number — it is a function of xx. For each value of xx, we get a numerical series an(xc)n\sum a_n (x-c)^n that may converge or diverge. The central question of this topic is: for which values of xx does the series converge?

📝 Example 1 (The geometric series as a power series)

n=0xn\sum_{n=0}^{\infty} x^n has an=1a_n = 1 for all nn and center c=0c = 0. From Series Convergence & Tests, this converges to 11x\frac{1}{1-x} for x<1|x| < 1 and diverges for x1|x| \geq 1. This is the prototype: convergence on an open interval, divergence outside it.

📝 Example 2 (The exponential series (R = ∞))

n=0xnn!\sum_{n=0}^{\infty} \frac{x^n}{n!} has an=1/n!a_n = 1/n! and converges for all xRx \in \mathbb{R}. From Mean Value Theorem & Taylor Expansion, we know this equals exe^x. The ratio of consecutive terms is x/(n+1)0|x|/(n+1) \to 0 for every fixed xx, so the ratio test gives convergence everywhere.

📝 Example 3 (A series that converges only at its center (R = 0))

n=0n!xn\sum_{n=0}^{\infty} n! \, x^n diverges for every x0x \neq 0. The ratio an+1xn+1/(anxn)=(n+1)x|a_{n+1} x^{n+1}/(a_n x^n)| = (n+1)|x| \to \infty for any fixed x0x \neq 0. The “radius of convergence” is 00 — this series is useless as a function of xx.

💡 Remark 1 (Power series generalize polynomials)

A polynomial of degree NN is a power series with an=0a_n = 0 for n>Nn > N. It converges everywhere (R=R = \infty). Power series extend this to “infinite-degree polynomials,” but the trade-off is that convergence is no longer automatic.

3. Radius of Convergence

The three examples above illustrate a remarkable structural fact: a power series always converges on an interval centered at cc. The half-width of this interval is the radius of convergence — and it is determined entirely by the coefficients.

🔷 Theorem 1 (Existence of the Radius of Convergence)

For any power series an(xc)n\sum a_n (x - c)^n, exactly one of the following holds:

(i) The series converges only at x=cx = c (R=0R = 0).

(ii) The series converges for all xRx \in \mathbb{R} (R=R = \infty).

(iii) There exists R>0R > 0 such that the series converges absolutely for xc<R|x - c| < R and diverges for xc>R|x - c| > R.

Proof.

Suppose an(x0c)n\sum a_n (x_0 - c)^n converges at some x0cx_0 \neq c. Then an(x0c)n0a_n(x_0 - c)^n \to 0 (by the divergence test from Series Convergence & Tests), so an(x0c)nM|a_n(x_0 - c)^n| \leq M for some bound MM. For any xx with xc<x0c|x - c| < |x_0 - c|, set r=xc/x0c<1r = |x - c|/|x_0 - c| < 1. Then

an(xc)n=an(x0c)nrnMrn.|a_n(x-c)^n| = |a_n(x_0-c)^n| \cdot r^n \leq M r^n.

Since Mrn\sum M r^n converges (geometric series with r<1r < 1), the comparison test gives absolute convergence at xx.

Now define R=sup{x0c:an(x0c)n converges}R = \sup\{|x_0 - c| : \sum a_n(x_0-c)^n \text{ converges}\}. If the series converges only at cc, then R=0R = 0 (case i). If the supremum is infinite, then R=R = \infty (case ii). Otherwise, RR is a positive real number (case iii): the argument above shows convergence for xc<R|x - c| < R, and divergence for xc>R|x - c| > R follows because if the series converged at some x1x_1 with x1c>R|x_1 - c| > R, it would also converge at all xx with xc<x1c|x - c| < |x_1 - c|, contradicting R=supR = \sup.

📐 Definition 2 (Radius of Convergence)

The number RR from Theorem 1 is the radius of convergence of an(xc)n\sum a_n(x-c)^n. We allow R=0R = 0 and R=R = \infty. The open interval (cR,c+R)(c-R, c+R) is the open interval of convergence.

How do we compute RR? The root and ratio tests from Series Convergence & Tests, applied to the coefficient sequence, give explicit formulas.

🔷 Theorem 2 (The Cauchy-Hadamard Formula)

1R=lim supnan1/n\frac{1}{R} = \limsup_{n \to \infty} |a_n|^{1/n}

with the convention 1/0=1/0 = \infty and 1/=01/\infty = 0.

Proof.

Apply the root test from Series Convergence & Tests to an(xc)n\sum |a_n(x-c)^n|:

lim supnan(xc)n1/n=xclim supnan1/n.\limsup_{n \to \infty} |a_n(x-c)^n|^{1/n} = |x-c| \cdot \limsup_{n \to \infty} |a_n|^{1/n}.

The root test gives convergence when this is <1< 1, i.e., xc<1/lim supan1/n=R|x-c| < 1/\limsup |a_n|^{1/n} = R, and divergence when >1> 1, i.e., xc>R|x-c| > R.

🔷 Theorem 3 (Ratio Test for Radius)

If limnan+1/an\lim_{n \to \infty} |a_{n+1}/a_n| exists (possibly =0= 0 or == \infty), then

R=limnanan+1.R = \lim_{n \to \infty} \left|\frac{a_n}{a_{n+1}}\right|.

💡 Remark 2 (Root test vs. ratio test)

The Cauchy-Hadamard formula always works (it uses limsup). The ratio test requires the limit to exist. When both apply, they give the same RR. From Series Convergence & Tests, the root test is strictly stronger — there exist series where the ratio test is inconclusive but the root test determines RR.

📝 Example 4 (Computing R for standard series)

(a) xn/n!\sum x^n/n!: ratio gives R=limn(n+1)=R = \lim_{n \to \infty} (n+1) = \infty.

(b) n!xn\sum n! \, x^n: ratio gives R=limn1/(n+1)=0R = \lim_{n \to \infty} 1/(n+1) = 0.

(c) xn/n\sum x^n/n: ratio gives R=limnn/(n+1)=1R = \lim_{n \to \infty} n/(n+1) = 1.

(d) xn/n2\sum x^n/n^2: ratio gives R=limnn2/(n+1)2=1R = \lim_{n \to \infty} n^2/(n+1)^2 = 1.

(e) nnxn\sum n^n x^n: Cauchy-Hadamard gives 1/R=lim supn=1/R = \limsup n = \infty, so R=0R = 0.

The explorer below lets you see the diagnostic sequences an1/n|a_n|^{1/n} and an+1/an|a_{n+1}/a_n| converging to 1/R1/R, and probe what happens when you evaluate the series at points inside and outside the radius.

Diagnostic: |aₙ₊₁/aₙ| 0.0182R = At x = 0.500: converges · S60 = 1.648721

Radius of convergence — Cauchy-Hadamard and ratio test diagnostics converging to 1/R

4. Endpoint Behavior — Where the Tests from Topic 17 Come to Work

The radius RR determines convergence on the open interval (cR,c+R)(c-R, c+R) and divergence outside [cR,c+R][c-R, c+R]. But at the endpoints x=c±Rx = c \pm R themselves, the power series becomes a numerical series — and you must test it directly using the convergence toolkit from Series Convergence & Tests.

📐 Definition 3 (Interval of Convergence)

The interval of convergence of an(xc)n\sum a_n(x-c)^n is the set of all xx where the series converges. It always includes the open interval (cR,c+R)(c-R, c+R) and may or may not include either endpoint.

📝 Example 5 (Three endpoint behaviors)

All three of the following have R=1R = 1 centered at c=0c = 0:

(a) xn\sum x^n: diverges at both endpoints. At x=1x = 1: 1\sum 1 diverges. At x=1x = -1: (1)n\sum (-1)^n diverges. Interval: (1,1)(-1, 1).

(b) xn/n2\sum x^n/n^2: converges at both endpoints. At x=1x = 1: 1/n2\sum 1/n^2 converges (pp-series, p=2p = 2). At x=1x = -1: (1)n/n2\sum (-1)^n/n^2 converges absolutely. Interval: [1,1][-1, 1].

(c) xn/n\sum x^n/n: mixed. At x=1x = -1: (1)n/n\sum (-1)^n/n converges by the alternating series (Leibniz) test. At x=1x = 1: 1/n\sum 1/n diverges (harmonic). Interval: [1,1)[-1, 1).

💡 Remark 3 (The four endpoint possibilities)

With two endpoints and two possible verdicts (converge/diverge) at each, there are four combinations: (cR,c+R)(c-R, c+R), [cR,c+R][c-R, c+R], [cR,c+R)[c-R, c+R), (cR,c+R](c-R, c+R]. All four occur in practice. The algorithm is always the same: (1) compute RR via ratio or Cauchy-Hadamard; (2) substitute x=c±Rx = c \pm R and apply a convergence test from Series Convergence & Tests.

📝 Example 6 (Endpoint analysis with comparison and alternating series tests)

For (1)nxnn+1\sum \frac{(-1)^n x^n}{\sqrt{n+1}}, the ratio test gives R=1R = 1. At x=1x = 1: (1)n/n+1\sum (-1)^n/\sqrt{n+1} converges by the Leibniz test (alternating, terms decrease to 00). At x=1x = -1: 1/n+1\sum 1/\sqrt{n+1} diverges by comparison with 1/n\sum 1/\sqrt{n} (pp-series with p=1/2<1p = 1/2 < 1). Interval: (1,1](-1, 1].

Endpoint behavior — the four possible interval types demonstrated by three standard series

5. Uniform Convergence on Compact Subsets

This is the section where the threads come together. We connect power series to the uniform convergence theory from Uniform Convergence, establishing the key property that makes everything in the next section work.

🔷 Theorem 4 (Uniform Convergence on Compact Subsets)

If an(xc)n\sum a_n(x-c)^n has radius of convergence R>0R > 0, then the series converges uniformly on every closed interval [cr,c+r][c-r, c+r] for 0<r<R0 < r < R.

Proof.

Fix rr with 0<r<R0 < r < R and let Mn=anrnM_n = |a_n| r^n. For xcr|x - c| \leq r, we have

an(xc)nanrn=Mn.|a_n(x-c)^n| \leq |a_n| r^n = M_n.

Since r<Rr < R, the series Mn=anrn\sum M_n = \sum |a_n| r^n converges (by the definition of RR — the power series converges absolutely for xc<R|x - c| < R, and r<Rr < R). By the Weierstrass M-test from Uniform Convergence, the series an(xc)n\sum a_n(x-c)^n converges uniformly on [cr,c+r][c-r, c+r].

💡 Remark 4 (Why compact subsets, not the full interval)

The power series xn=1/(1x)\sum x^n = 1/(1-x) converges pointwise on (1,1)(-1, 1) but does not converge uniformly on all of (1,1)(-1, 1). The partial sums Sn(x)=(1xn+1)/(1x)S_n(x) = (1-x^{n+1})/(1-x) satisfy

supx(1,1)Sn(x)1/(1x)=supx(1,1)xn+11x=\sup_{x \in (-1,1)} |S_n(x) - 1/(1-x)| = \sup_{x \in (-1,1)} \frac{|x|^{n+1}}{|1-x|} = \infty

for every nn (the supremum blows up as x1x \to 1^-). The uniform convergence theorem only guarantees uniformity on [r,r][-r, r] for r<1r < 1 — compact subsets strictly inside the interval of convergence. This is the same pointwise-vs-uniform distinction from Uniform Convergence, now appearing in a concrete power-series context.

Uniform convergence on compact subsets — sup-norm error decreasing on [-r,r] for r < R

6. Term-by-Term Differentiation & Integration

Because power series converge uniformly on compact subsets, the interchange theorems from Uniform Convergence apply. We can differentiate and integrate a power series term by term — and the resulting series has the same radius of convergence. This is the computational payoff of the theory.

🔷 Theorem 5 (Term-by-Term Differentiation)

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n has radius of convergence R>0R > 0, then ff is differentiable on (cR,c+R)(c-R, c+R) and

f(x)=n=1nan(xc)n1.f'(x) = \sum_{n=1}^{\infty} n a_n (x-c)^{n-1}.

The differentiated series has the same radius of convergence RR.

Proof.

Fix x0(cR,c+R)x_0 \in (c-R, c+R) and choose rr with x0c<r<R|x_0 - c| < r < R. On [cr,c+r][c-r, c+r], the series converges uniformly (Theorem 4). The partial sums SN(x)=n=0Nan(xc)nS_N(x) = \sum_{n=0}^{N} a_n(x-c)^n are polynomials, hence differentiable. Each SN(x)=n=1Nnan(xc)n1S_N'(x) = \sum_{n=1}^{N} n a_n(x-c)^{n-1}.

The differentiated series nan(xc)n1\sum n a_n(x-c)^{n-1} has

lim supnnan1/n=lim supn(n1/nan1/n)=1lim supnan1/n=1R\limsup_{n \to \infty} |n a_n|^{1/n} = \limsup_{n \to \infty} \bigl(n^{1/n} |a_n|^{1/n}\bigr) = 1 \cdot \limsup_{n \to \infty} |a_n|^{1/n} = \frac{1}{R}

since n1/n1n^{1/n} \to 1. So the differentiated series has the same radius RR and converges uniformly on [cr,c+r][c-r, c+r].

By the interchange theorem for differentiation from Uniform Convergence, f(x0)=limNSN(x0)=n=1nan(x0c)n1f'(x_0) = \lim_{N \to \infty} S_N'(x_0) = \sum_{n=1}^{\infty} n a_n(x_0-c)^{n-1}.

🔷 Corollary 1 (Power series are C^∞)

Applying Theorem 5 repeatedly, f(k)(x)=n=kn!(nk)!an(xc)nkf^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k} for all k0k \geq 0, each with radius RR. A power series is infinitely differentiable inside its radius of convergence.

🔷 Theorem 6 (Term-by-Term Integration)

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n has radius R>0R > 0, then

cxf(t)dt=n=0ann+1(xc)n+1\int_c^x f(t)\,dt = \sum_{n=0}^{\infty} \frac{a_n}{n+1}(x-c)^{n+1}

for xc<R|x - c| < R. The integrated series has radius of convergence RR.

📝 Example 7 (Deriving 1/(1−x)² by differentiation)

Differentiating 11x=n=0xn\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n term by term gives

1(1x)2=n=1nxn1=n=0(n+1)xnfor x<1.\frac{1}{(1-x)^2} = \sum_{n=1}^{\infty} n x^{n-1} = \sum_{n=0}^{\infty} (n+1) x^n \quad \text{for } |x| < 1.

📝 Example 8 (Deriving ln(1+x) by integration)

Integrate 11+x=n=0(x)n=n=0(1)nxn\frac{1}{1+x} = \sum_{n=0}^{\infty} (-x)^n = \sum_{n=0}^{\infty} (-1)^n x^n term by term from 00 to xx:

ln(1+x)=0xdt1+t=n=0(1)nn+1xn+1=n=1(1)n+1nxnfor x<1.\ln(1+x) = \int_0^x \frac{dt}{1+t} = \sum_{n=0}^{\infty} \frac{(-1)^n}{n+1} x^{n+1} = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n \quad \text{for } |x| < 1.

The series also converges at x=1x = 1 by the alternating series test (Leibniz) from Series Convergence & Tests, giving ln2=11/2+1/31/4+\ln 2 = 1 - 1/2 + 1/3 - 1/4 + \cdots.

📝 Example 9 (The arctangent series)

Integrate 11+t2=n=0(1)nt2n\frac{1}{1+t^2} = \sum_{n=0}^{\infty} (-1)^n t^{2n} to get

arctan(x)=n=0(1)n2n+1x2n+1for x1.\arctan(x) = \sum_{n=0}^{\infty} \frac{(-1)^n}{2n+1} x^{2n+1} \quad \text{for } |x| \leq 1.

Setting x=1x = 1 gives the Leibniz formula π4=113+1517+\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots.

The explorer below lets you see term-by-term differentiation and integration in action. Toggle between the two modes and watch how the partial sums of the derived/integrated series track the true derivative/integral.

f(x)╤╤ S(x) f′(x)╤╤ Sₙ′(x)n = 6, R = 1

Term-by-term calculus — differentiation of 1/(1−x) and integration of 1/(1+x)

7. Taylor Series as Power Series — The Infinite Extension

A Taylor series is a power series whose coefficients are determined by the derivatives of a function. The question is: when does this particular power series converge to the function?

🔷 Theorem 7 (Coefficient Extraction (Uniqueness))

If f(x)=n=0an(xc)nf(x) = \sum_{n=0}^{\infty} a_n (x-c)^n on some interval (cR,c+R)(c-R, c+R) with R>0R > 0, then

an=f(n)(c)n!a_n = \frac{f^{(n)}(c)}{n!}

for all n0n \geq 0. A power series representation of a function is necessarily its Taylor series.

Proof.

By Corollary 1, f(k)(x)=n=kn!(nk)!an(xc)nkf^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k}. Setting x=cx = c, all terms with n>kn > k vanish (they contain a factor (cc)nk=0(c-c)^{n-k} = 0), leaving f(k)(c)=k!akf^{(k)}(c) = k! \, a_k. Solving gives ak=f(k)(c)/k!a_k = f^{(k)}(c)/k!.

💡 Remark 5 (Uniqueness has teeth)

If two power series anxn\sum a_n x^n and bnxn\sum b_n x^n are equal on any interval containing 00, then an=bna_n = b_n for all nn. You cannot have two different power series representations of the same function centered at the same point. This makes power series representations canonical.

📝 Example 10 (The Taylor series catalog)

The six essential Taylor series at c=0c = 0 (Maclaurin series):

1. ex=n=0xnn!e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}, R=R = \infty

2. sinx=n=0(1)n(2n+1)!x2n+1\sin x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)!} x^{2n+1}, R=R = \infty

3. cosx=n=0(1)n(2n)!x2n\cos x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n)!} x^{2n}, R=R = \infty

4. ln(1+x)=n=1(1)n+1nxn\ln(1+x) = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n, R=1R = 1, also converges at x=1x = 1

5. 11x=n=0xn\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n, R=1R = 1

6. (1+x)α=n=0(αn)xn(1+x)^\alpha = \sum_{n=0}^{\infty} \binom{\alpha}{n} x^n (binomial series), R=1R = 1 for non-integer α\alpha

The flagship explorer below shows partial sums of these Taylor series converging to their target functions. Select a function, drag the nn slider, and watch Sn(x)S_n(x) approach f(x)f(x) inside the radius of convergence — and diverge wildly outside it.

f(x)╌╌ S5(x)R = 1Click the chart to probe a point

Taylor series catalog — six essential series with partial sums overlaid on target functions

8. Analytic vs. Smooth — When Taylor Series Succeed and Fail

Every power series with R>0R > 0 defines a CC^\infty function (Corollary 1). But does every CC^\infty function have a convergent Taylor series? The answer is no — and understanding why is the deepest insight of this topic.

📐 Definition 4 (Analytic Function)

A function ff is real analytic at cc if its Taylor series at cc converges to f(x)f(x) in some neighborhood of cc:

f(x)=n=0f(n)(c)n!(xc)nfor xc<R,  R>0.f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(c)}{n!}(x-c)^n \quad \text{for } |x - c| < R, \; R > 0.

A function is analytic on an interval if it is analytic at every point of that interval. The class of analytic functions is denoted CωC^\omega.

🔷 Theorem 8 (Sufficient Condition for Analyticity)

If ff is CC^\infty on an interval II containing cc and there exist constants M>0M > 0 and K>0K > 0 such that

f(n)(x)MKnn!for all n and all xI,|f^{(n)}(x)| \leq M \cdot K^n \cdot n! \quad \text{for all } n \text{ and all } x \in I,

then ff is analytic at cc with R1/KR \geq 1/K.

Proof.

The Lagrange remainder from Mean Value Theorem & Taylor Expansion satisfies

Rn(x)f(n+1)(ξ)(n+1)!xcn+1MKn+1(n+1)!(n+1)!xcn+1=M(Kxc)n+1.|R_n(x)| \leq \frac{|f^{(n+1)}(\xi)|}{(n+1)!} |x-c|^{n+1} \leq \frac{M K^{n+1} (n+1)!}{(n+1)!} |x-c|^{n+1} = M (K|x-c|)^{n+1}.

For xc<1/K|x-c| < 1/K, this is a geometric sequence converging to 00, so Rn(x)0R_n(x) \to 0 and the Taylor series converges to ff.

📝 Example 11 (e^x, sin x, cos x are entire (analytic everywhere))

For exe^x: on any interval [B,B][-B, B], f(n)(x)=exeB|f^{(n)}(x)| = e^x \leq e^B. Setting M=eBM = e^B and K=1K = 1 gives the bound f(n)(x)eB1nn!|f^{(n)}(x)| \leq e^B \cdot 1^n \cdot n! — but we need MKnn!M K^n n!, and the actual derivatives satisfy the much tighter bound f(n)(x)eB|f^{(n)}(x)| \leq e^B (without the n!n! factor). This means Rn(x)eBxn+1/(n+1)!0R_n(x) \leq e^B |x|^{n+1}/(n+1)! \to 0 for every xx, so R=R = \infty.

The same argument applies to sinx\sin x and cosx\cos x, where all derivatives are bounded by 11.

📝 Example 12 (Smooth but not analytic — e^{−1/x²} revisited)

This extends the discussion from Mean Value Theorem & Taylor Expansion, Example 8. Define

f(x)={e1/x2x00x=0.f(x) = \begin{cases} e^{-1/x^2} & x \neq 0 \\ 0 & x = 0. \end{cases}

All derivatives at 00 are 00 (the proof by induction uses L’Hôpital’s Rule repeatedly: each f(n)(x)f^{(n)}(x) is a polynomial in 1/x1/x times e1/x2e^{-1/x^2}, and e1/x2e^{-1/x^2} decays faster than any polynomial as x0x \to 0). So the Taylor series at 00 is T(x)=0T(x) = 0. But f(x)>0f(x) > 0 for x0x \neq 0. The Taylor series converges — but to the wrong function.

The derivative bound f(n)(0)MKnn!|f^{(n)}(0)| \leq M K^n n! fails for any finite KK: the derivatives at points near 00 (but not at 00) grow faster than any geometric rate.

💡 Remark 6 (Analytic functions are rare but ubiquitous)

In a precise sense (Baire category), “most” smooth functions are not analytic. Yet in practice — in calculus, physics, and ML — almost every function we encounter is analytic (or piecewise analytic). The standard function zoo (exe^x, sin\sin, cos\cos, ln\ln, polynomials, rational functions, compositions thereof) is closed under the operations that preserve analyticity. The smooth-but-not-analytic examples are constructed to violate the derivative growth condition.

Analytic |f(0.9) − T5| = 8.45e-4Non-analytic |f(0.5) − T5| = 0.0183
f(x)╌╌ T(x) (analytic)╌╌ T(x) 0 (non-analytic)n = 5

Analytic vs. smooth — Taylor series converging to f (left) vs. converging to the wrong function (right)

9. Connections to Statistics

Taylor series are the asymptotic backbone of statistical theory: characteristic-function CLT proofs, asymptotic normality of the MLE, Laplace approximation, the delta method, and the bias expansions for nonparametric estimators all expand to second order around a critical point and bound the remainder.

Characteristic functions and the CLT

The characteristic-function proof of the CLT expands logφXi(t/n)\log \varphi_{X_i}(t/\sqrt{n}) as a Taylor series in tt. The dominant terms give the Gaussian characteristic function eσ2t2/2e^{-\sigma^2 t^2 / 2}; the remainder vanishes as nn \to \infty. See formalStatistics Central Limit Theorem.

Delta method and Laplace approximation

The delta method — n(g(θ^)g(θ))N(0,[g(θ)]2σ2)\sqrt{n}(g(\hat{\theta}) - g(\theta)) \Rightarrow N(0, [g'(\theta)]^2 \sigma^2) — is a 1st-order Taylor expansion. Laplace approximation Taylor-expands the log-posterior to 2nd order around the MAP, producing a Gaussian approximation with covariance (2)1(-\nabla^2 \ell)^{-1} and the BIC penalty as its asymptotic form. See formalStatistics Expectation & Moments and formalStatistics Bayesian Model Comparison & BMA.

KDE bias and Edgeworth expansions

The KDE bias expansion E[f^n(x)]f(x)=(h2/2)f(x)μ2(K)+O(h4)E[\hat{f}_n(x)] - f(x) = (h^2/2) f''(x) \mu_2(K) + O(h^4) is a Taylor series in the bandwidth hh. Bootstrap higher-order accuracy — O(n1)O(n^{-1}) vs. the CLT’s O(n1/2)O(n^{-1/2}) — follows from matching Edgeworth (Taylor) terms between the bootstrap and true distributions. See formalStatistics Kernel Density Estimation and formalStatistics Bootstrap.

10. Connections to ML — Taylor Expansions in Optimization and Inference

9.1 The descent lemma and gradient descent

If f\nabla f is LL-Lipschitz continuous, the second-order Taylor expansion with Lagrange remainder gives the descent lemma:

f(y)f(x)+f(x)T(yx)+L2yx2.f(y) \leq f(x) + \nabla f(x)^T(y - x) + \frac{L}{2}\|y - x\|^2.

Setting y=xηf(x)y = x - \eta \nabla f(x) with step size η=1/L\eta = 1/L yields

f(xk+1)f(xk)12Lf(xk)2f(x_{k+1}) \leq f(x_k) - \frac{1}{2L}\|\nabla f(x_k)\|^2

— the fundamental inequality guaranteeing that gradient descent makes progress at every step. Newton’s method goes further: it uses the full quadratic Taylor model T2(x)T_2(x) as its step direction, achieving quadratic convergence near optima. (→ formalML: Gradient Descent)

9.2 The Laplace approximation

Given a posterior p(θdata)e(θ)p(\theta \,|\, \text{data}) \propto e^{\ell(\theta)} where (θ)=logp(dataθ)+logp(θ)\ell(\theta) = \log p(\text{data} \,|\, \theta) + \log p(\theta), the second-order Taylor expansion of \ell at the MAP estimate θ^\hat{\theta} gives

(θ)(θ^)12(θθ^)TH(θθ^)\ell(\theta) \approx \ell(\hat{\theta}) - \frac{1}{2}(\theta - \hat{\theta})^T H (\theta - \hat{\theta})

where H=2(θ^)H = -\nabla^2 \ell(\hat{\theta}). This yields the Gaussian approximation p(θdata)N(θ^,H1)p(\theta \,|\, \text{data}) \approx \mathcal{N}(\hat{\theta}, H^{-1}) — replacing a complex posterior with a Gaussian centered at the mode. The quality of this approximation depends on how well the second-order Taylor expansion captures the log-posterior, which is exactly the analyticity question this topic addresses. (→ formalML: Information Geometry)

9.3 GELU and activation function approximation

The Gaussian Error Linear Unit GELU(x)=xΦ(x)\text{GELU}(x) = x \Phi(x), where Φ\Phi is the standard normal CDF, is one of the most widely used activation functions in modern transformers. Its practical implementation uses a polynomial approximation:

GELU(x)0.5x(1+tanh(2/π(x+0.044715x3)))\text{GELU}(x) \approx 0.5x\bigl(1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3))\bigr)

which is derived from the Taylor expansion of the error function. The coefficient 0.0447150.044715 comes from matching Taylor series terms — a direct application of power series truncation in production neural network code.

9.4 Matrix exponential in continuous-time models

For neural ODEs, state-space models (S4, Mamba), and continuous-time dynamical systems, the matrix exponential

eAt=I+At+(At)22!+(At)33!+=n=0Antnn!e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}

is a power series in tt with matrix coefficients. It converges for all tt (since R=R = \infty for the scalar exponential, and the norm bound Antn/n!Antn/n!\|A^n t^n/n!\| \leq \|A\|^n |t|^n/n! gives a convergent comparison series). Term-by-term differentiation gives ddteAt=AeAt\frac{d}{dt} e^{At} = A e^{At}, which is the fundamental solution to the linear ODE x˙=Ax\dot{x} = Ax.

ML connections — descent lemma, Laplace approximation, GELU, and matrix exponential

11. Computational Notes

  • Power series evaluation: Horner’s method. Evaluating k=0nakxk\sum_{k=0}^{n} a_k x^k naively requires O(n2)O(n^2) multiplications. Horner’s method rewrites a0+a1x+a2x2++anxn=a0+x(a1+x(a2++xan))a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n = a_0 + x(a_1 + x(a_2 + \cdots + x \cdot a_n)) and evaluates from the inside out in O(n)O(n) with minimal round-off. Use numpy.polynomial.polynomial.polyval for numerical stability.
  • Radius of convergence estimation. For tabulated coefficients, compute an1/n|a_n|^{1/n} for large nn and look for convergence to 1/R1/R. The tail average of this sequence gives a reliable numerical estimate.
  • Endpoint testing is algorithmic. Compute RR via ratio or Cauchy-Hadamard, then substitute x=c±Rx = c \pm R into the series and apply the convergence-test flowchart from Series Convergence & Tests.

Computational verification — Horner's method and radius estimation from finite coefficients

Connections & Further Reading

Prerequisites — topics you need first

intermediate Limits & Continuity 45 min

Uniform Convergence

The Weierstrass M-test (Topic 4) proves that power series converge uniformly on compact subsets of their interval of convergence. The interchange theorems for integration and differentiation (Topic 4) then justify term-by-term calculus — the most powerful computational tool for power series.

foundational Series & Approximation 45 min

Series Convergence & Tests

Power series are series whose terms are functions aₙ(x−c)ⁿ rather than fixed numbers. The ratio test gives R = lim|aₙ/aₙ₊₁|, the root test gives R = 1/limsup|aₙ|^(1/n) (Cauchy-Hadamard), and endpoint analysis uses comparison and alternating series tests — all from Topic 17.

intermediate Single-Variable Calculus 55 min

Mean Value Theorem & Taylor Expansion

Topic 6 built Taylor polynomials Tₙ(x) and proved remainder bounds Rₙ(x), but left the question: when does Rₙ(x) → 0 as n → ∞? Topic 18 answers this by characterizing analytic functions — those for which the Taylor series converges to f — and providing the radius-of-convergence framework.

foundational Single-Variable Calculus 50 min

The Riemann Integral & FTC

Term-by-term integration of power series produces antiderivatives expressed as power series. The integral ∫₀ˣ 1/(1+t²)dt = arctan(x) can be computed by integrating the geometric series 1/(1+t²) = ∑(-1)ⁿt²ⁿ term by term, yielding the Leibniz formula arctan(x) = ∑(-1)ⁿx^(2n+1)/(2n+1).

intermediate Single-Variable Calculus 50 min

Improper Integrals & Special Functions

Power series with infinite radius of convergence (eˣ, sin x, cos x) define entire functions whose improper integrals are computed via term-by-term integration. The Gaussian integral ∫e^(−x²)dx is evaluated by expanding e^(−x²) as a power series and integrating term by term.

On to formalStatistics — where this calculus powers inference

Central Limit Theorem

The characteristic-function proof of the CLT expands log φ_{X_i}(t/√n) as a Taylor series in t. The dominant terms give the Gaussian characteristic function e^(-σ²t²/2); the remainder terms vanish as n → ∞.

Expectation Moments

The delta method: if √n(θ̂ - θ) ⟹ N(0, σ²) and g is differentiable, then √n(g(θ̂) - g(θ)) ⟹ N(0, [g'(θ)]² σ²). The proof is a 1st-order Taylor expansion g(θ̂) = g(θ) + g'(θ)(θ̂ - θ) + o_p(1/√n).

Bayesian Model Comparison And Bma

Laplace approximation Taylor-expands the log-posterior to second order around the MAP. The resulting Gaussian approximation has covariance matrix (-∇²ℓ)⁻¹. BIC's asymptotic derivation rests on the higher-order error term of this expansion.

Kernel Density Estimation

The KDE bias expansion E[f̂_n(x)] - f(x) = (h²/2) f''(x) μ_2(K) + O(h⁴) is a Taylor series in the bandwidth h. The dominant bias term gives the asymptotic MSE; the optimal bandwidth h* balances bias² against variance.

Bootstrap

The Edgeworth expansion of √n(θ̂_n - θ) uses Taylor series in the characteristic function. Bootstrap higher-order accuracy — O(n⁻¹) vs. the CLT's O(n^(-1/2)) — follows from matching Edgeworth terms between the bootstrap and true distributions.

Order Statistics And Quantiles

The Bahadur representation of sample quantiles, ξ̂_p = ξ_p - (F_n(ξ_p) - p)/f(ξ_p) + o_p(n^{-1/2}), is a Taylor-series-flavored asymptotic — a first-order linearization of the inverse-CDF map around the true quantile.

References

  1. book Rudin (1976). Principles of Mathematical Analysis Chapter 8 — the definitive treatment of power series, uniform convergence on compact subsets, and the algebra of power series
  2. book Abbott (2015). Understanding Analysis Chapter 6 — power series and Taylor series with an emphasis on the role of uniform convergence in justifying interchange
  3. book Spivak (2008). Calculus Chapter 23 — Taylor series with complete proofs of term-by-term theorems and the analytic vs. smooth distinction
  4. book Folland (1999). Real Analysis: Modern Techniques and Their Applications Section 0.6 — power series in the context of analysis prerequisites for measure theory
  5. book Bartle & Sherbert (2011). Introduction to Real Analysis Chapter 9 — power series convergence with careful attention to endpoint behavior and applications
  6. paper Hendrycks & Gimpel (2016). “Gaussian Error Linear Units (GELUs)” GELU(x) = x·Φ(x) is approximated via its Taylor expansion 0.5x(1 + tanh(√(2/π)(x + 0.044715x³))) — a direct application of power series truncation in neural network activation design.
  7. paper MacKay (1992). “A Practical Bayesian Framework for Backpropagation Networks” The Laplace approximation replaces the posterior with a Gaussian centered at the MAP estimate using a second-order Taylor expansion of the log-posterior — a foundational Bayesian ML technique.