Series & Approximation · intermediate · 50 min read

Power Series & Taylor Series

When do infinite polynomial expansions converge — and when can you differentiate and integrate them term by term? The bridge from finite Taylor approximations to infinite series representations, with the radius of convergence as the gatekeeper.

Abstract. A power series ∑ aₙ(x−c)ⁿ is an infinite polynomial — a series whose terms are functions of x rather than fixed numbers. The Cauchy-Hadamard theorem determines a radius of convergence R = 1/limsup|aₙ|^(1/n): the series converges absolutely for |x−c| < R and diverges for |x−c| > R. At the endpoints x = c ± R, convergence must be tested case by case using the convergence tests from Series Convergence & Tests. Inside the radius of convergence, a power series converges uniformly on every compact subset, which — via the interchange theorems from Uniform Convergence — justifies term-by-term differentiation and integration: the derivative of a power series is the series of derivatives, and the integral of a power series is the series of integrals, both with the same radius R. This makes power series infinitely differentiable inside their radius of convergence. A Taylor series ∑ f⁽ⁿ⁾(c)/n! · (x−c)ⁿ is a power series whose coefficients are determined by the derivatives of f at the center c. For analytic functions — those whose Taylor series converges to f in a neighborhood of c — the Taylor series provides a complete representation. But smooth does not imply analytic: the function e^(−1/x²) is C^∞ at the origin with all derivatives zero, yet is not identically zero, so its Taylor series converges to the wrong function. The Taylor series catalog (eˣ, sin x, cos x, ln(1+x), 1/(1−x), the binomial series (1+x)^α) forms the backbone of local approximation in both pure mathematics and applied ML. In machine learning, Taylor expansions appear in the descent lemma for gradient descent convergence, the Laplace approximation of posterior distributions, GELU activation function computation, and the matrix exponential for continuous-time dynamical models.

1. Overview & Motivation — From Finite to Infinite

Mean Value Theorem & Taylor Expansion gave us Taylor polynomials — finite sums $T_n(x) = \sum_{k=0}^{n} \frac{f^{(k)}(a)}{k!}(x-a)^k$ that approximate $f$ near $a$ . Series Convergence & Tests gave us convergence tests for infinite sums of numbers. What happens when we let $n \to \infty$ in the Taylor polynomial — when does the infinite sum converge, and does it converge to $f$ ?

More generally: what happens when the terms of a series depend on $x$ ? The geometric series $\sum_{n=0}^{\infty} x^n = \frac{1}{1-x}$ already demonstrated this in Series Convergence & Tests — it converges for $|x| < 1$ and diverges for $|x| \geq 1$ . Power series generalize this pattern.

Why this matters in ML. Neural network activation functions are often approximated by truncated Taylor series. The GELU activation $\text{GELU}(x) = x \cdot \Phi(x)$ is implemented in practice as $0.5x(1 + \tanh(\sqrt{2/\pi}(x + 0.044715x^3)))$ , which is a polynomial approximation derived from a Taylor expansion. Understanding when and where such truncations are valid requires the theory of power series convergence.

This topic sits at the intersection of three prerequisites. Series Convergence & Tests provides the convergence tests (ratio, root) that determine the radius of convergence. Uniform Convergence provides the uniform convergence theory that justifies term-by-term calculus. Mean Value Theorem & Taylor Expansion provides the Taylor polynomial machinery whose infinite extension we now analyze.

Power series overview — three behaviors: convergence inside R, divergence outside R, and the entire-function case R = ∞

2. Power Series — Definition and First Examples

📐 Definition 1 (Power Series)

A power series centered at $c$ is an expression of the form

$\sum_{n=0}^{\infty} a_n (x - c)^n = a_0 + a_1(x-c) + a_2(x-c)^2 + \cdots$

where $a_0, a_1, a_2, \ldots$ are real constants called the coefficients and $c$ is the center. The special case $c = 0$ gives $\sum a_n x^n$ .

A power series is not a single number — it is a function of $x$ . For each value of $x$ , we get a numerical series $\sum a_n (x-c)^n$ that may converge or diverge. The central question of this topic is: for which values of $x$ does the series converge?

📝 Example 1 (The geometric series as a power series)

$\sum_{n=0}^{\infty} x^n$ has $a_n = 1$ for all $n$ and center $c = 0$ . From Series Convergence & Tests, this converges to $\frac{1}{1-x}$ for $|x| < 1$ and diverges for $|x| \geq 1$ . This is the prototype: convergence on an open interval, divergence outside it.

📝 Example 2 (The exponential series (R = ∞))

$\sum_{n=0}^{\infty} \frac{x^n}{n!}$ has $a_n = 1/n!$ and converges for all $x \in \mathbb{R}$ . From Mean Value Theorem & Taylor Expansion, we know this equals $e^x$ . The ratio of consecutive terms is $|x|/(n+1) \to 0$ for every fixed $x$ , so the ratio test gives convergence everywhere.

📝 Example 3 (A series that converges only at its center (R = 0))

$\sum_{n=0}^{\infty} n! \, x^n$ diverges for every $x \neq 0$ . The ratio $|a_{n+1} x^{n+1}/(a_n x^n)| = (n+1)|x| \to \infty$ for any fixed $x \neq 0$ . The “radius of convergence” is $0$ — this series is useless as a function of $x$ .

💡 Remark 1 (Power series generalize polynomials)

A polynomial of degree $N$ is a power series with $a_n = 0$ for $n > N$ . It converges everywhere ( $R = \infty$ ). Power series extend this to “infinite-degree polynomials,” but the trade-off is that convergence is no longer automatic.

3. Radius of Convergence

The three examples above illustrate a remarkable structural fact: a power series always converges on an interval centered at $c$ . The half-width of this interval is the radius of convergence — and it is determined entirely by the coefficients.

🔷 Theorem 1 (Existence of the Radius of Convergence)

For any power series $\sum a_n (x - c)^n$ , exactly one of the following holds:

(i) The series converges only at $x = c$ ( $R = 0$ ).

(ii) The series converges for all $x \in \mathbb{R}$ ( $R = \infty$ ).

(iii) There exists $R > 0$ such that the series converges absolutely for $|x - c| < R$ and diverges for $|x - c| > R$ .

Proof.

Suppose $\sum a_n (x_0 - c)^n$ converges at some $x_0 \neq c$ . Then $a_n(x_0 - c)^n \to 0$ (by the divergence test from Series Convergence & Tests), so $|a_n(x_0 - c)^n| \leq M$ for some bound $M$ . For any $x$ with $|x - c| < |x_0 - c|$ , set $r = |x - c|/|x_0 - c| < 1$ . Then

$|a_n(x-c)^n| = |a_n(x_0-c)^n| \cdot r^n \leq M r^n.$

Since $\sum M r^n$ converges (geometric series with $r < 1$ ), the comparison test gives absolute convergence at $x$ .

Now define $R = \sup\{|x_0 - c| : \sum a_n(x_0-c)^n \text{ converges}\}$ . If the series converges only at $c$ , then $R = 0$ (case i). If the supremum is infinite, then $R = \infty$ (case ii). Otherwise, $R$ is a positive real number (case iii): the argument above shows convergence for $|x - c| < R$ , and divergence for $|x - c| > R$ follows because if the series converged at some $x_1$ with $|x_1 - c| > R$ , it would also converge at all $x$ with $|x - c| < |x_1 - c|$ , contradicting $R = \sup$ .

∎

📐 Definition 2 (Radius of Convergence)

The number $R$ from Theorem 1 is the radius of convergence of $\sum a_n(x-c)^n$ . We allow $R = 0$ and $R = \infty$ . The open interval $(c-R, c+R)$ is the open interval of convergence.

How do we compute $R$ ? The root and ratio tests from Series Convergence & Tests, applied to the coefficient sequence, give explicit formulas.

🔷 Theorem 2 (The Cauchy-Hadamard Formula)

$\frac{1}{R} = \limsup_{n \to \infty} |a_n|^{1/n}$

with the convention $1/0 = \infty$ and $1/\infty = 0$ .

Proof.

Apply the root test from Series Convergence & Tests to $\sum |a_n(x-c)^n|$ :

$\limsup_{n \to \infty} |a_n(x-c)^n|^{1/n} = |x-c| \cdot \limsup_{n \to \infty} |a_n|^{1/n}.$

The root test gives convergence when this is $< 1$ , i.e., $|x-c| < 1/\limsup |a_n|^{1/n} = R$ , and divergence when $> 1$ , i.e., $|x-c| > R$ .

∎

🔷 Theorem 3 (Ratio Test for Radius)

If $\lim_{n \to \infty} |a_{n+1}/a_n|$ exists (possibly $= 0$ or $= \infty$ ), then

$R = \lim_{n \to \infty} \left|\frac{a_n}{a_{n+1}}\right|.$

💡 Remark 2 (Root test vs. ratio test)

The Cauchy-Hadamard formula always works (it uses limsup). The ratio test requires the limit to exist. When both apply, they give the same $R$ . From Series Convergence & Tests, the root test is strictly stronger — there exist series where the ratio test is inconclusive but the root test determines $R$ .

📝 Example 4 (Computing R for standard series)

(a) $\sum x^n/n!$ : ratio gives $R = \lim_{n \to \infty} (n+1) = \infty$ .

(b) $\sum n! \, x^n$ : ratio gives $R = \lim_{n \to \infty} 1/(n+1) = 0$ .

(c) $\sum x^n/n$ : ratio gives $R = \lim_{n \to \infty} n/(n+1) = 1$ .

(d) $\sum x^n/n^2$ : ratio gives $R = \lim_{n \to \infty} n^2/(n+1)^2 = 1$ .

(e) $\sum n^n x^n$ : Cauchy-Hadamard gives $1/R = \limsup n = \infty$ , so $R = 0$ .

The explorer below lets you see the diagnostic sequences $|a_n|^{1/n}$ and $|a_{n+1}/a_n|$ converging to $1/R$ , and probe what happens when you evaluate the series at points inside and outside the radius.

Series:Test:x = 0.500Terms: 60

─ Diagnostic: |aₙ₊₁/aₙ| → 0.0182R = ∞At x = 0.500: converges · S₆₀ = 1.648721

Radius of convergence — Cauchy-Hadamard and ratio test diagnostics converging to 1/R

4. Endpoint Behavior — Where the Tests from Topic 17 Come to Work

The radius $R$ determines convergence on the open interval $(c-R, c+R)$ and divergence outside $[c-R, c+R]$ . But at the endpoints $x = c \pm R$ themselves, the power series becomes a numerical series — and you must test it directly using the convergence toolkit from Series Convergence & Tests.

📐 Definition 3 (Interval of Convergence)

The interval of convergence of $\sum a_n(x-c)^n$ is the set of all $x$ where the series converges. It always includes the open interval $(c-R, c+R)$ and may or may not include either endpoint.

📝 Example 5 (Three endpoint behaviors)

All three of the following have $R = 1$ centered at $c = 0$ :

(a) $\sum x^n$ : diverges at both endpoints. At $x = 1$ : $\sum 1$ diverges. At $x = -1$ : $\sum (-1)^n$ diverges. Interval: $(-1, 1)$ .

(b) $\sum x^n/n^2$ : converges at both endpoints. At $x = 1$ : $\sum 1/n^2$ converges ( $p$ -series, $p = 2$ ). At $x = -1$ : $\sum (-1)^n/n^2$ converges absolutely. Interval: $[-1, 1]$ .

(c) $\sum x^n/n$ : mixed. At $x = -1$ : $\sum (-1)^n/n$ converges by the alternating series (Leibniz) test. At $x = 1$ : $\sum 1/n$ diverges (harmonic). Interval: $[-1, 1)$ .

💡 Remark 3 (The four endpoint possibilities)

With two endpoints and two possible verdicts (converge/diverge) at each, there are four combinations: $(c-R, c+R)$ , $[c-R, c+R]$ , $[c-R, c+R)$ , $(c-R, c+R]$ . All four occur in practice. The algorithm is always the same: (1) compute $R$ via ratio or Cauchy-Hadamard; (2) substitute $x = c \pm R$ and apply a convergence test from Series Convergence & Tests.

📝 Example 6 (Endpoint analysis with comparison and alternating series tests)

For $\sum \frac{(-1)^n x^n}{\sqrt{n+1}}$ , the ratio test gives $R = 1$ . At $x = 1$ : $\sum (-1)^n/\sqrt{n+1}$ converges by the Leibniz test (alternating, terms decrease to $0$ ). At $x = -1$ : $\sum 1/\sqrt{n+1}$ diverges by comparison with $\sum 1/\sqrt{n}$ ( $p$ -series with $p = 1/2 < 1$ ). Interval: $(-1, 1]$ .

Endpoint behavior — the four possible interval types demonstrated by three standard series

5. Uniform Convergence on Compact Subsets

This is the section where the threads come together. We connect power series to the uniform convergence theory from Uniform Convergence, establishing the key property that makes everything in the next section work.

🔷 Theorem 4 (Uniform Convergence on Compact Subsets)

If $\sum a_n(x-c)^n$ has radius of convergence $R > 0$ , then the series converges uniformly on every closed interval $[c-r, c+r]$ for $0 < r < R$ .

Proof.

Fix $r$ with $0 < r < R$ and let $M_n = |a_n| r^n$ . For $|x - c| \leq r$ , we have

$|a_n(x-c)^n| \leq |a_n| r^n = M_n.$

Since $r < R$ , the series $\sum M_n = \sum |a_n| r^n$ converges (by the definition of $R$ — the power series converges absolutely for $|x - c| < R$ , and $r < R$ ). By the Weierstrass M-test from Uniform Convergence, the series $\sum a_n(x-c)^n$ converges uniformly on $[c-r, c+r]$ .

∎

💡 Remark 4 (Why compact subsets, not the full interval)

The power series $\sum x^n = 1/(1-x)$ converges pointwise on $(-1, 1)$ but does not converge uniformly on all of $(-1, 1)$ . The partial sums $S_n(x) = (1-x^{n+1})/(1-x)$ satisfy

$\sup_{x \in (-1,1)} |S_n(x) - 1/(1-x)| = \sup_{x \in (-1,1)} \frac{|x|^{n+1}}{|1-x|} = \infty$

for every $n$ (the supremum blows up as $x \to 1^-$ ). The uniform convergence theorem only guarantees uniformity on $[-r, r]$ for $r < 1$ — compact subsets strictly inside the interval of convergence. This is the same pointwise-vs-uniform distinction from Uniform Convergence, now appearing in a concrete power-series context.

Uniform convergence on compact subsets — sup-norm error decreasing on [-r,r] for r < R

6. Term-by-Term Differentiation & Integration

Because power series converge uniformly on compact subsets, the interchange theorems from Uniform Convergence apply. We can differentiate and integrate a power series term by term — and the resulting series has the same radius of convergence. This is the computational payoff of the theory.

🔷 Theorem 5 (Term-by-Term Differentiation)

If $f(x) = \sum_{n=0}^{\infty} a_n (x-c)^n$ has radius of convergence $R > 0$ , then $f$ is differentiable on $(c-R, c+R)$ and

$f'(x) = \sum_{n=1}^{\infty} n a_n (x-c)^{n-1}.$

The differentiated series has the same radius of convergence $R$ .

Proof.

Fix $x_0 \in (c-R, c+R)$ and choose $r$ with $|x_0 - c| < r < R$ . On $[c-r, c+r]$ , the series converges uniformly (Theorem 4). The partial sums $S_N(x) = \sum_{n=0}^{N} a_n(x-c)^n$ are polynomials, hence differentiable. Each $S_N'(x) = \sum_{n=1}^{N} n a_n(x-c)^{n-1}$ .

The differentiated series $\sum n a_n(x-c)^{n-1}$ has

$\limsup_{n \to \infty} |n a_n|^{1/n} = \limsup_{n \to \infty} \bigl(n^{1/n} |a_n|^{1/n}\bigr) = 1 \cdot \limsup_{n \to \infty} |a_n|^{1/n} = \frac{1}{R}$

since $n^{1/n} \to 1$ . So the differentiated series has the same radius $R$ and converges uniformly on $[c-r, c+r]$ .

By the interchange theorem for differentiation from Uniform Convergence, $f'(x_0) = \lim_{N \to \infty} S_N'(x_0) = \sum_{n=1}^{\infty} n a_n(x_0-c)^{n-1}$ .

∎

🔷 Corollary 1 (Power series are C^∞)

Applying Theorem 5 repeatedly, $f^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k}$ for all $k \geq 0$ , each with radius $R$ . A power series is infinitely differentiable inside its radius of convergence.

🔷 Theorem 6 (Term-by-Term Integration)

If $f(x) = \sum_{n=0}^{\infty} a_n (x-c)^n$ has radius $R > 0$ , then

$\int_c^x f(t)\,dt = \sum_{n=0}^{\infty} \frac{a_n}{n+1}(x-c)^{n+1}$

for $|x - c| < R$ . The integrated series has radius of convergence $R$ .

📝 Example 7 (Deriving 1/(1−x)² by differentiation)

Differentiating $\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n$ term by term gives

$\frac{1}{(1-x)^2} = \sum_{n=1}^{\infty} n x^{n-1} = \sum_{n=0}^{\infty} (n+1) x^n \quad \text{for } |x| < 1.$

📝 Example 8 (Deriving ln(1+x) by integration)

Integrate $\frac{1}{1+x} = \sum_{n=0}^{\infty} (-x)^n = \sum_{n=0}^{\infty} (-1)^n x^n$ term by term from $0$ to $x$ :

$\ln(1+x) = \int_0^x \frac{dt}{1+t} = \sum_{n=0}^{\infty} \frac{(-1)^n}{n+1} x^{n+1} = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n \quad \text{for } |x| < 1.$

The series also converges at $x = 1$ by the alternating series test (Leibniz) from Series Convergence & Tests, giving $\ln 2 = 1 - 1/2 + 1/3 - 1/4 + \cdots$ .

📝 Example 9 (The arctangent series)

Integrate $\frac{1}{1+t^2} = \sum_{n=0}^{\infty} (-1)^n t^{2n}$ to get

$\arctan(x) = \sum_{n=0}^{\infty} \frac{(-1)^n}{2n+1} x^{2n+1} \quad \text{for } |x| \leq 1.$

Setting $x = 1$ gives the Leibniz formula $\frac{\pi}{4} = 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots$ .

The explorer below lets you see term-by-term differentiation and integration in action. Toggle between the two modes and watch how the partial sums of the derived/integrated series track the true derivative/integral.

Series:

n = 6

━ f(x)╤╤ Sₙ(x)━ f′(x)╤╤ Sₙ′(x)n = 6, R = 1

Term-by-term calculus — differentiation of 1/(1−x) and integration of 1/(1+x)

7. Taylor Series as Power Series — The Infinite Extension

A Taylor series is a power series whose coefficients are determined by the derivatives of a function. The question is: when does this particular power series converge to the function?

🔷 Theorem 7 (Coefficient Extraction (Uniqueness))

If $f(x) = \sum_{n=0}^{\infty} a_n (x-c)^n$ on some interval $(c-R, c+R)$ with $R > 0$ , then

$a_n = \frac{f^{(n)}(c)}{n!}$

for all $n \geq 0$ . A power series representation of a function is necessarily its Taylor series.

Proof.

By Corollary 1, $f^{(k)}(x) = \sum_{n=k}^{\infty} \frac{n!}{(n-k)!} a_n (x-c)^{n-k}$ . Setting $x = c$ , all terms with $n > k$ vanish (they contain a factor $(c-c)^{n-k} = 0$ ), leaving $f^{(k)}(c) = k! \, a_k$ . Solving gives $a_k = f^{(k)}(c)/k!$ .

∎

💡 Remark 5 (Uniqueness has teeth)

If two power series $\sum a_n x^n$ and $\sum b_n x^n$ are equal on any interval containing $0$ , then $a_n = b_n$ for all $n$ . You cannot have two different power series representations of the same function centered at the same point. This makes power series representations canonical.

📝 Example 10 (The Taylor series catalog)

The six essential Taylor series at $c = 0$ (Maclaurin series):

1. $e^x = \sum_{n=0}^{\infty} \frac{x^n}{n!}$ , $R = \infty$

2. $\sin x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)!} x^{2n+1}$ , $R = \infty$

3. $\cos x = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n)!} x^{2n}$ , $R = \infty$

4. $\ln(1+x) = \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n} x^n$ , $R = 1$ , also converges at $x = 1$

5. $\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n$ , $R = 1$

6. $(1+x)^\alpha = \sum_{n=0}^{\infty} \binom{\alpha}{n} x^n$ (binomial series), $R = 1$ for non-integer $\alpha$

The flagship explorer below shows partial sums of these Taylor series converging to their target functions. Select a function, drag the $n$ slider, and watch $S_n(x)$ approach $f(x)$ inside the radius of convergence — and diverge wildly outside it.

n = 5

━ f(x)╌╌ S₅(x)R = 1Click the chart to probe a point

Taylor series catalog — six essential series with partial sums overlaid on target functions

8. Analytic vs. Smooth — When Taylor Series Succeed and Fail

Every power series with $R > 0$ defines a $C^\infty$ function (Corollary 1). But does every $C^\infty$ function have a convergent Taylor series? The answer is no — and understanding why is the deepest insight of this topic.

📐 Definition 4 (Analytic Function)

A function $f$ is real analytic at $c$ if its Taylor series at $c$ converges to $f(x)$ in some neighborhood of $c$ :

$f(x) = \sum_{n=0}^{\infty} \frac{f^{(n)}(c)}{n!}(x-c)^n \quad \text{for } |x - c| < R, \; R > 0.$

A function is analytic on an interval if it is analytic at every point of that interval. The class of analytic functions is denoted $C^\omega$ .

🔷 Theorem 8 (Sufficient Condition for Analyticity)

If $f$ is $C^\infty$ on an interval $I$ containing $c$ and there exist constants $M > 0$ and $K > 0$ such that

$|f^{(n)}(x)| \leq M \cdot K^n \cdot n! \quad \text{for all } n \text{ and all } x \in I,$

then $f$ is analytic at $c$ with $R \geq 1/K$ .

Proof.

The Lagrange remainder from Mean Value Theorem & Taylor Expansion satisfies

$|R_n(x)| \leq \frac{|f^{(n+1)}(\xi)|}{(n+1)!} |x-c|^{n+1} \leq \frac{M K^{n+1} (n+1)!}{(n+1)!} |x-c|^{n+1} = M (K|x-c|)^{n+1}.$

For $|x-c| < 1/K$ , this is a geometric sequence converging to $0$ , so $R_n(x) \to 0$ and the Taylor series converges to $f$ .

∎

📝 Example 11 (e^x, sin x, cos x are entire (analytic everywhere))

For $e^x$ : on any interval $[-B, B]$ , $|f^{(n)}(x)| = e^x \leq e^B$ . Setting $M = e^B$ and $K = 1$ gives the bound $|f^{(n)}(x)| \leq e^B \cdot 1^n \cdot n!$ — but we need $M K^n n!$ , and the actual derivatives satisfy the much tighter bound $|f^{(n)}(x)| \leq e^B$ (without the $n!$ factor). This means $R_n(x) \leq e^B |x|^{n+1}/(n+1)! \to 0$ for every $x$ , so $R = \infty$ .

The same argument applies to $\sin x$ and $\cos x$ , where all derivatives are bounded by $1$ .

📝 Example 12 (Smooth but not analytic — e^{−1/x²} revisited)

This extends the discussion from Mean Value Theorem & Taylor Expansion, Example 8. Define

$f(x) = \begin{cases} e^{-1/x^2} & x \neq 0 \\ 0 & x = 0. \end{cases}$

All derivatives at $0$ are $0$ (the proof by induction uses L’Hôpital’s Rule repeatedly: each $f^{(n)}(x)$ is a polynomial in $1/x$ times $e^{-1/x^2}$ , and $e^{-1/x^2}$ decays faster than any polynomial as $x \to 0$ ). So the Taylor series at $0$ is $T(x) = 0$ . But $f(x) > 0$ for $x \neq 0$ . The Taylor series converges — but to the wrong function.

The derivative bound $|f^{(n)}(0)| \leq M K^n n!$ fails for any finite $K$ : the derivatives at points near $0$ (but not at $0$ ) grow faster than any geometric rate.

💡 Remark 6 (Analytic functions are rare but ubiquitous)

In a precise sense (Baire category), “most” smooth functions are not analytic. Yet in practice — in calculus, physics, and ML — almost every function we encounter is analytic (or piecewise analytic). The standard function zoo ( $e^x$ , $\sin$ , $\cos$ , $\ln$ , polynomials, rational functions, compositions thereof) is closed under the operations that preserve analyticity. The smooth-but-not-analytic examples are constructed to violate the derivative growth condition.

Left panel:n = 5

Analytic |f(0.9) − T₅| = 8.45e-4Non-analytic |f(0.5) − T₅| = 0.0183

━ f(x)╌╌ Tₙ(x) (analytic)╌╌ Tₙ(x) ≡ 0 (non-analytic)n = 5

Analytic vs. smooth — Taylor series converging to f (left) vs. converging to the wrong function (right)

9. Connections to Statistics

Taylor series are the asymptotic backbone of statistical theory: characteristic-function CLT proofs, asymptotic normality of the MLE, Laplace approximation, the delta method, and the bias expansions for nonparametric estimators all expand to second order around a critical point and bound the remainder.

Characteristic functions and the CLT

The characteristic-function proof of the CLT expands $\log \varphi_{X_i}(t/\sqrt{n})$ as a Taylor series in $t$ . The dominant terms give the Gaussian characteristic function $e^{-\sigma^2 t^2 / 2}$ ; the remainder vanishes as $n \to \infty$ . See formalStatistics Central Limit Theorem.

Delta method and Laplace approximation

The delta method — $\sqrt{n}(g(\hat{\theta}) - g(\theta)) \Rightarrow N(0, [g'(\theta)]^2 \sigma^2)$ — is a 1st-order Taylor expansion. Laplace approximation Taylor-expands the log-posterior to 2nd order around the MAP, producing a Gaussian approximation with covariance $(-\nabla^2 \ell)^{-1}$ and the BIC penalty as its asymptotic form. See formalStatistics Expectation & Moments and formalStatistics Bayesian Model Comparison & BMA.

KDE bias and Edgeworth expansions

The KDE bias expansion $E[\hat{f}_n(x)] - f(x) = (h^2/2) f''(x) \mu_2(K) + O(h^4)$ is a Taylor series in the bandwidth $h$ . Bootstrap higher-order accuracy — $O(n^{-1})$ vs. the CLT’s $O(n^{-1/2})$ — follows from matching Edgeworth (Taylor) terms between the bootstrap and true distributions. See formalStatistics Kernel Density Estimation and formalStatistics Bootstrap.

10. Connections to ML — Taylor Expansions in Optimization and Inference

9.1 The descent lemma and gradient descent

If $\nabla f$ is $L$ -Lipschitz continuous, the second-order Taylor expansion with Lagrange remainder gives the descent lemma:

$f(y) \leq f(x) + \nabla f(x)^T(y - x) + \frac{L}{2}\|y - x\|^2.$

Setting $y = x - \eta \nabla f(x)$ with step size $\eta = 1/L$ yields

$f(x_{k+1}) \leq f(x_k) - \frac{1}{2L}\|\nabla f(x_k)\|^2$

— the fundamental inequality guaranteeing that gradient descent makes progress at every step. Newton’s method goes further: it uses the full quadratic Taylor model $T_2(x)$ as its step direction, achieving quadratic convergence near optima. (→ formalML: Gradient Descent)

9.2 The Laplace approximation

Given a posterior $p(\theta \,|\, \text{data}) \propto e^{\ell(\theta)}$ where $\ell(\theta) = \log p(\text{data} \,|\, \theta) + \log p(\theta)$ , the second-order Taylor expansion of $\ell$ at the MAP estimate $\hat{\theta}$ gives

$\ell(\theta) \approx \ell(\hat{\theta}) - \frac{1}{2}(\theta - \hat{\theta})^T H (\theta - \hat{\theta})$

where $H = -\nabla^2 \ell(\hat{\theta})$ . This yields the Gaussian approximation $p(\theta \,|\, \text{data}) \approx \mathcal{N}(\hat{\theta}, H^{-1})$ — replacing a complex posterior with a Gaussian centered at the mode. The quality of this approximation depends on how well the second-order Taylor expansion captures the log-posterior, which is exactly the analyticity question this topic addresses. (→ formalML: Information Geometry)

9.3 GELU and activation function approximation

The Gaussian Error Linear Unit $\text{GELU}(x) = x \Phi(x)$ , where $\Phi$ is the standard normal CDF, is one of the most widely used activation functions in modern transformers. Its practical implementation uses a polynomial approximation:

$\text{GELU}(x) \approx 0.5x\bigl(1 + \tanh(\sqrt{2/\pi}(x + 0.044715 x^3))\bigr)$

which is derived from the Taylor expansion of the error function. The coefficient $0.044715$ comes from matching Taylor series terms — a direct application of power series truncation in production neural network code.

9.4 Matrix exponential in continuous-time models

For neural ODEs, state-space models (S4, Mamba), and continuous-time dynamical systems, the matrix exponential

$e^{At} = I + At + \frac{(At)^2}{2!} + \frac{(At)^3}{3!} + \cdots = \sum_{n=0}^{\infty} \frac{A^n t^n}{n!}$

is a power series in $t$ with matrix coefficients. It converges for all $t$ (since $R = \infty$ for the scalar exponential, and the norm bound $\|A^n t^n/n!\| \leq \|A\|^n |t|^n/n!$ gives a convergent comparison series). Term-by-term differentiation gives $\frac{d}{dt} e^{At} = A e^{At}$ , which is the fundamental solution to the linear ODE $\dot{x} = Ax$ .

ML connections — descent lemma, Laplace approximation, GELU, and matrix exponential

11. Computational Notes

Power series evaluation: Horner’s method. Evaluating $\sum_{k=0}^{n} a_k x^k$ naively requires $O(n^2)$ multiplications. Horner’s method rewrites $a_0 + a_1 x + a_2 x^2 + \cdots + a_n x^n = a_0 + x(a_1 + x(a_2 + \cdots + x \cdot a_n))$ and evaluates from the inside out in $O(n)$ with minimal round-off. Use numpy.polynomial.polynomial.polyval for numerical stability.
Radius of convergence estimation. For tabulated coefficients, compute $|a_n|^{1/n}$ for large $n$ and look for convergence to $1/R$ . The tail average of this sequence gives a reliable numerical estimate.
Endpoint testing is algorithmic. Compute $R$ via ratio or Cauchy-Hadamard, then substitute $x = c \pm R$ into the series and apply the convergence-test flowchart from Series Convergence & Tests.

Computational verification — Horner's method and radius estimation from finite coefficients

Connections & Further Reading

Prerequisites — topics you need first

intermediate Limits & Continuity 45 min

Uniform Convergence

The Weierstrass M-test (Topic 4) proves that power series converge uniformly on compact subsets of their interval of convergence. The interchange theorems for integration and differentiation (Topic 4) then justify term-by-term calculus — the most powerful computational tool for power series.

foundational Series & Approximation 45 min

Series Convergence & Tests

Power series are series whose terms are functions aₙ(x−c)ⁿ rather than fixed numbers. The ratio test gives R = lim|aₙ/aₙ₊₁|, the root test gives R = 1/limsup|aₙ|^(1/n) (Cauchy-Hadamard), and endpoint analysis uses comparison and alternating series tests — all from Topic 17.

intermediate Single-Variable Calculus 55 min

Mean Value Theorem & Taylor Expansion

Topic 6 built Taylor polynomials Tₙ(x) and proved remainder bounds Rₙ(x), but left the question: when does Rₙ(x) → 0 as n → ∞? Topic 18 answers this by characterizing analytic functions — those for which the Taylor series converges to f — and providing the radius-of-convergence framework.

foundational Single-Variable Calculus 50 min

The Riemann Integral & FTC

Term-by-term integration of power series produces antiderivatives expressed as power series. The integral ∫₀ˣ 1/(1+t²)dt = arctan(x) can be computed by integrating the geometric series 1/(1+t²) = ∑(-1)ⁿt²ⁿ term by term, yielding the Leibniz formula arctan(x) = ∑(-1)ⁿx^(2n+1)/(2n+1).

intermediate Single-Variable Calculus 50 min

Improper Integrals & Special Functions

Power series with infinite radius of convergence (eˣ, sin x, cos x) define entire functions whose improper integrals are computed via term-by-term integration. The Gaussian integral ∫e^(−x²)dx is evaluated by expanding e^(−x²) as a power series and integrating term by term.

Where this leads — next in formalCalculus

intermediate Series & Approximation 55 min

Fourier Series & Orthogonal Expansions

Trigonometric series as a generalization of power series using sin and cos instead of monomials — the two approximation paradigms (local analytic vs. global periodic) are unified in approximation theory.

intermediate Series & Approximation 50 min

Approximation Theory

Weierstrass approximation theorem and non-Taylor polynomial approximations — the theoretical ceiling that Taylor expansion reaches toward but doesn't always hit.

intermediate ODEs 50 min

Linear Systems & Matrix Exponential

The matrix exponential e^(At) = Σ (At)^k/k! is a power series solution to ẋ = Ax — everything about convergence and term-by-term manipulation here carries over directly.

On to formalStatistics — where this calculus powers inference

Central Limit Theorem

The characteristic-function proof of the CLT expands log φ_{X_i}(t/√n) as a Taylor series in t. The dominant terms give the Gaussian characteristic function e^(-σ²t²/2); the remainder terms vanish as n → ∞.

Expectation Moments

The delta method: if √n(θ̂ - θ) ⟹ N(0, σ²) and g is differentiable, then √n(g(θ̂) - g(θ)) ⟹ N(0, [g'(θ)]² σ²). The proof is a 1st-order Taylor expansion g(θ̂) = g(θ) + g'(θ)(θ̂ - θ) + o_p(1/√n).

Bayesian Model Comparison And Bma

Laplace approximation Taylor-expands the log-posterior to second order around the MAP. The resulting Gaussian approximation has covariance matrix (-∇²ℓ)⁻¹. BIC's asymptotic derivation rests on the higher-order error term of this expansion.

Kernel Density Estimation

The KDE bias expansion E[f̂_n(x)] - f(x) = (h²/2) f''(x) μ_2(K) + O(h⁴) is a Taylor series in the bandwidth h. The dominant bias term gives the asymptotic MSE; the optimal bandwidth h* balances bias² against variance.

Bootstrap

The Edgeworth expansion of √n(θ̂_n - θ) uses Taylor series in the characteristic function. Bootstrap higher-order accuracy — O(n⁻¹) vs. the CLT's O(n^(-1/2)) — follows from matching Edgeworth terms between the bootstrap and true distributions.

Order Statistics And Quantiles

The Bahadur representation of sample quantiles, ξ̂_p = ξ_p - (F_n(ξ_p) - p)/f(ξ_p) + o_p(n^{-1/2}), is a Taylor-series-flavored asymptotic — a first-order linearization of the inverse-CDF map around the true quantile.

Continuous Distributions

The Normal MGF derivation uses 'completing the square' in the exponent, and the Gamma function's properties connect to power-series representations. The exponential series $e^x = \sum x^k/k!$ underwrites the MGF identity $M_X(t) = \sum \mathbb E[X^k]\,t^k/k!$ for distributions with finite moments.

Discrete Distributions

The exponential power series $e^x = \sum x^k/k!$ verifies that the Poisson PMF sums to 1 and drives MGF derivations. The probability generating function $G_X(s) = \mathbb E[s^X] = \sum p_k s^k$ is a power series whose coefficients are the PMF values.

Exponential Families

The log-partition function's moment-generating properties use Taylor-expansion arguments. The relationship between $A(\eta)$ and the moment-generating function connects to power series — $A'(\eta) = \mathbb E[T(X)]$ and $A''(\eta) = \mathrm{Var}(T(X))$ are the first two Taylor coefficients of $A$ at $\eta$.

Generalized Linear Models

GLM asymptotic-normality (§22.3 Thm 3) and the deviance LRT $\to \chi^2_k$ (§22.7 Thm 6) both rest on multivariate Taylor expansions of the log-likelihood around the MLE. The $\sqrt n$ scaling in the asymptotic distribution is exactly the Taylor-series remainder bound.

On to formalML — where this calculus powers ML

Gradient Descent

The descent lemma f(y) ≤ f(x) + ∇f(x)ᵀ(y−x) + L/2·‖y−x‖² is a second-order Taylor expansion with L-Lipschitz gradient remainder bound. Newton's method replaces gradient descent's linear Taylor model with the full quadratic Taylor model T₂(x), achieving quadratic convergence near optima.

Convex Analysis

A twice-differentiable function is convex iff its first-order Taylor expansion is a global lower bound: f(y) ≥ f(x) + ∇f(x)ᵀ(y−x). This characterization follows directly from Taylor's theorem with non-negative second derivative.

Information Geometry

The Fisher information matrix I(θ) is the Hessian of the KL divergence at θ = θ₀, computed via second-order Taylor expansion. The resulting Riemannian metric on parameter space is the foundation of natural gradient methods.

Smooth Manifolds

The smooth-vs-analytic distinction developed here is foundational: analytic functions are locally representable by convergent power series in coordinate charts, while smooth functions form the more general C^∞ category used in differential geometry.

References

book Rudin (1976). Principles of Mathematical Analysis Chapter 8 — the definitive treatment of power series, uniform convergence on compact subsets, and the algebra of power series
book Abbott (2015). Understanding Analysis Chapter 6 — power series and Taylor series with an emphasis on the role of uniform convergence in justifying interchange
book Spivak (2008). Calculus Chapter 23 — Taylor series with complete proofs of term-by-term theorems and the analytic vs. smooth distinction
book Folland (1999). Real Analysis: Modern Techniques and Their Applications Section 0.6 — power series in the context of analysis prerequisites for measure theory
book Bartle & Sherbert (2011). Introduction to Real Analysis Chapter 9 — power series convergence with careful attention to endpoint behavior and applications
paper Hendrycks & Gimpel (2016). “Gaussian Error Linear Units (GELUs)” GELU(x) = x·Φ(x) is approximated via its Taylor expansion 0.5x(1 + tanh(√(2/π)(x + 0.044715x³))) — a direct application of power series truncation in neural network activation design.
paper MacKay (1992). “A Practical Bayesian Framework for Backpropagation Networks” The Laplace approximation replaces the posterior with a Gaussian centered at the MAP estimate using a second-order Taylor expansion of the log-posterior — a foundational Bayesian ML technique.