Single-Variable Calculus · intermediate · 50 min read

Improper Integrals & Special Functions

Extending integration to unbounded intervals and unbounded integrands — the Gamma, Beta, and Gaussian integrals that are workhorses in probability and machine learning

Abstract. The Riemann integral ∫ₐᵇ f(x) dx requires both the interval [a,b] and the function f to be bounded. Improper integrals remove these restrictions by defining ∫ₐ∞ f(x) dx = lim_{b→∞} ∫ₐᵇ f(x) dx (Type I: unbounded intervals) and handling unbounded integrands via one-sided limits at singularities (Type II). The p-test provides the fundamental benchmark: ∫₁∞ 1/xᵖ dx converges if and only if p > 1, while ∫₀¹ 1/xᵖ dx converges if and only if p < 1. Comparison tests — direct and limit — reduce new convergence questions to these benchmarks. Three special functions built from improper integrals pervade probability and machine learning. The Gamma function Γ(s) = ∫₀∞ tˢ⁻¹ e⁻ᵗ dt extends the factorial to real numbers via the functional equation Γ(s+1) = sΓ(s), with Γ(n+1) = n! for positive integers. The Beta function B(a,b) = ∫₀¹ tᵃ⁻¹(1-t)ᵇ⁻¹ dt is the normalizing constant for the Beta distribution, connected to Gamma by B(a,b) = Γ(a)Γ(b)/Γ(a+b). The Gaussian integral ∫₋∞∞ e⁻ˣ² dx = √π normalizes the Gaussian distribution — its proof via polar coordinates is a celebrated application of multivariable substitution. Stirling’s approximation n! ≈ √(2πn)(n/e)ⁿ, derived from the Gamma function, gives the asymptotic behavior of factorials used throughout information theory and combinatorics. In machine learning, these functions appear as normalizing constants for probability distributions (Gaussian, Gamma, Beta, Chi-squared, Student-t), in Bayesian posterior computation, and in the analysis of tail probabilities and concentration inequalities.

Overview & Motivation

You’re computing the normalizing constant for a Gaussian distribution: $\int_{-\infty}^{\infty} e^{-x^2/2}\,dx$ . This integral has no closed-form antiderivative — $e^{-x^2/2}$ has no elementary antiderivative at all. Yet the integral equals $\sqrt{2\pi}$ , a fact that makes the entire Gaussian probability framework possible. More immediately: every density $f(x)$ must satisfy $\int_{-\infty}^{\infty} f(x)\,dx = 1$ , but this is an integral over an unbounded interval — the Riemann integral from Topic 7 doesn’t apply directly.

Improper integrals make these computations rigorous by defining integrals over unbounded domains (and of unbounded functions) as limits of the proper integrals we’ve already built. The idea is simple: if you can’t integrate all the way to infinity, integrate to some large number $b$ and ask what happens as $b \to \infty$ . If the limit is finite, the improper integral converges and we call that limit the value of the integral. If not, the integral diverges.

We’ll extend the Riemann integral to unbounded intervals (§2) and unbounded integrands (§3), develop convergence tests that determine convergence without computing exact values (§4), then meet the three special functions that pervade probability and ML: the Gamma function (§5), the Beta function (§6), and the Gaussian integral (§7). Stirling’s approximation (§8) gives the asymptotic behavior of $n!$ that appears throughout information theory and combinatorics.

Type I — Integration over Unbounded Intervals

Start with the picture: $f(x) = 1/x^2$ on $[1, \infty)$ . The curve drops toward zero, and we want the total area under it from $x = 1$ to the right forever. We can compute $\int_1^b \frac{1}{x^2}\,dx = 1 - \frac{1}{b}$ for any finite $b$ using the FTC. As $b \to \infty$ , this approaches $1$ . The area under an infinitely long curve is finite — this is the central surprise of improper integrals.

📐 Definition 1 (Type I Improper Integral)

If $f$ is Riemann integrable on $[a, b]$ for every $b > a$ , we define

$\int_a^{\infty} f(x)\,dx = \lim_{b \to \infty} \int_a^b f(x)\,dx,$

provided the limit exists. If the limit exists and is finite, the improper integral converges; otherwise it diverges.

Similarly, $\int_{-\infty}^b f(x)\,dx = \lim_{a \to -\infty} \int_a^b f(x)\,dx$ , and for doubly infinite integrals:

$\int_{-\infty}^{\infty} f(x)\,dx = \int_{-\infty}^c f(x)\,dx + \int_c^{\infty} f(x)\,dx$

for any $c \in \mathbb{R}$ , provided both pieces converge. The choice of $c$ doesn’t matter when both parts converge — this follows from the additivity of the Riemann integral over adjacent intervals.

📝 Example 1 (The p-test for Type I)

The integral $\int_1^{\infty} \frac{1}{x^p}\,dx$ is the fundamental benchmark for convergence at infinity.

For $p \neq 1$ :

$\int_1^b \frac{1}{x^p}\,dx = \frac{x^{1-p}}{1-p}\bigg|_1^b = \frac{b^{1-p} - 1}{1-p}.$

If $p > 1$ : the exponent $1 - p < 0$ , so $b^{1-p} \to 0$ as $b \to \infty$ . The integral converges to $\frac{1}{p-1}$ .
If $p < 1$ : the exponent $1 - p > 0$ , so $b^{1-p} \to \infty$ . The integral diverges.

For $p = 1$ : $\int_1^b \frac{1}{x}\,dx = \ln b \to \infty$ . Diverges.

The integral $\int_1^{\infty} \frac{1}{x^p}\,dx$ converges if and only if $p > 1$ . This is the $p$ -test — every other Type I convergence question reduces to comparing against $1/x^p$ .

📝 Example 2 (Exponential decay)

$\int_0^{\infty} e^{-x}\,dx = \lim_{b \to \infty} \left[-e^{-x}\right]_0^b = \lim_{b \to \infty} (1 - e^{-b}) = 1.$

Exponential decay is more than fast enough to make the area finite. In fact, $e^{-x}$ decays faster than any power $1/x^p$ , which is why exponential tails are so well-behaved in probability.

📝 Example 3 (Harmonic divergence)

$\int_1^{\infty} \frac{1}{x}\,dx = \lim_{b \to \infty} \ln b = \infty.$

The harmonic function $1/x$ decays, but too slowly — the area accumulates without bound. This is the borderline case $p = 1$ of the $p$ -test, and it’s the continuous analog of the divergent harmonic series $\sum 1/n$ .

💡 Remark 1 (Doubly improper integrals and the Cauchy principal value)

For $\int_{-\infty}^{\infty} f(x)\,dx$ , we split at any finite $c$ and require both $\int_{-\infty}^c f$ and $\int_c^{\infty} f$ to converge independently. The Cauchy principal value $\text{PV}\int_{-\infty}^{\infty} f(x)\,dx = \lim_{b \to \infty} \int_{-b}^{b} f(x)\,dx$ is a weaker notion — it can exist even when the improper integral diverges. For example, $\int_{-\infty}^{\infty} x\,dx$ has Cauchy principal value $0$ (by symmetry: $\int_{-b}^{b} x\,dx = 0$ for all $b$ ), but the improper integral diverges because $\int_0^{\infty} x\,dx = \infty$ .

b = 4.0

∫₁ᵇ f(x) dx = 0.750000Exact: 1.000000 | Error: 2.50e-1

p = 2 > 1, so the integral converges to 1/(p−1) = 1

Type I convergence: 1/x² converges while 1/x diverges

Type II — Integration of Unbounded Functions

Now consider $f(x) = 1/\sqrt{x}$ on $(0, 1]$ . The function is unbounded near $x = 0$ — it blows up to infinity. Yet the area under the curve from $\varepsilon$ to $1$ is $2(1 - \sqrt{\varepsilon})$ , which approaches $2$ as $\varepsilon \to 0^+$ . An infinitely tall region can have a finite area — the dual surprise to Type I.

📐 Definition 2 (Type II Improper Integral)

If $f$ is Riemann integrable on $[\varepsilon, b]$ for every $\varepsilon \in (a, b]$ but $f$ is unbounded near $a$ , we define

$\int_a^b f(x)\,dx = \lim_{\varepsilon \to a^+} \int_{\varepsilon}^b f(x)\,dx,$

provided the limit exists. Similarly, if $f$ is unbounded near $b$ :

$\int_a^b f(x)\,dx = \lim_{\varepsilon \to b^-} \int_a^{b - \varepsilon} f(x)\,dx.$

For singularities at an interior point $c \in (a, b)$ , we split: $\int_a^b f = \int_a^c f + \int_c^b f$ , and both limits must exist independently.

📝 Example 4 (The p-test for Type II)

The integral $\int_0^1 \frac{1}{x^p}\,dx$ is the benchmark for convergence near a singularity.

For $p \neq 1$ :

$\int_\varepsilon^1 \frac{1}{x^p}\,dx = \frac{x^{1-p}}{1-p}\bigg|_\varepsilon^1 = \frac{1 - \varepsilon^{1-p}}{1-p}.$

If $p < 1$ : $\varepsilon^{1-p} \to 0$ as $\varepsilon \to 0^+$ (since $1 - p > 0$ ). The integral converges to $\frac{1}{1-p}$ .
If $p > 1$ : $\varepsilon^{1-p} \to \infty$ . The integral diverges.

For $p = 1$ : $\int_\varepsilon^1 \frac{1}{x}\,dx = -\ln \varepsilon \to \infty$ . Diverges.

The integral $\int_0^1 \frac{1}{x^p}\,dx$ converges if and only if $p < 1$ . Note the complementary relationship with the Type I $p$ -test: $p > 1$ for convergence at infinity, $p < 1$ for convergence at zero. The borderline $p = 1$ diverges in both cases.

📝 Example 5 (Square root singularity)

$\int_0^1 \frac{1}{\sqrt{x}}\,dx$ : here $p = 1/2 < 1$ , so the integral converges. Explicitly:

$\int_\varepsilon^1 \frac{1}{\sqrt{x}}\,dx = 2\sqrt{x}\Big|_\varepsilon^1 = 2(1 - \sqrt{\varepsilon}) \to 2 \text{ as } \varepsilon \to 0^+.$

The area under $1/\sqrt{x}$ on $(0, 1]$ is finite ( $= 2$ ) despite the function blowing up at $0$ .

💡 Remark 2 (Interior singularities)

If $f$ has a singularity at $c \in (a, b)$ , we split: $\int_a^b f = \int_a^c f + \int_c^b f$ , and both parts must converge independently. For example:

$\int_{-1}^{1} \frac{1}{|x|^{2/3}}\,dx = \int_{-1}^{0} \frac{1}{|x|^{2/3}}\,dx + \int_0^1 \frac{1}{x^{2/3}}\,dx.$

Each piece converges ( $p = 2/3 < 1$ ), so the integral converges. The total value is $2 \cdot 3 = 6$ .

💡 Remark 3 (Both types simultaneously)

Some integrals involve both an unbounded interval and an unbounded integrand. For example, $\int_0^{\infty} \frac{1}{\sqrt{x}(1+x)}\,dx$ has a Type II singularity at $0$ (where $1/\sqrt{x}$ blows up) and requires a Type I analysis as $x \to \infty$ (where the integrand decays like $1/x^{3/2}$ ). Split at $x = 1$ and handle each piece separately: near $0$ , compare with $1/\sqrt{x}$ ( $p = 1/2 < 1$ , converges); at infinity, compare with $1/x^{3/2}$ ( $p = 3/2 > 1$ , converges). The integral converges and equals $\pi$ .

$Type II convergence: 1/\u221Ax converges while 1/x diverges near 0$

Convergence Tests

The $p$ -test is a powerful benchmark, but we need tools to determine convergence for integrals that aren’t exactly $1/x^p$ . The comparison tests reduce new convergence questions to known ones.

🔷 Theorem 1 (Comparison Test (Direct))

Suppose $0 \le f(x) \le g(x)$ for all $x \ge a$ .

(a) If $\int_a^{\infty} g(x)\,dx$ converges, then $\int_a^{\infty} f(x)\,dx$ converges, and $\int_a^{\infty} f \le \int_a^{\infty} g$ .

(b) If $\int_a^{\infty} f(x)\,dx$ diverges, then $\int_a^{\infty} g(x)\,dx$ diverges.

The analogous result holds for Type II integrals.

Proof.

For part (a): let $F(b) = \int_a^b f(x)\,dx$ and $G(b) = \int_a^b g(x)\,dx$ . By monotonicity of the Riemann integral (Topic 7, Theorem 3), $f \le g$ implies $F(b) \le G(b)$ for all $b > a$ . Since $f \ge 0$ , the function $F$ is increasing in $b$ : adding more non-negative area can only increase the integral. Since $G(b) \to \int_a^\infty g < \infty$ , the function $F$ is increasing and bounded above. By the Monotone Convergence principle (Topic 3), $\lim_{b \to \infty} F(b)$ exists and is finite.

Part (b) is the contrapositive of part (a): if $\int g$ converged, then $\int f$ would converge too, contradicting the hypothesis.

∎

📝 Example 6 (Convergence by comparison)

$\int_1^{\infty} \frac{1}{1+x^2}\,dx$ converges. For $x \ge 1$ :

$\frac{1}{1+x^2} \le \frac{1}{x^2},$

and $\int_1^{\infty} \frac{1}{x^2}\,dx$ converges ( $p = 2 > 1$ ). By the comparison test, $\int_1^{\infty} \frac{1}{1+x^2}\,dx$ converges.

In fact, we can compute the exact value: $\int_1^{\infty} \frac{1}{1+x^2}\,dx = \arctan(\infty) - \arctan(1) = \frac{\pi}{2} - \frac{\pi}{4} = \frac{\pi}{4}$ .

🔷 Theorem 2 (Limit Comparison Test)

Suppose $f(x) > 0$ and $g(x) > 0$ for $x \ge a$ , and $\lim_{x \to \infty} \frac{f(x)}{g(x)} = L$ .

If $0 < L < \infty$ : $\int_a^{\infty} f$ and $\int_a^{\infty} g$ either both converge or both diverge.
If $L = 0$ and $\int g$ converges: then $\int f$ converges.
If $L = \infty$ and $\int g$ diverges: then $\int f$ diverges.

Proof.

For the case $0 < L < \infty$ : choose $\varepsilon = L/2$ . There exists $M$ such that for $x \ge M$ :

$\frac{L}{2} < \frac{f(x)}{g(x)} < \frac{3L}{2},$

which gives $\frac{L}{2}g(x) < f(x) < \frac{3L}{2}g(x)$ for $x \ge M$ . Apply the direct comparison test:

If $\int g$ converges: $f(x) < \frac{3L}{2}g(x)$ , so $\int f$ converges by comparison with $\frac{3L}{2}\int g$ .
If $\int g$ diverges: $f(x) > \frac{L}{2}g(x)$ , so $\int f \ge \frac{L}{2}\int g$ diverges.

The cases $L = 0$ and $L = \infty$ follow by similar comparison arguments using one-sided bounds.

∎

📝 Example 7 (Limit comparison in action)

$\int_1^{\infty} \frac{x}{x^3 + 1}\,dx$ converges. Compare with $g(x) = \frac{1}{x^2}$ :

$\frac{f(x)}{g(x)} = \frac{x/(x^3 + 1)}{1/x^2} = \frac{x^3}{x^3 + 1} \to 1 \text{ as } x \to \infty.$

Since $L = 1 \in (0, \infty)$ and $\int_1^{\infty} \frac{1}{x^2}\,dx$ converges, the limit comparison test gives convergence. We didn’t need to compute the integral — knowing its asymptotic behavior was enough.

📐 Definition 3 (Absolute and Conditional Convergence)

An improper integral $\int_a^{\infty} f(x)\,dx$ converges absolutely if $\int_a^{\infty} |f(x)|\,dx$ converges. It converges conditionally if $\int_a^{\infty} f(x)\,dx$ converges but $\int_a^{\infty} |f(x)|\,dx$ diverges.

🔷 Theorem 3 (Absolute Convergence Implies Convergence)

If $\int_a^{\infty} |f(x)|\,dx$ converges, then $\int_a^{\infty} f(x)\,dx$ converges, and

$\left|\int_a^{\infty} f(x)\,dx\right| \le \int_a^{\infty} |f(x)|\,dx.$

Proof.

Write $f = f^+ - f^-$ where $f^+(x) = \max(f(x), 0)$ and $f^-(x) = \max(-f(x), 0)$ . Then $|f| = f^+ + f^-$ , and $0 \le f^+ \le |f|$ , $0 \le f^- \le |f|$ . Since $\int |f|$ converges, both $\int f^+$ and $\int f^-$ converge by the comparison test (Theorem 1). Then $\int f = \int f^+ - \int f^-$ converges.

∎

📝 Example 8 (The Dirichlet integral converges conditionally)

The Dirichlet integral $\int_0^{\infty} \frac{\sin x}{x}\,dx = \frac{\pi}{2}$ is a famous result (proved via contour integration or Laplace transforms — we state it without proof here). But $\int_0^{\infty} \frac{|\sin x|}{x}\,dx$ diverges: on each interval $[n\pi, (n+1)\pi]$ ,

$\int_{n\pi}^{(n+1)\pi} \frac{|\sin x|}{x}\,dx \ge \frac{1}{(n+1)\pi}\int_{n\pi}^{(n+1)\pi} |\sin x|\,dx = \frac{2}{(n+1)\pi},$

and $\sum_{n=0}^{\infty} \frac{2}{(n+1)\pi}$ diverges (harmonic series). The integral converges conditionally but not absolutely — positive and negative oscillations cancel just enough to produce a finite result, but the total variation is infinite.

💡 Remark 4 (Comparison tests for Type II)

The comparison and limit comparison tests apply to Type II improper integrals with the obvious modifications: compare behavior near the singularity instead of at infinity. For $\int_0^1 f(x)\,dx$ with a singularity at $0$ , compare $f(x)$ to $1/x^p$ as $x \to 0^+$ ; the $p$ -test ( $p < 1$ for convergence) provides the benchmark.

b = 10.0Show ratio f/g

\u222B f = 0.685730\u222B g = 0.900000Test: direct comparison \u2192 converges

Comparison test: direct comparison and limit comparison

Absolute vs. conditional convergence: the Dirichlet integral

The Gamma Function

The factorial $n! = 1 \cdot 2 \cdot 3 \cdots n$ is defined only for non-negative integers. Is there a smooth function that passes through the points $(1, 1), (2, 1), (3, 2), (4, 6), (5, 24), \ldots$ — that is, through $(n+1, n!)$ — and extends the factorial to all positive real numbers? Euler found one: $\Gamma(s) = \int_0^\infty t^{s-1} e^{-t}\,dt$ . This integral converges for $s > 0$ , and integration by parts yields $\Gamma(s+1) = s\Gamma(s)$ , which gives $\Gamma(n+1) = n!$ for positive integers.

📐 Definition 4 (The Gamma Function)

For $s > 0$ , define

$\Gamma(s) = \int_0^{\infty} t^{s-1} e^{-t}\,dt.$

This is a doubly improper integral: the factor $t^{s-1}$ may blow up at $t = 0$ (a Type II singularity when $s < 1$ ), and the integral extends to $t = \infty$ (Type I).

🔷 Proposition 1 (Convergence of the Gamma Integral)

$\Gamma(s)$ converges for all $s > 0$ .

Near $t = 0$ : $t^{s-1}e^{-t} \le t^{s-1}$ (since $e^{-t} \le 1$ for $t \ge 0$ ), and $\int_0^1 t^{s-1}\,dt$ converges if and only if $s - 1 > -1$ , i.e., $s > 0$ . This is the Type II $p$ -test with $p = 1 - s < 1$ .

Near $t = \infty$ : For any $s > 0$ , the exponential decay $e^{-t}$ dominates the polynomial growth $t^{s-1}$ . Specifically, for $t$ large enough, $t^{s-1}e^{-t} \le e^{-t/2}$ (since $t^{s-1} \le e^{t/2}$ for $t$ sufficiently large), and $\int_1^{\infty} e^{-t/2}\,dt = 2e^{-1/2}$ converges.

Proof.

Split the integral at $t = 1$ : $\Gamma(s) = \int_0^1 t^{s-1}e^{-t}\,dt + \int_1^{\infty} t^{s-1}e^{-t}\,dt$ .

For the first piece: $0 \le t^{s-1}e^{-t} \le t^{s-1}$ on $[0, 1]$ (since $e^{-t} \le 1$ ). The integral $\int_0^1 t^{s-1}\,dt = \frac{t^s}{s}\big|_0^1 = \frac{1}{s}$ converges for $s > 0$ .

For the second piece: we need $t^{s-1}e^{-t/2} \to 0$ as $t \to \infty$ . Since $\lim_{t \to \infty} t^{s-1}/e^{t/2} = 0$ for any $s$ (exponential growth dominates polynomial), there exists $T > 1$ such that $t^{s-1} \le e^{t/2}$ for $t \ge T$ . Then $t^{s-1}e^{-t} \le e^{-t/2}$ for $t \ge T$ , and $\int_T^{\infty} e^{-t/2}\,dt = 2e^{-T/2} < \infty$ . The integral $\int_1^T t^{s-1}e^{-t}\,dt$ is a proper Riemann integral (finite interval, bounded integrand), so it converges trivially.

∎

🔷 Theorem 4 (Gamma Functional Equation)

For $s > 0$ : $\Gamma(s+1) = s\,\Gamma(s)$ .

Proof.

Integration by parts with $u = t^s$ and $dv = e^{-t}\,dt$ :

$\Gamma(s+1) = \int_0^{\infty} t^s e^{-t}\,dt = \left[-t^s e^{-t}\right]_0^{\infty} + s\int_0^{\infty} t^{s-1} e^{-t}\,dt.$

The boundary term vanishes at both limits:

At $t = 0$ : $t^s e^{-t} \to 0$ (for $s > 0$ ).
At $t = \infty$ : $t^s e^{-t} \to 0$ (exponential decay dominates polynomial growth).

Therefore $\Gamma(s+1) = 0 + s\,\Gamma(s) = s\,\Gamma(s)$ .

∎

🔷 Theorem 5 (Gamma and the Factorial)

For positive integers $n$ : $\Gamma(n+1) = n!$ . Also, $\Gamma(1) = 1$ and $\Gamma(1/2) = \sqrt{\pi}$ .

Proof.

Base case: $\Gamma(1) = \int_0^{\infty} e^{-t}\,dt = 1$ .

Induction: By the functional equation, $\Gamma(2) = 1 \cdot \Gamma(1) = 1 = 1!$ , $\Gamma(3) = 2 \cdot \Gamma(2) = 2 = 2!$ , and generally $\Gamma(n+1) = n \cdot \Gamma(n) = n \cdot (n-1)! = n!$ .

Half-integer value: $\Gamma(1/2) = \int_0^{\infty} t^{-1/2} e^{-t}\,dt$ . Substitute $t = u^2$ , $dt = 2u\,du$ :

$\Gamma(1/2) = \int_0^{\infty} (u^2)^{-1/2} e^{-u^2} \cdot 2u\,du = 2\int_0^{\infty} e^{-u^2}\,du = \sqrt{\pi},$

where the last equality is the Gaussian integral (Theorem 7 below).

∎

📝 Example 9 (Half-integer values)

Using the functional equation $\Gamma(s+1) = s\Gamma(s)$ repeatedly:

$\Gamma(3/2) = \frac{1}{2}\Gamma(1/2) = \frac{\sqrt{\pi}}{2}, \qquad \Gamma(5/2) = \frac{3}{2}\Gamma(3/2) = \frac{3\sqrt{\pi}}{4}.$

In general, $\Gamma(n + 1/2) = \frac{(2n)!}{4^n n!}\sqrt{\pi}$ . These half-integer Gamma values appear in the surface area of spheres, the volume of balls in $\mathbb{R}^n$ , and the normalizing constants of the Student- $t$ and Chi-squared distributions.

💡 Remark 5 (The Bohr-Mollerup theorem)

The Gamma function is not just a factorial extension — it’s the factorial extension with the best analytic properties. The Bohr-Mollerup theorem states that $\Gamma$ is the unique function on $(0, \infty)$ satisfying: (i) $\Gamma(1) = 1$ , (ii) $\Gamma(s+1) = s\,\Gamma(s)$ , and (iii) $\log \Gamma$ is convex. The log-convexity condition rules out other interpolations like $f(s) = \Gamma(s)(1 + \varepsilon\sin(2\pi s))$ that also satisfy (i) and (ii) but oscillate between the integer values.

s = 3.00log \u0393(s)Show s < 0

\u0393(3.00) = 2.000000log \u0393 = 0.693147Near: Γ(3) = 2! = 2

The Gamma function and its integrand

The Beta Function

📐 Definition 5 (The Beta Function)

For $a, b > 0$ , define

$B(a, b) = \int_0^1 t^{a-1}(1-t)^{b-1}\,dt.$

This is a doubly improper integral when $a < 1$ or $b < 1$ (Type II singularities at $t = 0$ or $t = 1$ ). The integral converges for all $a, b > 0$ by the Type II $p$ -test at each endpoint: near $t = 0$ , the integrand behaves like $t^{a-1}$ , and $\int_0^{1/2} t^{a-1}\,dt$ converges iff $a > 0$ ; near $t = 1$ , it behaves like $(1-t)^{b-1}$ , and $\int_{1/2}^1 (1-t)^{b-1}\,dt$ converges iff $b > 0$ .

🔷 Theorem 6 (Beta-Gamma Relationship)

For all $a, b > 0$ :

$B(a, b) = \frac{\Gamma(a)\,\Gamma(b)}{\Gamma(a+b)}.$

Proof.

Compute the product $\Gamma(a)\,\Gamma(b)$ as a double integral:

$\Gamma(a)\,\Gamma(b) = \int_0^{\infty} \int_0^{\infty} s^{a-1} t^{b-1} e^{-(s+t)}\,ds\,dt.$

Substitute $s = uv$ , $t = u(1-v)$ with $u \in (0, \infty)$ and $v \in (0, 1)$ . The Jacobian of this transformation is $\left|\frac{\partial(s,t)}{\partial(u,v)}\right| = u$ , so:

$\Gamma(a)\,\Gamma(b) = \int_0^{\infty} \int_0^1 (uv)^{a-1}\bigl(u(1-v)\bigr)^{b-1} e^{-u} \cdot u\,dv\,du.$

Separating:

$= \underbrace{\int_0^{\infty} u^{a+b-1} e^{-u}\,du}_{\Gamma(a+b)} \cdot \underbrace{\int_0^1 v^{a-1}(1-v)^{b-1}\,dv}_{B(a,b)}.$

Therefore $\Gamma(a)\,\Gamma(b) = \Gamma(a+b) \cdot B(a,b)$ , which gives $B(a,b) = \frac{\Gamma(a)\,\Gamma(b)}{\Gamma(a+b)}$ .

(This proof uses Fubini’s theorem and a 2D change of variables — multivariable techniques that will be developed rigorously in Track 4. We preview them here because the result is too important to postpone.)

∎

📝 Example 10 (B(1/2, 1/2) = pi)

$B(1/2, 1/2) = \frac{\Gamma(1/2)^2}{\Gamma(1)} = \frac{(\sqrt{\pi})^2}{1} = \pi.$

Direct verification: $\int_0^1 \frac{1}{\sqrt{t(1-t)}}\,dt = \int_0^1 \frac{dt}{\sqrt{t - t^2}}$ . Completing the square and substituting $t = \frac{1}{2}(1 + \sin\theta)$ gives $\pi$ .

📝 Example 11 (B(a, 1) and B(1, b))

$B(a, 1) = \int_0^1 t^{a-1}\,dt = \frac{1}{a}$ .

Verify via the Gamma relationship: $\frac{\Gamma(a)\,\Gamma(1)}{\Gamma(a+1)} = \frac{\Gamma(a) \cdot 1}{a\,\Gamma(a)} = \frac{1}{a}$ . Similarly, $B(1, b) = 1/b$ .

💡 Remark 6 (The Beta distribution)

If $X \sim \text{Beta}(\alpha, \beta)$ , its density is $f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}$ for $x \in (0, 1)$ . The Beta function is precisely the normalizing constant that makes $\int_0^1 f(x)\,dx = 1$ . The parameters $\alpha, \beta > 0$ control the shape: $\alpha = \beta = 1$ gives the uniform distribution on $[0, 1]$ ; $\alpha = \beta \gg 1$ concentrates the density around $x = 1/2$ ; $\alpha > \beta$ skews the density toward $x = 1$ .

In Bayesian inference, the Beta distribution is the conjugate prior for the Binomial likelihood. If you observe $k$ successes in $n$ trials and your prior is $\text{Beta}(\alpha, \beta)$ , the posterior is $\text{Beta}(\alpha + k, \beta + n - k)$ . The marginal likelihood involves the ratio $B(\alpha + k, \beta + n - k) / B(\alpha, \beta)$ , which reduces to a ratio of Gamma functions.

Beta function integrand shapes and Beta distribution densities

The Gaussian Integral

The Gaussian integral is arguably the most important single integral in all of applied mathematics. It normalizes the Gaussian distribution, appears in the definition of $\Gamma(1/2)$ , gives the volume of the $n$ -sphere, and shows up in quantum mechanics, statistical physics, and signal processing.

🔷 Theorem 7 (The Gaussian Integral)

$\int_{-\infty}^{\infty} e^{-x^2}\,dx = \sqrt{\pi}.$

Equivalently, $\int_0^{\infty} e^{-x^2}\,dx = \frac{\sqrt{\pi}}{2}$ .

Proof.

Let $I = \int_0^{\infty} e^{-x^2}\,dx$ . The key trick is to compute $I^2$ as a double integral:

$I^2 = \left(\int_0^{\infty} e^{-x^2}\,dx\right)\left(\int_0^{\infty} e^{-y^2}\,dy\right) = \int_0^{\infty}\int_0^{\infty} e^{-(x^2 + y^2)}\,dx\,dy.$

Convert to polar coordinates: $x = r\cos\theta$ , $y = r\sin\theta$ , with Jacobian $r$ , and $x^2 + y^2 = r^2$ :

$I^2 = \int_0^{\pi/2}\int_0^{\infty} e^{-r^2}\,r\,dr\,d\theta = \frac{\pi}{2}\int_0^{\infty} r\,e^{-r^2}\,dr.$

The inner integral evaluates by substituting $u = r^2$ , $du = 2r\,dr$ :

$\int_0^{\infty} r\,e^{-r^2}\,dr = \frac{1}{2}\int_0^{\infty} e^{-u}\,du = \frac{1}{2}.$

Therefore $I^2 = \frac{\pi}{2} \cdot \frac{1}{2} = \frac{\pi}{4}$ , so $I = \frac{\sqrt{\pi}}{2}$ , and $\int_{-\infty}^{\infty} e^{-x^2}\,dx = 2I = \sqrt{\pi}$ .

(This proof uses a double integral and polar coordinates — multivariable tools that will be developed rigorously in Track 4. We preview them here because the result is too important and the proof too elegant to postpone. The reader can follow the geometry: squaring the integral converts a 1D problem into a 2D problem where the radial symmetry of $e^{-(x^2+y^2)}$ makes the integral tractable.)

∎

🔷 Proposition 2 (Gaussian Integral Variants)

(a) For $a > 0$ : $\int_{-\infty}^{\infty} e^{-ax^2}\,dx = \sqrt{\pi/a}$ . (Substitute $u = \sqrt{a}\,x$ .)

(b) Completing the square: $\int_{-\infty}^{\infty} e^{-ax^2 + bx}\,dx = \sqrt{\pi/a}\,e^{b^2/(4a)}$ .

(c) Even moments: $\int_{-\infty}^{\infty} x^{2n} e^{-x^2}\,dx = \frac{(2n)!}{4^n n!}\sqrt{\pi}$ .

These follow from (a) by differentiation with respect to $a$ (for the moments) or completing the square in the exponent.

📝 Example 12 (Normalizing the Gaussian density)

The Gaussian density is $f(x) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{-(x-\mu)^2/(2\sigma^2)}$ . We verify $\int_{-\infty}^{\infty} f(x)\,dx = 1$ :

Substitute $u = (x - \mu)/(\sigma\sqrt{2})$ , so $dx = \sigma\sqrt{2}\,du$ :

$\int_{-\infty}^{\infty} f(x)\,dx = \frac{1}{\sigma\sqrt{2\pi}} \cdot \sigma\sqrt{2} \int_{-\infty}^{\infty} e^{-u^2}\,du = \frac{\sqrt{2}}{\sqrt{2\pi}} \cdot \sqrt{\pi} = 1. \quad \checkmark$

The $\sqrt{2\pi}$ in the denominator of the Gaussian density is there precisely because $\int e^{-x^2}\,dx = \sqrt{\pi}$ — it’s the Gaussian integral doing the normalizing.

💡 Remark 7 (The error function and Gaussian CDF)

The error function $\text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_0^x e^{-t^2}\,dt$ is the normalized incomplete Gaussian integral. The Gaussian CDF is:

$\Phi(x) = \frac{1}{2}\left[1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right].$

The error function has no closed-form antiderivative — $e^{-t^2}$ cannot be integrated in terms of elementary functions. But it can be computed numerically to machine precision by standard libraries (scipy.special.erf, torch.special.erf). The rapid convergence of the Gaussian tail ( $\Phi(x) \approx 1 - \frac{1}{x\sqrt{2\pi}}e^{-x^2/2}$ for large $x$ , known as Mill’s ratio) is what makes sub-Gaussian concentration inequalities so powerful.

b = 3.0a = 1.02D polar proofGaussian density

∫ e^(-x²) dx = 1.772415Exact \u221A(\u03C0/a) = 1.772454 | Error: 3.92e-5

The Gaussian integral and polar coordinates proof

Stirling’s Approximation

🔷 Theorem 8 (Stirling's Approximation)

$n! \sim \sqrt{2\pi n}\left(\frac{n}{e}\right)^n \quad \text{as } n \to \infty.$

More precisely: $\lim_{n \to \infty} \frac{n!}{\sqrt{2\pi n}(n/e)^n} = 1$ . The relative error is $O(1/n)$ :

$n! = \sqrt{2\pi n}\left(\frac{n}{e}\right)^n \left(1 + \frac{1}{12n} + O(1/n^2)\right).$

Proof.

We sketch the proof via the Gamma function and Laplace’s method. Write $n! = \Gamma(n+1) = \int_0^{\infty} t^n e^{-t}\,dt$ . The integrand $h(t) = t^n e^{-t}$ has its maximum at $t = n$ (set $h'(t) = 0$ : $nt^{n-1}e^{-t} - t^n e^{-t} = 0$ gives $t = n$ ).

Substitute $t = n + \sqrt{n}\,u$ to center the integrand at its maximum. Taylor-expanding $\log h(t)$ around $t = n$ :

$\log h(n + \sqrt{n}\,u) = \log h(n) + 0 \cdot (\sqrt{n}\,u) - \frac{u^2}{2} + O(u^3/\sqrt{n}).$

So $h(n + \sqrt{n}\,u) \approx h(n) \cdot e^{-u^2/2}$ for the leading term. Integrating:

$\Gamma(n+1) \approx h(n) \cdot \sqrt{n} \int_{-\infty}^{\infty} e^{-u^2/2}\,du = n^n e^{-n} \cdot \sqrt{n} \cdot \sqrt{2\pi} = \sqrt{2\pi n}\left(\frac{n}{e}\right)^n.$

The Laplace approximation replaces the integrand by a Gaussian centered at its maximum — the $\sqrt{2\pi}$ factor is precisely the Gaussian integral. The correction terms (the $1/(12n)$ and higher) come from including the cubic and higher Taylor terms of $\log h$ .

∎

📝 Example 13 (Numerical verification)

$n$	$n!$	Stirling	Relative error
5	120	118.02	1.65%
10	3,628,800	3,598,696	0.83%
20	$2.433 \times 10^{18}$	$2.423 \times 10^{18}$	0.42%
50	$3.041 \times 10^{64}$	$3.036 \times 10^{64}$	0.17%
100	$9.333 \times 10^{157}$	$9.325 \times 10^{157}$	0.083%

The approximation improves as $n$ increases — the relative error decreases like $1/(12n)$ .

💡 Remark 8 (Stirling in log form)

In practice, the most useful form is:

$\log(n!) \approx n\log n - n + \frac{1}{2}\log(2\pi n).$

This is the form used throughout information theory. The key application: $\log\binom{n}{k} = \log(n!) - \log(k!) - \log((n-k)!)$ . Applying Stirling to each factorial:

$\log\binom{n}{k} \approx nH(k/n),$

where $H(p) = -p\log p - (1-p)\log(1-p)$ is the binary entropy function. Stirling’s approximation converts combinatorial counting problems into entropy calculations — this is the foundation of the “method of types” in information theory.

n max = 20Log scaleImproved StirlingLog form

20! = 2.4329e+18Stirling: 2.4228e+18Relative error: 0.418%

Stirling's approximation vs. n! on log scale

Computational Notes

The special functions from this topic have production-quality numerical implementations in every scientific computing library.

Improper integrals: scipy.integrate.quad handles improper integrals directly — pass np.inf as the upper limit. It uses adaptive Gauss-Kronrod quadrature with automatic singularity detection. For most well-behaved integrands, it achieves machine precision ( $\sim 10^{-15}$ relative error).

from scipy.integrate import quad
import numpy as np

# Type I: integral from 1 to infinity of 1/x^2
val, err = quad(lambda x: 1/x**2, 1, np.inf)  # val = 1.0

# Type II: integral from 0 to 1 of 1/sqrt(x)
val, err = quad(lambda x: 1/np.sqrt(x), 0, 1)  # val = 2.0

Special functions: scipy.special provides gamma, beta, erf, and dozens more. For numerical stability, always use scipy.special.gammaln (log-Gamma) instead of computing $\Gamma(n)$ directly — $\Gamma(n)$ overflows double precision for $n > 171$ , but $\log\Gamma(n)$ is representable for much larger $n$ .

from scipy.special import gamma, gammaln, beta, erf
import math

gamma(5)      # 24.0 (= 4!)
gammaln(1000) # 5905.22... (log(999!) — doesn't overflow)
beta(2, 3)    # 0.08333... (= 1/12)
erf(1.0)      # 0.8427...

Rule of thumb: Never compute $\Gamma(n)$ or $n!$ directly for large $n$ . Always work with $\log\Gamma$ and exponentiate only at the end, if needed.

Numerical convergence of truncated improper integrals

Connections to Statistics

Improper integrals are the workhorses of statistical theory: every continuous distribution on an unbounded support is defined via a normalizing improper integral, and tail probabilities, moments, and Bayesian normalizing constants all reduce to such integrals.

Special functions as normalizing constants

The Gamma function $\Gamma(s) = \int_0^\infty t^{s-1} e^{-t} \, dt$ normalizes the Gamma, Chi-squared, and Student- $t$ distributions. The Beta function normalizes Beta distributions. The Gaussian integral $\int_{-\infty}^\infty e^{-x^2} dx = \sqrt{\pi}$ normalizes the Normal. Every continuous distribution on an unbounded support is defined via an improper integral; conjugate priors (Beta-Binomial, Gamma-Poisson, Normal-Normal) are constructed precisely so the posterior normalizer reduces to a ratio of these special functions. See formalStatistics Continuous Distributions and formalStatistics Bayesian Foundations & Prior Selection.

Tail probabilities and large deviations

Tail probabilities $P(X > t) = \int_t^\infty f(x) \, dx$ are improper integrals. The decay rate (exponential vs. polynomial) determines whether the distribution is sub-Gaussian or heavy-tailed. The Cramér rate function in large-deviation theory controls the exponential rate at which these tail integrals shrink. See formalStatistics Large Deviations.

Moment existence

For heavy-tailed distributions, $E[X^k] = \int x^k f(x) \, dx$ is an improper-integral convergence question: the moment exists iff the integral converges, which depends on the tail decay rate relative to the power $k$ . The Cauchy distribution, with $f(x) \propto 1/(1+x^2)$ , has no finite mean — a fact that is purely about whether $\int x \cdot 1/(1+x^2) \, dx$ converges. See formalStatistics Expectation & Moments.

Connections to ML

The special functions from this topic are the computational backbone of probability distributions in machine learning. We make the connections explicit.

Normalizing constants for probability distributions

Every probability density $f(x)$ satisfies $\int f = 1$ . For the standard parametric families, the normalizing constant is a special function from this topic:

Distribution	Density kernel	Normalizing constant	Special function
$\mathcal{N}(\mu, \sigma^2)$	$e^{-(x-\mu)^2/(2\sigma^2)}$	$\sigma\sqrt{2\pi}$	Gaussian integral
$\text{Gamma}(\alpha, \beta)$	$x^{\alpha-1}e^{-\beta x}$	$\Gamma(\alpha)/\beta^\alpha$	Gamma function
$\text{Beta}(\alpha, \beta)$	$x^{\alpha-1}(1-x)^{\beta-1}$	$B(\alpha, \beta)$	Beta function
$\chi^2_k$	$x^{k/2-1}e^{-x/2}$	$2^{k/2}\Gamma(k/2)$	Gamma function
$t_\nu$ (Student)	$(1 + x^2/\nu)^{-(\nu+1)/2}$	$\frac{\Gamma((\nu+1)/2)}{\Gamma(\nu/2)\sqrt{\nu\pi}}$	Gamma function

Every time you write down a Gaussian, Gamma, or Beta distribution in a model, you’re implicitly using the improper integrals from this topic.

Forward link: Measure-Theoretic Probability on formalML develops the Lebesgue integral framework that makes these normalizing constants rigorous.

Bayesian posterior computation

Conjugate priors are chosen so that $\int \text{likelihood} \times \text{prior}\,d\theta$ reduces to a ratio of special functions:

Beta-Binomial: Prior $\text{Beta}(\alpha, \beta)$ , posterior $\text{Beta}(\alpha + k, \beta + n - k)$ . The marginal likelihood is $\frac{B(\alpha + k, \beta + n - k)}{B(\alpha, \beta)}$ — a ratio of Beta functions.
Gamma-Poisson: Prior $\text{Gamma}(\alpha, \beta)$ , posterior $\text{Gamma}(\alpha + \sum x_i, \beta + n)$ . Again, ratios of Gamma functions.
Normal-Normal: Posterior precision is the sum of prior and likelihood precisions. The normalizing constant involves $\sqrt{2\pi}$ .

When conjugacy is unavailable, these integrals must be computed numerically via MCMC or variational inference.

Forward link: Bayesian Nonparametrics on formalML develops the general Bayesian inference framework.

Stirling’s approximation in information theory

The entropy of the binomial distribution $\text{Bin}(n, p)$ is:

$H \approx \frac{1}{2}\log(2\pi n p(1-p))$

via Stirling applied to $\binom{n}{k}$ . The “method of types” uses $\log\binom{n}{k} \approx nH(k/n)$ to count the number of binary strings with a given empirical frequency. Stirling converts combinatorial counting into entropy maximization.

Forward link: Shannon Entropy on formalML develops the full information-theoretic framework.

Tail probabilities and concentration

The tail probability $P(X > t) = \int_t^{\infty} f(x)\,dx$ is an improper integral. For a standard Gaussian:

$P(X > t) = \frac{1}{2}\text{erfc}\left(\frac{t}{\sqrt{2}}\right) \le \frac{1}{t\sqrt{2\pi}}e^{-t^2/2}$

(Mill’s ratio bound). The tail decay rate determines whether the distribution is sub-Gaussian, sub-exponential, or heavy-tailed, and thereby governs the strength of available concentration inequalities.

Forward link: Concentration Inequalities on formalML develops sub-Gaussian and sub-exponential tail bounds.

ML connections: normalizing constants and Bayesian posteriors

Connections & Further Reading

Prerequisites — topics you need first

foundational Single-Variable Calculus 50 min

The Riemann Integral & FTC

The Riemann integral defined on bounded functions on bounded intervals is the starting point. Improper integrals extend this via limits: ∫₁∞ f = lim_{b→∞} ∫₁ᵇ f. The FTC, linearity, monotonicity, and comparison properties of the integral are used throughout.

intermediate Single-Variable Calculus 55 min

Mean Value Theorem & Taylor Expansion

Taylor expansion near singularities determines the convergence behavior of Type II improper integrals. Stirling’s approximation uses the method of Laplace (a saddle-point approximation that is essentially a second-order Taylor expansion of the log-integrand). The limit comparison test relies on the asymptotic analysis tools from Topic 6.

foundational Limits & Continuity 40 min

Sequences, Limits & Convergence

Improper integrals are defined as limits of proper integrals. The convergence/divergence analysis parallels the convergence theory of sequences from Topic 1, and the comparison test for improper integrals is the continuous analog of the comparison test for sequences.

intermediate Limits & Continuity 40 min

Completeness & Compactness

The Monotone Convergence principle for sequences (a consequence of completeness, Topic 3) justifies the existence of limits defining convergent improper integrals: if the truncated integrals form a bounded, monotone sequence, the limit exists.

foundational Single-Variable Calculus 45 min

The Derivative & Chain Rule

The Gamma function’s functional equation Γ(s+1) = sΓ(s) is proved via integration by parts (the product rule in reverse, Topic 5/Topic 7). Differentiation under the integral sign — differentiating ∫ f(x,t) dt with respect to a parameter — uses the derivative theory from Topic 5.

Where this leads — next in formalCalculus

foundational probability-foundations 40 min

Probability & The Union Bound

On to formalStatistics — where this calculus powers inference

Continuous Distributions

The Gamma function Γ(s) = ∫₀^∞ t^(s-1) e^(-t) dt normalizes the Gamma, Chi-squared, and Student-t distributions. The Beta function normalizes Beta distributions. The Gaussian integral ∫ e^(-x²) dx = √π normalizes the Normal. Every continuous distribution on unbounded support is defined via an improper integral.

Bayesian Foundations And Prior Selection

Conjugate priors (Beta-Binomial, Gamma-Poisson, Normal-Normal) are constructed so the posterior normalizer reduces to a ratio of Gamma or Beta functions. Improper priors (Jeffreys, flat) require the improper-integral machinery to verify posterior propriety.

Large Deviations

Tail probabilities P(X > t) = ∫_t^∞ f(x) dx are improper integrals. The rate-function asymptotics in Cramér's theorem control the exponential decay of these tail integrals.

Expectation Moments

Moment existence for heavy-tailed distributions is an improper-integral convergence question: E[X^k] = ∫ x^k f(x) dx exists iff the integral converges, which depends on the tail decay rate relative to the power k.

On to formalML — where this calculus powers ML

Measure Theoretic Probability

Every probability density satisfies ∫ f(x) dx = 1, an improper integral. The Gamma and Beta functions are normalizing constants for the Gamma, Chi-squared, Beta, and Dirichlet distributions. The transition from improper Riemann integrals to Lebesgue integrals resolves convergence issues with conditional integrals and enables measure-theoretic probability.

Bayesian Nonparametrics

Bayesian posteriors require ∫ likelihood × prior dθ over unbounded parameter spaces. Conjugate priors (Gamma-Poisson, Beta-Binomial, Normal-Normal) are designed so these improper integrals reduce to ratios of Gamma and Beta functions, yielding closed-form posteriors.

Shannon Entropy

Differential entropy h(X) = -∫ f(x) log f(x) dx and KL divergence are improper integrals over ℝ. For the Gaussian, this evaluates to ½ log(2πeσ²) via the Gaussian integral. Stirling’s approximation gives the entropy of the binomial distribution: H(Bin(n,p)) ≈ ½ log(2πnp(1-p)).

Concentration Inequalities

Tail bounds P(X > t) = ∫ᵗ∞ f(x) dx are improper integrals. The decay rate (exponential vs. polynomial tails) determines sub-Gaussian vs. heavy-tailed behavior. Stirling’s approximation appears in sharp bounds for binomial tails via the entropy method.

References

book Abbott (2015). Understanding Analysis Chapter 7.4 covers improper integrals as an extension of the Riemann integral — our primary reference for the convergence theory and comparison tests
book Rudin (1976). Principles of Mathematical Analysis Chapter 8 develops special functions including the Gamma function with characteristic concision — useful for the functional equation and Stirling’s approximation
book Spivak (2008). Calculus Chapter 18 on improper integrals with geometric motivation, and Chapter 19 on the Gamma function — the best reference for combining rigor with intuition
book Graham, Knuth & Patashnik (1994). Concrete Mathematics Chapter 9 develops Stirling’s approximation with detailed asymptotics — the most thorough treatment of the factorial’s asymptotic behavior
book Folland (1999). Real Analysis Chapter 2 on the Lebesgue integral — useful for understanding where improper Riemann integration fails and why the Lebesgue framework handles these issues naturally
book Bishop (2006). Pattern Recognition and Machine Learning Appendix B collects the special function identities (Gamma, Beta, Gaussian) used throughout Bayesian ML — the ML practitioner’s reference for these functions