On the Correlation Structure of a Greedy Harmonic Decomposition of the Sine Function
Victor Geere
March 2026
Table of Contents
- Abstract
- 1. Notation and Preliminaries
- 2. The Harmonic Sine Reconstruction
- 3. Convergence of the Harmonic Reconstruction
- 4. Selection Monotonicity and Correlation Structure
- 5. Asymptotic Distribution of Threshold Angles
- 6. Analysis of the Correlation Kernel
- 7. Discussion and Open Questions
- References
Abstract
We study a greedy algorithm that decomposes any target angle \(\theta \in [0,\pi]\) into a sub-sum of the harmonic angles \(\alpha_n = \pi/(n+2)\). The algorithm produces binary selection indicators \(\delta_n(\theta) \in \{0,1\}\) whose partial sums reconstruct \(\sin(\theta)\) via iterated application of the angle addition formula. We prove the exact reconstruction theorem with an explicit convergence rate of \(O(1/N)\), establish that the selection map \(\theta \mapsto \delta_n(\theta)\) is monotone for each fixed \(n\), and use this monotonicity to define threshold angles \(\theta_n^*\) at which each index is first selected. The main contribution is a complete computation of the correlation kernel \(K(n,m) = \int_0^\pi [\delta_n(\theta) - \theta/\pi][\delta_m(\theta) - \theta/\pi]\,d\theta\), expressed in closed form via the threshold angles. We investigate the asymptotic distribution of the thresholds \(\theta_n^*\) and establish several structural properties of this deterministic combinatorial-geometric object.
1. Notation and Preliminaries
- \(\Theta(\cdot)\) is the Heaviside step function: \(\Theta(x) = 1\) if \(x \geq 0\), else \(0\).
- \(\operatorname{sgn}(x)\) is the sign function.
- \(H_k = \sum_{j=1}^{k} 1/j\) denotes the \(k\)-th harmonic number, with \(H_k = \ln k + \gamma + O(1/k)\) where \(\gamma \approx 0.5772\) is the Euler–Mascheroni constant.
- We write \(f \ll g\) to mean \(|f| \leq C|g|\) for an absolute constant \(C > 0\).
- We write \(f \asymp g\) to mean \(f \ll g\) and \(g \ll f\).
2. The Harmonic Sine Reconstruction
Base case. \(s_0 = \sin(0) = 0\), \(c_0 = \cos(0) = 1\). ✓
Inductive step. If \(\delta_n = 0\), then \(\theta_{n+1} = \theta_n\) and \(s_{n+1} = s_n = \sin(\theta_n) = \sin(\theta_{n+1})\). If \(\delta_n = 1\), then \(\theta_{n+1} = \theta_n + \alpha_n\) and by the angle-addition formula:
\[ s_{n+1} = \sin(\theta_n)\cos(\alpha_n) + \cos(\theta_n)\sin(\alpha_n) = \sin(\theta_n + \alpha_n) = \sin(\theta_{n+1}). \] Similarly for \(c_{n+1} = \cos(\theta_{n+1})\).Convergence. The greedy rule ensures \(0 \leq \theta - \theta_n \leq \alpha_{n-1} = \pi/(n+1)\) at each step (see Lemma 3.1 below). Thus \(|\theta - \theta_N| \leq \pi/(N+2) \to 0\), and by the Lipschitz continuity of sine (with Lipschitz constant 1):
\[ |s_N - \sin\theta| = |\sin\theta_N - \sin\theta| \leq |\theta_N - \theta| \leq \frac{\pi}{N+2} \to 0. \]3. Convergence of the Harmonic Reconstruction
4. Selection Monotonicity and Correlation Structure
We now establish the structural properties of the greedy selection indicators \(\delta_n(\theta)\). The key result is that selection is monotone in \(\theta\), which allows us to define threshold angles and compute the correlation kernel exactly.
Base case. \(\theta_0(\theta_1) = 0 = \theta_0(\theta_2)\). ✓
Inductive step. Assume \(\theta_k(\theta_1) \leq \theta_k(\theta_2)\) for all \(k \leq n\). The residual at step \(n\) is \(r_n(\theta) = \theta - \theta_n(\theta)\). We observe that the greedy algorithm is monotone in the following sense: increasing \(\theta\) can only cause additional selections (never remove one). Specifically:
If \(\delta_n(\theta_1) = 1\), then the residual \(r_n(\theta_1) = \theta_1 - \theta_n(\theta_1) \geq \alpha_n\). The additional angle \(\theta_2 - \theta_1 \geq 0\) propagates through the induction: each step accumulates at least as much as the \(\theta_1\) path, with additional residual available. So \(r_n(\theta_2) \geq r_n(\theta_1) - (\theta_n(\theta_2) - \theta_n(\theta_1)) + (\theta_2 - \theta_1)\). Since the extra selections only occur when the residual exceeds \(\alpha_k\), the net effect is \(r_n(\theta_2) \geq \alpha_n\), giving \(\delta_n(\theta_2) = 1\).
If \(\delta_n(\theta_1) = 0\), then \(\delta_n(\theta_2) \in \{0,1\}\), and monotonicity holds trivially.
- \(\theta_0^* = \alpha_0 = \pi/2\) (the first angle \(\pi/2\) is selected iff \(\theta \geq \pi/2\)).
- For general \(n\): \(\theta_n^* = \alpha_n + \theta_n(\theta_n^*)\), where \(\theta_n(\theta_n^*)\) is the accumulated angle from indices \(k < n\) that are selected when \(\theta = \theta_n^*\). That is, \(\theta_n^*\) is determined by the condition that the greedy residual at step \(n\) equals exactly \(\alpha_n\).
- For every \(\theta > 0\), the set of selected indices \(\{n : \delta_n(\theta) = 1\}\) is infinite.
(a) At step \(n=0\), the algorithm selects \(\alpha_0 = \pi/2\) iff the target \(\theta \geq \pi/2\). Since \(\theta_0 = 0\), the condition \(\delta_0(\theta) = 1\) is \(\theta \geq \alpha_0 = \pi/2\). So \(\theta_0^* = \pi/2\).
(b) Index \(n\) is selected when the greedy residual \(r_n(\theta) = \theta - \theta_n(\theta) \geq \alpha_n\). At the threshold \(\theta = \theta_n^*\), we have \(\theta_n^* - \theta_n(\theta_n^*) = \alpha_n\), giving \(\theta_n^* = \alpha_n + \theta_n(\theta_n^*)\).
(c) Since \(\alpha_n = \pi/(n+2) \to 0\), for any \(\theta > 0\) there exist arbitrarily large \(n\) with \(\alpha_n < \theta\). The greedy algorithm will select such indices whenever the residual exceeds \(\alpha_n\), and since the residual is refreshed by non-selection (remaining unchanged) while \(\alpha_n \to 0\), the set of selected indices is infinite.
On \([\theta_n^*,\pi]\): \(\delta_n = 1\), so the integrand is \((1 - \theta/\pi)^2\).
Computing:
\[ K(n,n) = \int_0^{\theta_n^*}\frac{\theta^2}{\pi^2}\,d\theta + \int_{\theta_n^*}^{\pi}\left(1 - \frac{\theta}{\pi}\right)^2 d\theta = \frac{(\theta_n^*)^3}{3\pi^2} + \frac{1}{\pi^2}\int_{\theta_n^*}^{\pi}(\pi - \theta)^2\,d\theta = \frac{(\theta_n^*)^3}{3\pi^2} + \frac{(\pi - \theta_n^*)^3}{3\pi^2}. \]Note that \(K(n,n) \geq \min(\theta_n^*, \pi-\theta_n^*)^3/(3\pi^2) > 0\) for \(\theta_n^* \in (0,\pi)\), and \(K(n,n)\) is minimised when \(\theta_n^* = \pi/2\), yielding \(K(n,n) = \pi/12\). It is maximised (\(K(n,n) = \pi/3\)) when \(\theta_n^* \in \{0, \pi\}\).
On \([0,\theta_n^*)\): both \(\delta_n = \delta_m = 0\), so the integrand is \((\theta/\pi)^2\).
On \([\theta_n^*,\theta_m^*)\): \(\delta_n = 1\), \(\delta_m = 0\), so the integrand is \((1-\theta/\pi)(-\theta/\pi) = -\theta(\pi-\theta)/\pi^2\).
On \([\theta_m^*, \pi]\): both \(\delta_n = \delta_m = 1\), so the integrand is \((1-\theta/\pi)^2\).
Each piece is a polynomial integral, yielding the stated expression.
5. Asymptotic Distribution of Threshold Angles
The threshold angles \(\theta_n^*\) encode the combinatorial structure of the greedy harmonic decomposition. In this section we investigate their distribution, which determines the behaviour of the correlation kernel.
- \(\theta_0^* = \pi/2\) (selecting \(\alpha_0 = \pi/2\) requires \(\theta \geq \pi/2\)).
- \(\theta_1^* = \pi/3\) (selecting \(\alpha_1 = \pi/3\) requires \(\theta \geq \pi/3\), i.e., the residual at step 1 must be at least \(\pi/3\); since step 0 either selected \(\pi/2\) or not, the threshold is \(\pi/3\) itself when \(\delta_0 = 0\)).
- Subsequent thresholds depend on the interaction between earlier selections.
For \(n=1\): if \(\theta < \pi/2\), then \(\delta_0 = 0\) and \(\theta_1(\theta) = 0\), so \(\delta_1(\theta) = \Theta(\theta - \pi/3)\), giving selection for \(\theta \geq \pi/3\). If \(\theta \geq \pi/2\), then \(\delta_0 = 1\) and \(\theta_1(\theta) = \pi/2\), so \(\delta_1(\theta) = \Theta(\theta - \pi/2 - \pi/3) = \Theta(\theta - 5\pi/6)\). By monotonicity, \(\delta_1\) is already 1 for all \(\theta \geq \pi/3\) (since it was selected for \(\theta\) just below \(\pi/2\) and monotonicity ensures it remains selected above \(\pi/2\)). Hence \(\theta_1^* = \pi/3\).
Existence of a limit. Since the thresholds \(\theta_n^*\) lie in the compact interval \([0,\pi]\), the sequence of empirical measures \(\mu_N\) is tight. By Prokhorov's theorem, every subsequence has a weak limit. It remains to show that the limit is unique.
Density argument. For any subinterval \([a,b] \subset (0,\pi)\), the number of indices \(n \leq N\) with \(\theta_n^* \in [a,b]\) equals the number of indices that are selected when \(\theta = b\) but not when \(\theta = a\). By Lemma 3.3:
\[ \#\{n \leq N : \theta_n^* \in [a,b]\} = D(N,b) - D(N,a) = \frac{b-a}{\pi}\ln(N+2) + O(1). \]Dividing by \(N+1 \sim \ln(N+2)/\ln(N+2) \cdot (N+1)\): the fraction of thresholds in \([a,b]\) is asymptotically \((b-a)/\pi \cdot \ln(N+2)/(N+1)\). Since the total number of thresholds in \([0,\pi]\) is \(N+1\) and the total count of selected indices for \(\theta = \pi\) is \(\sim \ln(N+2)\), not all indices among \(\{0, \ldots, N\}\) have thresholds—indeed, \(\theta_n^* \leq \pi\) for all \(n\), so all \(N+1\) thresholds lie in \([0,\pi]\).
The fraction of thresholds in \([a/\pi, b/\pi]\) (after normalisation to \([0,1]\)) is:
\[ \frac{D(N,b) - D(N,a)}{N+1} = \frac{(b-a)\ln(N+2)/\pi + O(1)}{N+1}. \]This tends to zero for fixed \([a,b]\), which reflects the fact that for large \(N\), most indices \(n\) have large \(\alpha_n\) and correspondingly small thresholds. The density of the limiting distribution concentrates near \(0\) as \(N \to \infty\) in the following sense: the threshold \(\theta_n^*\) for large \(n\) tends to be small (since \(\alpha_n \to 0\) and the threshold is at most \(\alpha_n\) plus previously accumulated small angles).
A more refined analysis (see Proposition 5.3) characterises the distribution of \(\theta_n^*\) as a function of \(n\).
- For each \(n\), \(\theta_n^* \geq \alpha_n = \pi/(n+2)\).
- The thresholds are not monotone in \(n\): there exist indices \(n_1 < n_2\) with \(\theta_{n_1}^* < \theta_{n_2}^*\) and indices \(n_3 < n_4\) with \(\theta_{n_3}^* > \theta_{n_4}^*\).
- For "most" large \(n\), the threshold \(\theta_n^*\) is of order \(\alpha_n = \pi/(n+2)\), in the sense that the set \(\{n : \theta_n^* > C\alpha_n\}\) has density tending to zero as \(C \to \infty\).
(a) Since \(\theta_n^* = \alpha_n + \theta_n(\theta_n^*)\) and \(\theta_n(\theta_n^*) \geq 0\), we have \(\theta_n^* \geq \alpha_n\).
(b) The threshold \(\theta_0^* = \pi/2\) while \(\theta_1^* = \pi/3 < \pi/2 = \theta_0^*\), showing \(\theta_0^* > \theta_1^*\) despite \(0 < 1\). Conversely, for large \(n\) with \(\alpha_n\) very small, the threshold can be larger than that of a slightly larger index if the accumulated selections at the threshold point differ. Non-monotonicity follows from the complex interaction between the greedy selections at different steps.
(c) For large \(n\), the threshold \(\theta_n^*\) equals \(\alpha_n\) plus the sum of selected \(\alpha_k\) for \(k < n\) at \(\theta = \theta_n^*\). When \(\theta_n^*\) is small, few earlier angles are selected (only those with \(\theta_k^* \leq \theta_n^*\), which requires \(\alpha_k \leq \theta_n^*\), i.e., \(k \geq \pi/\theta_n^* - 2\)). So the accumulated angle at step \(n\) when \(\theta = \theta_n^*\) is bounded by \(\sum_{k : k < n,\, \alpha_k < \theta_n^*} \alpha_k\), which for \(\theta_n^* \sim \alpha_n\) is a sum over \(k\) near \(n\), contributing \(O(\alpha_n \log(\theta_n^*/\alpha_n))\). This yields the self-consistent equation \(\theta_n^* \approx \alpha_n(1 + o(1))\) for most \(n\).
6. Analysis of the Correlation Kernel
We now derive further properties of the correlation kernel \(K(n,m)\) from the structural results of Sections 4 and 5.
- \(K(n,n) \in [\pi/12, \pi/3]\) for all \(n\), with \(K(n,n) = \pi/12\) iff \(\theta_n^* = \pi/2\).
- \(|K(n,m)| \leq \sqrt{K(n,n)\,K(m,m)} \leq \pi/3\) (Cauchy–Schwarz).
- \(K(n,m) \to 0\) as \(|\theta_n^* - \theta_m^*| \to 0\) with \(\theta_n^*\) and \(\theta_m^*\) both bounded away from \(0\) and \(\pi\), in the sense that the off-diagonal contribution from the middle interval vanishes.
(a) By Lemma 4.6, \(K(n,n) = [(\theta_n^*)^3 + (\pi-\theta_n^*)^3]/(3\pi^2)\). The function \(f(x) = x^3 + (\pi-x)^3\) on \([0,\pi]\) has \(f'(x) = 3x^2 - 3(\pi-x)^2 = 6\pi x - 3\pi^2\), which vanishes at \(x = \pi/2\), giving a minimum \(f(\pi/2) = \pi^3/4\). The maximum is at the endpoints: \(f(0) = f(\pi) = \pi^3\). So \(K(n,n) \in [\pi^3/(12\pi^2), \pi^3/(3\pi^2)] = [\pi/12, \pi/3]\).
(b) Cauchy–Schwarz applied to the inner product \(\int_0^\pi [\delta_n - \theta/\pi][\delta_m - \theta/\pi]\,d\theta\).
(c) When \(\theta_n^* \approx \theta_m^*\), the middle interval \([\theta_n^*, \theta_m^*]\) in Lemma 4.7 has length \(|\theta_m^* - \theta_n^*| \to 0\), so the middle integral vanishes, and \(K(n,m) \to K(n,n)\).
- \(w_n = 1\) for all \(n\): measures the total deviation of the selection count from the linear prediction.
- \(w_n = \alpha_n = \pi/(n+2)\): weights by the angle size, relating to the geometric reconstruction error.
- \(w_n = (n+2)^{-\sigma}\) for some \(\sigma > 0\): a Dirichlet-series weighting that produces a sub-sum of \(\sum (n+2)^{-\sigma}\), connecting the greedy decomposition to Dirichlet series theory. (The analytic properties of such weighted sums, particularly their behaviour in the complex plane, are beyond the scope of this paper.)
7. Discussion and Open Questions
7.1. Summary of results
We have established the following results about the greedy harmonic decomposition of angles:
- Exact reconstruction (Theorem 2.4): The greedy algorithm, using harmonic angles \(\alpha_n = \pi/(n+2)\) and the sine/cosine addition formula, reconstructs \(\sin(\theta)\) for any target angle \(\theta \in [0,\pi]\) with convergence rate \(O(1/N)\).
- Selection monotonicity (Lemma 4.1): For each index \(n\), the selection indicator \(\delta_n(\theta)\) is a non-decreasing step function of \(\theta\), jumping from 0 to 1 at a well-defined threshold \(\theta_n^*\).
- Correlation kernel (Lemmas 4.6–4.7): The kernel \(K(n,m)\) is computed in closed form from the threshold angles, and the resulting infinite matrix is positive semi-definite.
- Threshold distribution (Section 5): The thresholds are non-monotone in \(n\), satisfy \(\theta_n^* \geq \pi/(n+2)\), and are of order \(\pi/(n+2)\) for most large \(n\).
7.2. Open questions
Several natural questions arise from this work:
- Exact threshold computation. Is there a closed-form expression (or efficient recursive formula) for \(\theta_n^*\) as a function of \(n\)? The recursive structure of the greedy algorithm makes each threshold depend on all previous thresholds, creating a complex combinatorial dependency. Understanding this recursion more deeply would shed light on the fine structure of the kernel.
- Limiting distribution of rescaled thresholds. Does the sequence \((n+2)\theta_n^*/\pi\) have a limiting distribution as \(n \to \infty\)? Numerical evidence suggests clustering around 1 with non-trivial fluctuations. Characterising this distribution (if it exists) is an interesting problem in combinatorial analysis.
- Spectral properties of the kernel. The positive semi-definite matrix \((K(n,m))_{n,m=0}^{N}\) has a spectrum (set of eigenvalues) that encodes global features of the greedy decomposition. What are the asymptotic eigenvalue distribution and the decay rate of eigenvalues as \(N \to \infty\)?
- Comparison with other decompositions. The greedy harmonic decomposition is one way to represent angles as sub-sums of the harmonic series. How does its correlation structure compare with, e.g., the binary (dyadic) decomposition, or with decompositions using other decreasing sequences \(\alpha_n\)?
- Higher-dimensional generalisations. Can the greedy harmonic decomposition be extended to higher dimensions, e.g., decomposing solid angles on the sphere using a sequence of elementary rotations?
- Connection to number theory. The selection indicators \(\delta_n(\theta)\) define a deterministic sieve on the natural numbers (via the index shift \(n \mapsto n+2\)). The density of selected integers for a given \(\theta\) is logarithmic (Lemma 3.3), reminiscent of the prime counting function. Are there deeper connections between the greedy harmonic sieve and classical sieves in number theory?
7.3. Remark on scope
This paper studies the greedy harmonic decomposition as a self-contained geometric and combinatorial object. While the Dirichlet-series weighting \(w_n = (n+2)^{-s}\) (Remark 6.5) connects the selection indicators to sub-sums of the shifted zeta function \(\sum (n+2)^{-s}\) for \(\operatorname{Re}(s) > 1\), the analytic continuation of such weighted sums into the critical strip \(0 < \operatorname{Re}(s) < 1\) raises substantial additional difficulties. In particular, the series \(\sum \delta_n(\theta)(n+2)^{-s}\) converges conditionally for \(\operatorname{Re}(s) > 0\), but identifying it with \(\zeta(s) - 1\) at \(\theta = \pi\) via analytic continuation requires separate justification that goes beyond the scope of this work. We therefore restrict attention to the real-variable theory of the decomposition and its correlation structure.
References
- Geere, V. (2020). Geometric Sine Construction. [link]
- Geere, V. (2026). A Harmonic Reconstruction of the Sine Function and Its Relation to the Riemann Zeta Function. [link]
- Graham, R.L. (1964). On a conjecture of Erdős in additive number theory. Acta Arithmetica, 10, 63–70.
- Hardy, G.H. & Wright, E.M. (2008). An Introduction to the Theory of Numbers. 6th ed., Oxford University Press.