slsqp_jax.hessian

L-BFGS history container and Hessian / inverse-Hessian-vector products, VARCHEN-style damping, soft / identity resets and the Lagrangian gradient helpers.

slsqp_jax.hessian

L-BFGS Hessian Approximation for SLSQP.

This module implements the Limited-memory BFGS (L-BFGS) algorithm for maintaining a matrix-free approximation to the Hessian of the Lagrangian.

Instead of storing a dense n x n matrix (O(n^2) memory), L-BFGS stores the last k (s, y) pairs and computes Hessian-vector products in O(kn) time using the compact representation (Byrd, Nocedal, Schnabel 1994):

B = B_0 - [B_0 S, Y] @ M^{-1} @ [S^T B_0; Y^T]

where B_0 = diag(diagonal) is the initial Hessian (per-variable scaling) and M is a small 2k x 2k matrix built from inner products of the stored vectors. During normal operation diagonal = gamma * ones(n) and this reduces to the scalar-scaled form. After an SNOPT-style diagonal reset, diagonal captures per-variable curvature from the discarded history.

VARCHEN-style Powell damping (Lotfi et al., 2020) is applied to each (s, y) pair before storage, damping toward B_0 = diag(diagonal) instead of the full L-BFGS approximation B. This is cheaper (O(n) vs O(k^2 n)), always well-conditioned, and avoids the circular dependency where a badly-conditioned B poisons its own damping.

class slsqp_jax.hessian.LBFGSHistory[source]

Bases: Module

L-BFGS history buffer for matrix-free Hessian approximation.

Stores the last k (s, y) pairs in a circular buffer and provides efficient Hessian-vector products via the compact representation.

Attributes:

s_history: Stored step vectors s_i = x_{i+1} - x_i. y_history: Stored (damped) gradient differences y_i. gamma: Scalar summary of the initial Hessian scaling. diagonal: Per-variable initial Hessian scaling (B_0 = diag(d)).

During normal operation this equals gamma * ones(n). After an SNOPT-style reset it stores per-variable curvature.

count: Number of valid pairs stored (0 to memory size). next_idx: Next write position in the circular buffer. eig_lower: Estimate of lambda_min(H) from VARCHEN Theorem 2. eig_upper: Estimate of lambda_max(H) from VARCHEN Theorem 2.

s_history: Float[Array, 'memory n']
y_history: Float[Array, 'memory n']
gamma: Float[Array, '']
diagonal: Float[Array, 'n']
count: Int[Array, '']
next_idx: Int[Array, '']
eig_lower: Float[Array, '']
eig_upper: Float[Array, '']
__init__(s_history, y_history, gamma, diagonal, count, next_idx, eig_lower, eig_upper)
Parameters:
  • s_history (Float[Array, 'memory n'])

  • y_history (Float[Array, 'memory n'])

  • gamma (Float[Array, ''])

  • diagonal (Float[Array, 'n'])

  • count (Int[Array, ''])

  • next_idx (Int[Array, ''])

  • eig_lower (Float[Array, ''])

  • eig_upper (Float[Array, ''])

Return type:

None

slsqp_jax.hessian.lbfgs_init(n, memory)[source]

Initialize an empty L-BFGS history buffer.

Return type:

LBFGSHistory

Parameters:
Args:

n: Dimension of the parameter space. memory: Maximum number of (s, y) pairs to store (typically 5-20).

Returns:

An initialized LBFGSHistory with no stored pairs and gamma=1.

slsqp_jax.hessian.lbfgs_hvp(history, v)[source]

Compute B @ v using the L-BFGS compact representation.

Uses the compact form with diagonal initial Hessian B_0 = diag(d) (Byrd, Nocedal & Schnabel, 1994, Theorem 2.2):

B = B_0 - [B_0 S, Y] @ M^{-1} @ [S^T B_0; Y^T]

where:

M = [[S^T B_0 S, L], [L^T, -D_sy]] (2k x 2k) L_{ij} = s_i^T y_j for i > j (strictly lower triangular) D_sy = diag(s_i^T y_i) (diagonal)

When no pairs are stored (count=0), this reduces to B = diag(d).

Complexity: O(k^2 n) where k is the number of stored pairs.

Args:

history: L-BFGS history buffer. v: Vector to multiply by the Hessian approximation.

Returns:

B @ v, the Hessian-vector product.

Return type:

Float[Array, 'n']

Parameters:
slsqp_jax.hessian.lbfgs_inverse_hvp(history, v)[source]

Compute H @ v = B^{-1} @ v via the L-BFGS two-loop recursion.

Implements Nocedal & Wright Algorithm 7.4 with diagonal initial scaling H_0 = diag(1/diagonal) instead of scalar (1/gamma) I.

Complexity: O(kn) where k is the number of stored pairs.

Return type:

Float[Array, 'n']

Parameters:
Args:

history: L-BFGS history buffer. v: Vector to multiply by the inverse Hessian approximation.

Returns:

H @ v = B^{-1} @ v, the inverse Hessian-vector product.

slsqp_jax.hessian.lbfgs_should_skip(s, y, skip_threshold=1e-08)[source]

Return the scalar should_skip decision that lbfgs_append uses.

Exposed so callers can observe the skip decision without having to diff fields on the returned history (which becomes meaningless once the circular buffer saturates, since count stays at memory from then on regardless of whether a real append happened).

Return type:

Bool[Array, '']

Parameters:
  • s (Float[Array, 'n'])

  • y (Float[Array, 'n'])

  • skip_threshold (float)

slsqp_jax.hessian.lbfgs_curvature_diagnostics(s, y, skip_threshold=1e-08)[source]

Return (s.y, relative_curvature, skipped) for the curvature pair.

Exposes the same intermediate quantities lbfgs_should_skip uses so the verbose callback can surface them without re-implementing the skip predicate. relative_curvature = |s.y| / (||s|| * ||y||) is the cosine of the angle between s and y and the dominant contributor to the skip decision (a near-zero value indicates the curvature pair is at the floating-point noise floor and gets skipped by the relative_curvature < skip_threshold branch of lbfgs_should_skip).

Return type:

tuple[Float[Array, ''], Float[Array, ''], Bool[Array, '']]

Parameters:
  • s (Float[Array, 'n'])

  • y (Float[Array, 'n'])

  • skip_threshold (float)

Args:

s: Step vector x_{k+1} - x_k. y: Gradient difference ∇L_{k+1} - ∇L_k (Lagrangian gradient

difference for SLSQP).

skip_threshold: Threshold passed to lbfgs_should_skip.

Returns:

(sty, relative_curvature, skipped), where sty = s.y is the raw curvature inner product, relative_curvature is the normalised version used in the skip predicate, and skipped is the same boolean lbfgs_should_skip returns.

slsqp_jax.hessian.lbfgs_append(history, s, y, damping_threshold=0.2, skip_threshold=1e-08, diag_floor=0.0001, diag_ceil=1000000.0)[source]

Append a new (s, y) pair to the L-BFGS history with Powell damping.

Uses VARCHEN-style damping toward B0 (Lotfi et al., 2020, eq 7-8): damping is computed against the diagonal initial Hessian B_0 = diag(diagonal) instead of the full L-BFGS approximation B. This is O(n) instead of O(k^2 n), always well-conditioned (the diagonal is clipped to [diag_floor, diag_ceil]), and avoids the circular dependency where a badly-conditioned B poisons its own damping.

Return type:

LBFGSHistory

Parameters:
The damped gradient difference is:

y_damped = theta * y + (1 - theta) * B0 s

where theta in [0, 1] is chosen to satisfy:

s^T y_damped >= threshold * s^T B0 s

If ||s|| is too small or the curvature ratio is too extreme, the update is skipped entirely to avoid numerical issues. The curvature_ratio bracket is [1e-8, 1e8] (more permissive than the previous [1e-6, 1e6]) so we no longer drop noisy-but- informative pairs on high-dimensional problems; these pairs instead flow through Powell damping. The relative_curvature < 1e-8 skip is now gated on sᵀy > 0 (positive curvature) so negative- curvature pairs are handled by damping rather than dropped.

After appending, the scalar gamma is updated to y_damped^T y_damped / (y_damped^T s) and clipped to [diag_floor, diag_ceil]. The per-variable diagonal is updated using a component-wise secant: d[i] = |y_damped[i] * s[i]| / (s[i]^2 + eps) and clipped to the same bracket. This gives d[i] |H_{ii}| for diagonal Hessians regardless of step direction, unlike the classical Shanno-Phua formula d[i] = y_i^2 / (y^T s) which produces d h_i^2 for multi-component steps.

The old 1e-6 floor on clip_lo allowed the inverse-Hessian diagonal H_0 = 1/d to balloon to 1e6, producing very large CG directions that the line search could not backtrack. Raising the floor to 1e-4 keeps the inverse diagonal bounded by 1e4 and dramatically reduces the “non-descent direction + chronic line- search backtracking” failure mode.

Args:

history: Current L-BFGS history. s: Step vector s = x_{k+1} - x_k. y: Gradient difference y = nabla L_{k+1} - nabla L_k. damping_threshold: Powell damping threshold (default 0.2). skip_threshold: Minimum step norm for update (default 1e-8). diag_floor: Minimum per-variable diagonal entry (default 1e-4). diag_ceil: Maximum per-variable diagonal entry (default 1e6).

Returns:

Updated L-BFGS history with the new pair appended.

slsqp_jax.hessian.lbfgs_estimate_condition(history, damping_threshold=0.2)[source]

Estimate eigenvalue bounds of the inverse Hessian approximation H.

Uses the diagonal B_0 = diag(d) as a proxy for the full L-BFGS condition number. The inverse Hessian eigenvalues are bounded by [1/max(d), 1/min(d)], giving kappa(H) = max(d) / min(d).

This is a practical simplification of the VARCHEN Theorem 2 bounds (Lotfi et al., 2020). The full recursive bounds are theoretically tight but overly pessimistic in practice because they propagate worst-case Lipschitz estimates, leading to false soft-reset triggers on moderately conditioned problems. Since we damp toward B0 (not the full B), the diagonal condition number is the most relevant quantity.

Return type:

tuple[Float[Array, ''], Float[Array, '']]

Parameters:
Returns:

(lambda_min_est, lambda_max_est) bounding the eigenvalues of H.

slsqp_jax.hessian.lbfgs_soft_reset(history)[source]

VARCHEN-style soft reset: keep only the most recent (s, y) pair.

When the estimated condition number of the inverse Hessian exceeds a threshold, this drops all but the newest curvature pair. This is less aggressive than lbfgs_reset() (which extracts the diagonal and drops everything) or lbfgs_identity_reset() (which restores B = I), preserving the most relevant curvature information.

Based on VARCHEN Algorithm 1, Step 7 (Lotfi et al., 2020).

Return type:

LBFGSHistory

Parameters:

history (LBFGSHistory)

slsqp_jax.hessian.lbfgs_compute_diagonal(history)[source]

Extract diag(B_k) from the L-BFGS compact representation.

From the compact form B = B_0 - W M^{-1} W^T, the diagonal is

diag(B_k) = diagonal - diag(W M^{-1} W^T)

where W = [B_0 S, Y] is (n, 2k) and M is the 2k x 2k inner matrix. The correction term is computed in O(k^2 n) by forming Q = W M^{-1} and summing (Q * W) row-wise.

This is used by lbfgs_reset() to implement the SNOPT diagonal reset strategy (Gill, Murray & Saunders, 2005, Section 3.3).

Return type:

Float[Array, 'n']

Parameters:

history (LBFGSHistory)

slsqp_jax.hessian.lbfgs_reset(history, diag_floor=0.0001, diag_ceil=1000000.0)[source]

SNOPT-style diagonal reset of the L-BFGS history.

Extracts diag(B_k) from the current approximation, discards all stored (s, y) pairs, and restarts with B_0 = diag(diag(B_k)). This preserves per-variable curvature information across the reset, preventing the “everything is flat” effect that occurs when the scalar gamma becomes very small.

The extracted diagonal is clipped to [diag_floor, diag_ceil] to keep the reset point aligned with the clipping used inside lbfgs_append(); previous versions used a hardcoded [1e-2, 1e6] that diverged from the append-path floor.

Based on the SNOPT limited-memory reset strategy (Gill, Murray & Saunders, SIAM Review, 47(1), 2005, Section 3.3).

Return type:

LBFGSHistory

Parameters:
slsqp_jax.hessian.lbfgs_identity_reset(history)[source]

Hard reset of L-BFGS history to the identity Hessian.

Unlike lbfgs_reset() which preserves per-variable curvature, this discards everything and restarts with B_0 = I. Used as an escalation when repeated SNOPT-style diagonal resets fail to break an ill-conditioning cycle (e.g. consecutive QP failures where the extracted diagonal perpetuates the same problematic scaling).

Return type:

LBFGSHistory

Parameters:

history (LBFGSHistory)

slsqp_jax.hessian.compute_partial_lagrangian_gradient(grad_f, eq_jac, multipliers_eq, gen_jac, multipliers_gen)[source]

Compute the partial Lagrangian gradient (without the bound block).

Returns nabla f(x) - J_eq(x)^T lambda - J_gen(x)^T mu_gen, where J_gen is the Jacobian of the general (nonlinear) inequality constraints — i.e. the inequality block excluding the constant identity-style rows that come from box bounds.

This is the core helper used by both compute_lagrangian_gradient (which adds the bound contribution on top) and the post-line-search NLP-level bound-multiplier recovery in solver.py: by reading off the partial Lagrangian gradient at indices that are at a bound at x_{k+1}, the recovery picks the bound multiplier that exactly zeros the corresponding component of the full Lagrangian gradient.

Return type:

Float[Array, 'n']

Parameters:
  • grad_f (Float[Array, 'n'])

  • eq_jac (Float[Array, 'm_eq n'])

  • multipliers_eq (Float[Array, 'm_eq'])

  • gen_jac (Float[Array, 'm_gen n'])

  • multipliers_gen (Float[Array, 'm_gen'])

Args:

grad_f: Gradient of the objective function nabla f(x). eq_jac: Jacobian of equality constraints (m_eq x n). multipliers_eq: Lagrange multipliers for equality constraints. gen_jac: Jacobian of general inequality constraints

(m_gen x n). Must exclude bound rows.

multipliers_gen: Lagrange multipliers for general inequality

constraints.

Returns:

Partial Lagrangian gradient with no bound contribution.

slsqp_jax.hessian.compute_lagrangian_gradient(grad_f, eq_jac, ineq_jac, multipliers_eq, multipliers_ineq)[source]

Compute the gradient of the Lagrangian function.

Return type:

Float[Array, 'n']

Parameters:
  • grad_f (Float[Array, 'n'])

  • eq_jac (Float[Array, 'm_eq n'])

  • ineq_jac (Float[Array, 'm_ineq n'])

  • multipliers_eq (Float[Array, 'm_eq'])

  • multipliers_ineq (Float[Array, 'm_ineq'])

The Lagrangian is:

L(x, lambda, mu) = f(x) - lambda^T c_eq(x) - mu^T c_ineq(x)

Its gradient with respect to x is:

nabla_x L = nabla f(x) - J_eq^T lambda - J_ineq^T mu

ineq_jac and multipliers_ineq must include the bound block when bounds are present (the ineq layout used by solver.py is [general; lower_bound; upper_bound]). Internally this is built on top of compute_partial_lagrangian_gradient() so the bound-multiplier recovery in the outer solver shares the same code path for the non-bound contribution.

Args:

grad_f: Gradient of objective function nabla f(x). eq_jac: Jacobian of equality constraints (m_eq x n). ineq_jac: Jacobian of inequality constraints (m_ineq x n),

including bound rows.

multipliers_eq: Lagrange multipliers for equality constraints. multipliers_ineq: Lagrange multipliers for inequality

constraints, including bound multipliers.

Returns:

Gradient of Lagrangian nabla_x L.

slsqp_jax.hessian.estimate_hessian_diagonal(hvp_fn, n, key, n_probes=20)[source]

Estimate the diagonal of a matrix given only its HVP.

Uses the Bekas-Kokiopoulou-Saad (2007) stochastic estimator: for Rademacher random vectors z (entries i.i.d. from {-1, +1}), E[z * (H z)] = diag(H). Averaging over n_probes samples gives an unbiased estimate with variance O(||off-diag||^2 / k).

The estimate is accurate when the matrix is diagonally dominant, which is common for Hessians of smooth functions in the natural coordinate basis.

Return type:

Float[Array, 'n']

Parameters:
Args:

hvp_fn: Function v -> H @ v (Hessian-vector product). n: Dimension of the matrix. key: JAX PRNG key for generating random probes. n_probes: Number of Rademacher probes (default 20). More

probes reduce variance at the cost of more HVP calls. Each probe costs one forward-over-reverse AD pass.

Returns:

Estimated diagonal of H, shape (n,).