Mathematical Optimization Papers

OPTIMAL RISK-AVERSE DESIGN OF GREEN HYDROGEN PROJECTS IN BRAZIL: A STOCHASTIC OPTIMIZATION APPROACH

Tue, 14 Apr 2026 14:49:22 +0000

No abstract available.

Deterministic and stochastic optimization for solving large size inverse problems in image processing

Tue, 14 Apr 2026 14:49:22 +0000

Dans cette thèse on s’intéresse au problème des décompositions canoniques polyadiques de tenseurs d’ordre N potentiellement grands et sous différentes contraintes (non-négativité, aspect creux lié à une possible surestimation du rang du tenseur). Pour traiter ce problème, nous proposons trois nouvelles approches itératives différentes: deux approches déterministes dont une approche proximale, et une approche stochastique. La première approche étend les travaux de thèse de J-P. Royer au cas de tenseurs de dimension N. Dans l’approche stochastique, nous considérons pour la première fois dans le domaine des décompositions tensorielles, des algorithmes génétiques (mimétiques) dont principe général repose sur l’évolution d’une population de candidats. Dans le dernier type d’approche, nous avons considéré un algorithme proximal pré-conditionné (le Block-Coordinate Variable Metric Forward-Backward), algorithme fonctionnant par blocs de données avec une matrice de pré-conditionnement liée à chaque bloc et fondé sur deux étapes successives principales : une étape de gradient et une étape proximale. Finalement, les différentes méthodes suggérées sont comparées entre elles et avec d’autres algorithmes classiques de la littérature sur des données synthétiques (à la fois aléatoires ou proches des données observées en spectroscopie de fluorescence) et sur des données expérimentales réelles correspondant à une campagne de surveillance des eaux d’une rivière et visant à la détection d’apparition de polluants.

A traveling salesman problem with drone stations and speed-optimized drones

Tue, 14 Apr 2026 14:49:22 +0000

: With e-commerce expanding rapidly, last-mile delivery challenges have been exacerbated, necessitating innovative logistics to reduce operational costs and improve delivery speed. This paper investigates a traveling salesman problem with drone stations, where a truck collaborates with multiple drones docked at candidate drone stations to serve customers. In contrast to existing studies that typically assume fixed drone speeds, this work treats drone speeds as decision variables and introduces a comprehensive energy consumption model that accounts for all phases of drone flight. The objective is to jointly optimize truck routing, station selection, drone – customer assignment, and drone speed to minimize the total delivery cost. Through a speed-discretion method, we formulate the problem as a mixed-integer linear programming model and develop a tailored adaptive large neighborhood search (ALNS) algorithm. Computational experiments indicate that for large-sized instances with 80 – 100 customers and 16 – 20 candidate stations, ALNS produces solutions within 50 seconds, with average optimality gaps below 1.8% compared to Gurobi’s solutions obtained under a 5000-second time limit. The results also show that the speed optimization strategy consistently outperforms fixed-speed approaches across multiple performance metrics, including total cost, service completion time, energy consumption, and service coverage.

Tax-Efficient retirement Withdrawal Planning using a linear Programming Model

Tue, 14 Apr 2026 14:49:22 +0000

No abstract available.

Novel computational workflow for selecting virtual patient cohorts for in silico clinical trials

Tue, 14 Apr 2026 14:49:22 +0000

Objectives: In silico clinical trials are a well-accepted model-informed drug development (MIDD) tool to understand, optimize and predict the effect of drugs across diverse populations [1,2]. Generation and selection of virtual patients for simulating in silico trials remains an open area of research not least due to challenges of selecting virtual patients that are representative of the underlying physiological and clinical characteristics of individual subjects enrolled into clinical trials [3]. While virtual patients can be obtained by sampling sets of parameters from a calibrated non-linear mixed effects model, these may however generate biomarkers outside the expected range of observations, thus impacting the accuracy of in silico clinical trials predictions. To obtain a more representative virtual population, different algorithms can be applied to tailor plausible patient cohorts and yield calibrated virtual patients. In this work, we present a new virtual patient selection workflow using a minimal quantitative systems pharmacology (QSP) model of chronic hepatitis B virus (HBV) infection. Methods: A QSP model of HBV disease progression incorporating the effect of standard-of-care therapies (peg-interferon and nucleos(t)ide analogues) was used to generate plausible patient cohorts corresponding to the Everest trial [4] (a real-world clinical study exploring how interferon therapy can achieve functional cure in chronically infected patients). To tailor plausible patients to a virtual cohort mimicking baseline characteristics of patients enrolled into the Everest project, we developed a computational algorithm based on a mixed-integer linear programming framework coupled to a multi-objective optimization genetic algorithm. Subsequently, the virtual patient cohort obtained with this algorithm was used to perform in silico clinical trials mirroring the Everest protocol to validate the workflow. Non-linear relationships between baseline characteristics, prognostic biomarkers, and dosing parameters with clinical endpoints were explored using in silico trials. Results: We show that: (1) the HBV QSP model captures longitudinal biomarker data from untreated patients as well as chronically infected subjects receiving standard-of-care therapies, (2) Plausible patients generated with the model capture HBV disease progression, (3) The novel computational workflow efficiently selects virtual patients matching HBsAg and viral load baseline Everest data distributions, (4) In silico trials that mirrored the Everest trial showed quantitative agreement with clinical end-points thus validating the workflow, (5) In silico trials were leveraged to identify and compare optimal dosing regimens and explore mechanistic pathways in responders and non-responders to interferon. Conclusions: This work proposes and validates a novel QSP-based computational workflow for performing in silico trials, generating mechanistic hypotheses, and identifying optimal dosing regimens.Citations: COI: DW, AN, AS, RD are employees of and hold shares in GSK. JCR is a joint postdoc with WJJ (University of Buffalo) and GSK.[1] Madabushi et al. Pharm Res. 2022[2] Rieger et al. Prog Biophys Mol Biol. 2018[3] Arsène et al. NY: Springer US. 2023[4] Xie et al. Int Liver Meeting. 2022

Optimization of Investments in Cybersecurity: A Linear Optimization of Investments in Cybersecurity: A Linear Programming Approach Programming Approach

Tue, 14 Apr 2026 14:49:22 +0000

No abstract available.

G lobal

Tue, 14 Apr 2026 14:49:22 +0000

For modeling imprecise and indeterminate data for multi-objective decision making, two different methods: neutrosophic multi-objective linear/non-linear programming neutrosophic goal programming, which have been very recently proposed in the literatuire. In many economic problems, the well-known probabilities or fuzzy solutions procedures are not suitable because they cannot deal the situation when indeterminacy inherently involves in the problem. In this case we propose a new concept in optimization problem under uncertainty and indeterminacy. It is an extension of fuzzy and intuitionistic fuzzy optimization in which the degrees of indeterminacy and falsity (rejection) of objectives and constraints are simultaneously considered together with the degrees of truth membership (satisfaction/acceptance). The drawbacks of the existing neutrosophic optimization models have been presented and new framework of multi-objective optimization in neutrosophic environment has been proposed. The essence of the proposed approach is that it is capable of dealing with indeterminacy and falsity simultaneously.

Quantum-Inspired Hamiltonian Descent for Mixed-Integer Quadratic Programming

Tue, 14 Apr 2026 14:57:01 +0000

No abstract available.

lpviz: Interactive Linear Programming Visualization

Fri, 01 May 2026 08:10:31 +0000

This paper presents lpviz, a browser-based visualization tool for linear programming. lpviz is deeply interactive, offering an intuitive interface where users can directly draw and edit the feasible region and objective vector, without requiring cumbersome manipulation of raw numerical coefficients. lpviz lets users compare the behavior of several classes of linear programming algorithms, namely Simplex, Interior-Point, Primal-Dual Hybrid Gradient, and Central Path. In the 3D mode, lpviz places iterates at heights corresponding to important solver metadata such as complementarity gap or KKT residual, helping users gain further insight into algorithm behavior beyond the primal iterates alone. lpviz has been used in both research and classroom settings, to help develop intuition for the strengths and weaknesses of different solvers and the impact of solver settings on convergence behavior. lpviz is open-source, permissively licensed, and freely available on any device with a web browser at https://lpviz.net .

Robust Constrained Optimization via Sliding Mode Control

Fri, 01 May 2026 08:10:31 +0000

This paper develops a sliding mode control based frame work for equality constrained optimization by reformulation the first order Karush Kuhn Tucker conditions as control affine dynamical system. The optimization variables are treated as states and the Lagrange multipliers as control input, with equality constraints defined as sliding manifold. The resulting design guarantees exact constraint enforcement with finite time convergence, independent of objective convexity, and exhibits robustness to matched disturbance, structural uncertainty and bounded measurement noise. To accelerate the convergence, a nonsingular terminal sliding mode based normed gradient flow is introduced, ensuring both finite time convergence to optimal solution and constraint satisfaction. Rigorous Lyapunov analysis establishes closed loop stability and convergence. Numerical studies across diverse benchmark problems demonstrate superior accuracy and robustness over classical continuous time optimization method, highlighting effectiveness under disturbance.

A Systematic Review of Recent Advancements in PINN Augmented Deep Learning and Mathematical Modeling for Efficient Portfolio Management

Fri, 01 May 2026 08:10:31 +0000

In finance, portfolio management is a traditional yet difficult problem that has drawn attention from practitioners and researchers for many years. However, there are still difficult technological problems that need to be solved. In the world of finance, managing a portfolio has never been easy. Selecting portfolios in a volatile market is made easier with the help of portfolio management. The goal of this review study is to present the concept of physics-informed neural networks because they provide a novel approach to directly incorporating physics and finance principles into the neural network's learning process. By doing so, physics-informed neural networks ensure that their forecasts are in line with established financial regulations and processes in addition to offering precise forecasts. Furthermore, this article provides an overview of the current state of research in portfolio optimization with the support of mathematical models, deep learning models and physics-informed neural networks. In addition, the advantages and disadvantages of various deep learning and mathematical modelling are discussed. Researchers and business professionals alike should find the data useful for advancing the field of investment management and trying out new portfolio management strategies. For this purpose, in this review work, emphasis is given to these factors. Finally, a few challenging issues and potential future directions are discussed, encouraging readers to consider fresh ideas in this field of study.

Exact formulations for rectangular-warehouse single-picker routing with scattered storage in single-block and two-block layouts

Fri, 01 May 2026 08:10:31 +0000

Order picking travel dominates much of warehouse effort, and exact routing is especially valuable when storage is scattered so pick locations are not fixed in advance. We address the single picker routing problem (SPRP) and its scattered-storage variant (SPRP-SS) in single-block and two-block rectangular warehouses. We propose two mixed-integer linear programming formulations that exploit structural properties of optimal tours to simplify connectivity modelling and remove redundant edge configurations: a Configuration Connectivity model tailored to single-block layouts and an Edge Connectivity model that extends to two-block layouts. In extensive computational experiments on large randomly generated benchmark sets for single-block and two-block rectangular layouts, we compare these formulations against established MILP and network-flow baselines for SPRP and SPRP-SS and report computational gains tied to the structural restrictions. The results support using compact, solver-based exact routing models in industrial settings where dynamic programming is cumbersome to integrate, particularly for SPRP-SS and for routing subproblems embedded in larger planning or warehouse-design optimizations.

Robust Geometric Control of Catenary Robots under Unstructured Force Uncertainties

Fri, 01 May 2026 08:10:31 +0000

This paper considers the robust control of a catenary robot composed of two quadrotors connected by an inextensible cable. The system is modeled on $SE(3)$, with the cable treated as a geometric subsystem induced by the UAV configuration rather than as an independent dynamical element. The catenary shape determines configuration-dependent forces that couple the translational dynamics of the vehicles. We propose a geometric tracking controller for the relative configuration of the agents and analyze its robustness with respect to unstructured uncertainties in the catenary-induced forces. The main theoretical result establishes local input-to-state stability of the closed-loop tracking errors. In particular, we obtain asymptotic convergence in the nominal case and an explicit ultimate bound for the tracking errors under bounded catenary-force perturbations.

Gårding Polynomials

Fri, 01 May 2026 08:10:31 +0000

We introduce Gårding polynomials, a class of real multivariate polynomials defined via positivity regions invariant under translation by positive directions and closed under strictly positive affine transformations. We establish a structural theorem providing two complementary characterizations of this class: one via reduction to the multi-affine case through polarization, and another via a recursive condition involving partial derivatives. The class of Gårding polynomials strictly extends that of real stable polynomials while retaining many of their structural properties. In particular, multi-affine Gårding polynomials with nonnegative coefficients satisfy the Rayleigh property, and their positive univariate specializations yield ultra log-concave coefficient sequences. Moreover, the Gårding property for several matroid generating functions is preserved under natural matroid operations. As applications, we obtain new negative dependence results for generating functions associated with various classes of matroids and graphs--many of which lie beyond the reach of real stability or Lorentzian methods--as well as for characteristic polynomials of certain matrix classes.

Data-Driven Continuous-Time Linear Quadratic Regulator via Closed-Loop and Reinforcement Learning Parameterizations

Fri, 01 May 2026 08:10:31 +0000

This paper studies data-driven approaches to the continuous-time linear quadratic regulator (LQR) problem based on two existing parameterizations, namely a closed-loop (CL) parameterization from behavioral system theory and an integral reinforcement learning (IRL) parameterization. The CL parameterization characterizes the closed-loop system via a matrix that satisfies equality constraints. While this parameterization has been extensively studied for discrete-time systems, we adapt key results to the continuous-time setting and develop a policy iteration (PI) scheme, derive a data-driven continuous-time algebraic Riccati equation (CARE), and introduce an alternative convex problem formulation. The IRL parameterization utilizes off-policy data to perform policy evaluation, which is then used for PI or value iteration. Within the IRL framework, we derive a policy gradient flow and propose convex reformulations of the LQR problem. Finally, we provide a unified treatment of these parameterizations that enables a systematic understanding of existing approaches and clarifies their structural relationships.

Frank-Wolfe Beyond 1/t Convergence

Fri, 01 May 2026 08:10:31 +0000

We consider smooth convex minimization over compact convex sets, i.e., $\min_{x \in C} f(x)$ with the (vanilla) Frank-Wolfe algorithm. Well-known lower bounds establish a worst-case $Ω(1/t)$ primal-gap barrier in the general smooth convex case, and faster convergence usually requires favorable function properties such as Hölder error bounds or strong convexity. We present a new Local Dual Sharpness (LDS) condition, essentially a property of the feasible region and its LMO, under which the Frank-Wolfe algorithm converges in $o(1/t)$ for any smooth convex function, ruling out an $Ω(1/t)$ lower bound under LDS. The condition is a generalization (and localization) of uniform convexity of sets and it is satisfied by any uniformly convex set. To our knowledge, this is the first unconditional $o(1/t)$ convergence result for uniformly convex sets. Combining LDS with stronger function properties, e.g., a local variant of Hölder error bounds, allows us to quantify the actual rates.

A Scaled Gradient Modified Non-monotone Line Search Method for Constrained Optimization Problems

Fri, 01 May 2026 08:10:31 +0000

In this paper, we propose a scaled gradient modified non-monotone line search method for solving constrained minimization problems, and explore several specific properties of this method, namely, its convergence analysis. We discuss the linear convergence rate of the sequence generated by the proposed algorithm to a solution of the constrained minimization problem where the objective function is strongly quasiconvex. We consider numerical examples of large-scale fractional programming and quadratic programming for the function of pseudo convex and strongly quasiconvex and compare the performance of the proposed algorithm with the existing ones for these examples.

Global Optimality for Constrained Exploration via Penalty Regularization

Fri, 01 May 2026 08:10:31 +0000

Efficient exploration is a central problem in reinforcement learning and is often formalized as maximizing the entropy of the state-action occupancy measure. While unconstrained maximum-entropy exploration is relatively well understood, real-world exploration is often constrained by safety, resource, or imitation requirements. This constrained setting is particularly challenging because entropy maximization lacks additive structure, rendering Bellman-equation-based methods inapplicable. Moreover, scalable approaches require policy parameterization, inducing non-convexity in both the objective and the constraints. To our knowledge, the only prior model-free policy-gradient approach for this setting under general policy parameterization is due to Ying et al. (2025). Unfortunately, their guarantees are limited to weak regret and ergodic averages, which do not imply that the final output is a single deployable policy that is near-optimal and nearly feasible. In this work we take a different approach to this problem, and propose Policy Gradient Penalty (PGP) method, a single-loop policy-space method that enforces general convex occupancy-measure constraints via quadratic-penalty regularization. PGP constructs pseudo-rewards that yield gradient estimates of the penalized objective, subsequently exploiting the classical Policy Gradient Theorem. We further establish the regularity of the penalized objective, providing the smoothness properties needed to justify the convergence of PGP. Leveraging hidden convexity and strong duality, we then establish global last-iterate convergence guarantees, attaining an $ε$-optimal constrained entropy value with $ε$ bounded constraint violation despite policy-induced non-convexity. We validate PGP through ablations on a grid-world benchmark and further demonstrate scalability on two challenging continuous-control tasks.

A Novel Computational Framework for Causal Inference: Tree-Based Discretization with ILP-Based Matching

Sun, 03 May 2026 07:59:48 +0000

Causal inference is essential for data-driven decision-making, as it aims to uncover causal relationships from observational data. However, identifying causality remains challenging due to the potential for confounding and the distinction between correlation and causation. While recent advances in causal machine learning and matching algorithms have improved estimation accuracy, these methods often face trade-offs between interpretability and computational efficiency. This paper proposes a novel approach that combines a tree-based discretization technique, tailored for causal inference, with an integer linear programming-based matching algorithm. The discretization ensures approximately linear relationships for control datasets within strata, enabling effective matching, while the optimization framework optimizes for global balance. The resulting algorithm yields computational efficiency and less biased ATT estimates compared to state-of-the-art algorithms. Empirical evaluations demonstrate the proposed method's practical advantages over existing techniques in causal inference scenarios.

An Adaptive Variable Neighborhood Search for a Family of Set Covering Routing Problems with an Application in Disaster Relief Operations

Mon, 04 May 2026 08:28:14 +0000

This paper studies a variant of the Set Covering Routing Problem (SCRP) motivated by post-disaster humanitarian logistics. We consider a hybrid distribution concept in which the majority of transportation is performed by helicopters, while ground transport is limited to the last mile, addressing severe accessibility constraints in disaster-affected regions. The resulting problem integrates landing site location, routing, and covering decisions, incorporating features of the Multi-Vehicle Covering Tour Problem (m-CTP) and the Vehicle Routing with Demand Allocation Problem (VRDAP) in a facility-capacitated, multi-depot setting. Due to the computational complexity of the problem, we develop an Adaptive Variable Neighborhood Search (AVNS) that combines established routing operators with novel mechanisms for covering decisions. The performance of the proposed approach is evaluated on benchmark instances for the related m-CTP and VRDAP problems, demonstrating competitive solution quality compared to problem-specific state-of-the-art approaches. Furthermore, we apply our AVNS to a real-world case study based on the 2024 flash floods in Afghanistan. The results highlight the practical relevance of the proposed framework and provide managerial insights into effective distribution strategies for disaster response operations.

Wasserstein Distributionally Robust Regret Optimization for Reinforcement Learning from Human Feedback

Mon, 04 May 2026 08:28:14 +0000

Reinforcement learning from human feedback (RLHF) has become a core post-training step for aligning large language models, yet the reward signal used in RLHF is only a learned proxy for true human utility. From an operations research perspective, this creates a decision problem under objective misspecification: the policy is optimized against an estimated reward, while deployment performance is determined by an unobserved objective. The resulting gap leads to reward over-optimization, or Goodharting, where proxy reward continues to improve even after true quality deteriorates. Existing mitigations address this problem through uncertainty penalties, pessimistic rewards, or conservative constraints, but they can be computationally burdensome and overly pessimistic. We propose Wasserstein distributionally robust regret optimization (DRRO) for RLHF. Instead of pessimizing worst-case value as in standard DRO, DRRO pessimizes worst-case regret relative to the best policy under the same plausible reward perturbation. We study the promptwise problem through a simplex allocation model and show that, under an $\ell_1$ ambiguity set, the inner worst-case regret admits an exact solution and the optimal policy has a water-filling structure. These results lead to a practical policy-gradient algorithm with a simple sampled-bonus interpretation and only minor changes to PPO/GRPO-style RLHF training. The framework also clarifies theoretically why DRRO is less pessimistic than DRO, and our experiments show that DRRO mitigates over-optimization more effectively than existing baselines while standard DRO is systematically over-pessimistic.

Moral Hazard in LTI Dynamics: A Hypothesis Testing Approach

Mon, 04 May 2026 08:28:14 +0000

Many incentive design problems must contend with information asymmetries due to non-observation of efficiency (adverse selection) or non-observation of effort (moral hazard). And although a growing body of literature considers incentive design in control systems, the problem of designing incentives for control systems under information asymmetries has been less well-studied. This paper considers a model of moral hazard within control systems. In our model, the control system is described by an (affine) linear time-invariant (LTI) system with process noise. There is an agent who gets to choose (from between two choices) a linear state-feedback controller to apply to the LTI system, with one of the state-feedback controllers having a higher quadratic cost on the control inputs than the other. Our goal is to design a payment scheme that incentivizes the agent to choose the state-feedback controller that minimizes a quadratic cost on system states plus the time-discounted payment amount, subject to the understanding that the agent bears the control cost while being risk-averse with respect to their time-discounted payment. We formulate the problem as a constrained optimization, and prove that for a payment given after a fixed (but optimizable) time horizon the optimal payment scheme chooses the payment amount using a likelihood ratio hypothesis test. We numerically demonstrate our results by applying the derived optimal payment scheme to two examples: load frequency control (LFC) in power systems and wellness interventions for body weight loss.

Approximations and Learning for Decentralized Stochastic Control and Near Optimal Finite Window Policies

Mon, 04 May 2026 08:28:14 +0000

Decentralized stochastic control problems are difficult to study due to information structure dependent subtleties, which prevent many classical methods in stochastic control from being applicable. In this paper we consider such problems with general standard Borel spaces under two related information structures. (a) the one-step delayed information sharing pattern (OSDISP) where agents share their information with one-step delay, and (b) the $K$-step periodic information sharing pattern (KSPISP), where information is shared periodically. It is known that OSDISP and KSPISP problems admit a centralized reduction where the agents view the problem from the perspective of a centralized controller that uses the common information to prescribe function valued actions (local policies) which map each agent's private information to an optimal action in the original problem. We provide rigorous approximation results and performance bounds for the KSPISP and OSDISP problems, which results from replacing the full common information by a finite sliding window of information and we establish near optimality of such policies. The latter depends on a predictor stability condition in expected total variation. As a further contribution, we show that under the information structures provided, corresponding Q-learning algorithms (in quantized or finite memory forms) converge asymptotically to near optimal solutions. While restrictive and hypothetical conditions have been presented in the literature, our contributions are thus to provide, to our knowledge, the first explicit conditions and rigorous approximation and learning results for such decentralized problems with general spaces.

Structure-Preserving Optimal Control of Maxwell's Equations with Applications to Source Cloaking

Mon, 04 May 2026 08:28:14 +0000

We develop a structure-preserving solution framework for the optimal control of the time-dependent Maxwell's equations. Building on a well-posedness theory for a weak form of the forward problem, we first analyze a forward solver that couples Nédélec and Raviart--Thomas finite elements with Crank--Nicolson time stepping. The solver preserves the de~Rham structure, enforces a discrete Gauss law, exactly satisfies a per-time-step energy balance, and converges to the weak solution under low regularity assumptions on the problem data, which are dictated by the optimal control setting. To control the Maxwell system, we add the curl of a space-time current density as a source to Ampére's law. The curl form yields charge conservation without auxiliary constraints. We prove the well-posedness and continuity of the control-to-state map, derive the adjoint system and a gradient representation for a tracking-type objective functional, and formulate a discrete optimization scheme that inherits structure preservation from the forward solver. Our discrete stationarity conditions are consistent with their continuous counterparts, and the discrete optimal controls converge, with mesh and time refinements, to the continuous optima. We demonstrate the merits of our optimal control formulation and the theoretical developments by numerically solving a series of source-cloaking model problems.

A unified perspective on fine-tuning and sampling with diffusion and flow models

Mon, 04 May 2026 08:28:14 +0000

We study the problem of training diffusion and flow generative models to sample from target distributions defined by an exponential tilting of a base density; a formulation that subsumes both sampling from unnormalized densities and reward fine-tuning of pre-trained models. This problem can be approached from a stochastic optimal control (SOC) perspective, using adjoint-based or score matching methods, or from a non-equilibrium thermodynamics perspective. We provide a unified framework encompassing these approaches and make three main contributions: (i) bias-variance decompositions revealing that Adjoint Matching/Sampling and Novel Score Matching have finite gradient variance, while Target and Conditional Score Matching do not; (ii) norm bounds on the lean adjoint ODE that theoretically support the effectiveness of adjoint-based methods; and (iii) adaptations of the CMCD and NETS loss functions, along with novel Crooks and Jarzynski identities, to the exponential tilting setting. We validate our analysis with reward fine-tuning experiments on Stable Diffusion 1.5 and 3.

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Mon, 04 May 2026 08:28:14 +0000

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.

Data Deletion Can Help in Adaptive RL

Mon, 04 May 2026 08:28:14 +0000

Deploying reinforcement learning policies in the real world requires adapting to time-varying environments. We study this problem in the contextual Markov Decision Process (cMDP) framework, where a family of environments is indexed by a low-dimensional context unknown at test time. The standard approach decomposes the problem: train a so-called "universal policy" which assumes knowledge of the true context, then pair it with a context estimator which approximates context using the observed trajectory. We identify a simple, counterintuitive trick that substantially improves the estimator: randomly delete a fraction of the training buffer after each round. This works because data is collected across multiple rounds using progressively better policies, and older trajectories come from a different distribution than what the estimator will face at deployment time; random deletion creates an implicit exponential decay on older data while preserving diversity without requiring any explicit identification of which samples are stale. This reduces robustness gap by 30% for MLPs and by 6% on average for recurrent networks. Strikingly, it allows a narrow MLP with 5x fewer parameters to outperform a wide MLP trained without deletion. To understand when and why deletion helps, we analyze regularized empirical risk minimization with a mismatch between the train distribution and the distribution at deployment; in this idealized setting, we prove that removing a single uniformly random training point decreases expected test loss in expectation under mild conditions. For ridge regression we make this quantitative: deletion helps when the regularization coefficient is moderate and the signal-to-noise ratio (SNR) is sufficiently low, and, crucially, this SNR threshold gives a direct measure of how large the distribution mismatch between training and deployment must be for deletion to be beneficial.

A Unified Regularity Condition for Optimal Control: Bridging LICQ, MFCQ, and Subdifferentials

Mon, 04 May 2026 08:28:14 +0000

This paper presents a unified derivation of transversality conditions in optimal control problems using exact penalty functions. The key regularity condition is that the origin is uniformly separated from the subdifferential of the penalty function in a neighborhood of the admissible set. This condition, hereafter referred to as the Unified Separation Condition (USC), generalizes the classical Mangasarian-Fromovitz condition for inequalities and linear independence of gradients for equalities; in the smooth case, these classical conditions are equivalent to USC, as shown via Gordan's theorem. The USC remains applicable even when constraint functions are nondifferentiable, where classical constraint qualifications are not defined. Assuming exactness, we derive transversality conditions for all major cases: fixed and free terminal time, equality and inequality constraints, moving manifolds, and free left endpoint. Remarkably, this approach yields these classical results in a concise and transparent manner, avoiding the need for constructing cones of endpoint variations or applying separation theorems. The theoretical results are complemented by a numerical implementation applied to the time-optimal control of a harmonic oscillator. The numerical implementation converges to the exact solution obtained via Pontryagin's maximum principle combined with transversality conditions, confirming the consistency and practical applicability of the proposed methodology.

Introduction to Mathematical Programming with Equilibrium Constraints (MPECs) and Bilevel Optimization

Mon, 04 May 2026 08:28:14 +0000

Our aim is to explain mathematical programs with equilibrium constraints (MPECs), motivate them through applications, present the main equivalent formulations of equilibrium constraints, and summarize the basic existence theory for optimal solutions. The central message is that an MPEC is an optimization problem whose feasible set is partly defined by another optimization, variational inequality, complementarity system, or equilibrium model.

Introduction to Exact Penalization for Mathematical Programming with Equilibrium Constraints

Mon, 04 May 2026 08:28:14 +0000

We present a focused introduction to exact penalty methods for nonlinear programs and mathematical programs with equilibrium constraints (MPECs), emphasizing their connection to modern error bound theory. The goal is twofold. First, we explain how classical optimality conditions can be interpreted through exact penalization, and why such results typically rely on constraint regularity conditions that can be understood as error bounds on perturbations of feasible sets. We then highlight how recent developments based on subanalytic geometry and Lojasiewicz-type inequalities extend this framework beyond classical regularity assumptions, enabling exact penalization under broader analytic conditions. Second, we demonstrate how this theory can be applied in practice to MPECs by reformulating them via KKT systems and constructing exact penalty functions based on residual mappings. Particular attention is given to fractional-order penalties arising from Lojasiewicz error bounds, as well as to improved formulations for special problem classes where sharper exponents can be obtained. These developments provide both theoretical insight and practical guidance for analyzing and solving challenging constrained optimization problems.

First-Order Optimality Conditions for Mathematical Programming with Equilibrium Constraints

Mon, 04 May 2026 08:28:14 +0000

We present a systematic introduction to first-order optimality conditions for mathematical programs with equilibrium constraints (MPECs), emphasizing the limitations of classical nonlinear programming techniques. The goal is twofold. First, we explain why a direct application of standard optimality conditions -- based on reformulating MPECs via KKT systems or differentiable exact penalty functions -- is often inadequate, as such approaches typically require strong and restrictive assumptions, including nondegeneracy and smoothness conditions. Second, we develop a first-principles framework for analyzing MPECs by focusing on the geometric structure of the feasible region. In particular, we study stationarity concepts and provide a detailed characterization of the tangent cone at feasible points, which leads to appropriate constraint qualifications tailored to MPECs. These results form the foundation for rigorous first-order analysis and clarify the relationship between the original MPEC formulation and its KKT-based representation, offering practical guidance for handling these inherently challenging optimization problems.

Controlling the Swarm: Sparse Actuation and Collision Avoidance under Stochastic Delay

Mon, 04 May 2026 08:28:14 +0000

Classical flocking models demonstrate how local interactions generate emergent order, but real-world multi-agent deployments are bound by severe constraints: limited actuator availability, heterogeneous communication latencies, and environmental noise. In this talk, we present a unified finite-N framework that tackles the interplay of these exact mechanisms. We study a delayed stochastic leader-follower particle system featuring topological communication, singular repulsion, and bounded sparse leader actuation. A central challenge in such systems is mathematical well-posedness, as discontinuous communication laws and singular repulsions clash with standard strong Ito frameworks. We resolve this by introducing an augmented Lyapunov functional that simultaneously enforces a strict collision barrier and closes a uniform Gronwall estimate. Building on this rigorous foundation, we formulate a free-terminal-time, chance-constrained optimal control problem. We show that temporally sparse, bang-off-bang leader actuation not only drastically reduces control effort compared to continuous baselines, but also reveals non-monotone sensitivities to leader density. Ultimately, we demonstrate that in delayed stochastic swarms, adding more direct actuation is not strictly optimal -- highlighting a highly non-trivial resource allocation paradox in cooperative control.

Riemannian Optimization over Symmetric Positive Definite Matrices with the Alpha-Procrustes Geometry

Mon, 04 May 2026 08:28:14 +0000

In Riemannian optimization, it is well known that the condition number of the Riemannian Hessian at an optimum strongly influences the asymptotic convergence behavior of optimization algorithms. On the manifold of symmetric positive definite (SPD) matrices, several commonly used metrics for optimization, such as the Affine-Invariant (AI) and Bures--Wasserstein (BW) metrics, tend to become ill-conditioned as the underlying SPD matrix becomes ill-conditioned. As a result, even when the Euclidean Hessian remains uniformly well-conditioned on the SPD manifold, optimization may still become difficult near an optimum associated with an ill-conditioned SPD matrix. In this paper, we address this issue through the Alpha-Procrustes (AP) geometry on the SPD manifold. This geometry generalizes several well-known metrics, including the Log-Euclidean (LE) metric for $α=0$ and the BW metric for $α=1/2$. We first show that, when $α=1$, all eigenvalues of the Riemannian metric operator induced by the AP geometry are uniformly bounded independently of the underlying SPD matrix. Therefore, under the assumption that the Euclidean Hessian satisfies the uniform spectral bounds, all the eigenvalues of the corresponding Riemannian Hessian are uniformly bounded independently of the underlying SPD matrix. Consequently, the case $α=1$ provides a robust geometric framework for several Riemannian optimization problems involving ill-conditioned SPD matrices. Finally, we validate our theoretical findings through extensive numerical experiments across a range of applications.

Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation

Mon, 04 May 2026 08:28:14 +0000

Multi-task learning (MTL) has emerged as a pivotal paradigm in machine learning by leveraging shared structures across multiple related tasks. Despite its empirical success, the development of likelihood-based efficiently solvable algorithms--even for shared linear representations--remains largely underdeveloped, primarily due to the non-convex structure intrinsic to matrix factorization. This paper introduces a first-order algorithm that jointly learns a shared representation and task-specific parameters, with guaranteed efficiency. Notably, it converges in $\widetilde{\mathcal{O}}(1)$ iterations and attains a \emph{near-optimal} estimation error of $\widetilde{\mathcal{O}}(dk/(TN))$, \emph{improving} over existing likelihood-based methods by a factor of $k$, where $d$, $k$, $T$, $N$ denote input dimension, representation dimension, task count, and samples per task, respectively. Our results justify that likelihood-based first-order methods can efficiently solve the MTL problem.

Linking PageRank, Time Reversal, and Policy Evaluation

Mon, 04 May 2026 08:28:14 +0000

We establish a connection between policy evaluation in Markov decision processes and PageRank in network analysis. For a fixed policy, we show that the value function of a discounted Markov decision process can be obtained, up to an explicit rescaling, from the PageRank vector of a suitably defined time-reversed Markov chain. In this correspondence, the discount factor plays the role of the teleportation parameter, while rewards induce the restart distribution. Beyond the irreducible case, invoking quasi-stationary distributions and Doob $h$-transforms, we prove a general decomposition theorem showing that policy evaluation for arbitrary finite MDPs reduces to a collection of PageRank problems on the recurrent and transient components of the policy-induced Markov chain. This framework naturally extends to undiscounted MDPs with terminal states and to transition-dependent rewards. We conclude by showing efficiency of our approach on a numerical example of a sticky random walk on large deterministic and random graphs.

Instance-Aware Parameter Configuration in Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem

Mon, 04 May 2026 08:28:14 +0000

Algorithm performance in combinatorial optimization is highly sensitive to parameter settings, while a single globally tuned configuration often fails to exploit the heterogeneity of instances. This limitation is particularly evident in the Electric Capacitated Vehicle Routing Problem, where instances differ in structure, demand patterns, and energy constraints. This paper investigates instance-aware parameter configuration for Bilevel Late Acceptance Hill Climbing, a state-of-the-art metaheuristic for the Electric Capacitated Vehicle Routing Problem. An offline tuning procedure is used to obtain instance-specific parameter labels, which are then mapped from instance features via a regression model to enable parameter prediction for unseen instances prior to execution. Experimental results on the IEEE WCCI 2020 benchmark and its extensions show that the proposed approach achieves an average objective value reduction of $0.28\%$ across eight held-out test instances relative to a globally tuned configuration. This corresponds to a significant cost reduction in multimillion-dollar transportation operations.

Gradient Regularized Newton Boosting Trees with Global Convergence

Mon, 04 May 2026 08:28:14 +0000

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with Lipschitz Hessians, we extend a recent gradient regularized Newton scheme to the restricted weak learner setting. This scheme minimally modifies the classical algorithm by introducing an adaptive $\ell_2$-regularization term proportional to the square root of the gradient norm at each iteration. We establish a $\mathcal{O}(\frac{1}{k^2})$ rate for this scheme, thereby obtaining a globally convergent second-order GBDT algorithm with a rate matching that of first-order boosting with Nesterov momentum. In numerical experiments, we show that our scheme converges while vanilla Newton boosting may diverge.

Learning-Based Stackelberg Equilibrium Seeking with Application to Demand-Side Energy Management

Mon, 04 May 2026 08:28:14 +0000

Demand-side management (DSM) enables distribution system operators (DSOs) to steer electricity consumption through dynamic price signals or incentive mechanisms, thereby leveraging end-users' flexibility potential for delivering grid services. The resulting hierarchical interaction between the DSO and the end-users can be formulated as a Stackelberg game, where the operator dynamically sets the prices and the end-users optimally respond to them. Efficiently designing these price signals is challenging, as the users' response models are unknown or difficult to estimate. In this paper, we propose a learning-based zeroth-order algorithm for incentive design, in which the iterative update of the incentive signals is efficiently assisted by a data-driven online estimation of the users' responses. The proposed method is then proven to converge to an equilibrium tariff while allowing the DSO to estimate the decision-making problems at the user level. Moreover, the method preserves users' privacy, as the update rule of the DSO is solely based on observations of communicated end-user actions. Numerical simulations employing real-world data illustrate the efficient convergence of our learning-based proposed method, while significantly reducing the number of required interactions between the DSO and the end-users with respect to the state-of-the-art approach.

On the Distribution of Unweighted Minimum Knapsack Instances with Large SOS Rank

Mon, 04 May 2026 08:28:14 +0000

We analyze the sum-of-squares rank of unweighted instances of the Minimum Knapsack (MK) problem, i.e., minimization of $\sum_{i=1}^n x_i$ for 0/1 variables under the constraint $\sum_{i=1}^n x_i \geq q$, with $q \in \mathbb{R}$. Such instances have long served as a testbed for understanding the limitations of lift-and-project methods in Boolean optimization. For example, both the Lovász-Schrijver and Sherali-Adams hierarchies require (maximal) rank $n$ to solve them, already when $q=1/2$ is constant. The SOS hierarchy requires only \emph{sublinear} rank $O(\sqrt{n})$ to solve unweighted MK when $q=1/2$. On the other hand, when $q$ is allowed to vary with~$n$, the SOS rank of the problem may become linear. Interestingly, this is known to happen both when $q$ is large, and when $q$ is very small ($0

Reinforcement Learning with Markov Risk Measures and Multipattern Risk Approximation

Mon, 04 May 2026 08:28:14 +0000

For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based $Q$-learning method with multipattern $Q$-factor approximation and we prove a high-probability regret bound of $\mathcal{O}\big(H^2 N^H \sqrt{ K}\big)$, where $H$ is the horizon, $N$ is the mini-batch size, and $K$ is the number of episodes. We also propose an economical version of the $Q$-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.

Optimal Merton's Problem under Multivariate Affine Volterra Models with Jumps

Mon, 04 May 2026 08:28:14 +0000

This paper is concerned with portfolio selection for an investor with exponential, power, and logarithmic utility in multi-asset financial markets allowing jumps. We investigate the classical Merton's portfolio optimization problem in a Volterra stochastic environment described by a multivariate Volterra--Heston model with jumps driven by an independent Poisson random measure. Owing to the non-Markovian and non-semimartingale nature of the model, classical stochastic control techniques are not directly applicable. Instead, the problem is tackled using the martingale optimality principle by constructing a family of supermartingale processes characterized via solutions to an original Riccati backward stochastic differential equation with jumps (Riccati BSDEJ).The resulting optimal strategies for Merton's problems are derived in semi-closed form depending on the solutions to time-dependent multivariate Riccati-Volterra equations, while the optimal value is expressed using the solution to this original Riccati BSDEJ. Numerical experiments on a two-dimensional rough Heston model illustrate the impact of both path roughness and jumps components on the value function and optimal strategies in the Merton problem.

Unstable free boundary problems in optimal control theory: existence and regularity

Mon, 04 May 2026 08:28:14 +0000

We establish the first general regularity result for constrained optimal control problems arising naturally in mathematical physics and mathematical biology. Namely, we prove that for a large class of problems of the form ``maximise $\int ψ(Θ_m)-c\int m$ where $-ΔΘ_m=mΘ_m+B(x,Θ_m)$, under the constraint $0\leq m\leq 1$ a.e.", the solution $m^*$ is bang-bang, in the sense that $m^*=χ_{E^*}$, and that $\partial E^*$ is smooth up to a $(d-2)$-dimensional subset. Moreover, we prove that the solutions to the volume constrained problem ``maximise $\int ψ(Θ_m)$ where $-ΔΘ_m=mΘ_m+B(x,Θ_m)$, under the constraint $0\leq m\leq 1$ a.e and $\int m=m_0$" are bang-bang in the sense that $m^*=χ_{E^*}$ and that, in the two-dimensional case, $\partial E^*$ is a finite union of smooth curves. This is done via reduction to an unstable free boundary problem, the regularity analysis of which was pioneered by Monneau \& Weiss and Chanillo, Kenig \& To. In our case, the free boundary is not minimising, and the laplacian of the state function is sign-changing, which creates significant difficulties, in particular regarding the non-degeneracy of blow-ups. This requires a new approach blending tools from optimal control theory, free boundary and measure theory to establish the regularity of the free boundary.

A Line-search-free Method for Adaptive Decentralized Optimization

Mon, 04 May 2026 08:28:14 +0000

We study decentralized optimization over networks where agents cooperatively minimize a smooth (strongly) convex sum of local losses while communicating only with immediate neighbors. Prevailing decentralized methods require either centralized knowledge of global problem and network parameters for stepsize tuning--hence impractical, or costly per-iteration line-searches that demand access to local function values. We propose line-search-free, fully decentralized algorithms in which each agent adapts its stepsize using only past local iterates and gradients--with no extra function evaluations and no global tuning. The key technical ingredient is a new Lyapunov function, from which a natural adaptive stepsize rule emerges: at each iteration, each agent selects the largest stepsize that guarantees descent, based solely on a local curvature estimate built from successive gradients. The proposed algorithms enjoy strong theoretical guarantees: sublinear convergence rates for merely convex objectives and linear rates under strong convexity. Numerical experiments on standard benchmarks show consistent improvements over the state of the art, both adaptive and non-adaptive.

Sion's minimax theorem and the proximal point algorithm in Hadamard spaces

Mon, 04 May 2026 08:28:14 +0000

We obtain Sion's minimax theorem in Hadamard spaces and discuss its applications. Among other things, we study several fundamental properties of resolvents of saddle functions in Hadamard spaces. An application to the proximal point algorithm for minimax problems in Hadamard spaces are also included.

Randomized Subspace Nesterov Accelerated Gradient

Mon, 04 May 2026 08:28:14 +0000

Randomized-subspace methods reduce the cost of first-order optimization by using only low-dimensional projected-gradient information, a feature that is attractive in forward-mode automatic differentiation and communication-limited settings. While Nesterov acceleration is well understood for full-gradient and coordinate-based methods, obtaining accelerated methods for general subspace sketches that use only projected-gradient information and can improve over full-dimensional Nesterov acceleration in oracle complexity is technically nontrivial. We develop randomized-subspace Nesterov accelerated gradient methods for smooth convex and smooth strongly convex optimization under matrix smoothness and generic sketch moment assumptions. The key technical ingredient is a three-sequence formulation tailored to matrix smoothness, which recovers the corresponding classical Nesterov methods in the full-dimensional case. The resulting theory establishes accelerated oracle-complexity guarantees and makes explicit how matrix smoothness and the sketch distribution enter the complexity. It also provides a unified basis for comparing sketch families and identifying when randomized-subspace acceleration improves over full-dimensional Nesterov acceleration in oracle complexity.

Complex Equation Learner: Rational Symbolic Regression with Gradient Descent in Complex Domain