OPTIMIZATION
Mathematical optimization in statistics involves finding the best parameters or solutions that maximize or minimize a particular objective function, such as fitting a model to data or minimizing error, to achieve optimal statistical performance or predictions
Introduction
Continuous optimization
Gradient descent
Gradient descent is the backbone of modern machine learning optimization: it iteratively updates parameters by moving in the direction of steepest descent of the loss function.
Newton's method and BFGS
Newton's method converges quadratically by using second-order information. BFGS approximates the Hessian to get near-Newton performance without the O(n³) cost.
Convex optimization
Convex optimization is the class of problems where every local minimum is global. It unifies linear programming, quadratic programming, and support vector machines under one framework.
Constrained optimization
Linear programming
Linear programming optimizes a linear objective function subject to linear inequality constraints. The feasible region is a convex polytope and the optimum always lies at a vertex.
Simplex method
The Simplex method solves linear programs by moving from vertex to vertex of the feasible polytope, always improving the objective, until no further improvement is possible.
Lagrange multipliers
Lagrange multipliers transform a constrained optimization problem into an unconstrained one by incorporating constraints into the objective via multipliers.
KKT conditions
The KKT conditions generalize Lagrange multipliers to inequality constraints, providing necessary and sufficient optimality conditions for constrained optimization.
Discrete and combinatorial optimization
Integer programming
Integer programming optimizes linear objectives over integer or binary variables. Branch and bound is the fundamental algorithm: it partitions the feasible set and bounds subproblems using LP relaxations.
Dynamic programming
Dynamic programming solves optimization problems with overlapping subproblems and optimal substructure by storing solutions to avoid redundant computation.
Greedy algorithms
Greedy algorithms make the locally best choice at each step without reconsidering past decisions. When the greedy choice property holds, they find the global optimum efficiently.
Graph algorithms
Introduction to graphs
Graphs model relationships between objects. Learn the key concepts, types of graphs, and data structures used to represent them before studying graph algorithms.
Dijkstra's algorithm
Dijkstra's algorithm finds the shortest path from a source to all other vertices in a weighted graph with non-negative edge weights.