Poblano Toolbox v1.1

Limited-Memory BFGS Optimization

Limited-memory quasi-Newton methods [2] are a class of methods that compute and/or maintain simple, compact approximations of the Hessian matrices of second derivatives, which are used determining search directions. Poblano includes the limited-memory BFGS (L-BFGS) method, a variant of these methods whose Hessian approximations are based on the BFGS method (see [1] for more details).

The Poblano function for the L-BFGS method is called lbfgs.

Contents


Introduction

The general steps of L-BFGS methods are given below in high-level pseudo-code [2]:

Computing the step direction

In Step 6 in the above method, the computation of the step direction is performed using the following method (assume we are at iteration ) [2]:


Method Specific Input Parameters

The input parameters specific to the lbfgs method are presented below. See the Optimization Input Parameters documentation for more details on the Poblano parameters shared across all methods.

M        Limited memory parameter {5}


Default Input Parameters

The default input parameters are returned with the following call to lbfgs:

params = lbfgs('defaults')
params = 

                   Display: 'iter'
              DisplayIters: 1
           LineSearch_ftol: 0.0001
           LineSearch_gtol: 0.01
    LineSearch_initialstep: 1
         LineSearch_maxfev: 20
         LineSearch_method: 'more-thuente'
         LineSearch_stpmax: 1e+15
         LineSearch_stpmin: 1e-15
           LineSearch_xtol: 1e-15
                         M: 5
              MaxFuncEvals: 10000
                  MaxIters: 1000
                RelFuncTol: 1e-06
                   StopTol: 1e-05
                 TraceFunc: 0
            TraceFuncEvals: 0
                 TraceGrad: 0
             TraceGradNorm: 0
              TraceRelFunc: 0
                    TraceX: 0

See the Optimization Input Parameters documentationfor more details on the Poblano parameters shared across all methods.


Examples

Below are the results of using the lbfgs method in Poblano to solve example problems solved using the ncg method in the Nonlinear Conjugate Gradient Optimization documentation. Note that the different methods lead to slightly different solutions and a different number of function evaluations.

Example 1 (from Poblano Examples)

In this example, we have and , and use a random starting point.

randn('state',0);
x0 = randn(10,1)
out = lbfgs(@(x) example1(x,3), x0)
x0 =

     -0.43256
      -1.6656
      0.12533
      0.28768
      -1.1465
       1.1909
       1.1892
    -0.037633
      0.32729
      0.17464

 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1       1.80545257       0.73811114
     1         5      -4.10636797       0.54564169
     2         9      -5.78542020       0.51884777
     3        14      -7.58881458       0.30353965
     4        18      -8.59871546       0.37732775
     5        21      -9.18998855       0.36848976
     6        24      -9.85388944       0.16082452
     7        26      -9.99944665       0.00997994
     8        28      -9.99995225       0.00293180
     9        30      -9.99999998       0.00006204
    10        32     -10.00000000       0.00000157

out = 

             Params: [1x1 inputParser]
           ExitFlag: 0
    ExitDescription: 'Successful termination based on StopTol'
                  X: [10x1 double]
                  F: -10
                  G: [10x1 double]
          FuncEvals: 32
              Iters: 10

Example 2 (from Poblano Examples)

In this example, we compute a rank-4 approximation to a Pascal matrix (generated using the Matlab function pascal(4)). The starting point is random vector. Note that in the interest of space, Poblano is set to display only the final iteration is this example.

m = 4; n = 4; k = 4;
Data.rank = k;
Data.A = pascal(m);
randn('state',0);
x0 = randn((m+n)*k,1);
out = lbfgs(@(x) example2(x,Data), x0, 'Display', 'final')
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
    61       128       0.00000004       0.00000971

out = 

             Params: [1x1 inputParser]
           ExitFlag: 0
    ExitDescription: 'Successful termination based on StopTol'
                  X: [32x1 double]
                  F: 4.1175e-08
                  G: [32x1 double]
          FuncEvals: 128
              Iters: 61

As for the ncg method, the fact that out.ExitFlag > 0 indicates that the method did not converge to the specified tolerance (i.e., using the default StopTol input parameter value of 1e-5). Since the maximum number of function evaluations was exceeded, we can increasing the number of maximum numbers of function evaluations and iterations allowed, and the optimizer converges to a solution within the specified tolerance.

out = lbfgs(@(x) example2(x,Data), x0, 'MaxIters',1000, ...
    'MaxFuncEvals',10000,'Display','final')
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
    61       128       0.00000004       0.00000971

out = 

             Params: [1x1 inputParser]
           ExitFlag: 0
    ExitDescription: 'Successful termination based on StopTol'
                  X: [32x1 double]
                  F: 4.1175e-08
                  G: [32x1 double]
          FuncEvals: 128
              Iters: 61

Verifying the solution, we see that we find a matrix decomposition which fits the matrix with very small relative error (given the stopping tolerance of 1e-5 used by the optimizer).

[U,V] = example2_extract(m,n,k,out.X);
norm(Data.A-U*V')/norm(Data.A)
ans =

   1.0273e-05

For Example 2, we see that lbfgs requires slightly fewer function evaluations than ncg to solve the problem (143 versus 175). Performance of the different methods in Poblano is dependent on both the method and the particular parameters chosen. Thus, it is recommended that several test runs on smaller problems are performed initially using the different methods to help decide which method and set of parameters works best for a particular class of problems.


References

[1] Dennis, Jr., J. E. and Schnabel, R. B. (1996). Numerical Methods for Unconstrained Optimization and Nonlinear Equations. SIAM.

[2] Nocedal, J. and Wright S. J. (1999). Numerical Optimization. Springer.


Poblano Toolbox v1.1