Weighted optimization for CP tensor decomposition with incomplete data

We explain how to use cp_wopt with the POBLANO toolbox.

Contents

Create an example problem with missing data.

Here we have 25% missing data and 10% noise.

R = 2;
info = create_problem('Size', [15 10 5], 'Num_Factors', R, ...
    'M', 0.25, 'Noise', 0.10);
X = info.Data;
P = info.Pattern;
M_true= info.Soln;

Create initial guess using 'nvecs'

M_init = create_guess('Data', X, 'Num_Factors', R, ...
    'Factor_Generator', 'nvecs');

Set up the optimization parameters

It's genearlly a good idea to consider the parameters of the optimization method. The default options may be either too stringent or not stringent enough. The most important options to consider are detailed here.

% Get the defaults
ncg_opts = ncg('defaults');
% Tighten the stop tolerance (norm of gradient). This is often too large.
ncg_opts.StopTol = 1.0e-6;
% Tighten relative change in function value tolearnce. This is often too large.
ncg_opts.RelFuncTol = 1.0e-20;
% Increase the number of iterations.
ncg_opts.MaxIters = 10^4;
% Only display every 10th iteration
ncg_opts.DisplayIters = 10;
% Display the final set of options
ncg_opts
ncg_opts = 
                   Display: 'iter'
              DisplayIters: 10
           LineSearch_ftol: 1.0000e-04
           LineSearch_gtol: 0.0100
    LineSearch_initialstep: 1
         LineSearch_maxfev: 20
         LineSearch_method: 'more-thuente'
         LineSearch_stpmax: 1.0000e+15
         LineSearch_stpmin: 1.0000e-15
           LineSearch_xtol: 1.0000e-15
              MaxFuncEvals: 10000
                  MaxIters: 10000
                RelFuncTol: 1.0000e-20
              RestartIters: 20
                 RestartNW: 0
              RestartNWTol: 0.1000
                   StopTol: 1.0000e-06
                 TraceFunc: 0
            TraceFuncEvals: 0
                 TraceGrad: 0
             TraceGradNorm: 0
              TraceRelFunc: 0
                    TraceX: 0
                    Update: 'PR'

Call the cp_wopt method

Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient. The meaning of any line search warnings can be checked via doc cvsrch.

[M,~,output] = cp_wopt(X, P, R, 'init', M_init, ...
    'alg', 'ncg', 'alg_options', ncg_opts);
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1     185.00205958       0.48027652
    10        47       1.68251600       0.03869586
    20        74       1.51902470       0.00179193
    30        95       1.51868289       0.00009479
    40       115       1.51868158       0.00000174
    45       125       1.51868158       0.00000050

Check the output

It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit flag. A zero (0) indicates successful termination with the gradient smaller than the specified StopTol, and a three (3) indicates a successful termination where the change in function value is less than RelFuncTol. The meaning of any other flags can be checked via doc poblano_params.

exitflag = output.ExitFlag
exitflag =
     0

Evaluate the output

We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.

scr = score(M,M_true)
scr =
    0.9849

Create a SPARSE example problem with missing data.

Here we have 95% missing data and 10% noise.

R = 2;
info = create_problem('Size', [150 100 50], 'Num_Factors', R, ...
    'M', 0.95, 'Sparse_M', true, 'Noise', 0.10);
X = info.Data;
P = info.Pattern;
M_true= info.Soln;

Create initial guess using 'nvecs'

M_init = create_guess('Data', X, 'Num_Factors', R, ...
    'Factor_Generator', 'nvecs');

Set up the optimization parameters

It's genearlly a good idea to consider the parameters of the optimization method. The default options may be either too stringent or not stringent enough. The most important options to consider are detailed here.

% Get the defaults
ncg_opts = ncg('defaults');
% Tighten the stop tolerance (norm of gradient). This is often too large.
ncg_opts.StopTol = 1.0e-6;
% Tighten relative change in function value tolearnce. This is often too large.
ncg_opts.RelFuncTol = 1.0e-20;
% Increase the number of iterations.
ncg_opts.MaxIters = 10^4;
% Only display every 10th iteration
ncg_opts.DisplayIters = 10;
% Display the final set of options
ncg_opts
ncg_opts = 
                   Display: 'iter'
              DisplayIters: 10
           LineSearch_ftol: 1.0000e-04
           LineSearch_gtol: 0.0100
    LineSearch_initialstep: 1
         LineSearch_maxfev: 20
         LineSearch_method: 'more-thuente'
         LineSearch_stpmax: 1.0000e+15
         LineSearch_stpmin: 1.0000e-15
           LineSearch_xtol: 1.0000e-15
              MaxFuncEvals: 10000
                  MaxIters: 10000
                RelFuncTol: 1.0000e-20
              RestartIters: 20
                 RestartNW: 0
              RestartNWTol: 0.1000
                   StopTol: 1.0000e-06
                 TraceFunc: 0
            TraceFuncEvals: 0
                 TraceGrad: 0
             TraceGradNorm: 0
              TraceRelFunc: 0
                    TraceX: 0
                    Update: 'PR'

Call the cp_wopt method

Here is an example call to the cp_opt method. By default, each iteration prints the least squares fit function value (being minimized) and the norm of the gradient. The meaning of any line search warnings can be checked via doc cvsrch.

[M,~,output] = cp_wopt(X, P, R, 'init', M_init, ...
    'alg', 'ncg', 'alg_options', ncg_opts);
 Iter  FuncEvals       F(X)          ||G(X)||/N        
------ --------- ---------------- ----------------
     0         1    1170.62424530       0.02422437
    10        52     190.11900267       0.00893357
    20        96     162.40135695       0.10203161
    30       137      11.42949623       0.00295760
    40       160      11.38681461       0.00002031
    48       176      11.38681239       0.00000096

Check the output

It's important to check the output of the optimization method. In particular, it's worthwhile to check the exit flag. A zero (0) indicates successful termination with the gradient smaller than the specified StopTol, and a three (3) indicates a successful termination where the change in function value is less than RelFuncTol. The meaning of any other flags can be checked via doc poblano_params.

exitflag = output.ExitFlag
exitflag =
     0

Evaluate the output

We can "score" the similarity of the model computed by CP and compare that with the truth. The score function on ktensor's gives a score in [0,1] with 1 indicating a perfect match. Because we have noise, we do not expect the fit to be perfect. See doc score for more details.

scr = score(M,M_true)
scr =
    0.9977