function X = solve_for_triu(A, H)
% Solve the linear matrix equation AX + X'A' = H for X upper triangular.
%
% function X = solve_for_triu(A, H)
%
% Given A of size p-by-p and H (symmetric) of size p-by-p, solves the
% linear matrix equation AX + X'A' = H for X upper triangular.
% 
% The total computational cost is O(p^4).
%
% If the same equation is to be solved for X symmetric instead, call
% Matlab's built-in sylvester function.
%
% This is a support function to compute the inverse of QR-based
% retractions.
%
% This algorithm appears as Algorithm 1 in:
%  Empirical Arithmetic Averaging over the Compact Stiefel Manifold,
%  Tetsuya Kaneko, Simone Fiori, Toshihisa Tanaka,
%  IEEE Transactions on Signal Processing, 2013
%
% See also: stiefelfactory rotationsfactory sylvester sylvester_nochecks

% This file is part of Manopt: www.manopt.org.
% Original author: Nicolas Boumal, July 18, 2018.
% Contributors: 
% Change log:
%
%   Aug. 3, 2018 (NB):
%       Initial array of zeros now copies the class of A, so that if A is a
%       regular matrix of doubles it doesn't change anything, but if A is
%       on GPU, then this function also works on GPU.

% One tentative idea to reduce the cost to O(p^3) involves taking an LU
% factorization of A. Then, we obtain a permutation matrix P and
% triangular matrices L (lower) and U (upper) such that PA = LU.
% Since inv(P) = P', the matrix equation becomes:
%
%  P' L U X + X' U' L' P = H
%
% Notice that U*X is still upper triangular, so that we can solve for
% U*X first, and obtain X later by solving an upper triangular system.
% After this change of variables, the system involves P'L instead of A.
% If the permutation happens to be identity, then clearly principal
% submatrices of P'L = L are lower triangular themselves, and as a
% result the linear systems that we need to solve below only cost
% O(pp^2) instead of O(pp^3). Summing for pp = 1 .. p gives O(p^3)
% instead of O(p^4). In general though, P is not the identity
% permutation. In particular, if P' reverses the order of the rows of L,
% so that the first half of the principal submatrices of P'L are full, 
% then we revert back to O(p^4) complexity. Interestingly, for X, Y close
% by, the matrix A = X'*Y that appears in computing the inverse retraction
% is close to identity, so that its LU factorization naturally leads to P
% identity; thus, in such scenario we could reduce the cost to O(p^3)
% without loss of stability due to the LU change of variable.

    p = size(A, 1);
    X = zeros(p, p, class(A));
    for pp = 1 : p
        b = H(1:pp, pp);
        b(end) = b(end)/2;
        b(1:end-1) = b(1:end-1) - X(1:pp-1, 1:pp-1)'*A(pp, 1:pp-1)';
        X(1:pp, pp) = A(1:pp, 1:pp) \ b;
    end
    
end
