\subsection{Expressive Power}
\label{sec:expressive}

Here we show that the network in \eqref{eq:network} has the expressive power to implement \textit{any} Boolean function over $\gX$. Therefore, in terms of expressive power, the network is suitable for learning Boolean functions and has the same expressive power for implementing Boolean functions as a standard one-hidden layer NN.

\begin{thm}
\label{thm:expressive}
Let $f:\gX \rightarrow \gY$. Then, there exists $\mth$ and a network $N$ in \eqref{eq:network} with $r \le 2^D$ neurons such that for all $\vx \in \gX$, $\sign\left(N(\vx; \mth)\right) = f\left(\vx\right)$.
\end{thm}

\begin{proof}
Let $\gX_+ = \group{\vx}{ f(x) = 1}$. Define $r = \left| \gX_+ \right|$. Then, $\gX_+ = \{\vx_1, \ldots, \vx_r\}$. Define $c=-1$ and for each $i \in [r]$ define $\vw_i = \vx_i$ and $b_i = -D+2$. Then $\forall \vx_i \in \gX_+$ it holds that $\sigma(\vw_i \cdot \vx_i + b_i) = 2$ and $\forall \vx \neq \vx_i$ it holds that $\sigma(\vw_i \cdot \vx + b_i) = 0$. Therefore  $\forall \vx \in \gX_+$ we have $N(\vx; \mth) = 1$ and for $\vx \in \gX \backslash \gX_+$ it holds that  $N(\vx; \mth) = -1$, from which the claim follows.
\end{proof}

\subsection{Empirical Performance}

\begin{figure*}[t!]
\hspace*{\fill}
\begin{subfigure}{0.4\textwidth}
  \centering
  % include first image
  \includegraphics[width=\linewidth]{figures/D=9_comparsion_no_ntk.png}  
  \caption{}
  \label{fig:D=9_comparison}
\end{subfigure}
\hspace*{\fill}
\begin{subfigure}{0.4\textwidth}
  \centering
  % include first image
  \includegraphics[width=\linewidth]{figures/D=9_reconstraction.png}  
  \caption{}
  \label{fig:D=9_reconstruction}
\end{subfigure}
\hspace*{\fill}

\caption{Learning the read-once DNF:  $(x_1 \land x_2 \land x_3) \lor (x_4 \land x_5 \land x_6) \lor (x_7 \land x_8 \land x_9)$ (a) Test accuracy for the following models: convex neural network, standard two layer neural network and an algorithm based on statistical queries (SQ). (b) Accuracy of the DNF recovery procedure for finding exactly the true DNF from the network weights.% \amirg{make sure the recovery procedure is explained somewhere.}
}
\label{fig:motivating}

\end{figure*}


Thus far we described the setting where the ground truth function is a read-once DNF that is learned by a convex neural net. We have seen in Theorem \ref{thm:expressive} that the convex network is sufficiently expressive. However, this does not imply that the network can learn read-once DNFs in practice. To examine this, we performed experiments for learning read-once DNFs under the uniform distribution with the convex network. We compared its test performance to a standard two-layer neural network, and an algorithm based on Statistical Queries (SQ) for learning read-once DNFs that has polynomial sample complexity guarantees \citep{mansour2001entropy}. We note that the convex network was implemented with a relatively small initialization. In Section \ref{sec:empirical} and in the Supplementary we conduct experiments with a convex network with large initialization which is analogous to training in the NTK regime \citep{chizat2019lazy}. 

\figref{fig:D=9_comparison} shows the evaluation results. It can be seen that the convex network outperforms the other algorithms across all training set sizes. 

Therefore, together with Theorem \ref{thm:expressive}, we conclude that the convex network we consider is a good test-bed for analyzing the inductive bias of neural networks in the setting of read-once DNFs and uniform distribution.