In this paper,  
%we revisit deep learning optimization for feedforward models with smooth activations, and make 
we revisit the NTK analysis with smooth activations and show that effectively linear width suffices for the NTK at initialization to be positive definite. Our analysis \pcedit{makes a novel use of generalized Hermite series expansion for smooth function activation. Though standard Hermite series expansion has been used for ReLU activation functions, such analysis relied heavily on the homogeneous assumption of ReLU functions --- a property generally absent in smooth activation functions. 
\pcdelete{Our work also presents how achieving $\epsilon$-optimality can improve the width dependence on the number of training samples to be linear, thus improving the dependence over the existing literature, even when compared to works based on ReLU activations.}
Finally, our work highlights the importance of initialization variance \pcedit{in determining a trade-off between tighter Hessian bounds and larger lower bounds on the NTK condition.}}
{ Given the growing literature on optimization of neural networks based on NTK analysis, we hope our work contributes by providing a better theoretical understanding on the performance of networks whose width may beneficially scale with the number of training samples.} %\pccomment{This last sentence needs work. Maybe mention future work too.}.

%reveals a possible tradeoff based on the initialization variance, with smaller values helping the Hessian and RSC analyses and larger values helping the NTK analysis. 