In the paper titled 'Topically Driven Neural Language Model', the authors mention that multi-layer LSTMs are vanilla stacked LSTMs without two modules: skip connections and another module proposed in a related paper you have read before. Provide the full title of that paper.