Keywords: Generative Model, Lookahead bias, Lookback bias, Empirical social science, Foundational Model, Firm growth, Innovation
TL;DR: We introduce NoLBERT, a timestamped foundational model that avoids both lookahead and lookback biases. We demonstrate its applicability in econometric analyses of the relationship between innovation and firm growth.
Abstract: We present NoLBERT, a lightweight, timestamped foundational language model for empirical research---particularly for forecasting in economics, finance, and the social sciences. By pretraining exclusively on text from 1976 to 1995, NoLBERT avoids both {\it lookback} and {\it lookahead} biases (information leakage) that can undermine econometric inference. It exceeds domain-specific baselines on NLP benchmarks while maintaining temporal consistency. Applied to patent texts, NoLBERT enables the construction of firm-level innovation networks and shows that gains in innovation centrality predict higher long-run profit growth.
Submission Number: 106
Loading