Abstract: Language models with billions of parameters have shown remarkable emergent properties, including the ability to reason on unstructured data. We show that open-science multi-lingual large language models can perform the task of spatial reasoning on two or more entities with significant accuracy. A responsible large language model would perform this spatial reasoning task with the same accuracy regardless of the choice of the names of the entities over which the spatial relationships are defined. However, we show that the accuracies of contemporary large language models are impacted by the choice of proper nouns even when the underlying task ought to be independent of the choice of proper nouns. In this context, we also observe that the conditional log probabilities or beam scores of open-science multi-lingual large language model predictions are not well-calibrated, and the scores do not discriminate between correct and wrong responses in this context.