Cross-lingual transfer learning for statistical type inference

Zhiming Li, Xiaofei Xie, Haoliang Li, Zhengzi Xu, Yi Li, Yang Liu

2022 (modified: 22 Nov 2022)ISSTA 2022Readers: Everyone

Abstract: NOTE OF RETRACTION: The authors, Zhiming Li, Xiaofei Xie, Haoliang Li, Zhengzi Xu, Yi Li, and Yang Liu, of the paper “Cross-lingual transfer learning for statistical type inference” have requested their paper be Retracted due to errors in the paper. The authors all agree the major conclusions are erroneous: 1. (Major) In RQ4, the results of LambadaNet and Typilus baseline methods are erroneous and the PLATO results are implemented without the incorporation of cross-lingual data. And some numbers are recorded erroneously in the table, which makes the important conclusion of the paper “Plato can significantly outperform the baseline” erroneous. 2. (Major) In RQ1, the implementations of the rule-based tools (CheckJS and Pytype) (Page 8) are erroneous, and we find it not possible to compare PLATO with the Pytype tool fairly. This renders the conclusion of the paper “With Plato, one can achieve comparative or even better performance by using cross-lingual labeled data instead of implementing rule-based tool from scratch that requires significant manual effort and expert knowledge.” erroneous. 3. Besides, for RQ1, we realize that the type set used for the Python & TypeScript transfer only uses 6 and 4 meta-types, which are somewhat inconsistent with the description on Page 6. The implementation of the ADV baseline for the Java transfer benchmarks and the supervised_o of TypeScript baselines are erroneous. And the ensemble method used for PLATO is inconsistent with the description in the methodology section. And RQ1 has used an outdated checkpoint of ours (different from the one used in other RQs.) The pre-trained model, training process, and ensemble strategy are implemented in settings somewhat different from the description in the methodology section. 4. The visualizations of Figure 6 & 8 are somewhat inconsistent with real cases. 5. In RQ3, the description of the baseline method (Bert with supervised learning) is wrong (Page 9) (It should be “only trained on partially labeled target language data”). And we find that some tokens are erroneously normalized during preprocessing. And some data points’ results are erroneous, thus “Plato without Kernel” and “PLATO” methods would not achieve as high improvements as claimed. 6. In RQ2, the ablation of the PLATO model is erroneous and we find that the sequence submodel performs better than the kernel submodel (Table 3).

0 Replies