[2025-01-18 01:49:16,277][    INFO][__main__] gradient_modifier_config:
mode=<GradientModifierMode.DO_NOTHING: 'do_nothing'> target_modules_to_freeze=[] (factory.py:53)
[2025-01-18 01:49:16,277][    INFO][__main__] mode = <GradientModifierMode.DO_NOTHING: 'do_nothing'> (factory.py:57)
[2025-01-18 01:49:16,278][    INFO][__main__] Creating GradientModifierDoNothing instance ... (factory.py:61)
[2025-01-18 01:49:16,278][    INFO][__main__] Using model without gradient modifications. (gradient_modifier_do_nothing.py:58)
[2025-01-18 01:49:16,278][    INFO][__main__] Returning unmodified model. (gradient_modifier_do_nothing.py:59)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.embeddings.word_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.embeddings.position_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.embeddings.token_type_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.embeddings.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.embeddings.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.0.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,278][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.1.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.2.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,279][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.3.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.4.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,280][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.5.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,281][    INFO][__main__] name = 'roberta.encoder.layer.6.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.7.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,282][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.8.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,283][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.9.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,284][    INFO][__main__] name = 'roberta.encoder.layer.10.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.10.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'roberta.encoder.layer.11.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'lm_head.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'lm_head.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'lm_head.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'lm_head.layer_norm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:49:16,285][    INFO][__main__] name = 'lm_head.layer_norm.bias', param.requires_grad = True (log_model_info.py:66)

[2025-01-18 01:49:19,049][    INFO][__main__] Calling trainer.train() ... (finetune_model.py:46)
[2025-01-18 01:49:19,161][    INFO][transformers.trainer] The following columns in the training set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: text, dialogue_id, turn_index, special_tokens_mask, split. If text, dialogue_id, turn_index, special_tokens_mask, split are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message. (trainer.py:910)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer] ***** Running training ***** (trainer.py:2362)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Num examples = 10,000 (trainer.py:2363)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Num Epochs = 5 (trainer.py:2364)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Instantaneous batch size per device = 8 (trainer.py:2365)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Total train batch size (w. parallel, distributed & accumulation) = 16 (trainer.py:2368)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Gradient Accumulation steps = 2 (trainer.py:2369)
[2025-01-18 01:49:19,167][    INFO][transformers.trainer]   Total optimization steps = 3,125 (trainer.py:2370)
[2025-01-18 01:49:19,168][    INFO][transformers.trainer]   Number of trainable parameters = 124,697,433 (trainer.py:2371)
