[2025-01-18 01:48:35,843][    INFO][__main__] gradient_modifier_config:
mode=<GradientModifierMode.FREEZE_LAYERS: 'freeze_layers'> target_modules_to_freeze=['lm_head'] (factory.py:53)
[2025-01-18 01:48:35,843][    INFO][__main__] mode = <GradientModifierMode.FREEZE_LAYERS: 'freeze_layers'> (factory.py:57)
[2025-01-18 01:48:35,843][    INFO][__main__] Creating GradientModifierFreezeLayers instance ... (factory.py:69)
[2025-01-18 01:48:35,843][    INFO][__main__] Freezing layers ... (gradient_modifier_freeze_layers.py:67)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.embeddings.word_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.embeddings.position_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.embeddings.token_type_embeddings.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.embeddings.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.embeddings.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.0.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,844][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.1.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.2.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,845][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.3.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.4.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,846][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.5.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,847][    INFO][__main__] name = 'roberta.encoder.layer.6.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.6.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.7.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,848][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.8.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,849][    INFO][__main__] name = 'roberta.encoder.layer.9.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.9.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.9.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.9.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.10.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.query.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.query.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.key.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.key.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,850][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.value.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.self.value.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.attention.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.intermediate.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.intermediate.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.output.dense.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.output.dense.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.output.LayerNorm.weight', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'roberta.encoder.layer.11.output.LayerNorm.bias', param.requires_grad = True (log_model_info.py:66)
[2025-01-18 01:48:35,851][    INFO][__main__] name = 'lm_head.bias', param.requires_grad = False (log_model_info.py:66)
[2025-01-18 01:48:35,852][    INFO][__main__] name = 'lm_head.dense.weight', param.requires_grad = False (log_model_info.py:66)
[2025-01-18 01:48:35,852][    INFO][__main__] name = 'lm_head.dense.bias', param.requires_grad = False (log_model_info.py:66)
[2025-01-18 01:48:35,852][    INFO][__main__] name = 'lm_head.layer_norm.weight', param.requires_grad = False (log_model_info.py:66)
[2025-01-18 01:48:35,852][    INFO][__main__] name = 'lm_head.layer_norm.bias', param.requires_grad = False (log_model_info.py:66)
[2025-01-18 01:48:35,852][    INFO][__main__] Freezing layers DONE. (gradient_modifier_freeze_layers.py:88)

[2025-01-18 01:48:38,685][    INFO][__main__] Calling trainer.train() ... (finetune_model.py:46)
[2025-01-18 01:48:38,763][    INFO][transformers.trainer] The following columns in the training set don't have a corresponding argument in `RobertaForMaskedLM.forward` and have been ignored: split, special_tokens_mask, text, dialogue_id, turn_index. If split, special_tokens_mask, text, dialogue_id, turn_index are not expected by `RobertaForMaskedLM.forward`,  you can safely ignore this message. (trainer.py:910)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer] ***** Running training ***** (trainer.py:2362)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Num examples = 10,000 (trainer.py:2363)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Num Epochs = 5 (trainer.py:2364)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Instantaneous batch size per device = 8 (trainer.py:2365)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Total train batch size (w. parallel, distributed & accumulation) = 16 (trainer.py:2368)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Gradient Accumulation steps = 2 (trainer.py:2369)
[2025-01-18 01:48:38,769][    INFO][transformers.trainer]   Total optimization steps = 3,125 (trainer.py:2370)
[2025-01-18 01:48:38,770][    INFO][transformers.trainer]   Number of trainable parameters = 124,055,040 (trainer.py:2371)