engine.py and main.py is code with accumulation step optimization applied

Can train with 2 4090 GPUs
