Abstract: The time-consuming nature of programming robotic motion and manipulation to achieve specific tasks is a key reason that robots have generally been restricted to repetitive tasks with little variation. Developers need to manually write specific code for each task, making it challenging to adapt the code to different environments and assembly scenarios. This inefficiency leads to significant time being spent on creating redundant code for similar robotic actions in various assemblies. To address this, we develop a generative network to program robots through demonstration, bringing more agility to the process. We propose a network, Act2Code, that takes video demonstration as an input and translates it to robotic instructions. We evaluate our network on the real-world assembly dataset. Our results demonstrate the model’s effectiveness in generating code, achieving a promising BLEU score of 0.72.