# CodeDPO: Aligning Code Models with Self Generated and Verified Source Code

>  Code generation models have shown significant potential for programming tasks. 
> However, existing training methods like supervised fine-tuning face key limitations: they do not effectively teach models to prioritize correct over incorrect solutions in ambiguous situations, nor do they effectively optimize the runtime efficiency of the generated code. 
> To address these challenges, we propose CodeDPO, a framework that integrates preference learning into code generation. 
> We define code preference objects based on two key factors: code correctness and efficiency. 
> CodeDPO employs a novel dataset construction method, utilizing a self-generation-and-validation mechanism that simultaneously generates and evaluates code and test cases. 
> The underlying assumption is that test cases executable by multiple code snippets provide more reliable validation, and code that passes more tests is more likely to be correct. 
> Through this self-validation process, our PageRank-inspired algorithm iteratively updates the ranking score of each code snippet, ultimately creating a code preference optimization dataset based on correctness and efficiency.
> CodeDPO is flexible and scalable, generating diverse preference optimization data without depending on external resources. 
> Through comprehensive evaluations of five widely used benchmarks, CodeDPO demonstrates significant improvements in correctness and efficiency compared to existing methods. 
> Our experiments prove that CodeDPO enhances the capabilities of LLMs in code generation and provides a robust foundation for conducting code preference optimization in more complex and challenging real-world scenarios.

## Code is available at https://anonymous.4open.science/r/CodeDPO/

**Note that** we have removed personal information based on the `remove_name.py` and `remove_comment_in_sh_py.py`

The code include dataset construction as well as the DPO training



## Core Code Explanation

### Dataset Construction

Our self-validation score is implemented in:

```python
# Step2-CodeRankScore/src/agreement.py

	def get_sorted_solutions_with_iter_page_rank(self, T = 10, beta = 0.85, rtn_test=False):
        logger.info('Start to get sorted solutions with iter page rank')
        ranked_solutions_by_task = defaultdict(list)
        ranked_tests_by_task = defaultdict(list)

        solution_scores = np.ones((self.max_task, self.max_sol))
        test_scores = np.ones((self.max_task, self.max_test))

        def iter_step_page_rank(solution_scores_t_1, test_scores_t_1, beta):
            # connect matrix: Prompt_len * Code * Test
            # Test Score matrix: Prompt_len * Test
            # Code Score matrix: Prompt_len * Code
            test_scores_t = test_scores_t_1 * (1 - beta) + np.einsum("PCT,PC->PT", self.task_sol_test_matrix, solution_scores_t_1) * beta
            solution_scores_t = solution_scores_t_1 * (1 - beta) + np.einsum("PCT,PT->PC", self.task_sol_test_matrix, test_scores_t) * beta
            return solution_scores_t, test_scores_t
        
        for i in range(T):
            solution_scores, test_scores = iter_step_page_rank(solution_scores, test_scores, beta)
            
        for task_id in self.caseset_passed_solutions_by_task.keys():
            this_taskid_id = self.taskid_string_to_id[task_id]
            rtn_sol_score_list = []
            for solution_id in self.solution_id_to_string_by_task[task_id].keys():
                solution_string = self.solution_id_to_string_by_task[task_id][solution_id]
                solution_score = solution_scores[this_taskid_id][solution_id]
                rtn_sol_score_list.append(([solution_string], solution_score))
            ranked_solutions_by_task[task_id] = sorted(rtn_sol_score_list, key=lambda x: x[1], reverse=True)
        
        if not rtn_test:
            return ranked_solutions_by_task
        else:
            for task_id in self.caseset_passed_solutions_by_task.keys():
                this_taskid_id = self.taskid_string_to_id[task_id]
                rtn_test_score_list = []
                for test_id in self.test_case_id_to_string_by_task[task_id].keys():
                    test_string = self.test_case_id_to_string_by_task[task_id][test_id]
                    test_score = test_scores[this_taskid_id][test_id]
                    rtn_test_score_list.append(([test_string], test_score))
                ranked_tests_by_task[task_id] = sorted(rtn_test_score_list, key=lambda x: x[1], reverse=True)
            return ranked_solutions_by_task, ranked_tests_by_task
```



The Execution time is measured in:

```python
# Step2-CodeRankScore/src/_execution.py
def _pack_test_cases(test_cases, timeout, with_time = False):
    if with_time:
        def _pack_test_cases_with_time(test_cases, timeout):
            blank_4 = ' ' * 4
            blank_8 = ' ' * 8
            blank_12 = ' ' * 12
            result = f'def check():\n    pass_result, pass_time = [], []\n    import time as my_time_\n'
            for idx, tc in enumerate(test_cases):
                multi_line_assertion = tc.strip().replace('\n', f'\n{blank_12}')
                result += f'\n{blank_4}try:\n{blank_8}with time_limit({timeout}):\n{blank_12}_start_t = my_time_.time()\n{blank_12}{multi_line_assertion}\
                            \n{blank_12}pass_result.append(True)\n{blank_12}pass_time.append(my_time_.time()-_start_t)\n{blank_4}except Exception as e:\n{blank_8}pass_result.append(False)\n{blank_8}pass_time.append(100000)\n'
            result += '\n    return pass_result,pass_time\n'
            result += f'\nglobal final_result, final_result_time\nfinal_result,final_result_time = check()'
            return result
        return _pack_test_cases_with_time(test_cases, timeout)
    blank_4 = ' ' * 4
    blank_8 = ' ' * 8
    blank_12 = ' ' * 12
    result = f'def check():\n    pass_result = []\n'
    for idx, tc in enumerate(test_cases):
        multi_line_assertion = tc.strip().replace('\n', f'\n{blank_12}')
        result += f'\n{blank_4}try:\n{blank_8}with time_limit({timeout}):\n{blank_12}{multi_line_assertion}\
                    \n{blank_12}pass_result.append(True)\n{blank_4}except Exception as e:\n{blank_8}pass_result.append(False)\n'
    result += '\n    return pass_result\n'
    result += f'\nglobal final_result\nfinal_result = check()'
    return result
  
  
#  ...
  
						try:
                exec_globals = {'time_limit': time_limit}
                with swallow_io():
                    with time_limit(timeout + 1):
                        exec(check_program, exec_globals)
                if with_time:
                    result_time.append(exec_globals['final_result_time'])
                result.append(exec_globals['final_result'])
```







### DPO Training

Please see `Step3-DPOTraining/fastchat/train/code.dpo.fromlora.py` and other relative code files.





