Data collection based on FSDP backend on Ascend devices(en)
==========================================================================================

Last updated: 08/14/2025.

This is a tutorial for data collection using the GRPO or DAPO algorithm
based on FSDP on Ascend devices.

Configuration
-------------

Leverage two levels of configuration to control data collection:

1. **Global profiler control**: Use parameters in ``ppo_trainer.yaml`` to control the collection mode and steps.
2. **Role profile control**: Use parameters in each role's ``profile`` field to control the collection mode for each role.

Global collection control
~~~~~~~~~~~~~~~~~~~~~~~~~

Use parameters in ppo_trainer.yaml to control the collection mode
and steps.

-  global_profiler: Control the ranks and mode of profiling

   -  tool: The profiling tool to use, options are nsys, npu, torch,
      torch_memory.
   -  steps: This parameter can be set as a list that has
      collection steps, such as [2, 4], which means it will collect steps 2
      and 4. If set to null, no collection occurs.
   -  save_path: The path to save the collected data. Default is
      "outputs/profile".


Role collection control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In each role's ``profiler`` field, you can control the collection mode for that role.

-  enable: Whether to enable profiling for this role.
-  all_ranks: Whether to collect data from all ranks.
-  ranks: A list of ranks to collect data from. If empty, no data is collected.
-  tool_config: Configuration for the profiling tool used by this role.

Use parameters in each role's ``profiler.tool_config.npu`` to control npu profiler behavior:

-  level: Collection level—options are level_none, level0, level1, and
   level2

   -  level_none: Disables all level-based data collection (turns off
      profiler_level).
   -  level0: Collect high-level application data, underlying NPU data,
      and operator execution details on NPU.
   -  level1: Extends level0 by adding CANN-layer AscendCL data and AI
      Core performance metrics on NPU.
   -  level2: Extends level1 by adding CANN-layer Runtime data and AI
      CPU metrics.

-  contents: A list of options to control the collection content, such as
   npu, cpu, memory, shapes, module, stack.
   
   -  npu: Whether to collect device-side performance data.
   -  cpu: Whether to collect host-side performance data.
   -  memory: Whether to enable memory analysis.
   -  shapes: Whether to record tensor shapes.
   -  module: Whether to record framework-layer Python call stack
      information.
   -  stack: Whether to record operator call stack information.

-  analysis: Enables automatic data parsing.
-  discrete: Whether to enable discrete mode.


Examples
--------

Disabling collection
~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: null # disable profile

End-to-End collection
~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: [1, 2, 5]
      actor_rollout_ref:
         actor:
            profiler:
               enable: True
               all_ranks: True
               tool_config:
                  npu:
                     discrete: False
        # rollout & ref follow actor settings


Discrete Mode Collection
~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: yaml

      global_profiler:
         steps: [1, 2, 5]
      actor_rollout_ref:
         actor:
            profiler:
               enable: True
               all_ranks: True
               tool_config:
                  npu:
                     discrete: True
        # rollout & ref follow actor settings


Visualization
-------------

Collected data is stored in the user-defined save_path and can be
visualized by using the `MindStudio Insight <https://www.hiascend.com/document/detail/zh/mindstudio/80RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0002.html>`_ tool.

Additionally, in a Linux environment, the MindStudio Insight tool is provided in the form of a `JupyterLab Plugin <https://www.hiascend.com/document/detail/zh/mindstudio/82RC1/GUI_baseddevelopmenttool/msascendinsightug/Insight_userguide_0130.html>`_ ，offering a more intuitive and highly interactive user interface. The advantages of the JupyterLab plugin are as follows:

- Seamless integration: Supports running the MindStudio Insight tool directly within the Jupyter environment, eliminating the need to switch platforms or copy data from the server, enabling data to be collected and used immediately.
- Fast startup: Allows MindStudio Insight to be launched quickly via the JupyterLab command line or graphical interface.
- Smooth operation: In a Linux environment, launching MindStudio Insight through JupyterLab effectively alleviates performance lag compared to the full-package communication mode, significantly improving the user experience.
- Remote access: Supports remotely launching MindStudio Insight. Users can connect to the service via a local browser for direct visual analysis, reducing the difficulty of uploading and downloading data during large-model training or inference.

If the analysis parameter is set to False, offline parsing is required after data collection:

.. code:: python

    import torch_npu
    # Set profiler_path to the parent directory of the "localhost.localdomain_<PID>_<timestamp>_ascend_pt" folder
    torch_npu.profiler.profiler.analyse(profiler_path=profiler_path)