{
  "query": "Tell me how to submit a script using sbatch.",
  "query-id": 89,
  "gold_response": "Overview\n- sbatch submits a batch script to the Slurm scheduler. It queues the job, returns a Job ID immediately, and writes output to files (default: slurm-%j.out, where %j is the Job ID).\n- srun runs commands on allocated resources (either from an sbatch allocation or from a direct srun request). You will use sbatch to submit non-interactive jobs and srun to launch tasks within that allocation or for quick interactive runs. A detailed comparison and when-to-use guidance appears below.\n\nComplete sample sbatch script (save as submit.sh)\n```\n#!/bin/bash\n#SBATCH --job-name=demo_job           # A short, descriptive name\n#SBATCH --partition=compute           # Cluster-specific queue/partition\n#SBATCH --nodes=1                     # Number of nodes\n#SBATCH --ntasks=1                    # Total tasks (MPI ranks)\n#SBATCH --cpus-per-task=4             # CPU cores (threads) per task\n#SBATCH --mem=8G                      # Total memory per node\n#SBATCH --time=00:30:00               # Max runtime (HH:MM:SS)\n#SBATCH --output=demo_%j.out          # Stdout file (%j = Job ID)\n#SBATCH --error=demo_%j.err           # Stderr file\n# #SBATCH --gres=gpu:1                # Uncomment if a GPU is required\n\nmodule purge\nmodule load python/3.10               # Load your software stack\n\necho \"Node: $(hostname) | CPUs: $SLURM_CPUS_PER_TASK\"\n# Launch the workload inside the allocation. srun ensures proper binding and distribution.\nsrun --cpu-bind=cores python my_program.py --threads $SLURM_CPUS_PER_TASK\n```\nSubmit it:\n- sbatch submit.sh\n- Slurm responds with: Submitted batch job 123456\n\nBreakdown of key directives (what and why)\n- --partition: Queue/resource pool to run on. Values are site-specific; check your cluster docs.\n- --nodes, --ntasks, --cpus-per-task: Core parallelism knobs.\n  - nodes: How many machines (nodes) to allocate. Most single-node jobs use 1; distributed jobs may use >1.\n  - ntasks: Number of parallel processes (often MPI ranks). For pure MPI across N nodes, set ntasks to the total ranks.\n  - cpus-per-task: Threads per process (often OpenMP threads or intra-process parallelism). For hybrid MPI+OpenMP, set ntasks to MPI ranks and cpus-per-task to threads per rank.\n  Why: Choosing these correctly matches your program’s parallel model and ensures the scheduler reserves appropriate cores and places tasks correctly.\n- --mem (or --mem-per-cpu): Memory request. Use --mem for total per node or --mem-per-cpu when memory scales with cores. Why: Prevents oversubscription and avoids out-of-memory kills.\n- --time: Upper bound on runtime; jobs are canceled when exceeded. Why: Schedulers rely on this for planning; request a reasonable buffer.\n- --output/--error: Control where logs are written. Why: Makes debugging and results collection straightforward.\n- --gres=gpu:X (optional): Request GPUs when needed. Why: Ensures placement on GPU nodes and enforces exclusivity.\n\nsrun vs sbatch: what’s different\n- sbatch\n  - Purpose: Submit a non-interactive batch job for later execution.\n  - Behavior: Non-blocking; returns a Job ID immediately; output goes to files.\n  - Scope: Reserves resources for the whole job. Within the script, you can run multiple steps.\n  - Typical use: Long or queued workloads, production runs, arrays, jobs that should continue after you log out.\n- srun\n  - Purpose: Launch tasks on allocated resources (creates a job step) or request an allocation and run a command immediately.\n  - Behavior: Blocking by default when used from a terminal; prints output to your terminal (unless redirected).\n  - Scope: Inside an sbatch script, srun starts parallel steps across the allocation with correct CPU/GPU binding.\n  - Typical use: Launching MPI ranks or multi-process steps inside sbatch; quick tests or interactive shells.\n\nConcrete examples\n- Interactive test (srun):\n  - Allocate a short interactive shell (partition/time may vary):\n    srun --pty -p interactive -t 00:30:00 -N 1 -n 1 -c 2 bash\n  - Your terminal is attached to the allocated node until you exit.\n- Quick single command (srun):\n  - srun -N 1 -n 1 -c 1 hostname\n- Batch job (sbatch) for a long run:\n  - Use the provided submit.sh and run: sbatch submit.sh\n  - Output is saved to demo_<JobID>.out and .err.\n- MPI inside sbatch (distributed memory; MPI = Message Passing Interface):\n  - In your script, request total ranks and launch with srun:\n    #SBATCH --nodes=2\n    #SBATCH --ntasks=8\n    srun ./my_mpi_app\n- OpenMP inside sbatch (shared-memory threading; OpenMP = a threading model within a process):\n  - Request threads per task and export OMP_NUM_THREADS:\n    #SBATCH --ntasks=1\n    #SBATCH --cpus-per-task=8\n    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK\n    srun ./my_openmp_app\n\nWhen to use each\n- Use sbatch when:\n  - The job is non-interactive, may wait in queue, or should keep running after you disconnect.\n  - You want logs captured to files and possibly to chain multiple steps.\n  - You need job arrays or reproducible scheduled runs.\n- Use srun when:\n  - You need an interactive shell or a quick, blocking test with immediate feedback.\n  - You are inside an sbatch script and want to launch parallel steps (MPI ranks, multiple tasks) with proper resource binding.\n  - You need to run a one-off command on allocated resources without writing a script.\n\nMonitoring and canceling\n- See queued/running jobs:\n  - squeue -u $USER\n  - squeue -j <JOBID>\n- Historical/accounting info (after completion):\n  - sacct -j <JOBID> --format=JobID,JobName,State,Elapsed,AllocCPUS,MaxRSS,NodeList\n- Live logs:\n  - tail -f demo_<JOBID>.out\n- Cancel a job:\n  - scancel <JOBID>\n\nNotes\n- Partitions, module names, and GPU flags are site-specific; consult your cluster documentation.\n- Inside sbatch, srun is recommended for parallel launches because it cooperates with Slurm to place tasks, bind CPUs/GPUs, and manage job steps correctly.",
  "gold_information": [
    "The sbatch command submits a non-interactive batch script to the cluster scheduler.",
    "The sbatch submission queues the job, returns a job ID immediately, and writes output to files by default.",
    "The default stdout file name includes a token for the job ID.",
    "Use srun inside a batch script to launch tasks with correct CPU and GPU binding.",
    "Use sbatch for queued or long-running jobs that should continue after logout.",
    "Use srun for interactive shells, quick tests, or to run parallel steps within an existing allocation.",
    "Specify a queue with the --partition option in the batch script.",
    "Control parallelism with --nodes, --ntasks, and --cpus-per-task to match the program’s model.",
    "Use --mem for total memory per node or --mem-per-cpu when memory scales with cores.",
    "Set --time to enforce an upper runtime limit; jobs are canceled if the limit is exceeded.",
    "Configure --output and --error to capture logs to specific files.",
    "Request accelerators with --gres=gpu:X when a GPU is needed.",
    "Launch message-passing workloads across nodes with srun inside the allocation.",
    "For threaded workloads, set --cpus-per-task and export a threads variable before launching.",
    "From a terminal, srun blocks until the command finishes and prints output to the terminal unless redirected.",
    "The sbatch command is non-blocking and returns immediately with a job ID.",
    "List queued or running jobs with squeue filtered by user or job ID.",
    "View historical job accounting details with sacct using the job ID.",
    "Follow live logs by tailing the job’s output file.",
    "Cancel a queued or running job with scancel using the job ID.",
    "Cluster partitions, module names, and GPU flags are site-specific and require consulting local documentation.",
    "Within a batch allocation, srun coordinates task placement and resource distribution across nodes."
  ]
}