# JOINTLY OPTIMIZING WIRELENGTH AND THERMAL FIELDS FOR CHIP PLACEMENT

Anonymous authors

Paper under double-blind review

### ABSTRACT

Macro placement is a crucial and complex issue in chip design. In recent studies, reinforcement learning (RL) has demonstrated outstanding performance in optimizing chip wirelength, but this leads to thermally inefficient design. Additionally, due to the specialized expertise necessary for creating chip benchmarks and the constraints imposed by confidentiality agreements, there exists a scarcity of publicly available chip thermal placement benchmarks. This work introduces a reinforcement learning-based thermal placement model that can optimize both wirelength and max temperatures. We also strictly followed the chip design process and established a macro thermal placement benchmark. This significantly reduces the entry barriers for researchers, facilitating benchmarking and result replication. Compared to other models, our model notably diminishes the chip's max temperature of the chip while slightly extending wirelength on smaller-scale chips. On large-scale chips, our model can further reduce wirelength while decreasing the chip's max temperature. Our code and benchmarks will be open sourced soon.

025 026

027

004

010 011

012

013

014

015

016

017

018

019

021

023

### 1 INTRODUCTION

With the development of large-scale integrated circuits (IC), placement is a crucial task that directly affects chip performance, such as speed and energy costGarg & Shukla (2016). In the placement task, macros (more than 100) and standard cells (more than 10k) are placed in appropriate locations to meet the design metrics such as wirelength, routability, timing, power, max temperature, and manufacturabilityQiu et al. (2023). As chip sizes continue to increase, manual design struggles to meet various design metrics simultaneously. Therefore, finding an automatic and efficient chip placement method becomes crucial.

Recent chip placement approachesChen et al. (2008); Lu et al. (2014); Lin et al. (2019); Cheng et al. (2018); Viswanathan et al. (2007) have shown significant advantages in wirelength optimization. However, As shown in Figure 1(d), shorter wirelength often results in an aggregation of components that increases max temperatures of chip then impacting chip performance. Max temperatures and temperature gradients have a definite effect on the reliability and performance of integrated circuits. For example, large temperature gradients increase clock skew in clock distribution networksXia et al. (2017), and device overheating due thermal runaway can occur in semiconductor devices due to the positive feedback between high temperature and increasing leakage currentMolter et al. (2023); Li et al. (2005).

044 Recent methods rarely prioritize heat optimization as a primary objective in chip placement. Existing approaches that simultaneously optimize wirelength and max temperature mainly focus on chiplet 046 placement, and relatively few research focuses on macro placement. These works primarily suffer 047 from the following three shortcomings: First, Two-Stage Optimization Leads to Suboptimal. As 048 showen in Figure 1(a) , almost all thermal placement methods Ma et al. (2021); Chiou et al. (2023) divide the optimization process into two steps: first optimizing wirelength, and then optimizing max temperatures. This approach significantly restricts the model's exploration space during heat opti-051 mization, as component placements are nearly fixed, leading to local optima. Second, These heat optimization methods are less effective for scenarios with high-density components. Thermal 052 placement methods typically use techniques like translation, rotation, and swapping after placing all components on the chip canvas to reduce max temperature. However, as shown in Figure 1(b),

091

092

094

096

098

054 when component density is too high, it greatly limits the space available for translation and rotation, 055 significantly impacting optimization results. Third, there is a lack of macro thermal placement benchmarks. Thermal placement benchmarks are primarily focused on chiplet placement, with 057 relatively fewer benchmarks addressing macro thermal placement. Generating these benchmarks requires running Electronic Design Automation (EDA) flows, which necessitate expertise in chip design-making data generation costly Jiang et al. (2024). Additionally, non-disclosure agreements (NDAs) for manufacturing techniques and EDA tools limit the release of raw data. As a result, most 060 studies are only able to create small internal datasets or set the power density to a random value 061 between  $10^5$  and  $10^7 W/m^2$  for technology validation, thus making benchmarks and reproducing 062 results highly challenging. 063



Figure 1: Figure (a), wirelength and heat optimization process. Previous methods refers to Ma et al. (2021); Chiou et al. (2023). Our method combines wirelength optimization and heat optimization into one step, optimizing thermal performance from the beginning of the placement.
Figure (b), heat optimization strategies. Previous methods have a small feasible solution space when components are densely, and cannot significantly change the placement of components. Our model can generate a variety of placements during the heat optimization process. Table(c) ,
Commonly used macro placement benchmark. Commonly used macro placement benchmarks lack power information for each component. We built a macro thermal placement benchmark that includes detailed power information for each component through logic synthesis. Figure(d), placement result visualization. Our model significantly reduces the max temperature and HPWL in large-scale chips.

100 To address these issues, we first developed a macro thermal placement model and then established 101 an open-source benchmark for macro thermal placement. We utilize a reinforcement learning-based 102 approach to simultaneously optimize wirelength and max temperature. By processing the wire-103 maskLai et al. (2022) and the max temperature under the current placement through a convolutional 104 neural network, we select the most optimal placement positions for overall performance, minimiz-105 ing both wirelength and max temperature, rather than optimizing them separately. In constructing the benchmark, we strictly followed the standard chip design process, performing logic synthesis on 106 15 real open-source chips to obtain detailed component information, rather than assigning random 107 power values to each macro. Compared to existing benchmarks, our benchmark increases the theoretical power consumption pe macro while preserving the gate-level netlist through logic synthesis.
 The main contributions of this paper are as follows:

- We developed a reinforcement learning model that optimizes max temperature and wirelength simultaneously, taking into account constraints on both wirelength and max temperature to achieve a global optimal solution for wirelength and max temperature. This approach avoids the local optima caused by separately optimizing wirelength and max temperature.
- Through a comprehensive EDA process, we created the first open-source macro thermal placement benchmark, which provides a reliable baseline for comparison and replication, significantly lowering the barrier to entry for chip thermal placement.
  - On the 15 benchmarks we have established, our model achieves the lowest placement temperature, demonstrating the effectiveness of our model.
- 121 122 123

124

126

111

112

113

114

115 116

117

118

119

120

2 RELATED WORK

### 125 2.1 Chip Placement Methods

There are two main optimization indicators commonly used in chip placement: using wirelength as
an optimization metricChen et al. (2008); Lu et al. (2014); Lin et al. (2019); Cheng et al. (2018);
Viswanathan et al. (2007); Kim et al. (2012); Kim & Markov (2012); Spindler et al. (2008); Chan
et al. (2006); Lai et al. (2022); Shi et al. (2024); Mirhoseini et al. (2021) and optimizing both max
temperature and wirelength simultaneouslyMa et al. (2021); Chiou et al. (2023).

Using wirelength as an optimization metric. The optimization of wirelength is primarily catego-132 rized into classic methods (e.g., analytical methods)Chen et al. (2008); Lu et al. (2014); Lin et al. 133 (2019); Cheng et al. (2018); Viswanathan et al. (2007); Kim et al. (2012); Kim & Markov (2012); 134 Spindler et al. (2008); Chan et al. (2006); Shi et al. (2024) and learning-based methods (e.g., RL)Lai 135 et al. (2022); Mirhoseini et al. (2021). Those methods typically utilize min Wirelength(s, H)136 as the objective function, with some incorporating additional objectives such as congestion and 137 overlap. For example, DREAMPlaceLin et al. (2019) utilizes analytical methods to optimize wire-138 length and density, convert the placement task into min  $WA(s, H) + \lambda Density(s, H)$ . WA de-139 notes the smoothed weighted average wirelength used to approximate Half Perimeter Wire Length 140 (HPWL), Density denotes the differentiable density measure used to penalize overlap, and  $\lambda$  is 141 the trade-off factor. The problem is then solved numerically using classical mathematical optimiza-142 tion techniques, such as gradient descent, to rapidly generate a high-quality complete placement. MaskPlaceLai et al. (2022) utilizes reinforcement learning to optimize wirelength. Reinforcement 143 learning views the placement process as a Markov Decision Process (MDP). In each step t, a com-144 ponent is placed on the chip canvas. Set the reward as  $r_t = HPWL_{t-1} - HPWL_t$  and train the 145 model to maximize the reward in order to achieve the min Wirelength. Those methods can achieve 146 placement results that surpass humans in terms of wirelength optimization, but they overlook the 147 impact of max temperatures on the chip, leading to a reduction in the chip's thermal performance. 148

**Optimizing both thermal and wirelength.** Some methods consider the chip's thermal distri-149 bution during placement and optimize wirelength and max temperature simultaneously. How-150 ever, these methods primarily concentrate on 2.5D chiplet placement and relatively less on macro 151 placement. TAP-2.5DMa et al. (2021) employs simulated annealing to discover a placement re-152 sult with improved wirelength and max temperature by moving the chiplet through translation 153 and rotation from the initial layout generated by Chen & Chang (2006). SA cost function is 154  $cost = \alpha T + (1 - \alpha)W$ , where T and W are temperature and wirelength respectively.  $\alpha$  is the balance 155 coefficient.  $\alpha = 0$  when  $T \leq 85$ . Then,  $\alpha$  incrementally rises with T until it peaks at a maximum 156 value of 0.9. This means that heat optimization is withheld during wirelength optimization until the 157 temperature surpasses 85 degrees Celsius, at which point heat optimization is initiated. Chiou et al. 158 (2023) utilizes an SP-based tree to achieve wirelength-focused placement. After the placement is completed, perform post-placement with thermal considerations. However, these approaches prior-159 itize wirelength optimization initially and subsequently address heat optimization once wirelength 160 optimization reaches a certain level. If the interplay between the wirelength and thermal parame-161 ters is not considered initially, the system will converge to a locally optimal solution. Furthermore,

162
 163
 164
 164
 165
 164
 166
 166
 167
 168
 168
 169
 169
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160
 160

165 166

167

### 2.2 Chip Dataset and Benchmark

168 Datasets and benchmarks are crucial for the development of research. This facilitates benchmarking 169 and result reproducibility, while also reducing the barriers to entry for new researchers. Researchers 170 have developed various datasets tailored for specific tasks to foster advancements in chip design. In the prediction task, CircuitNet and CircuitNet 2.0Jiang et al. (2024); Chai et al. (2023) collected 171 over 10,000 data points from CPU, GPU, and AI chips, and conducted multi-model prediction tasks 172 such as timing, routing feasibility, and IR-drop prediction. In macro placement tasks, generally used 173 public benchmarks include ISPDNam et al. (2005; 2006), IBM benchmarkAlpert (1998), and the 174 Ariane RISC-V CPU designZaruba & Benini (2019). These benchmarks mainly include the length 175 and width and pin position (the components are interconnected through pins.) of each component, 176 and the topological relationships (Netlist) of components. Due to confidentiality reasons, those 177 benchmarks do not disclose specific power of individual components. Due to the absence of specific 178 power for each component, thermal field for macro placement cannot be conducted. While the power 179 density of each macro can be constrained within the range of  $10^5 - 10^7 W/m^2$  based on statistical 180 principlesCong et al. (2004), the power density of components varies across different manufacturing 181 processes and component librariesBorkar (1999); Wrzecionko et al. (2009); Ku et al. (2007); Hanson et al. (2003); Li et al. (2005); Kim et al. (2005). Additionally, the power of each macro is also 182 influenced by voltage fluctuations and frequencies. In other words, the power of macros is influenced 183 by various factors and requires precise calculations to obtain more accurate results. 184

185

### **3** PRELIMINARY AND NOTATION

187 188 189

190

### 3.1 MACRO PLACEMENT

191 Macro placement is an integral part of placement. The macro placement task can be viewed as an 192 optimization problem. The objective function is minimized by adjusting the position of the macro 193 while satisfying certain constraints. In macro placement, common optimization metrics mainly 194 include wirelength and max temperature, aiming for the chip to have the shortest possible wire-195 length and the lowest temperature. Consider with the challenges associated with directly calculating 196 wirelengths, recent work primarily relies on Half Perimeter WireLength (HPWL) to approximate wirelength which is computed by accumulating all the half-perimeters of bounding rectangle of all 197 the nets from the chip netlist. As the chip power is predominantly generated in macros, we calculate 198 the power density of each macro based on its power and area. Subsequently, we obtain the chip's 199 thermal field through finite element analysis, extracting the max temperature from it. 200

Placement constraints mainly include: overlap, which avoid overlapping between each macro, In
 the chip canvas, each position can be occupied by at most one macro. congestion, the congestion of
 each position's routing in the chip canvas should be less than a fixed threshold. Therefore, the entire
 placement optimization problem can be formulated as:

205 206 207

208 209

$$\min_{x,y} HPWL(x,y) + \alpha MaxT(x,y)$$
(1)

$$s.t.Overlap(x, y, w, h) = 0$$
<sup>(2)</sup>

$$Congestion(x, y, w, h) \le C \tag{3}$$

210 211 212

213 HPWL(.) means Half Perimeter WireLength, and MaxT(.) represents the highest temperature of 214 the entire chip, Overlap(.) and Congestion(.) represent the methods for calculating overlap and 215 congestion, respectively.  $(x, y) = (x_1, y_1, x_2, y_2...x_n, y_n)$  represent the placement position of  $i^{th}$ 216 macro.



Figure 2: The overall structure of our model. Black arrows represent the forward propagation process, while red arrows represent the backward propagation process.

#### THERMALLY DRIVEN MACRO PLACEMENT MODEL

By placing a macro into the chip canvas each step, we transform chip placement into a Markov decision process (MDP)Kaelbling et al. (1996). The overall architecture of the model, as shown in Figure 2, consists of a policy network  $\pi_{\theta}(a_t|s_t)$  and a value network  $V_{\phi}(s_t)$ . The policy network adopts an encoder-decoder structure, using the previous state  $s_t$  as input to select an action  $a_t$  as output. Black arrows represent the forward propagation process, while red arrows represent the backward propagation process. The reward calculator computes the reward by weighting the increments of post-placement HPWL and max temperature.

### 4.1 HEAT MASK

In this section, we complete thermal simulation of the entire chip through finite element analysis (FEA) methods. In the FEA process, accurately describing the boundary effects of a heat source on thermal load variations is crucial. Geometric mapping refers to the formulation of how to construct mechanical analysis models from level-set-based geometric structuresChen et al. (2023); Guo et al. (2014); Kang & Wang (2013); Zhang et al. (2015); Kreisselmeier & Steinhauser (1980); Wang et al. (2018); Torii et al. (2022). We use density-based mapping to maintain a certain level of accuracy while considering the avoidance of additional costs and extra implementation work associated with grid re-partitioning. We use the efficeint algorithm based on Green functionLiu et al. (2013) to map 3d thermal field to 2d. The Heat equation can be expressed as: 

$$\sigma \frac{\partial T(r,t)}{\partial t} = \nabla \cdot (\kappa \nabla T(r,t)) + p(r,t)r \in D$$
(4)

In our model, we assumes all four sides of the chip are insulated from the ambient environment. The heat flow towards to x- and y- direction walls is zero. Heat generated from components on chip can be dissipated toward to heat sinks at the top or PCB at the bottom. The boundary condition of our system can be expressed as follows:

$$\frac{\partial T(r,t)}{\partial x}|_{x=0,L_x} = \frac{\partial T(r,t)}{\partial y}|_{y=0,L_y} = 0$$
(5)

$$\partial x = \partial y = \partial y$$

$$\kappa \frac{\partial I(r,t)}{\partial z}|_{z=-L_z} = h_p T(x, y, -L_z, t) \tag{6}$$

269 
$$\kappa \frac{\partial T(r,t)}{\partial z}|_{z=0} = -h_s T(x,y,0,t) \tag{7}$$

 $h_p$  denotes primary heat flow to the heat sink and  $h_s$  denotes secondary heat flow to the PCB respectively. In our model, the chip is divided into M by N bins. More bins imply more detailed temperature distributions. Accord to the Laplace's equation of heat equation, the general solution without time with respect to the boundary condition in equation 5, 6, 7 has been derived then implemented in integral function to approximated the temperature distribution of bins in chips. Solution in z-direction described in is independent to solution in x- and y-directions. Dimensionality reduction in the z-axis directions significantly faciliates heat analysis on temperature distribution on chip.

277 We introduce 3d finite element analysis (FEA) method to analysize the thermal distribution on chips. 278 A level set function (LSF)  $\phi(x)$  is introduced to describe the shape of components. In our chip 279 designs, each macro is approximated as rectangle. Which LSF can be constructed in unified form 280 as:

281

282 283

290 291

292 293

294

295

296 297 298

299 300

301

302 303 304

305

310

312

$$\phi(x, y; x_0, y_0) = 1 - \left(\frac{x - x_0}{a}\right)^m - \left(\frac{y - y_0}{b}\right)^m \tag{8}$$

Where m is integer number which controls the components shape; a and b are semi-major length and semi-minor length of component respectively;  $(x_0, y_0)$  corresponds to the geometric center coordinate of component. To account for a unified FEA process without remeshing grids after each movement of macros, the geometric description function of our macros is projected onto a density field with the Heaviside function:

$$H(x) = \begin{cases} 1, x > 0\\ 0, x \le 0 \end{cases}$$
(9)

The region that Heaviside function equal 1 represents the occupancy of components, where the heat source load is distributed. The heat source intensity function  $(\text{HSIF})\Phi(x)$  in whole design can be expressed as:

$$\Phi(x) = \sum_{c=1}^{N_c} Q_c(x) \cdot H(\phi_c(x))$$
(10)

Where  $Q_c(x)$  is the intensity distribution function of the ith heat source. The structured quadrilateral finite elements are introduced in our FEA design. The element equilibrium equation is

$$\mathbf{K}^{e}\mathbf{T}^{e} = \mathbf{P}^{e} \tag{11}$$

Where  $\mathbf{K}^{e}$  is the element heat transfer matrix, is the elemental nodal  $\mathbf{T}^{e}$  temperature vector,  $\mathbf{P}^{e}$  is the equivalent elemental nodal thermal load vector, respectively. The entire chip's thermal field is obtained through finite element analysis, and this thermal field is used as a heat mask input for the model.

### 311 4.2 REINFORCEMENT LEARNING

We drew inspiration from the network structure of MaskPlaceLai et al. (2022) and used the Heat Mask along with the Position Mask, Wire Mask, and View Mask as inputs to the network. We utilized the commonly used PPOSchulman et al. (2017) framework to train the policy  $\pi_{\theta}(a_t|s_t)$ . We combine the HPWL and the weighted max temperature of the entire chip as the reward. Specifically, we use the increase in wirelength and temperature after placing each component as negative rewards to minimize wirelength and highest temperature. The reward calculation method is as follows:

320 321

$$r_t = (HPWL_{t-1} - HPWL_t) + \alpha(Tmax_{t-1} - Tmax_t)$$
(12)

322  $HPWL_{t-1}, HPWL_t, Tmax_{t-1}, Tmax_t$  represent the wirelength and highest temperature at time 323 t-1 and time t, respectively.  $\alpha$  is a hyperparameter used to balance the magnitudes of the two parameters.

## <sup>324</sup> 5 THERMALLY DRIVEN MACRO PLACEMENT BENCHMARK

325 326 327

328

330

331

332

333

334

335

336

337

338

339

340 341

342

343

345

346

347

348

349

350

351

352

353

354 355

356

357

359

360

Chip design is a complicated process, primarily divided into two stages: front-end design and backend design. In front-end chip design, the main focus is on describing the functionality of the chip by using hardware description codes like Verilog to illustrate its logical functions. Recently, some work has used large language models (LLM) to generate Verilog codeLai et al. (2024); Alsaqer et al. (2024); Chang et al. (2023), significantly accelerating the front-end design process of chip design. Following that, the Verilog codes link to top module hierarchically in design are analyzed then mapped to gate-level descriptions through logic synthesis with respective to specified design constraints. After logic synthesis, the connection relationships between macros and standard cells are established. There are differences in the area and timing parameters of standard cells and macros under different technology library. After logic synthesis, the chip design progresses into the backend design stage, where the focus is primarily on completing the physical design of the chip. This includes tasks such as floorplanning, placement and routing. Placement and routing are key steps in chip back-end design. During this stage, optimizing the performance, power, and area (PPA) of the chip through placement and routing optimizations is crucial, much of the research has been focused on fundamental trade-offs made in semiconductor design for PPA. In this section, we construct



Figure 3: Generation process of the thermally driven macro placement benchmark.

361 benchmarks for chip thermal placement as shown in Figure 3. We use RISC-V SoC RTL design 362 tools in chipyardAmid et al. (2020), an opensource framework for SoC agile development. All the 363 designs are the variants of RISC-V SoC with the core (s) being RocketAsanovic et al. (2016) or (and) 364 BoomCelio et al. (2015) as well as Shuttle. The benchmark designs are generated from chipyard implemented with Verilog HDL. We apply SRAM compiler to map the cache modules consist with 366 sequential cells in Verilog files to vendor SRAMs. The Verilog files with SRAM modules are logical 367 synthesized using Synopsys Design Compiler to get the gate-level netlists as well as the power and 368 area of components. The SMIC 55-nm technology node is adopted to memory compiler and standard 369 cells during the logical synthesis progress of our research. We obtain 15 benchmarks in total and the detailed information of each benchmark is listed in Table 1. 370

The netlist generated by logical synthesis is used to represent the logical relationship among components in integrated circuits. The number of pins for each cell is determined by the finout and finin of cells or macros. We distribute the pins location on macro boundary randomly. A netlist in a design can be defined as H(V, E), in which V represents to the vertices of components in hypergraph H. The nets correspondes to the hyperedges E. The power data generated by logical synthesis is used to represent the heat power of cells and macros. The heat power for each component consists with static and dynamic power. We introduce the dynamical power in macro placement tasks since the dynamic power is orders of magnitude larger than the static power of macros.

| Benchmark             | Macros | Std cells | Nets    |
|-----------------------|--------|-----------|---------|
| Rocket                | 80     | 203034    | 302279  |
| HwachaRocket          | 162    | 809553    | 1090468 |
| Sha3Rocket            | 80     | 230981    | 353109  |
| LargeBoomAndRocket    | 138    | 1191052   | 1581156 |
| SmallBoomAndRocket    | 90     | 571904    | 800541  |
| DualBoomAndDualRocket | 260    | 2287113   | 3006245 |
| DualBoomAndRocket     | 180    | 2169563   | 2836397 |
| GemminiRocket         | 392    | 1145387   | 1700468 |
| MempressRocket        | 824    | 697697    | 1133408 |
| FPGemminiRocket       | 280    | 1262227   | 1752083 |
| GemminiShuttle        | 289    | 1204401   | 1756288 |
| LeanGemminiRocket     | 392    | 852394    | 1277971 |
| QuadRocketSbusRingNoC | 552    | 888764    | 1327155 |
| SbusMeshNoC           | 2184   | 1978414   | 3229628 |
| SbusRingNoC           | 936    | 1322864   | 1998056 |

Table 1: The detailed information about the benchmark.

### 6 EXPERIMENTS

We test our method on benchmark in Table 1 and compared it with MaskplaceLai et al. (2022). We set trade-off coefficient of wirelength and max temperature as  $\alpha$ =1 and 0 respectively. The other hyperparameters set same with previous work. We notice that the number of macros in our benchmarks vary from 80 to 2184. For the SbusMeshNoC benchmark which has over 2000 macros and over 3000000 nets, a single RL epoch by step-by-step placement costs more than an hour. Thus, for large benchmarks (contains over 300 macros) we select 256 macros in train process then generate all macros finally.



Figure 4: Results of the placement for the first 256 macro. a represents the results without heat optimization, while b represents the results with heat optimization. The red box represents the space not utilized by the model without heat optimization, while the green box indicates that our model has reserved more ample space for the placement of subsequent macros.

|                    | Benchmark             | Methods               | HPWL $(10^5)$ | Max temperature (K) |
|--------------------|-----------------------|-----------------------|---------------|---------------------|
|                    |                       | maskplace             | 5.90          | 393.57              |
| Sha3Rocket         | ours ( $\alpha$ =0)   | 4.97                  | 397.32        |                     |
|                    | ours ( $\alpha$ =1)   | 5.95                  | 388.93        |                     |
|                    |                       | maskplace             | 5.73          | 399.84              |
|                    | Rocket                | ours $(\alpha = 0)$   | 5.09          | 395.79              |
|                    | ours ( $\alpha$ =1)   | 5.58                  | 392.06        |                     |
| SmallBoomAndRocket | maskplace             | 5.56                  | 422.55        |                     |
|                    | ours $(\alpha = 0)$   | 5.28                  | 423.54        |                     |
|                    | ours ( $\alpha$ =1)   | 6.97                  | 413.91        |                     |
|                    |                       | maskplace             | 1.17          | 446.23              |
|                    | LargeBoomAndRocket    | ours $(\alpha=0)$     | 1.06          | 443.60              |
|                    | 6                     | ours ( $\alpha = 1$ ) | 1.50          | 435.83              |
|                    | maskplace             | 1.55                  | 452.64        |                     |
|                    | HwachaRocket          | ours ( $\alpha = 0$ ) | 1.40          | 454.96              |
|                    | ours ( $\alpha$ =1)   | 2.05                  | 429.20        |                     |
|                    | maskplace             | 1.70                  | 479.73        |                     |
|                    | DualBoomAndRocket     | ours $(\alpha=0)$     | 1.49          | 472.55              |
|                    |                       | ours ( $\alpha$ =1)   | 1.87          | 466.87              |
|                    |                       | maskplace             | 2.22          | 487.77              |
|                    | DualBoomAndDualRocket | ours $(\alpha=0)$     | 2.19          | 495.62              |
|                    |                       | ours $(\alpha = 1)$   | 2.89          | 484.05              |
|                    | maskplace             | 3.92                  | 348.68        |                     |
|                    | FPGemminiRocket       | ours ( $\alpha = 0$ ) | 4.04          | 349.00              |
|                    |                       | ours ( $\alpha$ =1)   | 4.40          | 347.37              |
|                    |                       | maskplace             | 8.21          | 354.01              |
|                    | GemminiShuttle        | ours ( $\alpha = 0$ ) | 7.81          | 356.69              |
|                    | ours ( $\alpha$ =1)   | 8.19                  | 345.93        |                     |
|                    | maskplace             | 24.68                 | 342.45        |                     |
|                    | QuadRocketSbusRingNoC | ours ( $\alpha = 0$ ) | 23.02         | 342.17              |
| <b>C</b>           | ours ( $\alpha = 1$ ) | 22.62                 | 341.28        |                     |
| SbusMeshNoC        | maskplace             | 15.51                 | 401.98        |                     |
|                    | ours $(\alpha=0)$     | 17.15                 | 386.39        |                     |
|                    | ours $(\alpha = 1)$   | 15.97                 | 383.26        |                     |
|                    | maskplace             | 27.04                 | 364.77        |                     |
|                    | SbusRingNoC           | ours ( $\alpha = 0$ ) | 26.53         | 359.68              |
| 20020000           | ours ( $\alpha$ =1)   | 25.91                 | 360.75        |                     |
|                    |                       | maskplace             | 132.53        | 324.24              |
| MempressRocket     | ours ( $\alpha$ =0)   | 142.83                | 328.94        |                     |
|                    | ours ( $\alpha$ =1)   | 128.93                | 328.42        |                     |

Table 2: Comparison of HPWL  $(10^5)$  and max temperature.

432

472

473

474 For wirelength results, the most of our results shows longer wirelength indicates the balance between 475 wirelength and thermal properties of chip. However, we notice as chip scale increases, the differ-476 ence of wirelength between our method and other method decreases. For QuadRocketSbusRing-NoC, SbusRingNoC and MempressRocket. the wirelength for our method is lower than previous 477 methods. Figure 1(d) shows the temperature distribution as well as placement result in SbusMesh-478 NoC benchmark, We observe that, due to the heat optimization in the reward function, macros are 479 distributed more evenly across the canvas. 480

481 The results of placing the first 256 macros in our model are shown in Figure 4. We notice that 482 compared to models without heat optimization, our model has a larger internal space and higher space utilization rate, which is more conducive to the placement of subsequent modules. Due to 483 entropic order for macros introduced by "repulsive force" equivalent to heat optimization reward 484 function in our methods, the space between large macros has large and uniform size, small macros 485 within different modules can be placed in these free space effectively. This indicates that performing



Figure 5: trade-off between wirelength and max temperature.

heat optimization at the initial stages in large-scale chip placement tasks has a positive impact onreducing max temperature and wirelength.

Trade-off results. In fact, the macro placement in benchmark is associated with trade-off coefficient 504  $\alpha$  between wirelength and max temperature. Accord to scatter plot in Figure 5 As  $\alpha$  increases, the 505 temperature decreases significantly with wirelength increases slightly which can be attributed to the 506 heat optimization for placement. However, as trade-off coefficient tends to 1, the result of wirelength 507 and max temperature shows chaotic. The chaotic of trend might be associated with the expansion 508 of configuration for macro in phase space. We should notice that lower wirelength indicates the 509 aggregation between macros. Heat optimization tends to separate the macros in whole canvas to deminish hot spot. The configuration of macro placement increases massively indicates we need 510 more epoch in train process to explore the phase space of macros. To avoid the chaotic introduced 511 by expansion of configuration of macro, we select  $\alpha=1$  as trade-off coefficient. 512

513

486

487

488 489

490 491

492

493 494 495

496

497 498

499 500

### 7 CONCLUSION

514 515

526 527

516 In this paper, we developed a reinforcement learning-based macro placement model that optimizes both wirelength and thermal field, thus achieving a balance between wirelength and max temper-517 ature. Furthermore, we established 15 open-source macro thermal placement benchmarks through 518 a comprehensive EDA process. We obtained gate-level netlists and detailed power information for 519 each macro through logic synthesis. From the experiments, it is evident that our model can re-520 duce the chip's max temperature while slightly increasing the wirelength on smaller-scale chips. 521 On larger-scale chips, our model disperses the initial component placement through heat rewards, 522 providing ample space for subsequent macros and reducing wirelength to a certain extent. This also 523 demonstrates the importance of early heat optimization in large-scale chip placement. We aim for 524 our benchmark to promote research in chip thermal placement, thereby enabling chips to achieve 525 optimal performance.

# 527 REFERENCES

- 529 Charles J Alpert. The ispd98 circuit benchmark suite. In *Proceedings of the 1998 international* 530 *symposium on Physical design*, pp. 80–85, 1998.
- Shadan Alsaqer, Sarah Alajmi, Imtiaz Ahmad, and Mohammad Alfailakawi. The potential of llms in hardware design. *Journal of Engineering Research*, 2024.
- Alon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, Harrison Liew,
   Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, et al. Chipyard: Integrated design,
   simulation, and implementation framework for custom socs. *IEEE Micro*, 40(4):10–21, 2020.
- Krste Asanovic, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Biancolin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraelevitz, et al. The rocket chip generator. *EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2016-17*, 4: 6–2, 2016.

| 540<br>541 | Shekhar Borkar. Design challenges of technology scaling. IEEE micro, 19(4):23-29, 1999.                                                                                                             |
|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 542        | Christopher Celio, David A Patterson, and Krste Asanovic. The berkeley out-of-order machine                                                                                                         |
| 543        | (boom): An industry-competitive, synthesizable, parameterized risc-v processor. <i>EECS Depart</i> -                                                                                                |
| 544        | ment, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-167, 2015.                                                                                                                       |
| 545        |                                                                                                                                                                                                     |
| 546        | Zhuomin Chai, Yuxiang Zhao, Wei Liu, Yibo Lin, Runsheng Wang, and Ru Huang. Circuitnet:                                                                                                             |
| 547        | An open-source dataset for machine learning in vlsi cad applications with improved domain-<br>specific evaluation metric and learning strategies. <i>IEEE Transactions on Computer-Aided Design</i> |
| 548        | of Integrated Circuits and Systems, 42(12):5034–5047, 2023.                                                                                                                                         |
| 549        |                                                                                                                                                                                                     |
| 550        | Tony F Chan, Jason Cong, Joseph R Shinnerl, Kenton Sze, and Min Xie. mpl6: Enhanced multilevel mixed-size placement. In <i>Proceedings of the 2006 international symposium on Physical design</i> , |
| 551        | pp. 212–214, 2006.                                                                                                                                                                                  |
| 552        | pp. 212–214, 2000.                                                                                                                                                                                  |
| 553        | Kaiyan Chang, Ying Wang, Haimeng Ren, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li,                                                                                                            |
| 554        | and Xiaowei Li. Chipgpt: How far are we from natural language hardware design. arXiv preprint                                                                                                       |
| 555        | arXiv:2305.14019, 2023.                                                                                                                                                                             |
| 556        |                                                                                                                                                                                                     |
| 557<br>558 | Tung-Chieh Chen and Yao-Wen Chang. Modern floorplanning based on b/sup*/-tree and fast sim-<br>ulated annealing. <i>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys-</i>  |
| 559        | tems, 25(4):637–650, 2006.                                                                                                                                                                          |
| 560        |                                                                                                                                                                                                     |
| 561        | Tung-Chieh Chen, Zhe-Wei Jiang, Tien-Chang Hsu, Hsin-Chen Chen, and Yao-Wen Chang. Ntu-                                                                                                             |
| 562        | place3: An analytical placer for large-scale mixed-size designs with preplaced blocks and density                                                                                                   |
| 563        | constraints. <i>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems</i> , 27(7):1228–1240, 2008.                                                                          |
| 564        | 27(7).1226-1240, 2008.                                                                                                                                                                              |
| 565        | Xianqi Chen, Wen Yao, Weien Zhou, Zeyu Zhang, and Yu Li. A general differentiable layout                                                                                                            |
| 566        | optimization framework for heat transfer problems. International Journal of Heat and Mass                                                                                                           |
| 567        | Transfer, 211:124205, 2023.                                                                                                                                                                         |
| 568        | Chung-Kuan Cheng, Andrew B Kahng, Ilgweon Kang, and Lutong Wang. Replace: Advancing                                                                                                                 |
| 569        | solution quality and routability validation in global placement. <i>IEEE Transactions on Computer</i> -                                                                                             |
| 570        | Aided Design of Integrated Circuits and Systems, 38(9):1717–1730, 2018.                                                                                                                             |
| 571        |                                                                                                                                                                                                     |
| 572        | Hong-Wen Chiou, Jia-Hao Jiang, Yu-Teng Chang, Yu-Min Lee, and Chi-Wen Pan. Chiplet place-                                                                                                           |
| 573<br>574 | ment for 2.5 d ic with sequence pair based tree and thermal consideration. In <i>Proceedings of the 28th Asia and South Pacific Design Automation Conference</i> , pp. 7–12, 2023.                  |
| 575        |                                                                                                                                                                                                     |
| 576        | Jason Cong, Jie Wei, and Yan Zhang. A thermal-driven floorplanning algorithm for 3d ics. In                                                                                                         |
|            | IEEE/ACM International Conference on Computer Aided Design, 2004. ICCAD-2004., pp. 306–                                                                                                             |
| 577<br>578 | 313. IEEE, 2004.                                                                                                                                                                                    |
| 579        | Shivani Garg and Neeraj Kr Shukla. A study of floorplanning challenges and analysis of macro                                                                                                        |
| 580        | placement approaches in physical aware synthesis. <i>International Journal of Hybrid Information</i>                                                                                                |
| 581        | Technology, 9(1):279–290, 2016.                                                                                                                                                                     |
|            |                                                                                                                                                                                                     |
| 582<br>583 | Xu Guo, Weisheng Zhang, and Wenliang Zhong. Doing topology optimization explicitly and ge-                                                                                                          |
| 584        | ometrically—a new moving morphable components based framework. Journal of Applied Me-                                                                                                               |
|            | chanics, 81(8):081009, 2014.                                                                                                                                                                        |
| 585        | Heather Hanson, MS Hrishikesh, Vikas Agarwal, Stephen W Keckler, and Doug Burger. Static                                                                                                            |
| 586        | energy reduction techniques for microprocessor caches. <i>IEEE Transactions on Very Large Scale</i>                                                                                                 |
| 587        | Integration (VLSI) Systems, 11(3):303–313, 2003.                                                                                                                                                    |
| 588        |                                                                                                                                                                                                     |
| 589        | Xun Jiang, Yuxiang Zhao, Yibo Lin, Runsheng Wang, Ru Huang, et al. Circuitnet 2.0: An advanced                                                                                                      |
| 590<br>501 | dataset for promoting machine learning innovations in realistic chip design environment. In <i>The</i>                                                                                              |
| 591        | Twelfth International Conference on Learning Representations, 2024.                                                                                                                                 |
| 592<br>593 | Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. <i>Journal of artificial intelligence research</i> , 4:237–285, 1996.                               |

621

623

634

635

636

637

- 594 Zhan Kang and Yiqiang Wang. Integrated topology optimization with embedded movable holes 595 based on combined description by material density and level sets. Computer methods in applied 596 mechanics and engineering, 255:1-13, 2013. 597
- Myung-Chul Kim and Igor L Markov. Complx: A competitive primal-dual lagrange optimization 598 for global placement. In Proceedings of the 49th Annual Design Automation Conference, pp. 747-752, 2012. 600
- 601 Myung-Chul Kim, Natarajan Viswanathan, Charles J Alpert, Igor L Markov, and Shyam Ramji. 602 Maple: Multilevel adaptive placement for mixed-size designs. In Proceedings of the 2012 ACM 603 international symposium on International Symposium on Physical Design, pp. 193–200, 2012.
- Nam Sung Kim, David Blaauw, and Trevor Mudge. Quantitative analysis and optimization tech-605 niques for on-chip cache leakage power. IEEE Transactions on Very Large Scale Integration 606 (VLSI) Systems, 13(10):1147-1156, 2005. 607
- 608 Gerhard Kreisselmeier and Reinhold Steinhauser. Systematic control design by optimizing a vector 609 performance index. In Computer aided design of control systems, pp. 113–117. Elsevier, 1980. 610
- Ja Chun Ku, Serkan Ozdemir, Gokhan Memik, and Yehea Ismail. Thermal management of on-chip 611 caches through power density minimization. IEEE transactions on very large scale integration 612 (VLSI) systems, 15(5):592-604, 2007. 613
- 614 Yao Lai, Yao Mu, and Ping Luo. Maskplace: Fast chip placement via reinforced visual representa-615 tion learning. Advances in Neural Information Processing Systems, 35:24019–24030, 2022. 616
- 617 Yao Lai, Sungyoung Lee, Guojin Chen, Souradip Poddar, Mengkang Hu, David Z Pan, and Ping Luo. Analogcoder: Analog circuit design via training-free code generation. arXiv preprint 618 arXiv:2405.14918, 2024. 619
- 620 Peng Li, Yangdong Deng, and Lawrence T Pileggi. Temperature-dependent optimization of cache leakage power dissipation. In 2005 International Conference on Computer Design, pp. 7–12. 622 IEEE, 2005.
- 624 Yibo Lin, Shounak Dhar, Wuxi Li, Haoxing Ren, Brucek Khailany, and David Z Pan. Dreamplace: Deep learning toolkit-enabled gpu acceleration for modern vlsi placement. In Proceedings of the 625 56th Annual Design Automation Conference 2019, pp. 1–6, 2019. 626
- 627 Sean Shih-Ying Liu, Ren-Guo Luo, Suradeth Aroonsantidecha, Ching-Yu Chin, and Hung-Ming 628 Chen. Fast thermal aware placement with accurate thermal analysis based on green function. 629 IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(6):1404–1415, 2013. 630
- Jingwei Lu, Pengwen Chen, Chin-Chih Chang, Lu Sha, Dennis J-H Huang, Chin-Chi Teng, and 631 Chung-Kuan Cheng. eplace: Electrostatics based placement using nesterov's method. In Pro-632 ceedings of the 51st Annual Design Automation Conference, pp. 1–6, 2014. 633
  - Yenai Ma, Leila Delshadtehrani, Cansu Demirkiran, José L Abellan, and Aiav Joshi. Tap-2.5 d: A thermally-aware chiplet placement methodology for 2.5 d systems. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1246–1251. IEEE, 2021.
- 638 Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, et al. A graph placement methodol-639 ogy for fast chip design. Nature, 594(7862):207-212, 2021. 640
- 641 Michael Molter, Rahul Kumar, Sonja Koller, Osama Wagar Bhatti, Nikita Ambasana, Elvse Rosen-642 baum, and Madhavan Swaminathan. Thermal-aware soc macro placement and multi-chip module 643 design optimization with bayesian optimization. In 2023 IEEE 73rd Electronic Components and 644 Technology Conference (ECTC), pp. 935–942. IEEE, 2023. 645
- Gi-Joon Nam, Charles J Alpert, Paul Villarrubia, Bruce Winter, and Mehmet Yildiz. The ispd2005 646 placement contest and benchmark suite. In Proceedings of the 2005 international symposium on 647 Physical design, pp. 216–220, 2005.

663

667

696 697

699 700

- 648
   649
   650
   650
   Gi-Joon Nam, CJ Aplert, and Paul G Villarrubia. The ispd 2006 placement contest and benchmark suite. In *Slides presented at ISPD'06*, 2006.
- Yihang Qiu, Yan Xing, Xin Zheng, Peng Gao, Shuting Cai, and Xiaoming Xiong. Progress of
   placement optimization for accelerating vlsi physical design. *Electronics*, 12(2):337, 2023.
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. *arXiv preprint arXiv:1707.06347*, 2017.
- Yunqi Shi, Ke Xue, Song Lei, and Chao Qian. Macro placement by wire-mask-guided black-box
   optimization. Advances in Neural Information Processing Systems, 36, 2024.
- Peter Spindler, Ulf Schlichtmann, and Frank M Johannes. Kraftwerk2—a fast force-directed quadratic placement approach using an accurate net model. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 27(8):1398–1411, 2008.
  - AJ Torii, JR de Faria, and AA Novotny. Aggregation and regularization schemes: a probabilistic point of view. *Structural and Multidisciplinary Optimization*, 65(3):76, 2022.
- Natarajan Viswanathan, Min Pan, and Chris Chu. Fastplace 3.0: A fast multilevel quadratic place ment algorithm with placement congestion control. In 2007 Asia and South Pacific Design Au tomation Conference, pp. 135–140. IEEE, 2007.
- Xuan Wang, Kai Long, Van-Nam Hoang, and Ping Hu. An explicit optimization model for integrated layout design of planar multi-component systems using moving morphable bars. *Computer Methods in Applied Mechanics and Engineering*, 342:46–70, 2018.
- Benjamin Wrzecionko, Jürgen Biela, and Johann W Kolar. Sic power semiconductors in hevs:
   Influence of junction temperature on power density, chip utilization and efficiency. In 2009 35th
   Annual Conference of IEEE Industrial Electronics, pp. 3834–3841. IEEE, 2009.
- Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. Aid: A benchmark data set for performance evaluation of aerial scene classification. *IEEE Transactions on Geoscience and Remote Sensing*, 55(7):3965–3981, 2017.
- Florian Zaruba and Luca Benini. The cost of application-class processing: Energy and performance
   analysis of a linux-ready 1.7-ghz 64-bit risc-v core in 22-nm fdsoi technology. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 27(11):2629–2640, 2019.
- Weisheng Zhang, Wenliang Zhong, and Xu Guo. Explicit layout control in optimal design of structural systems with multiple embedding components. *Computer Methods in Applied Mechanics and Engineering*, 290:290–313, 2015.