Modern Hypergraph Partitioning
Balanced hypergraph partitioning is a well-studied, fundamental combinatorial optimization problem with multiple applications in EDA. The objective is to partition vertices of a hypergraph into a specified number of disjoint blocks such that each block has bounded size and the cutsize, i.e., the number of hyperedges spanning multiple blocks, is minimized. In Bustany et al. (2024) we develop the K-way extension of the SpecPart partitioner, K-SpecPart. We are also developing extensions to the TritonPart partitioner, including the following:
2.5D Chiplets Partitioning
There is increasingly widespread adoption of chiplets in industry, and 2.5D interposer-based systems are seen as a cost-viable way to build large systems. As a result, disaggregating a large system into constituent chiplets has become an important issue. We have developed ChipletPart, a cost-driven 2.5D system partitioner that accounts for unique chiplet system constraints including complex cost models, limited reach of standardized inter-chiplet IO transceivers, and assignment of manufacturing technologies to different chiplets in a heterogeneously integrated system. We incorporate reach-aware chiplet floorplanning to satisfy the “reach” constraints imposed by IO cells. A fast annealing-based floorplanner in ChipletPart enables the generation of IO-feasible solutions where conventional partitioners fail. We augment ChipletPart with a genetic algorithm to discover both the number of chiplets and their respective manufacturing technology nodes.
Figure 1 shows the complete framework of ChipletPart, where Core-ChipletPart is the main partitioner. In Core-ChipletPart we do not adopt the widely-used multilevel approach since there are typically only a few hundreds of IP blocks in a block-level SoC netlist. We first compute a large pool of initial partitioning solutions using various graph partitioning tools, and then perform floorplan-aware FM-based refinement. Each potential FM move triggers a call to the chiplet cost model to obtain the cost (or, gain) of an updated partitioning solution. This involves generating a floorplan solution using a fast SA-based reach-aware chiplet floorplanner, and then calculating the cost based on the floorplan solution. We adopt the same K-way FM implementation as TritonPart.
Our partitioning approach is well suited to the nature of the chiplet cost model. It achieves up to 12.1% improvement over state-of-the-art min-cut-based partitioners (hMETIS, TritonPart) that do not guarantee floorplan feasibility. Our heterogeneous technology-aware cost-driven multi-way genetic ChipletPart finds cost reductions of up to 96% in multi-technology scenarios.
Multi-die FPGAs Partitioning
Similar to chiplets, interposer-based multi-die FPGA systems enable the creation of large FPGAs from smaller die, which can improve yield and help fabricate larger-capacity devices. Because TritonPart infrastructure handles multi-dimensional vertex weights (and balance constraints), it is amenable to FPGA-based partitioning needs. Challenges associated with multi-die FPGA partitioning include (but are not limited to):
- reduced connectivity and increased delay between multiple dies;
lack of an “optimal” balance factor since the optimal balance varies between netlists; - impact of physical orientation and shape of partitions, e.g., with respect to fixed I/Os, length of carry chains, etc.;
- heterogeneous and multi-dimensional resources such as LUTs, FFs, carry chains of LUTs (and their FFs), DSPs and RAMs (further, a die cannot be filled to 100% LUT/FF capacity, as it would likely be unpackable and/or have significantly degraded Fmax due to congestion);
- partitioning prior to clustering requires a means to estimate final cluster utilization; and
- lack of available baselines for Quality of Results (QoR) validation, with evaluation methods for partitioning solutions also being unclear.
We are given a primitive netlist and an interposer-based FPGA architecture. The task is to partition the netlist and use the partitioning solution to guide an FPGA implementation tool (such as Quartus or Verilog-to-Routing) to obtain a better quality of results, such as faster placement, improved Fmax and lower critical path delays (CPDs). Our extension of TritonPart’s implementation to handle the multi-die FPGA partitioning problem includes the following enhancements:
- VTR-based solution evaluation: We leverage the open-source academic verilog-to-route (VTR) tool to evaluate a given partitioning solution by (i) translating the partitioning solution to region (placement) constraints; (ii) passing the region constraints to VTR; and (iii) running the VTR flow to measure Fmax, routed wirelength and the critical path delay. Our overall flow is shown in Figure 2.
- Modeling of interposer-based architecture: To model an interposer-based multi-die FPGA architecture, we leverage available monolithic architectures and split them into multiple dies. Out of all the interconnects that cross the dies, we remove a certain percentage and increase the delays of those that remain.
- Partitioning engine enhancements: We remove the imbalance factor from partitioning and instead introduce two types of “resource violation penalties”:
- Cutsize penalty is incurred when the cutsize exceeds the available number of inter-die interconnects; and
- Area penalty is incurred when the area of a partition exceeds the available amount of resources in a die.
The new penalties are integrated directly into the TritonPart cost function. Additionally, we use the multi-dimensional vertex features of TritonPart to handle the heterogeneous nature of FPGAs and use grouping constraints to handle macros and carry chains in the design. Ongoing efforts seek to develop slack budgeting methodologies and better interaction with the OpenSTA timer.
Team Members
Collaborators
Yiannis Koutis2
Ismail Bustany3
1. UC San Diego
2. New Jersey Institute of Technology
3. Advanced Micro Devices