LLVM 22.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1//===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8/// \file
9/// This file implements the targeting of the RegisterBankInfo class for
10/// AMDGPU.
11///
12/// \par
13///
14/// AMDGPU has unique register bank constraints that require special high level
15/// strategies to deal with. There are two main true physical register banks
16/// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17/// sort of pseudo-register bank needed to represent SGPRs used in a vector
18/// boolean context. There is also the AGPR bank, which is a special purpose
19/// physical register bank present on some subtargets.
20///
21/// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22/// be uniform. It is generally not valid to legalize operands by inserting
23/// copies as on other targets. Operations which require uniform, SGPR operands
24/// generally require scalarization by repeatedly executing the instruction,
25/// activating each set of lanes using a unique set of input values. This is
26/// referred to as a waterfall loop.
27///
28/// \par Booleans
29///
30/// Booleans (s1 values) requires special consideration. A vector compare result
31/// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32/// register. These are represented with the VCC bank. During selection, we need
33/// to be able to unambiguously go back from a register class to a register
34/// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35/// bank, we need to know the use context type. An SGPR s1 value always means a
36/// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37/// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38/// a 32-bit virtual register. Taken together, this means we need to adjust the
39/// type of boolean operations to be regbank legal. All SALU booleans need to be
40/// widened to 32-bits, and all VALU booleans need to be s1 values.
41///
42/// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43/// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44/// bank. A non-boolean source (such as a truncate from a 1-bit load from
45/// memory) will require a copy to the VCC bank which will require clearing the
46/// high bits and inserting a compare.
47///
48/// \par Constant bus restriction
49///
50/// VALU instructions have a limitation known as the constant bus
51/// restriction. Most VALU instructions can use SGPR operands, but may read at
52/// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53/// instructions). This is one unique SGPR, so the same SGPR may be used for
54/// multiple operands. From a register bank perspective, any combination of
55/// operands should be legal as an SGPR, but this is contextually dependent on
56/// the SGPR operands all being the same register. There is therefore optimal to
57/// choose the SGPR with the most uses to minimize the number of copies.
58///
59/// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60/// operation should have its source operands all mapped to VGPRs (except for
61/// VCC), inserting copies from any SGPR operands. This the most trivial legal
62/// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63/// complicated to solve here. Every optimization pattern or instruction
64/// selected to multiple outputs would have to enforce this rule, and there
65/// would be additional complexity in tracking this rule for every G_*
66/// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67/// picking the optimal operand combination from a post-isel optimization pass.
68///
69//===----------------------------------------------------------------------===//
70
72
73#include "AMDGPU.h"
75#include "AMDGPUInstrInfo.h"
76#include "GCNSubtarget.h"
78#include "SIRegisterInfo.h"
84#include "llvm/IR/IntrinsicsAMDGPU.h"
85
86#define GET_TARGET_REGBANK_IMPL
87#include "AMDGPUGenRegisterBank.inc"
88
89// This file will be TableGen'ed at some point.
90#include "AMDGPUGenRegisterBankInfo.def"
91
92using namespace llvm;
93using namespace MIPatternMatch;
94
95namespace {
96
97// Observer to apply a register bank to new registers created by LegalizerHelper.
98class ApplyRegBankMapping final : public GISelChangeObserver {
99private:
101 const AMDGPURegisterBankInfo &RBI;
103 const RegisterBank *NewBank;
105
106public:
107 ApplyRegBankMapping(MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI_,
108 MachineRegisterInfo &MRI_, const RegisterBank *RB)
109 : B(B), RBI(RBI_), MRI(MRI_), NewBank(RB) {
110 assert(!B.isObservingChanges());
111 B.setChangeObserver(*this);
112 }
113
114 ~ApplyRegBankMapping() override {
115 for (MachineInstr *MI : NewInsts)
116 applyBank(*MI);
117
118 B.stopObservingChanges();
119 }
120
121 /// Set any registers that don't have a set register class or bank to SALU.
122 void applyBank(MachineInstr &MI) {
123 const unsigned Opc = MI.getOpcode();
124 if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
125 Opc == AMDGPU::G_SEXT) {
126 // LegalizerHelper wants to use the basic legalization artifacts when
127 // widening etc. We don't handle selection with vcc in artifact sources,
128 // so we need to use a select instead to handle these properly.
129 Register DstReg = MI.getOperand(0).getReg();
130 Register SrcReg = MI.getOperand(1).getReg();
131 const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
132 if (SrcBank == &AMDGPU::VCCRegBank) {
133 const LLT S32 = LLT::scalar(32);
134 assert(MRI.getType(SrcReg) == LLT::scalar(1));
135 assert(MRI.getType(DstReg) == S32);
136 assert(NewBank == &AMDGPU::VGPRRegBank);
137
138 // Replace the extension with a select, which really uses the boolean
139 // source.
140 B.setInsertPt(*MI.getParent(), MI);
141
142 auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
143 auto False = B.buildConstant(S32, 0);
144 B.buildSelect(DstReg, SrcReg, True, False);
145 MRI.setRegBank(True.getReg(0), *NewBank);
146 MRI.setRegBank(False.getReg(0), *NewBank);
147 MI.eraseFromParent();
148 }
149
150 assert(!MRI.getRegClassOrRegBank(DstReg));
151 MRI.setRegBank(DstReg, *NewBank);
152 return;
153 }
154
155#ifndef NDEBUG
156 if (Opc == AMDGPU::G_TRUNC) {
157 Register DstReg = MI.getOperand(0).getReg();
158 const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
159 assert(DstBank != &AMDGPU::VCCRegBank);
160 }
161#endif
162
163 for (MachineOperand &Op : MI.operands()) {
164 if (!Op.isReg())
165 continue;
166
167 // We may see physical registers if building a real MI
168 Register Reg = Op.getReg();
169 if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
170 continue;
171
172 const RegisterBank *RB = NewBank;
173 if (MRI.getType(Reg) == LLT::scalar(1)) {
174 assert(NewBank == &AMDGPU::VGPRRegBank &&
175 "s1 operands should only be used for vector bools");
176 assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
177 MI.getOpcode() != AMDGPU::G_ANYEXT) &&
178 "not expecting legalization artifacts here");
179 RB = &AMDGPU::VCCRegBank;
180 }
181
182 MRI.setRegBank(Reg, *RB);
183 }
184 }
185
186 void erasingInstr(MachineInstr &MI) override {}
187
188 void createdInstr(MachineInstr &MI) override {
189 // At this point, the instruction was just inserted and has no operands.
190 NewInsts.push_back(&MI);
191 }
192
193 void changingInstr(MachineInstr &MI) override {}
194 void changedInstr(MachineInstr &MI) override {
195 // FIXME: In principle we should probably add the instruction to NewInsts,
196 // but the way the LegalizerHelper uses the observer, we will always see the
197 // registers we need to set the regbank on also referenced in a new
198 // instruction.
199 }
200};
201
202} // anonymous namespace
203
205 : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
206 TII(Subtarget.getInstrInfo()) {
207
208 // HACK: Until this is fully tablegen'd.
209 static llvm::once_flag InitializeRegisterBankFlag;
210
211 static auto InitializeRegisterBankOnce = [this]() {
212 assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
213 &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
214 &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
215 (void)this;
216 };
217
218 llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
219}
220
221static bool isVectorRegisterBank(const RegisterBank &Bank) {
222 unsigned BankID = Bank.getID();
223 return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
224}
225
227 return RB != &AMDGPU::SGPRRegBank;
228}
229
231 const RegisterBank &Src,
232 TypeSize Size) const {
233 // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
234 if (Dst.getID() == AMDGPU::SGPRRegBankID &&
235 (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
236 return std::numeric_limits<unsigned>::max();
237 }
238
239 // Bool values are tricky, because the meaning is based on context. The SCC
240 // and VCC banks are for the natural scalar and vector conditions produced by
241 // a compare.
242 //
243 // Legalization doesn't know about the necessary context, so an s1 use may
244 // have been a truncate from an arbitrary value, in which case a copy (lowered
245 // as a compare with 0) needs to be inserted.
246 if (Size == 1 &&
247 (Dst.getID() == AMDGPU::SGPRRegBankID) &&
248 (isVectorRegisterBank(Src) ||
249 Src.getID() == AMDGPU::SGPRRegBankID ||
250 Src.getID() == AMDGPU::VCCRegBankID))
251 return std::numeric_limits<unsigned>::max();
252
253 // There is no direct copy between AGPRs.
254 if (Dst.getID() == AMDGPU::AGPRRegBankID &&
255 Src.getID() == AMDGPU::AGPRRegBankID)
256 return 4;
257
258 return RegisterBankInfo::copyCost(Dst, Src, Size);
259}
260
262 const ValueMapping &ValMapping,
263 const RegisterBank *CurBank) const {
264 // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
265 // VGPR.
266 // FIXME: Is there a better way to do this?
267 if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
268 return 10; // This is expensive.
269
270 assert(ValMapping.NumBreakDowns == 2 &&
271 ValMapping.BreakDown[0].Length == 32 &&
272 ValMapping.BreakDown[0].StartIdx == 0 &&
273 ValMapping.BreakDown[1].Length == 32 &&
274 ValMapping.BreakDown[1].StartIdx == 32 &&
275 ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
276
277 // 32-bit extract of a 64-bit value is just access of a subregister, so free.
278 // TODO: Cost of 0 hits assert, though it's not clear it's what we really
279 // want.
280
281 // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
282 // alignment restrictions, but this probably isn't important.
283 return 1;
284}
285
286const RegisterBank &
288 LLT Ty) const {
289 if (&RC == &AMDGPU::SReg_1RegClass)
290 return AMDGPU::VCCRegBank;
291
292 // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
293 // VCC-like use.
294 if (TRI->isSGPRClass(&RC)) {
295 // FIXME: This probably came from a copy from a physical register, which
296 // should be inferable from the copied to-type. We don't have many boolean
297 // physical register constraints so just assume a normal SGPR for now.
298 if (!Ty.isValid())
299 return AMDGPU::SGPRRegBank;
300
301 return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
302 }
303
304 return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
305}
306
307template <unsigned NumOps>
310 const MachineInstr &MI, const MachineRegisterInfo &MRI,
311 const std::array<unsigned, NumOps> RegSrcOpIdx,
312 ArrayRef<OpRegBankEntry<NumOps>> Table) const {
313
314 InstructionMappings AltMappings;
315
317
318 unsigned Sizes[NumOps];
319 for (unsigned I = 0; I < NumOps; ++I) {
320 Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
321 Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
322 }
323
324 for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
325 unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
326 Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
327 }
328
329 // getInstrMapping's default mapping uses ID 1, so start at 2.
330 unsigned MappingID = 2;
331 for (const auto &Entry : Table) {
332 for (unsigned I = 0; I < NumOps; ++I) {
333 int OpIdx = RegSrcOpIdx[I];
334 Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
335 }
336
337 AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
339 Operands.size()));
340 }
341
342 return AltMappings;
343}
344
347 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
348 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
349 case Intrinsic::amdgcn_readlane: {
350 static const OpRegBankEntry<3> Table[2] = {
351 // Perfectly legal.
352 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
353
354 // Need a readfirstlane for the index.
355 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
356 };
357
358 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
359 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
360 }
361 case Intrinsic::amdgcn_writelane: {
362 static const OpRegBankEntry<4> Table[4] = {
363 // Perfectly legal.
364 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
365
366 // Need readfirstlane of first op
367 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
368
369 // Need readfirstlane of second op
370 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
371
372 // Need readfirstlane of both ops
373 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
374 };
375
376 // rsrc, voffset, offset
377 const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
378 return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, Table);
379 }
380 default:
382 }
383}
384
387 const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
388
389 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
390 case Intrinsic::amdgcn_s_buffer_load: {
391 static const OpRegBankEntry<2> Table[4] = {
392 // Perfectly legal.
393 { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
394
395 // Only need 1 register in loop
396 { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
397
398 // Have to waterfall the resource.
399 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
400
401 // Have to waterfall the resource, and the offset.
402 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
403 };
404
405 // rsrc, offset
406 const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
407 return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, Table);
408 }
409 case Intrinsic::amdgcn_ds_ordered_add:
410 case Intrinsic::amdgcn_ds_ordered_swap: {
411 // VGPR = M0, VGPR
412 static const OpRegBankEntry<3> Table[2] = {
413 // Perfectly legal.
414 { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
415
416 // Need a readfirstlane for m0
417 { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
418 };
419
420 const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
421 return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, Table);
422 }
423 case Intrinsic::amdgcn_s_sendmsg:
424 case Intrinsic::amdgcn_s_sendmsghalt: {
425 // FIXME: Should have no register for immediate
426 static const OpRegBankEntry<1> Table[2] = {
427 // Perfectly legal.
428 { { AMDGPU::SGPRRegBankID }, 1 },
429
430 // Need readlane
431 { { AMDGPU::VGPRRegBankID }, 3 }
432 };
433
434 const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
435 return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, Table);
436 }
437 default:
439 }
440}
441
442// FIXME: Returns uniform if there's no source value information. This is
443// probably wrong.
445 if (!MI.hasOneMemOperand())
446 return false;
447
448 const MachineMemOperand *MMO = *MI.memoperands_begin();
449 const unsigned AS = MMO->getAddrSpace();
450 const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
452 const unsigned MemSize = 8 * MMO->getSize().getValue();
453
454 // Require 4-byte alignment.
455 return (MMO->getAlign() >= Align(4) ||
457 ((MemSize == 16 && MMO->getAlign() >= Align(2)) ||
458 (MemSize == 8 && MMO->getAlign() >= Align(1))))) &&
459 // Can't do a scalar atomic load.
460 !MMO->isAtomic() &&
461 // Don't use scalar loads for volatile accesses to non-constant address
462 // spaces.
463 (IsConst || !MMO->isVolatile()) &&
464 // Memory must be known constant, or not written before this load.
465 (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
467}
468
471 const MachineInstr &MI) const {
472
473 const MachineFunction &MF = *MI.getParent()->getParent();
474 const MachineRegisterInfo &MRI = MF.getRegInfo();
475
476
477 InstructionMappings AltMappings;
478 switch (MI.getOpcode()) {
479 case TargetOpcode::G_CONSTANT:
480 case TargetOpcode::G_IMPLICIT_DEF: {
481 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
482 if (Size == 1) {
483 static const OpRegBankEntry<1> Table[3] = {
484 { { AMDGPU::VGPRRegBankID }, 1 },
485 { { AMDGPU::SGPRRegBankID }, 1 },
486 { { AMDGPU::VCCRegBankID }, 1 }
487 };
488
489 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
490 }
491
492 [[fallthrough]];
493 }
494 case TargetOpcode::G_FCONSTANT:
495 case TargetOpcode::G_FRAME_INDEX:
496 case TargetOpcode::G_GLOBAL_VALUE: {
497 static const OpRegBankEntry<1> Table[2] = {
498 { { AMDGPU::VGPRRegBankID }, 1 },
499 { { AMDGPU::SGPRRegBankID }, 1 }
500 };
501
502 return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
503 }
504 case TargetOpcode::G_AND:
505 case TargetOpcode::G_OR:
506 case TargetOpcode::G_XOR: {
507 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
508
509 if (Size == 1) {
510 // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
511 const InstructionMapping &SCCMapping = getInstructionMapping(
512 1, 1, getOperandsMapping(
513 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
514 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
515 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
516 3); // Num Operands
517 AltMappings.push_back(&SCCMapping);
518
519 const InstructionMapping &VCCMapping0 = getInstructionMapping(
520 2, 1, getOperandsMapping(
521 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
522 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
523 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
524 3); // Num Operands
525 AltMappings.push_back(&VCCMapping0);
526 return AltMappings;
527 }
528
529 if (Size != 64)
530 break;
531
532 const InstructionMapping &SSMapping = getInstructionMapping(
533 1, 1, getOperandsMapping(
534 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
535 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
536 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
537 3); // Num Operands
538 AltMappings.push_back(&SSMapping);
539
540 const InstructionMapping &VVMapping = getInstructionMapping(
541 2, 2, getOperandsMapping(
542 {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
543 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
544 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
545 3); // Num Operands
546 AltMappings.push_back(&VVMapping);
547 break;
548 }
549 case TargetOpcode::G_LOAD:
550 case TargetOpcode::G_ZEXTLOAD:
551 case TargetOpcode::G_SEXTLOAD: {
552 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
553 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
554 unsigned PtrSize = PtrTy.getSizeInBits();
555 unsigned AS = PtrTy.getAddressSpace();
556
560 const InstructionMapping &SSMapping = getInstructionMapping(
561 1, 1, getOperandsMapping(
562 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
563 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
564 2); // Num Operands
565 AltMappings.push_back(&SSMapping);
566 }
567
568 const InstructionMapping &VVMapping = getInstructionMapping(
569 2, 1,
571 {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
572 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
573 2); // Num Operands
574 AltMappings.push_back(&VVMapping);
575
576 // It may be possible to have a vgpr = load sgpr mapping here, because
577 // the mubuf instructions support this kind of load, but probably for only
578 // gfx7 and older. However, the addressing mode matching in the instruction
579 // selector should be able to do a better job of detecting and selecting
580 // these kinds of loads from the vgpr = load vgpr mapping.
581
582 return AltMappings;
583
584 }
585 case TargetOpcode::G_SELECT: {
586 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
587 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
588 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
589 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
590 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
591 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
592 4); // Num Operands
593 AltMappings.push_back(&SSMapping);
594
595 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
596 getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
597 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
598 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
599 AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
600 4); // Num Operands
601 AltMappings.push_back(&VVMapping);
602
603 return AltMappings;
604 }
605 case TargetOpcode::G_UADDE:
606 case TargetOpcode::G_USUBE:
607 case TargetOpcode::G_SADDE:
608 case TargetOpcode::G_SSUBE: {
609 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
610 const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
612 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
613 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
614 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
615 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
616 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
617 5); // Num Operands
618 AltMappings.push_back(&SSMapping);
619
620 const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
621 getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
622 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
623 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
624 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
625 AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
626 5); // Num Operands
627 AltMappings.push_back(&VVMapping);
628 return AltMappings;
629 }
630 case AMDGPU::G_BRCOND: {
631 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
632
633 // TODO: Change type to 32 for scalar
635 1, 1, getOperandsMapping(
636 {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
637 2); // Num Operands
638 AltMappings.push_back(&SMapping);
639
641 1, 1, getOperandsMapping(
642 {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
643 2); // Num Operands
644 AltMappings.push_back(&VMapping);
645 return AltMappings;
646 }
647 case AMDGPU::G_INTRINSIC:
648 case AMDGPU::G_INTRINSIC_CONVERGENT:
650 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
651 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS:
653 default:
654 break;
655 }
657}
658
662 LLT HalfTy,
663 Register Reg) const {
664 assert(HalfTy.getSizeInBits() == 32);
665 MachineRegisterInfo *MRI = B.getMRI();
666 Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
667 Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
668 const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
669 MRI->setRegBank(LoLHS, *Bank);
670 MRI->setRegBank(HiLHS, *Bank);
671
672 Regs.push_back(LoLHS);
673 Regs.push_back(HiLHS);
674
675 B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
676 .addDef(LoLHS)
677 .addDef(HiLHS)
678 .addUse(Reg);
679}
680
681/// Replace the current type each register in \p Regs has with \p NewTy
683 LLT NewTy) {
684 for (Register Reg : Regs) {
685 assert(MRI.getType(Reg).getSizeInBits() == NewTy.getSizeInBits());
686 MRI.setType(Reg, NewTy);
687 }
688}
689
691 if (Ty.isVector()) {
694 Ty.getElementType());
695 }
696
697 assert(Ty.getScalarSizeInBits() % 2 == 0);
698 return LLT::scalar(Ty.getScalarSizeInBits() / 2);
699}
700
701// Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
702// source value into a scalar register.
705 Register Src) const {
706 LLT Ty = MRI.getType(Src);
707 const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
708
709 if (Bank == &AMDGPU::SGPRRegBank)
710 return Src;
711
712 unsigned Bits = Ty.getSizeInBits();
713 assert(Bits % 32 == 0);
714
715 if (Bank != &AMDGPU::VGPRRegBank) {
716 // We need to copy from AGPR to VGPR
717 Src = B.buildCopy(Ty, Src).getReg(0);
718 MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
719 }
720
721 LLT S32 = LLT::scalar(32);
722 unsigned NumParts = Bits / 32;
725
726 if (Bits == 32) {
727 SrcParts.push_back(Src);
728 } else {
729 auto Unmerge = B.buildUnmerge(S32, Src);
730 for (unsigned i = 0; i < NumParts; ++i)
731 SrcParts.push_back(Unmerge.getReg(i));
732 }
733
734 for (unsigned i = 0; i < NumParts; ++i) {
735 Register SrcPart = SrcParts[i];
736 Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
737 MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
738
739 const TargetRegisterClass *Constrained =
740 constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
741 (void)Constrained;
742 assert(Constrained && "Failed to constrain readfirstlane src reg");
743
744 B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
745
746 DstParts.push_back(DstPart);
747 }
748
749 if (Bits == 32)
750 return DstParts[0];
751
752 Register Dst = B.buildMergeLikeInstr(Ty, DstParts).getReg(0);
753 MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
754 return Dst;
755}
756
757/// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
758/// any of the required SGPR operands are VGPRs, perform a waterfall loop to
759/// execute the instruction for each unique combination of values in all lanes
760/// in the wave. The block will be split such that rest of the instructions are
761/// moved to a new block.
762///
763/// Essentially performs this loop:
764//
765/// Save Execution Mask
766/// For (Lane : Wavefront) {
767/// Enable Lane, Disable all other lanes
768/// SGPR = read SGPR value for current lane from VGPR
769/// VGPRResult[Lane] = use_op SGPR
770/// }
771/// Restore Execution Mask
772///
773/// There is additional complexity to try for compare values to identify the
774/// unique values used.
777 SmallSet<Register, 4> &SGPROperandRegs) const {
778 // Track use registers which have already been expanded with a readfirstlane
779 // sequence. This may have multiple uses if moving a sequence.
780 DenseMap<Register, Register> WaterfalledRegMap;
781
782 MachineBasicBlock &MBB = B.getMBB();
783 MachineFunction *MF = &B.getMF();
784
786 const unsigned MovExecOpc =
787 Subtarget.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
788 const unsigned MovExecTermOpc =
789 Subtarget.isWave32() ? AMDGPU::S_MOV_B32_term : AMDGPU::S_MOV_B64_term;
790
791 const unsigned XorTermOpc = Subtarget.isWave32() ?
792 AMDGPU::S_XOR_B32_term : AMDGPU::S_XOR_B64_term;
793 const unsigned AndSaveExecOpc = Subtarget.isWave32() ?
794 AMDGPU::S_AND_SAVEEXEC_B32 : AMDGPU::S_AND_SAVEEXEC_B64;
795 const unsigned ExecReg = Subtarget.isWave32() ?
796 AMDGPU::EXEC_LO : AMDGPU::EXEC;
797
798#ifndef NDEBUG
799 const int OrigRangeSize = std::distance(Range.begin(), Range.end());
800#endif
801
802 MachineRegisterInfo &MRI = *B.getMRI();
803 Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
804 Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
805
806 // Don't bother using generic instructions/registers for the exec mask.
807 B.buildInstr(TargetOpcode::IMPLICIT_DEF)
808 .addDef(InitSaveExecReg);
809
810 Register PhiExec = MRI.createVirtualRegister(WaveRC);
811 Register NewExec = MRI.createVirtualRegister(WaveRC);
812
813 // To insert the loop we need to split the block. Move everything before this
814 // point to a new block, and insert a new empty block before this instruction.
817 MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
818 MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
820 ++MBBI;
821 MF->insert(MBBI, LoopBB);
822 MF->insert(MBBI, BodyBB);
823 MF->insert(MBBI, RestoreExecBB);
824 MF->insert(MBBI, RemainderBB);
825
826 LoopBB->addSuccessor(BodyBB);
827 BodyBB->addSuccessor(RestoreExecBB);
828 BodyBB->addSuccessor(LoopBB);
829
830 // Move the rest of the block into a new block.
832 RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
833
834 MBB.addSuccessor(LoopBB);
835 RestoreExecBB->addSuccessor(RemainderBB);
836
837 B.setInsertPt(*LoopBB, LoopBB->end());
838
839 B.buildInstr(TargetOpcode::PHI)
840 .addDef(PhiExec)
841 .addReg(InitSaveExecReg)
842 .addMBB(&MBB)
843 .addReg(NewExec)
844 .addMBB(BodyBB);
845
846 const DebugLoc &DL = B.getDL();
847
848 MachineInstr &FirstInst = *Range.begin();
849
850 // Move the instruction into the loop body. Note we moved everything after
851 // Range.end() already into a new block, so Range.end() is no longer valid.
852 BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
853
854 // Figure out the iterator range after splicing the instructions.
855 MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
856 auto NewEnd = BodyBB->end();
857
858 B.setMBB(*LoopBB);
859
860 LLT S1 = LLT::scalar(1);
861 Register CondReg;
862
863 assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
864
865 for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
866 for (MachineOperand &Op : MI.all_uses()) {
867 Register OldReg = Op.getReg();
868 if (!SGPROperandRegs.count(OldReg))
869 continue;
870
871 // See if we already processed this register in another instruction in the
872 // sequence.
873 auto OldVal = WaterfalledRegMap.find(OldReg);
874 if (OldVal != WaterfalledRegMap.end()) {
875 Op.setReg(OldVal->second);
876 continue;
877 }
878
879 Register OpReg = Op.getReg();
880 LLT OpTy = MRI.getType(OpReg);
881
882 const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
883 if (OpBank != &AMDGPU::VGPRRegBank) {
884 // Insert copy from AGPR to VGPR before the loop.
885 B.setMBB(MBB);
886 OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
887 MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
888 B.setMBB(*LoopBB);
889 }
890
891 Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
892
893 // Build the comparison(s).
894 unsigned OpSize = OpTy.getSizeInBits();
895 bool Is64 = OpSize % 64 == 0;
896 unsigned PartSize = Is64 ? 64 : 32;
897 LLT PartTy = LLT::scalar(PartSize);
898 unsigned NumParts = OpSize / PartSize;
900 SmallVector<Register, 8> CurrentLaneParts;
901
902 if (NumParts == 1) {
903 OpParts.push_back(OpReg);
904 CurrentLaneParts.push_back(CurrentLaneReg);
905 } else {
906 auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
907 auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
908 for (unsigned i = 0; i < NumParts; ++i) {
909 OpParts.push_back(UnmergeOp.getReg(i));
910 CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
911 MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
912 MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
913 }
914 }
915
916 for (unsigned i = 0; i < NumParts; ++i) {
917 auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
918 OpParts[i]).getReg(0);
919 MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
920
921 if (!CondReg) {
922 CondReg = CmpReg;
923 } else {
924 CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
925 MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
926 }
927 }
928
929 Op.setReg(CurrentLaneReg);
930
931 // Make sure we don't re-process this register again.
932 WaterfalledRegMap.insert(std::pair(OldReg, Op.getReg()));
933 }
934 }
935
936 // The ballot becomes a no-op during instruction selection.
937 CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
938 {LLT::scalar(Subtarget.isWave32() ? 32 : 64)})
939 .addReg(CondReg)
940 .getReg(0);
941 MRI.setRegClass(CondReg, WaveRC);
942
943 // Update EXEC, save the original EXEC value to VCC.
944 B.buildInstr(AndSaveExecOpc)
945 .addDef(NewExec)
946 .addReg(CondReg, RegState::Kill);
947
948 MRI.setSimpleHint(NewExec, CondReg);
949
950 B.setInsertPt(*BodyBB, BodyBB->end());
951
952 // Update EXEC, switch all done bits to 0 and all todo bits to 1.
953 B.buildInstr(XorTermOpc)
954 .addDef(ExecReg)
955 .addReg(ExecReg)
956 .addReg(NewExec);
957
958 // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
959 // s_cbranch_scc0?
960
961 // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
962 B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
963
964 // Save the EXEC mask before the loop.
965 BuildMI(MBB, MBB.end(), DL, TII->get(MovExecOpc), SaveExecReg)
966 .addReg(ExecReg);
967
968 // Restore the EXEC mask after the loop.
969 B.setMBB(*RestoreExecBB);
970 B.buildInstr(MovExecTermOpc)
971 .addDef(ExecReg)
972 .addReg(SaveExecReg);
973
974 // Set the insert point after the original instruction, so any new
975 // instructions will be in the remainder.
976 B.setInsertPt(*RemainderBB, RemainderBB->begin());
977
978 return true;
979}
980
981// Return any unique registers used by \p MI at \p OpIndices that need to be
982// handled in a waterfall loop. Returns these registers in \p
983// SGPROperandRegs. Returns true if there are any operands to handle and a
984// waterfall loop is necessary.
986 SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
987 MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
988 for (unsigned Op : OpIndices) {
989 assert(MI.getOperand(Op).isUse());
990 Register Reg = MI.getOperand(Op).getReg();
991 const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
992 if (OpBank->getID() != AMDGPU::SGPRRegBankID)
993 SGPROperandRegs.insert(Reg);
994 }
995
996 // No operands need to be replaced, so no need to loop.
997 return !SGPROperandRegs.empty();
998}
999
1001 MachineIRBuilder &B, MachineInstr &MI, ArrayRef<unsigned> OpIndices) const {
1002 // Use a set to avoid extra readfirstlanes in the case where multiple operands
1003 // are the same register.
1004 SmallSet<Register, 4> SGPROperandRegs;
1005
1006 if (!collectWaterfallOperands(SGPROperandRegs, MI, *B.getMRI(), OpIndices))
1007 return false;
1008
1009 MachineBasicBlock::iterator I = MI.getIterator();
1010 return executeInWaterfallLoop(B, make_range(I, std::next(I)),
1011 SGPROperandRegs);
1012}
1013
1014// Legalize an operand that must be an SGPR by inserting a readfirstlane.
1016 MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const {
1017 Register Reg = MI.getOperand(OpIdx).getReg();
1018 MachineRegisterInfo &MRI = *B.getMRI();
1019 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1020 if (Bank == &AMDGPU::SGPRRegBank)
1021 return;
1022
1023 Reg = buildReadFirstLane(B, MRI, Reg);
1024 MI.getOperand(OpIdx).setReg(Reg);
1025}
1026
1027/// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1028/// rest will be in the remainder.
1029static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1030 unsigned TotalSize = Ty.getSizeInBits();
1031 if (!Ty.isVector())
1032 return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1033
1034 LLT EltTy = Ty.getElementType();
1035 unsigned EltSize = EltTy.getSizeInBits();
1036 assert(FirstSize % EltSize == 0);
1037
1038 unsigned FirstPartNumElts = FirstSize / EltSize;
1039 unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1040
1041 return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1042 LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1043}
1044
1046 if (!Ty.isVector())
1047 return LLT::scalar(128);
1048
1049 LLT EltTy = Ty.getElementType();
1050 assert(128 % EltTy.getSizeInBits() == 0);
1051 return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1052}
1053
1057 MachineInstr &MI) const {
1058 MachineRegisterInfo &MRI = *B.getMRI();
1059 Register DstReg = MI.getOperand(0).getReg();
1060 const LLT LoadTy = MRI.getType(DstReg);
1061 unsigned LoadSize = LoadTy.getSizeInBits();
1062 MachineMemOperand *MMO = *MI.memoperands_begin();
1063 const unsigned MaxNonSmrdLoadSize = 128;
1064
1065 const RegisterBank *DstBank =
1066 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1067 if (DstBank == &AMDGPU::SGPRRegBank) {
1068 // There are some special cases that we need to look at for 32 bit and 96
1069 // bit SGPR loads otherwise we have nothing to do.
1070 if (LoadSize != 32 && (LoadSize != 96 || Subtarget.hasScalarDwordx3Loads()))
1071 return false;
1072
1073 const unsigned MemSize = 8 * MMO->getSize().getValue();
1074 // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1075 // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1076 // scalar loads should have a load size of 32 but memory access size of less
1077 // than 32.
1078 if (LoadSize == 32 &&
1079 (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1080 return false;
1081
1082 if (LoadSize == 32 &&
1083 ((MemSize == 8 && MMO->getAlign() >= Align(1)) ||
1084 (MemSize == 16 && MMO->getAlign() >= Align(2))) &&
1087 return false;
1088
1089 Register PtrReg = MI.getOperand(1).getReg();
1090
1091 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
1092
1093 if (LoadSize == 32) {
1094 // This is an extending load from a sub-dword size. Widen the memory
1095 // access size to 4 bytes and clear the extra high bits appropriately
1096 const LLT S32 = LLT::scalar(32);
1097 if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1098 // Must extend the sign bit into higher bits for a G_SEXTLOAD
1099 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1100 B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1101 } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1102 // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1103 auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1104 B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1105 } else
1106 // We do not need to touch the higher bits for regular loads.
1107 B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1108 } else {
1109 // 96-bit loads are only available for vector loads. We need to split this
1110 // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1111 if (MMO->getAlign() < Align(16)) {
1112 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
1113 LLT Part64, Part32;
1114 std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1115 if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1117 return false;
1118 return true;
1119 }
1120 LLT WiderTy = widen96To128(LoadTy);
1121 auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1122 if (WiderTy.isScalar()) {
1123 B.buildTrunc(MI.getOperand(0), WideLoad);
1124 } else {
1125 B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1126 WideLoad);
1127 }
1128 }
1129
1130 MI.eraseFromParent();
1131 return true;
1132 }
1133
1134 // 128-bit loads are supported for all instruction types.
1135 if (LoadSize <= MaxNonSmrdLoadSize)
1136 return false;
1137
1138 SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1139
1140 if (SrcRegs.empty())
1141 SrcRegs.push_back(MI.getOperand(1).getReg());
1142
1143 // RegBankSelect only emits scalar types, so we need to reset the pointer
1144 // operand to a pointer type.
1145 Register BasePtrReg = SrcRegs[0];
1146 LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1147 MRI.setType(BasePtrReg, PtrTy);
1148
1149 // The following are the loads not splitted enough during legalization
1150 // because it was not clear they are smem-load or vmem-load
1153 assert(LoadSize % MaxNonSmrdLoadSize == 0);
1154 unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1155 const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1156 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
1157 LegalizerHelper Helper(B.getMF(), O, B);
1158 if (LoadTy.isVector()) {
1159 if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) !=
1161 return false;
1162 } else {
1163 if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1164 return false;
1165 }
1166 }
1167
1168 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1169 return true;
1170}
1171
1175 MachineInstr &MI) const {
1176 MachineRegisterInfo &MRI = *B.getMRI();
1177 const MachineFunction &MF = B.getMF();
1178 const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1179 const auto &TFI = *ST.getFrameLowering();
1180
1181 // Guard in case the stack growth direction ever changes with scratch
1182 // instructions.
1183 assert(TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsUp &&
1184 "Stack grows upwards for AMDGPU");
1185
1186 Register Dst = MI.getOperand(0).getReg();
1187 Register AllocSize = MI.getOperand(1).getReg();
1188 Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1189
1190 const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1191
1192 if (SizeBank != &AMDGPU::SGPRRegBank) {
1193 auto WaveReduction =
1194 B.buildIntrinsic(Intrinsic::amdgcn_wave_reduce_umax, {LLT::scalar(32)})
1195 .addUse(AllocSize)
1196 .addImm(0);
1197 AllocSize = WaveReduction.getReg(0);
1198 }
1199
1200 LLT PtrTy = MRI.getType(Dst);
1201 LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1202
1204 Register SPReg = Info->getStackPtrOffsetReg();
1205 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1206
1207 auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1208 auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1209
1210 auto OldSP = B.buildCopy(PtrTy, SPReg);
1211 if (Alignment > TFI.getStackAlign()) {
1212 auto StackAlignMask = (Alignment.value() << ST.getWavefrontSizeLog2()) - 1;
1213 auto Tmp1 = B.buildPtrAdd(PtrTy, OldSP,
1214 B.buildConstant(LLT::scalar(32), StackAlignMask));
1215 B.buildMaskLowPtrBits(Dst, Tmp1,
1216 Log2(Alignment) + ST.getWavefrontSizeLog2());
1217 } else {
1218 B.buildCopy(Dst, OldSP);
1219 }
1220 auto PtrAdd = B.buildPtrAdd(PtrTy, Dst, ScaledSize);
1221 B.buildCopy(SPReg, PtrAdd);
1222 MI.eraseFromParent();
1223 return true;
1224}
1225
1229 int RsrcIdx) const {
1230 const int NumDefs = MI.getNumExplicitDefs();
1231
1232 // The reported argument index is relative to the IR intrinsic call arguments,
1233 // so we need to shift by the number of defs and the intrinsic ID.
1234 RsrcIdx += NumDefs + 1;
1235
1236 // Insert copies to VGPR arguments.
1237 applyDefaultMapping(OpdMapper);
1238
1239 // Fixup any SGPR arguments.
1240 SmallVector<unsigned, 4> SGPRIndexes;
1241 for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1242 if (!MI.getOperand(I).isReg())
1243 continue;
1244
1245 // If this intrinsic has a sampler, it immediately follows rsrc.
1246 if (I == RsrcIdx || I == RsrcIdx + 1)
1247 SGPRIndexes.push_back(I);
1248 }
1249
1250 executeInWaterfallLoop(B, MI, SGPRIndexes);
1251 return true;
1252}
1253
1254// Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1255// the three offsets (voffset, soffset and instoffset)
1257 MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg,
1258 Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const {
1259 const LLT S32 = LLT::scalar(32);
1260 MachineRegisterInfo *MRI = B.getMRI();
1261
1262 if (std::optional<int64_t> Imm =
1263 getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1264 uint32_t SOffset, ImmOffset;
1265 if (TII->splitMUBUFOffset(*Imm, SOffset, ImmOffset, Alignment)) {
1266 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1267 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1268 InstOffsetVal = ImmOffset;
1269
1270 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1271 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1272 return SOffset + ImmOffset;
1273 }
1274 }
1275
1276 Register Base;
1277 unsigned Offset;
1278
1279 std::tie(Base, Offset) =
1280 AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);
1281
1282 uint32_t SOffset, ImmOffset;
1283 if ((int)Offset > 0 &&
1284 TII->splitMUBUFOffset(Offset, SOffset, ImmOffset, Alignment)) {
1285 if (getRegBank(Base, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1286 VOffsetReg = Base;
1287 SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1288 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1289 InstOffsetVal = ImmOffset;
1290 return 0; // XXX - Why is this 0?
1291 }
1292
1293 // If we have SGPR base, we can use it for soffset.
1294 if (SOffset == 0) {
1295 VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1296 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1297 SOffsetReg = Base;
1298 InstOffsetVal = ImmOffset;
1299 return 0; // XXX - Why is this 0?
1300 }
1301 }
1302
1303 // Handle the variable sgpr + vgpr case.
1304 MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1305 if (Add && (int)Offset >= 0) {
1306 Register Src0 = getSrcRegIgnoringCopies(Add->getOperand(1).getReg(), *MRI);
1307 Register Src1 = getSrcRegIgnoringCopies(Add->getOperand(2).getReg(), *MRI);
1308
1309 const RegisterBank *Src0Bank = getRegBank(Src0, *MRI, *TRI);
1310 const RegisterBank *Src1Bank = getRegBank(Src1, *MRI, *TRI);
1311
1312 if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1313 VOffsetReg = Src0;
1314 SOffsetReg = Src1;
1315 return 0;
1316 }
1317
1318 if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1319 VOffsetReg = Src1;
1320 SOffsetReg = Src0;
1321 return 0;
1322 }
1323 }
1324
1325 // Ensure we have a VGPR for the combined offset. This could be an issue if we
1326 // have an SGPR offset and a VGPR resource.
1327 if (getRegBank(CombinedOffset, *MRI, *TRI) == &AMDGPU::VGPRRegBank) {
1328 VOffsetReg = CombinedOffset;
1329 } else {
1330 VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1331 B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1332 }
1333
1334 SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1335 B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1336 return 0;
1337}
1338
1340 switch (Opc) {
1341 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
1342 return AMDGPU::G_AMDGPU_BUFFER_LOAD;
1343 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
1344 return AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE;
1345 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
1346 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE;
1347 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
1348 return AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT;
1349 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT:
1350 return AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT;
1351 default:
1352 break;
1353 }
1354 llvm_unreachable("Unexpected s_buffer_load opcode");
1355}
1356
1358 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1359 MachineInstr &MI = OpdMapper.getMI();
1360 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1361
1362 const LLT S32 = LLT::scalar(32);
1363 Register Dst = MI.getOperand(0).getReg();
1364 LLT Ty = MRI.getType(Dst);
1365
1366 const RegisterBank *RSrcBank =
1367 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1368 const RegisterBank *OffsetBank =
1369 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1370 if (RSrcBank == &AMDGPU::SGPRRegBank &&
1371 OffsetBank == &AMDGPU::SGPRRegBank)
1372 return true; // Legal mapping
1373
1374 // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1375 // here but don't have an MMO.
1376
1377 unsigned LoadSize = Ty.getSizeInBits();
1378 int NumLoads = 1;
1379 if (LoadSize == 256 || LoadSize == 512) {
1380 NumLoads = LoadSize / 128;
1381 Ty = Ty.divide(NumLoads);
1382 }
1383
1384 // Use the alignment to ensure that the required offsets will fit into the
1385 // immediate offsets.
1386 const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1387
1388 MachineFunction &MF = B.getMF();
1389
1390 Register SOffset;
1391 Register VOffset;
1392 int64_t ImmOffset = 0;
1393
1394 unsigned MMOOffset = setBufferOffsets(B, MI.getOperand(2).getReg(), VOffset,
1395 SOffset, ImmOffset, Alignment);
1396
1397 // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1398 // can, but we need to track an MMO for that.
1399 const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1400 const Align MemAlign(4); // FIXME: ABI type alignment?
1405 MemSize, MemAlign);
1406 if (MMOOffset != 0)
1407 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1408
1409 // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1410 // assume that the buffer is unswizzled.
1411
1412 Register RSrc = MI.getOperand(1).getReg();
1413 Register VIndex = B.buildConstant(S32, 0).getReg(0);
1414 B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1415
1416 SmallVector<Register, 4> LoadParts(NumLoads);
1417
1418 MachineBasicBlock::iterator MII = MI.getIterator();
1419 MachineInstrSpan Span(MII, &B.getMBB());
1420
1421 for (int i = 0; i < NumLoads; ++i) {
1422 if (NumLoads == 1) {
1423 LoadParts[i] = Dst;
1424 } else {
1425 LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1426 MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1427 }
1428
1429 MachineMemOperand *MMO = BaseMMO;
1430 if (i != 0)
1431 BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset + 16 * i, MemSize);
1432
1433 B.buildInstr(getSBufferLoadCorrespondingBufferLoadOpcode(MI.getOpcode()))
1434 .addDef(LoadParts[i]) // vdata
1435 .addUse(RSrc) // rsrc
1436 .addUse(VIndex) // vindex
1437 .addUse(VOffset) // voffset
1438 .addUse(SOffset) // soffset
1439 .addImm(ImmOffset + 16 * i) // offset(imm)
1440 .addImm(0) // cachepolicy, swizzled buffer(imm)
1441 .addImm(0) // idxen(imm)
1442 .addMemOperand(MMO);
1443 }
1444
1445 // TODO: If only the resource is a VGPR, it may be better to execute the
1446 // scalar load in the waterfall loop if the resource is expected to frequently
1447 // be dynamically uniform.
1448 if (RSrcBank != &AMDGPU::SGPRRegBank) {
1449 // Remove the original instruction to avoid potentially confusing the
1450 // waterfall loop logic.
1451 B.setInstr(*Span.begin());
1452 MI.eraseFromParent();
1453
1454 SmallSet<Register, 4> OpsToWaterfall;
1455
1456 OpsToWaterfall.insert(RSrc);
1457 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1458 OpsToWaterfall);
1459 }
1460
1461 if (NumLoads != 1) {
1462 if (Ty.isVector())
1463 B.buildConcatVectors(Dst, LoadParts);
1464 else
1465 B.buildMergeLikeInstr(Dst, LoadParts);
1466 }
1467
1468 // We removed the instruction earlier with a waterfall loop.
1469 if (RSrcBank == &AMDGPU::SGPRRegBank)
1470 MI.eraseFromParent();
1471
1472 return true;
1473}
1474
1476 const OperandsMapper &OpdMapper,
1477 bool Signed) const {
1478 MachineInstr &MI = OpdMapper.getMI();
1479 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1480
1481 // Insert basic copies
1482 applyDefaultMapping(OpdMapper);
1483
1484 Register DstReg = MI.getOperand(0).getReg();
1485 LLT Ty = MRI.getType(DstReg);
1486
1487 const LLT S32 = LLT::scalar(32);
1488
1489 unsigned FirstOpnd = isa<GIntrinsic>(MI) ? 2 : 1;
1490 Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1491 Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1492 Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1493
1494 const RegisterBank *DstBank =
1495 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1496 if (DstBank == &AMDGPU::VGPRRegBank) {
1497 if (Ty == S32)
1498 return true;
1499
1500 // There is no 64-bit vgpr bitfield extract instructions so the operation
1501 // is expanded to a sequence of instructions that implement the operation.
1502 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
1503
1504 const LLT S64 = LLT::scalar(64);
1505 // Shift the source operand so that extracted bits start at bit 0.
1506 auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1507 : B.buildLShr(S64, SrcReg, OffsetReg);
1508 auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1509
1510 // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1511 // if the width is a constant.
1512 if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1513 // Use the 32-bit bitfield extract instruction if the width is a constant.
1514 // Depending on the width size, use either the low or high 32-bits.
1515 auto Zero = B.buildConstant(S32, 0);
1516 auto WidthImm = ConstWidth->Value.getZExtValue();
1517 if (WidthImm <= 32) {
1518 // Use bitfield extract on the lower 32-bit source, and then sign-extend
1519 // or clear the upper 32-bits.
1520 auto Extract =
1521 Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1522 : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1523 auto Extend =
1524 Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1525 B.buildMergeLikeInstr(DstReg, {Extract, Extend});
1526 } else {
1527 // Use bitfield extract on upper 32-bit source, and combine with lower
1528 // 32-bit source.
1529 auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1530 auto Extract =
1531 Signed
1532 ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1533 : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1534 B.buildMergeLikeInstr(DstReg, {UnmergeSOffset.getReg(0), Extract});
1535 }
1536 MI.eraseFromParent();
1537 return true;
1538 }
1539
1540 // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1541 // operations.
1542 auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1543 auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1544 if (Signed)
1545 B.buildAShr(S64, SignBit, ExtShift);
1546 else
1547 B.buildLShr(S64, SignBit, ExtShift);
1548 MI.eraseFromParent();
1549 return true;
1550 }
1551
1552 // The scalar form packs the offset and width in a single operand.
1553
1554 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::SGPRRegBank);
1555
1556 // Ensure the high bits are clear to insert the offset.
1557 auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1558 auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1559
1560 // Zeros out the low bits, so don't bother clamping the input value.
1561 auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1562
1563 // Transformation function, pack the offset and width of a BFE into
1564 // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1565 // source, bits [5:0] contain the offset and bits [22:16] the width.
1566 auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1567
1568 // TODO: It might be worth using a pseudo here to avoid scc clobber and
1569 // register class constraints.
1570 unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1571 (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1572
1573 auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1574 if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
1575 llvm_unreachable("failed to constrain BFE");
1576
1577 MI.eraseFromParent();
1578 return true;
1579}
1580
1582 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
1583 MachineInstr &MI = OpdMapper.getMI();
1584 MachineRegisterInfo &MRI = OpdMapper.getMRI();
1585
1586 // Insert basic copies.
1587 applyDefaultMapping(OpdMapper);
1588
1589 Register Dst0 = MI.getOperand(0).getReg();
1590 Register Dst1 = MI.getOperand(1).getReg();
1591 Register Src0 = MI.getOperand(2).getReg();
1592 Register Src1 = MI.getOperand(3).getReg();
1593 Register Src2 = MI.getOperand(4).getReg();
1594
1595 if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1596 return true;
1597
1598 bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1599 LLT S1 = LLT::scalar(1);
1600 LLT S32 = LLT::scalar(32);
1601
1602 bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1603 bool Accumulate = true;
1604
1605 if (!DstOnValu) {
1606 if (mi_match(Src2, MRI, m_ZeroInt()))
1607 Accumulate = false;
1608 }
1609
1610 // Keep the multiplication on the SALU.
1611 Register DstHi;
1612 Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1613 bool MulHiInVgpr = false;
1614
1615 MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1616
1617 if (Subtarget.hasSMulHi()) {
1618 DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1619 : B.buildSMulH(S32, Src0, Src1).getReg(0);
1620 MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1621 } else {
1622 Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1623 Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1624
1625 MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1626 MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1627
1628 DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1629 : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1630 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1631
1632 if (!DstOnValu) {
1633 DstHi = buildReadFirstLane(B, MRI, DstHi);
1634 } else {
1635 MulHiInVgpr = true;
1636 }
1637 }
1638
1639 // Accumulate and produce the "carry-out" bit.
1640 //
1641 // The "carry-out" is defined as bit 64 of the result when computed as a
1642 // big integer. For unsigned multiply-add, this matches the usual definition
1643 // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1644 // result, which is determined as:
1645 // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1646 LLT CarryType = DstOnValu ? S1 : S32;
1647 const RegisterBank &CarryBank =
1648 DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1649 const RegisterBank &DstBank =
1650 DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1651 Register Carry;
1652 Register Zero;
1653
1654 if (!IsUnsigned) {
1655 Zero = B.buildConstant(S32, 0).getReg(0);
1656 MRI.setRegBank(Zero,
1657 MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1658
1659 Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1660 .getReg(0);
1661 MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1662 : AMDGPU::SGPRRegBank);
1663
1664 if (DstOnValu && !MulHiInVgpr) {
1665 Carry = B.buildTrunc(S1, Carry).getReg(0);
1666 MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1667 }
1668 }
1669
1670 if (Accumulate) {
1671 if (DstOnValu) {
1672 DstLo = B.buildCopy(S32, DstLo).getReg(0);
1673 DstHi = B.buildCopy(S32, DstHi).getReg(0);
1674 MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1675 MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1676 }
1677
1678 auto Unmerge = B.buildUnmerge(S32, Src2);
1679 Register Src2Lo = Unmerge.getReg(0);
1680 Register Src2Hi = Unmerge.getReg(1);
1681 MRI.setRegBank(Src2Lo, DstBank);
1682 MRI.setRegBank(Src2Hi, DstBank);
1683
1684 if (!IsUnsigned) {
1685 auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1686 MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1687
1688 Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1689 MRI.setRegBank(Carry, CarryBank);
1690 }
1691
1692 auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1693 DstLo = AddLo.getReg(0);
1694 Register CarryLo = AddLo.getReg(1);
1695 MRI.setRegBank(DstLo, DstBank);
1696 MRI.setRegBank(CarryLo, CarryBank);
1697
1698 auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1699 DstHi = AddHi.getReg(0);
1700 MRI.setRegBank(DstHi, DstBank);
1701
1702 Register CarryHi = AddHi.getReg(1);
1703 MRI.setRegBank(CarryHi, CarryBank);
1704
1705 if (IsUnsigned) {
1706 Carry = CarryHi;
1707 } else {
1708 Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1709 MRI.setRegBank(Carry, CarryBank);
1710 }
1711 } else {
1712 if (IsUnsigned) {
1713 Carry = B.buildConstant(CarryType, 0).getReg(0);
1714 MRI.setRegBank(Carry, CarryBank);
1715 }
1716 }
1717
1718 B.buildMergeLikeInstr(Dst0, {DstLo, DstHi});
1719
1720 if (DstOnValu) {
1721 B.buildCopy(Dst1, Carry);
1722 } else {
1723 B.buildTrunc(Dst1, Carry);
1724 }
1725
1726 MI.eraseFromParent();
1727 return true;
1728}
1729
1730// Return a suitable opcode for extending the operands of Opc when widening.
1731static unsigned getExtendOp(unsigned Opc) {
1732 switch (Opc) {
1733 case TargetOpcode::G_ASHR:
1734 case TargetOpcode::G_SMIN:
1735 case TargetOpcode::G_SMAX:
1736 return TargetOpcode::G_SEXT;
1737 case TargetOpcode::G_LSHR:
1738 case TargetOpcode::G_UMIN:
1739 case TargetOpcode::G_UMAX:
1740 return TargetOpcode::G_ZEXT;
1741 default:
1742 return TargetOpcode::G_ANYEXT;
1743 }
1744}
1745
1746// Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1747// any illegal vector extend or unmerge operations.
1748static std::pair<Register, Register>
1749unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1750 const LLT S32 = LLT::scalar(32);
1751 auto Bitcast = B.buildBitcast(S32, Src);
1752
1753 if (ExtOpcode == TargetOpcode::G_SEXT) {
1754 auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1755 auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1756 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1757 }
1758
1759 auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1760 if (ExtOpcode == TargetOpcode::G_ZEXT) {
1761 auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1762 return std::pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1763 }
1764
1765 assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1766 return std::pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1767}
1768
1769// For cases where only a single copy is inserted for matching register banks.
1770// Replace the register in the instruction operand
1772 const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1773 SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1774 if (!SrcReg.empty()) {
1775 assert(SrcReg.size() == 1);
1776 OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1777 return true;
1778 }
1779
1780 return false;
1781}
1782
1783/// Handle register layout difference for f16 images for some subtargets.
1786 Register Reg) const {
1788 return Reg;
1789
1790 const LLT S16 = LLT::scalar(16);
1791 LLT StoreVT = MRI.getType(Reg);
1792 if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1793 return Reg;
1794
1795 auto Unmerge = B.buildUnmerge(S16, Reg);
1796
1797
1798 SmallVector<Register, 4> WideRegs;
1799 for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1800 WideRegs.push_back(Unmerge.getReg(I));
1801
1802 const LLT S32 = LLT::scalar(32);
1803 int NumElts = StoreVT.getNumElements();
1804
1805 return B.buildMergeLikeInstr(LLT::fixed_vector(NumElts, S32), WideRegs)
1806 .getReg(0);
1807}
1808
1809static std::pair<Register, unsigned>
1811 int64_t Const;
1812 if (mi_match(Reg, MRI, m_ICst(Const)))
1813 return std::pair(Register(), Const);
1814
1815 Register Base;
1816 if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1817 return std::pair(Base, Const);
1818
1819 // TODO: Handle G_OR used for add case
1820 return std::pair(Reg, 0);
1821}
1822
1823std::pair<Register, unsigned>
1825 Register OrigOffset) const {
1826 const unsigned MaxImm = SIInstrInfo::getMaxMUBUFImmOffset(Subtarget);
1827 Register BaseReg;
1828 unsigned ImmOffset;
1829 const LLT S32 = LLT::scalar(32);
1830
1831 // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1832 std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1833 OrigOffset);
1834
1835 unsigned C1 = 0;
1836 if (ImmOffset != 0) {
1837 // If the immediate value is too big for the immoffset field, put only bits
1838 // that would normally fit in the immoffset field. The remaining value that
1839 // is copied/added for the voffset field is a large power of 2, and it
1840 // stands more chance of being CSEd with the copy/add for another similar
1841 // load/store.
1842 // However, do not do that rounding down if that is a negative
1843 // number, as it appears to be illegal to have a negative offset in the
1844 // vgpr, even if adding the immediate offset makes it positive.
1845 unsigned Overflow = ImmOffset & ~MaxImm;
1846 ImmOffset -= Overflow;
1847 if ((int32_t)Overflow < 0) {
1848 Overflow += ImmOffset;
1849 ImmOffset = 0;
1850 }
1851
1852 C1 = ImmOffset;
1853 if (Overflow != 0) {
1854 if (!BaseReg)
1855 BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1856 else {
1857 auto OverflowVal = B.buildConstant(S32, Overflow);
1858 BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1859 }
1860 }
1861 }
1862
1863 if (!BaseReg)
1864 BaseReg = B.buildConstant(S32, 0).getReg(0);
1865
1866 return {BaseReg, C1};
1867}
1868
1870 Register SrcReg) const {
1871 MachineRegisterInfo &MRI = *B.getMRI();
1872 LLT SrcTy = MRI.getType(SrcReg);
1873 if (SrcTy.getSizeInBits() == 32) {
1874 // Use a v_mov_b32 here to make the exec dependency explicit.
1875 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1876 .addDef(DstReg)
1877 .addUse(SrcReg);
1878 return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1879 constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1880 }
1881
1882 Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1883 Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1884
1885 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1886 .addDef(TmpReg0)
1887 .addUse(SrcReg, 0, AMDGPU::sub0);
1888 B.buildInstr(AMDGPU::V_MOV_B32_e32)
1889 .addDef(TmpReg1)
1890 .addUse(SrcReg, 0, AMDGPU::sub1);
1891 B.buildInstr(AMDGPU::REG_SEQUENCE)
1892 .addDef(DstReg)
1893 .addUse(TmpReg0)
1894 .addImm(AMDGPU::sub0)
1895 .addUse(TmpReg1)
1896 .addImm(AMDGPU::sub1);
1897
1898 return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1899 constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1900}
1901
1902/// Utility function for pushing dynamic vector indexes with a constant offset
1903/// into waterfall loops.
1905 MachineInstr &IdxUseInstr,
1906 unsigned OpIdx,
1907 unsigned ConstOffset) {
1908 MachineRegisterInfo &MRI = *B.getMRI();
1909 const LLT S32 = LLT::scalar(32);
1910 Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1911 B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1912
1913 auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1914
1915 auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1916 MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1917 MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1918 IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1919}
1920
1921/// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1922/// original 32-bit source value (to be inserted in the low part of the combined
1923/// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1924/// value.
1926 Register Hi32Reg, Register Lo32Reg,
1927 unsigned ExtOpc,
1928 const RegisterBank &RegBank,
1929 bool IsBooleanSrc = false) {
1930 if (ExtOpc == AMDGPU::G_ZEXT) {
1931 B.buildConstant(Hi32Reg, 0);
1932 } else if (ExtOpc == AMDGPU::G_SEXT) {
1933 if (IsBooleanSrc) {
1934 // If we know the original source was an s1, the high half is the same as
1935 // the low.
1936 B.buildCopy(Hi32Reg, Lo32Reg);
1937 } else {
1938 // Replicate sign bit from 32-bit extended part.
1939 auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1940 B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1941 B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1942 }
1943 } else {
1944 assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1945 B.buildUndef(Hi32Reg);
1946 }
1947}
1948
1949bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1951 const OperandsMapper &OpdMapper) const {
1952 MachineRegisterInfo &MRI = *B.getMRI();
1953
1954 Register VecReg = MI.getOperand(1).getReg();
1955 Register Idx = MI.getOperand(2).getReg();
1956
1957 const RegisterBank &IdxBank =
1958 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1959
1960 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1961
1962 LLT VecTy = MRI.getType(VecReg);
1963 unsigned EltSize = VecTy.getScalarSizeInBits();
1964 unsigned NumElem = VecTy.getNumElements();
1965
1966 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1967 IsDivergentIdx, &Subtarget))
1968 return false;
1969
1970 LLT S32 = LLT::scalar(32);
1971
1972 const RegisterBank &DstBank =
1973 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1974 const RegisterBank &SrcBank =
1975 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1976
1977 const RegisterBank &CCBank =
1978 (DstBank == AMDGPU::SGPRRegBank &&
1979 SrcBank == AMDGPU::SGPRRegBank &&
1980 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1981 : AMDGPU::VCCRegBank;
1982 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1983
1984 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1985 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1986 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1987 }
1988
1989 LLT EltTy = VecTy.getScalarType();
1990 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1991 unsigned NumLanes = DstRegs.size();
1992 if (!NumLanes)
1993 NumLanes = 1;
1994 else
1995 EltTy = MRI.getType(DstRegs[0]);
1996
1997 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1998 SmallVector<Register, 2> Res(NumLanes);
1999 for (unsigned L = 0; L < NumLanes; ++L)
2000 Res[L] = UnmergeToEltTy.getReg(L);
2001
2002 for (unsigned I = 1; I < NumElem; ++I) {
2003 auto IC = B.buildConstant(S32, I);
2004 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2005 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2006 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2007
2008 for (unsigned L = 0; L < NumLanes; ++L) {
2009 auto S = B.buildSelect(EltTy, Cmp,
2010 UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
2011
2012 for (unsigned N : { 0, 2, 3 })
2013 MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
2014
2015 Res[L] = S->getOperand(0).getReg();
2016 }
2017 }
2018
2019 for (unsigned L = 0; L < NumLanes; ++L) {
2020 Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
2021 B.buildCopy(DstReg, Res[L]);
2022 MRI.setRegBank(DstReg, DstBank);
2023 }
2024
2025 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2026 MI.eraseFromParent();
2027
2028 return true;
2029}
2030
2031// Insert a cross regbank copy for a register if it already has a bank that
2032// differs from the one we want to set.
2035 const RegisterBank &Bank) {
2036 const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2037 if (CurrBank && *CurrBank != Bank) {
2038 Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2039 MRI.setRegBank(Copy, Bank);
2040 return Copy;
2041 }
2042
2043 MRI.setRegBank(Reg, Bank);
2044 return Reg;
2045}
2046
2047bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2049 const OperandsMapper &OpdMapper) const {
2050
2051 MachineRegisterInfo &MRI = *B.getMRI();
2052 Register VecReg = MI.getOperand(1).getReg();
2053 Register Idx = MI.getOperand(3).getReg();
2054
2055 const RegisterBank &IdxBank =
2056 *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2057
2058 bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2059
2060 LLT VecTy = MRI.getType(VecReg);
2061 unsigned EltSize = VecTy.getScalarSizeInBits();
2062 unsigned NumElem = VecTy.getNumElements();
2063
2064 if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2065 IsDivergentIdx, &Subtarget))
2066 return false;
2067
2068 LLT S32 = LLT::scalar(32);
2069
2070 const RegisterBank &DstBank =
2071 *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2072 const RegisterBank &SrcBank =
2073 *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2074 const RegisterBank &InsBank =
2075 *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2076
2077 const RegisterBank &CCBank =
2078 (DstBank == AMDGPU::SGPRRegBank &&
2079 SrcBank == AMDGPU::SGPRRegBank &&
2080 InsBank == AMDGPU::SGPRRegBank &&
2081 IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2082 : AMDGPU::VCCRegBank;
2083 LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2084
2085 if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2086 Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2087 MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2088 }
2089
2090 LLT EltTy = VecTy.getScalarType();
2091 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2092 unsigned NumLanes = InsRegs.size();
2093 if (!NumLanes) {
2094 NumLanes = 1;
2095 InsRegs.push_back(MI.getOperand(2).getReg());
2096 } else {
2097 EltTy = MRI.getType(InsRegs[0]);
2098 }
2099
2100 auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2101 SmallVector<Register, 16> Ops(NumElem * NumLanes);
2102
2103 for (unsigned I = 0; I < NumElem; ++I) {
2104 auto IC = B.buildConstant(S32, I);
2105 MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2106 auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2107 MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2108
2109 for (unsigned L = 0; L < NumLanes; ++L) {
2110 Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2111 Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2112 Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2113
2114 Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2115 MRI.setRegBank(Select, DstBank);
2116
2117 Ops[I * NumLanes + L] = Select;
2118 }
2119 }
2120
2121 LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2122 if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2123 B.buildBuildVector(MI.getOperand(0), Ops);
2124 } else {
2125 auto Vec = B.buildBuildVector(MergeTy, Ops);
2126 MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2127 B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2128 }
2129
2130 MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2131 MI.eraseFromParent();
2132
2133 return true;
2134}
2135
2136// Break s_mul_u64 into 32-bit vector operations.
2138 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2139 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2140 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2141 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2142
2143 // All inputs are SGPRs, nothing special to do.
2144 if (DefRegs.empty()) {
2145 assert(Src0Regs.empty() && Src1Regs.empty());
2146 applyDefaultMapping(OpdMapper);
2147 return;
2148 }
2149
2150 assert(DefRegs.size() == 2);
2151 assert(Src0Regs.size() == Src1Regs.size() &&
2152 (Src0Regs.empty() || Src0Regs.size() == 2));
2153
2154 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2155 MachineInstr &MI = OpdMapper.getMI();
2156 Register DstReg = MI.getOperand(0).getReg();
2157 LLT HalfTy = LLT::scalar(32);
2158
2159 // Depending on where the source registers came from, the generic code may
2160 // have decided to split the inputs already or not. If not, we still need to
2161 // extract the values.
2162
2163 if (Src0Regs.empty())
2164 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2165 else
2166 setRegsToType(MRI, Src0Regs, HalfTy);
2167
2168 if (Src1Regs.empty())
2169 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2170 else
2171 setRegsToType(MRI, Src1Regs, HalfTy);
2172
2173 setRegsToType(MRI, DefRegs, HalfTy);
2174
2175 // The multiplication is done as follows:
2176 //
2177 // Op1H Op1L
2178 // * Op0H Op0L
2179 // --------------------
2180 // Op1H*Op0L Op1L*Op0L
2181 // + Op1H*Op0H Op1L*Op0H
2182 // -----------------------------------------
2183 // (Op1H*Op0L + Op1L*Op0H + carry) Op1L*Op0L
2184 //
2185 // We drop Op1H*Op0H because the result of the multiplication is a 64-bit
2186 // value and that would overflow.
2187 // The low 32-bit value is Op1L*Op0L.
2188 // The high 32-bit value is Op1H*Op0L + Op1L*Op0H + carry (from
2189 // Op1L*Op0L).
2190
2191 ApplyRegBankMapping ApplyBank(B, *this, MRI, &AMDGPU::VGPRRegBank);
2192
2193 Register Hi = B.buildUMulH(HalfTy, Src0Regs[0], Src1Regs[0]).getReg(0);
2194 Register MulLoHi = B.buildMul(HalfTy, Src0Regs[0], Src1Regs[1]).getReg(0);
2195 Register Add = B.buildAdd(HalfTy, Hi, MulLoHi).getReg(0);
2196 Register MulHiLo = B.buildMul(HalfTy, Src0Regs[1], Src1Regs[0]).getReg(0);
2197 B.buildAdd(DefRegs[1], Add, MulHiLo);
2198 B.buildMul(DefRegs[0], Src0Regs[0], Src1Regs[0]);
2199
2200 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2201 MI.eraseFromParent();
2202}
2203
2205 MachineIRBuilder &B, const OperandsMapper &OpdMapper) const {
2206 MachineInstr &MI = OpdMapper.getMI();
2207 B.setInstrAndDebugLoc(MI);
2208 unsigned Opc = MI.getOpcode();
2209 MachineRegisterInfo &MRI = OpdMapper.getMRI();
2210 switch (Opc) {
2211 case AMDGPU::G_CONSTANT:
2212 case AMDGPU::G_IMPLICIT_DEF: {
2213 Register DstReg = MI.getOperand(0).getReg();
2214 LLT DstTy = MRI.getType(DstReg);
2215 if (DstTy != LLT::scalar(1))
2216 break;
2217
2218 const RegisterBank *DstBank =
2219 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2220 if (DstBank == &AMDGPU::VCCRegBank)
2221 break;
2222 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2223 if (DefRegs.empty())
2224 DefRegs.push_back(DstReg);
2225
2226 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2227
2228 Register NewDstReg = MRI.createGenericVirtualRegister(LLT::scalar(32));
2229 LLVMContext &Ctx = B.getMF().getFunction().getContext();
2230
2231 MI.getOperand(0).setReg(NewDstReg);
2232 if (Opc != AMDGPU::G_IMPLICIT_DEF) {
2233 uint64_t ConstVal = MI.getOperand(1).getCImm()->getZExtValue();
2234 MI.getOperand(1).setCImm(
2235 ConstantInt::get(IntegerType::getInt32Ty(Ctx), ConstVal));
2236 }
2237
2238 MRI.setRegBank(NewDstReg, *DstBank);
2239 B.buildTrunc(DefRegs[0], NewDstReg);
2240 return;
2241 }
2242 case AMDGPU::G_PHI: {
2243 Register DstReg = MI.getOperand(0).getReg();
2244 LLT DstTy = MRI.getType(DstReg);
2245 if (DstTy != LLT::scalar(1))
2246 break;
2247
2248 const LLT S32 = LLT::scalar(32);
2249 const RegisterBank *DstBank =
2250 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2251 if (DstBank == &AMDGPU::VCCRegBank) {
2252 applyDefaultMapping(OpdMapper);
2253 // The standard handling only considers the result register bank for
2254 // phis. For VCC, blindly inserting a copy when the phi is lowered will
2255 // produce an invalid copy. We can only copy with some kind of compare to
2256 // get a vector boolean result. Insert a register bank copy that will be
2257 // correctly lowered to a compare.
2258 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2259 Register SrcReg = MI.getOperand(I).getReg();
2260 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2261
2262 if (SrcBank != &AMDGPU::VCCRegBank) {
2263 MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2264 B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2265
2266 auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2267 MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2268 MI.getOperand(I).setReg(Copy.getReg(0));
2269 }
2270 }
2271
2272 return;
2273 }
2274
2275 // Phi handling is strange and only considers the bank of the destination.
2276 substituteSimpleCopyRegs(OpdMapper, 0);
2277
2278 // Promote SGPR/VGPR booleans to s32
2279 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2280 B.setInsertPt(B.getMBB(), MI);
2281 LegalizerHelper Helper(B.getMF(), ApplyBank, B);
2282
2283 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2284 llvm_unreachable("widen scalar should have succeeded");
2285
2286 return;
2287 }
2288 case AMDGPU::G_FCMP:
2290 break;
2291 [[fallthrough]];
2292 case AMDGPU::G_ICMP:
2293 case AMDGPU::G_UADDO:
2294 case AMDGPU::G_USUBO:
2295 case AMDGPU::G_UADDE:
2296 case AMDGPU::G_SADDE:
2297 case AMDGPU::G_USUBE:
2298 case AMDGPU::G_SSUBE: {
2299 unsigned BoolDstOp =
2300 (Opc == AMDGPU::G_ICMP || Opc == AMDGPU::G_FCMP) ? 0 : 1;
2301 Register DstReg = MI.getOperand(BoolDstOp).getReg();
2302
2303 const RegisterBank *DstBank =
2304 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2305 if (DstBank != &AMDGPU::SGPRRegBank)
2306 break;
2307
2308 const bool HasCarryIn = MI.getNumOperands() == 5;
2309
2310 // If this is a scalar compare, promote the result to s32, as the selection
2311 // will end up using a copy to a 32-bit vreg.
2312 const LLT S32 = LLT::scalar(32);
2313 Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2314 MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2315 MI.getOperand(BoolDstOp).setReg(NewDstReg);
2316
2317 if (HasCarryIn) {
2318 Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2319 MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2320 B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2321 MI.getOperand(4).setReg(NewSrcReg);
2322 }
2323
2324 MachineBasicBlock *MBB = MI.getParent();
2325 B.setInsertPt(*MBB, std::next(MI.getIterator()));
2326
2327 // If we had a constrained VCC result register, a copy was inserted to VCC
2328 // from SGPR.
2329 SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2330 if (DefRegs.empty())
2331 DefRegs.push_back(DstReg);
2332 B.buildTrunc(DefRegs[0], NewDstReg);
2333 return;
2334 }
2335 case AMDGPU::G_SELECT: {
2336 Register DstReg = MI.getOperand(0).getReg();
2337 LLT DstTy = MRI.getType(DstReg);
2338
2339 SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2340 if (CondRegs.empty())
2341 CondRegs.push_back(MI.getOperand(1).getReg());
2342 else {
2343 assert(CondRegs.size() == 1);
2344 }
2345
2346 const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2347 if (CondBank == &AMDGPU::SGPRRegBank) {
2348 const LLT S32 = LLT::scalar(32);
2349 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2350 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2351
2352 MI.getOperand(1).setReg(NewCondReg);
2353 B.buildZExt(NewCondReg, CondRegs[0]);
2354 }
2355
2356 if (DstTy.getSizeInBits() != 64)
2357 break;
2358
2359 LLT HalfTy = getHalfSizedType(DstTy);
2360
2361 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2362 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2363 SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2364
2365 // All inputs are SGPRs, nothing special to do.
2366 if (DefRegs.empty()) {
2367 assert(Src1Regs.empty() && Src2Regs.empty());
2368 break;
2369 }
2370
2371 if (Src1Regs.empty())
2372 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2373 else {
2374 setRegsToType(MRI, Src1Regs, HalfTy);
2375 }
2376
2377 if (Src2Regs.empty())
2378 split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2379 else
2380 setRegsToType(MRI, Src2Regs, HalfTy);
2381
2382 setRegsToType(MRI, DefRegs, HalfTy);
2383
2384 auto Flags = MI.getFlags();
2385 B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0], Flags);
2386 B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1], Flags);
2387
2388 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2389 MI.eraseFromParent();
2390 return;
2391 }
2392 case AMDGPU::G_BRCOND: {
2393 Register CondReg = MI.getOperand(0).getReg();
2394 // FIXME: Should use legalizer helper, but should change bool ext type.
2395 const RegisterBank *CondBank =
2396 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2397
2398 if (CondBank == &AMDGPU::SGPRRegBank) {
2399 const LLT S32 = LLT::scalar(32);
2400 Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2401 MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2402
2403 MI.getOperand(0).setReg(NewCondReg);
2404 B.buildZExt(NewCondReg, CondReg);
2405 return;
2406 }
2407
2408 break;
2409 }
2410 case AMDGPU::G_AND:
2411 case AMDGPU::G_OR:
2412 case AMDGPU::G_XOR: {
2413 // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2414 // there is a VGPR input.
2415 Register DstReg = MI.getOperand(0).getReg();
2416 LLT DstTy = MRI.getType(DstReg);
2417
2418 const RegisterBank *DstBank =
2419 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2420
2421 if (DstTy.getSizeInBits() == 1) {
2422 if (DstBank == &AMDGPU::VCCRegBank)
2423 break;
2424
2425 MachineFunction *MF = MI.getParent()->getParent();
2426 ApplyRegBankMapping ApplyBank(B, *this, MRI, DstBank);
2427 LegalizerHelper Helper(*MF, ApplyBank, B);
2428
2429 if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2431 llvm_unreachable("widen scalar should have succeeded");
2432 return;
2433 }
2434
2435 if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
2436 const LLT S32 = LLT::scalar(32);
2437 MachineBasicBlock *MBB = MI.getParent();
2438 MachineFunction *MF = MBB->getParent();
2439 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2440 LegalizerHelper Helper(*MF, ApplySALU, B);
2441 // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
2442 // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
2443 // as "not".
2444 if (MI.getOpcode() == AMDGPU::G_XOR &&
2445 mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) {
2446 Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
2447 Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
2448 Helper.widenScalarDst(MI, S32);
2449 } else {
2450 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2451 llvm_unreachable("widen scalar should have succeeded");
2452 }
2453 return;
2454 }
2455
2456 if (DstTy.getSizeInBits() != 64)
2457 break;
2458
2459 LLT HalfTy = getHalfSizedType(DstTy);
2460 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2461 SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2462 SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2463
2464 // All inputs are SGPRs, nothing special to do.
2465 if (DefRegs.empty()) {
2466 assert(Src0Regs.empty() && Src1Regs.empty());
2467 break;
2468 }
2469
2470 assert(DefRegs.size() == 2);
2471 assert(Src0Regs.size() == Src1Regs.size() &&
2472 (Src0Regs.empty() || Src0Regs.size() == 2));
2473
2474 // Depending on where the source registers came from, the generic code may
2475 // have decided to split the inputs already or not. If not, we still need to
2476 // extract the values.
2477
2478 if (Src0Regs.empty())
2479 split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2480 else
2481 setRegsToType(MRI, Src0Regs, HalfTy);
2482
2483 if (Src1Regs.empty())
2484 split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2485 else
2486 setRegsToType(MRI, Src1Regs, HalfTy);
2487
2488 setRegsToType(MRI, DefRegs, HalfTy);
2489
2490 auto Flags = MI.getFlags();
2491 B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]}, Flags);
2492 B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]}, Flags);
2493
2494 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2495 MI.eraseFromParent();
2496 return;
2497 }
2498 case AMDGPU::G_ABS: {
2499 Register SrcReg = MI.getOperand(1).getReg();
2500 const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2501
2502 // There is no VALU abs instruction so we need to replace it with a sub and
2503 // max combination.
2504 if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2505 MachineFunction *MF = MI.getParent()->getParent();
2506 ApplyRegBankMapping Apply(B, *this, MRI, &AMDGPU::VGPRRegBank);
2507 LegalizerHelper Helper(*MF, Apply, B);
2508
2510 llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2511 return;
2512 }
2513 [[fallthrough]];
2514 }
2515 case AMDGPU::G_ADD:
2516 case AMDGPU::G_SUB:
2517 case AMDGPU::G_MUL:
2518 case AMDGPU::G_SHL:
2519 case AMDGPU::G_LSHR:
2520 case AMDGPU::G_ASHR:
2521 case AMDGPU::G_SMIN:
2522 case AMDGPU::G_SMAX:
2523 case AMDGPU::G_UMIN:
2524 case AMDGPU::G_UMAX: {
2525 Register DstReg = MI.getOperand(0).getReg();
2526 LLT DstTy = MRI.getType(DstReg);
2527
2528 // Special case for s_mul_u64. There is not a vector equivalent of
2529 // s_mul_u64. Hence, we have to break down s_mul_u64 into 32-bit vector
2530 // multiplications.
2531 if (!Subtarget.hasVectorMulU64() && Opc == AMDGPU::G_MUL &&
2532 DstTy.getSizeInBits() == 64) {
2533 applyMappingSMULU64(B, OpdMapper);
2534 return;
2535 }
2536
2537 // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2538 // Packed 16-bit operations need to be scalarized and promoted.
2539 if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2540 break;
2541
2542 const RegisterBank *DstBank =
2543 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2544 if (DstBank == &AMDGPU::VGPRRegBank)
2545 break;
2546
2547 const LLT S32 = LLT::scalar(32);
2548 MachineBasicBlock *MBB = MI.getParent();
2549 MachineFunction *MF = MBB->getParent();
2550 ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
2551
2552 if (DstTy.isVector() && Opc == AMDGPU::G_ABS) {
2553 Register WideSrcLo, WideSrcHi;
2554
2555 std::tie(WideSrcLo, WideSrcHi) =
2556 unpackV2S16ToS32(B, MI.getOperand(1).getReg(), TargetOpcode::G_SEXT);
2557 auto Lo = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcLo});
2558 auto Hi = B.buildInstr(AMDGPU::G_ABS, {S32}, {WideSrcHi});
2559 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2560 MI.eraseFromParent();
2561 return;
2562 }
2563
2564 if (DstTy.isVector()) {
2565 Register WideSrc0Lo, WideSrc0Hi;
2566 Register WideSrc1Lo, WideSrc1Hi;
2567
2568 unsigned ExtendOp = getExtendOp(MI.getOpcode());
2569 std::tie(WideSrc0Lo, WideSrc0Hi)
2570 = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2571 std::tie(WideSrc1Lo, WideSrc1Hi)
2572 = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2573 auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2574 auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2575 B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2576 MI.eraseFromParent();
2577 } else {
2578 LegalizerHelper Helper(*MF, ApplySALU, B);
2579
2580 if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2581 llvm_unreachable("widen scalar should have succeeded");
2582
2583 // FIXME: s16 shift amounts should be legal.
2584 if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2585 Opc == AMDGPU::G_ASHR) {
2586 B.setInsertPt(*MBB, MI.getIterator());
2587 if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2588 llvm_unreachable("widen scalar should have succeeded");
2589 }
2590 }
2591
2592 return;
2593 }
2594 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
2595 case AMDGPU::G_AMDGPU_S_MUL_U64_U32: {
2596 // This is a special case for s_mul_u64. We use
2597 // G_AMDGPU_S_MUL_I64_I32 opcode to represent an s_mul_u64 operation
2598 // where the 33 higher bits are sign-extended and
2599 // G_AMDGPU_S_MUL_U64_U32 opcode to represent an s_mul_u64 operation
2600 // where the 32 higher bits are zero-extended. In case scalar registers are
2601 // selected, both opcodes are lowered as s_mul_u64. If the vector registers
2602 // are selected, then G_AMDGPU_S_MUL_I64_I32 and
2603 // G_AMDGPU_S_MUL_U64_U32 are lowered with a vector mad instruction.
2604
2605 // Insert basic copies.
2606 applyDefaultMapping(OpdMapper);
2607
2608 Register DstReg = MI.getOperand(0).getReg();
2609 Register SrcReg0 = MI.getOperand(1).getReg();
2610 Register SrcReg1 = MI.getOperand(2).getReg();
2611 const LLT S32 = LLT::scalar(32);
2612 const LLT S64 = LLT::scalar(64);
2613 assert(MRI.getType(DstReg) == S64 && "This is a special case for s_mul_u64 "
2614 "that handles only 64-bit operands.");
2615 const RegisterBank *DstBank =
2616 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2617
2618 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2619 // with s_mul_u64 operation.
2620 if (DstBank == &AMDGPU::SGPRRegBank) {
2621 MI.setDesc(TII->get(AMDGPU::S_MUL_U64));
2622 MRI.setRegClass(DstReg, &AMDGPU::SGPR_64RegClass);
2623 MRI.setRegClass(SrcReg0, &AMDGPU::SGPR_64RegClass);
2624 MRI.setRegClass(SrcReg1, &AMDGPU::SGPR_64RegClass);
2625 return;
2626 }
2627
2628 // Replace G_AMDGPU_S_MUL_I64_I32 and G_AMDGPU_S_MUL_U64_U32
2629 // with a vector mad.
2630 assert(MRI.getRegBankOrNull(DstReg) == &AMDGPU::VGPRRegBank &&
2631 "The destination operand should be in vector registers.");
2632
2633 // Extract the lower subregister from the first operand.
2634 Register Op0L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2635 MRI.setRegClass(Op0L, &AMDGPU::VGPR_32RegClass);
2636 MRI.setType(Op0L, S32);
2637 B.buildTrunc(Op0L, SrcReg0);
2638
2639 // Extract the lower subregister from the second operand.
2640 Register Op1L = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
2641 MRI.setRegClass(Op1L, &AMDGPU::VGPR_32RegClass);
2642 MRI.setType(Op1L, S32);
2643 B.buildTrunc(Op1L, SrcReg1);
2644
2645 unsigned NewOpc = Opc == AMDGPU::G_AMDGPU_S_MUL_U64_U32
2646 ? AMDGPU::G_AMDGPU_MAD_U64_U32
2647 : AMDGPU::G_AMDGPU_MAD_I64_I32;
2648
2650 Register Zero64 = B.buildConstant(S64, 0).getReg(0);
2651 MRI.setRegClass(Zero64, &AMDGPU::VReg_64RegClass);
2652 Register CarryOut = MRI.createVirtualRegister(&AMDGPU::VReg_64RegClass);
2653 MRI.setRegClass(CarryOut, &AMDGPU::VReg_64RegClass);
2654 B.buildInstr(NewOpc, {DstReg, CarryOut}, {Op0L, Op1L, Zero64});
2655 MI.eraseFromParent();
2656 return;
2657 }
2658 case AMDGPU::G_SEXT_INREG: {
2659 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2660 if (SrcRegs.empty())
2661 break; // Nothing to repair
2662
2663 const LLT S32 = LLT::scalar(32);
2664 ApplyRegBankMapping O(B, *this, MRI, &AMDGPU::VGPRRegBank);
2665
2666 // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2667 // we would need to further expand, and doesn't let us directly set the
2668 // result registers.
2669 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2670
2671 int Amt = MI.getOperand(2).getImm();
2672 if (Amt <= 32) {
2673 // Downstream users have expectations for the high bit behavior, so freeze
2674 // incoming undefined bits.
2675 if (Amt == 32) {
2676 // The low bits are unchanged.
2677 B.buildFreeze(DstRegs[0], SrcRegs[0]);
2678 } else {
2679 auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2680 // Extend in the low bits and propagate the sign bit to the high half.
2681 B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2682 }
2683
2684 B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2685 } else {
2686 // The low bits are unchanged, and extend in the high bits.
2687 // No freeze required
2688 B.buildCopy(DstRegs[0], SrcRegs[0]);
2689 B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2690 }
2691
2692 Register DstReg = MI.getOperand(0).getReg();
2693 MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2694 MI.eraseFromParent();
2695 return;
2696 }
2697 case AMDGPU::G_CTPOP:
2698 case AMDGPU::G_BITREVERSE: {
2699 const RegisterBank *DstBank =
2700 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2701 if (DstBank == &AMDGPU::SGPRRegBank)
2702 break;
2703
2704 Register SrcReg = MI.getOperand(1).getReg();
2705 const LLT S32 = LLT::scalar(32);
2706 LLT Ty = MRI.getType(SrcReg);
2707 if (Ty == S32)
2708 break;
2709
2710 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2711
2712 MachineFunction &MF = B.getMF();
2713 LegalizerHelper Helper(MF, ApplyVALU, B);
2714
2715 if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2716 llvm_unreachable("narrowScalar should have succeeded");
2717 return;
2718 }
2719 case AMDGPU::G_AMDGPU_FFBH_U32:
2720 case AMDGPU::G_AMDGPU_FFBL_B32:
2721 case AMDGPU::G_CTLZ_ZERO_UNDEF:
2722 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2723 const RegisterBank *DstBank =
2724 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2725 if (DstBank == &AMDGPU::SGPRRegBank)
2726 break;
2727
2728 Register SrcReg = MI.getOperand(1).getReg();
2729 const LLT S32 = LLT::scalar(32);
2730 LLT Ty = MRI.getType(SrcReg);
2731 if (Ty == S32)
2732 break;
2733
2734 // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2735 // which return -1 when the input is zero:
2736 // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2737 // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2738 // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2739 // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2740 ApplyRegBankMapping ApplyVALU(B, *this, MRI, &AMDGPU::VGPRRegBank);
2741 SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2742 unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2743 ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2744 : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2745 ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2746 : Opc;
2747 unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2748 auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2749 auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2750 unsigned AddOpc =
2751 Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2752 ? AMDGPU::G_ADD
2753 : AMDGPU::G_UADDSAT;
2754 Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2755 Register DstReg = MI.getOperand(0).getReg();
2756 B.buildUMin(DstReg, X, Y);
2757 MI.eraseFromParent();
2758 return;
2759 }
2760 case AMDGPU::G_SEXT:
2761 case AMDGPU::G_ZEXT:
2762 case AMDGPU::G_ANYEXT: {
2763 Register SrcReg = MI.getOperand(1).getReg();
2764 LLT SrcTy = MRI.getType(SrcReg);
2765 const bool Signed = Opc == AMDGPU::G_SEXT;
2766
2767 assert(OpdMapper.getVRegs(1).empty());
2768
2769 const RegisterBank *SrcBank =
2770 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2771
2772 Register DstReg = MI.getOperand(0).getReg();
2773 LLT DstTy = MRI.getType(DstReg);
2774 if (DstTy.isScalar() &&
2775 SrcBank != &AMDGPU::SGPRRegBank &&
2776 SrcBank != &AMDGPU::VCCRegBank &&
2777 // FIXME: Should handle any type that round to s64 when irregular
2778 // breakdowns supported.
2779 DstTy.getSizeInBits() == 64 &&
2780 SrcTy.getSizeInBits() <= 32) {
2781 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2782
2783 // Extend to 32-bit, and then extend the low half.
2784 if (Signed) {
2785 // TODO: Should really be buildSExtOrCopy
2786 B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2787 } else if (Opc == AMDGPU::G_ZEXT) {
2788 B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2789 } else {
2790 B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2791 }
2792
2793 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2794 MRI.setRegBank(DstReg, *SrcBank);
2795 MI.eraseFromParent();
2796 return;
2797 }
2798
2799 if (SrcTy != LLT::scalar(1))
2800 return;
2801
2802 // It is not legal to have a legalization artifact with a VCC source. Rather
2803 // than introducing a copy, insert the select we would have to select the
2804 // copy to.
2805 if (SrcBank == &AMDGPU::VCCRegBank) {
2806 SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2807
2808 const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2809
2810 unsigned DstSize = DstTy.getSizeInBits();
2811 // 64-bit select is SGPR only
2812 const bool UseSel64 = DstSize > 32 &&
2813 SrcBank->getID() == AMDGPU::SGPRRegBankID;
2814
2815 // TODO: Should s16 select be legal?
2816 LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2817 auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2818 auto False = B.buildConstant(SelType, 0);
2819
2820 MRI.setRegBank(True.getReg(0), *DstBank);
2821 MRI.setRegBank(False.getReg(0), *DstBank);
2822 MRI.setRegBank(DstReg, *DstBank);
2823
2824 if (DstSize > 32) {
2825 B.buildSelect(DefRegs[0], SrcReg, True, False);
2826 extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2827 } else if (DstSize < 32) {
2828 auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2829 MRI.setRegBank(Sel.getReg(0), *DstBank);
2830 B.buildTrunc(DstReg, Sel);
2831 } else {
2832 B.buildSelect(DstReg, SrcReg, True, False);
2833 }
2834
2835 MI.eraseFromParent();
2836 return;
2837 }
2838
2839 break;
2840 }
2841 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2842 SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2843
2844 assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2845
2846 Register DstReg = MI.getOperand(0).getReg();
2847 Register SrcReg = MI.getOperand(1).getReg();
2848
2849 const LLT S32 = LLT::scalar(32);
2850 LLT DstTy = MRI.getType(DstReg);
2851 LLT SrcTy = MRI.getType(SrcReg);
2852
2853 if (foldExtractEltToCmpSelect(B, MI, OpdMapper))
2854 return;
2855
2856 const ValueMapping &DstMapping
2857 = OpdMapper.getInstrMapping().getOperandMapping(0);
2858 const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2859 const RegisterBank *SrcBank =
2860 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2861 const RegisterBank *IdxBank =
2862 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2863
2864 Register BaseIdxReg;
2865 unsigned ConstOffset;
2866 std::tie(BaseIdxReg, ConstOffset) =
2867 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2868
2869 // See if the index is an add of a constant which will be foldable by moving
2870 // the base register of the index later if this is going to be executed in a
2871 // waterfall loop. This is essentially to reassociate the add of a constant
2872 // with the readfirstlane.
2873 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2874 ConstOffset > 0 &&
2875 ConstOffset < SrcTy.getNumElements();
2876
2877 // Move the base register. We'll re-insert the add later.
2878 if (ShouldMoveIndexIntoLoop)
2879 MI.getOperand(2).setReg(BaseIdxReg);
2880
2881 // If this is a VGPR result only because the index was a VGPR result, the
2882 // actual indexing will be done on the SGPR source vector, which will
2883 // produce a scalar result. We need to copy to the VGPR result inside the
2884 // waterfall loop.
2885 const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2886 SrcBank == &AMDGPU::SGPRRegBank;
2887 if (DstRegs.empty()) {
2888 applyDefaultMapping(OpdMapper);
2889
2891
2892 if (NeedCopyToVGPR) {
2893 // We don't want a phi for this temporary reg.
2894 Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2895 MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2896 MI.getOperand(0).setReg(TmpReg);
2897 B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2898
2899 // Use a v_mov_b32 here to make the exec dependency explicit.
2900 buildVCopy(B, DstReg, TmpReg);
2901 }
2902
2903 // Re-insert the constant offset add inside the waterfall loop.
2904 if (ShouldMoveIndexIntoLoop)
2905 reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2906
2907 return;
2908 }
2909
2910 assert(DstTy.getSizeInBits() == 64);
2911
2912 LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2913
2914 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2915 auto One = B.buildConstant(S32, 1);
2916
2917 MachineBasicBlock::iterator MII = MI.getIterator();
2918
2919 // Split the vector index into 32-bit pieces. Prepare to move all of the
2920 // new instructions into a waterfall loop if necessary.
2921 //
2922 // Don't put the bitcast or constant in the loop.
2923 MachineInstrSpan Span(MII, &B.getMBB());
2924
2925 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2926 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2927 auto IdxHi = B.buildAdd(S32, IdxLo, One);
2928
2929 auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2930 auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2931
2932 MRI.setRegBank(DstReg, *DstBank);
2933 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2934 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2935 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2936 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2937
2938 SmallSet<Register, 4> OpsToWaterfall;
2939 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2940 MI.eraseFromParent();
2941 return;
2942 }
2943
2944 // Remove the original instruction to avoid potentially confusing the
2945 // waterfall loop logic.
2946 B.setInstr(*Span.begin());
2947 MI.eraseFromParent();
2948 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2949 OpsToWaterfall);
2950
2951 if (NeedCopyToVGPR) {
2952 MachineBasicBlock *LoopBB = Extract1->getParent();
2953 Register TmpReg0 = MRI.createGenericVirtualRegister(S32);
2954 Register TmpReg1 = MRI.createGenericVirtualRegister(S32);
2955 MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2956 MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2957
2958 Extract0->getOperand(0).setReg(TmpReg0);
2959 Extract1->getOperand(0).setReg(TmpReg1);
2960
2961 B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2962
2963 buildVCopy(B, DstRegs[0], TmpReg0);
2964 buildVCopy(B, DstRegs[1], TmpReg1);
2965 }
2966
2967 if (ShouldMoveIndexIntoLoop)
2968 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2969
2970 return;
2971 }
2972 case AMDGPU::G_INSERT_VECTOR_ELT: {
2973 SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2974
2975 Register DstReg = MI.getOperand(0).getReg();
2976 LLT VecTy = MRI.getType(DstReg);
2977
2978 assert(OpdMapper.getVRegs(0).empty());
2979 assert(OpdMapper.getVRegs(3).empty());
2980
2981 if (substituteSimpleCopyRegs(OpdMapper, 1))
2982 MRI.setType(MI.getOperand(1).getReg(), VecTy);
2983
2984 if (foldInsertEltToCmpSelect(B, MI, OpdMapper))
2985 return;
2986
2987 const RegisterBank *IdxBank =
2988 OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2989
2990 Register SrcReg = MI.getOperand(1).getReg();
2991 Register InsReg = MI.getOperand(2).getReg();
2992 LLT InsTy = MRI.getType(InsReg);
2993 (void)InsTy;
2994
2995 Register BaseIdxReg;
2996 unsigned ConstOffset;
2997 std::tie(BaseIdxReg, ConstOffset) =
2998 AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2999
3000 // See if the index is an add of a constant which will be foldable by moving
3001 // the base register of the index later if this is going to be executed in a
3002 // waterfall loop. This is essentially to reassociate the add of a constant
3003 // with the readfirstlane.
3004 bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
3005 ConstOffset > 0 &&
3006 ConstOffset < VecTy.getNumElements();
3007
3008 // Move the base register. We'll re-insert the add later.
3009 if (ShouldMoveIndexIntoLoop)
3010 MI.getOperand(3).setReg(BaseIdxReg);
3011
3012
3013 if (InsRegs.empty()) {
3015
3016 // Re-insert the constant offset add inside the waterfall loop.
3017 if (ShouldMoveIndexIntoLoop) {
3018 reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
3019 }
3020
3021 return;
3022 }
3023
3024 assert(InsTy.getSizeInBits() == 64);
3025
3026 const LLT S32 = LLT::scalar(32);
3027 LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
3028
3029 auto CastSrc = B.buildBitcast(Vec32, SrcReg);
3030 auto One = B.buildConstant(S32, 1);
3031
3032 // Split the vector index into 32-bit pieces. Prepare to move all of the
3033 // new instructions into a waterfall loop if necessary.
3034 //
3035 // Don't put the bitcast or constant in the loop.
3037
3038 // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
3039 auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
3040 auto IdxHi = B.buildAdd(S32, IdxLo, One);
3041
3042 auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
3043 auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
3044
3045 const RegisterBank *DstBank =
3046 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
3047 const RegisterBank *SrcBank =
3048 OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
3049 const RegisterBank *InsSrcBank =
3050 OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
3051
3052 MRI.setRegBank(InsReg, *InsSrcBank);
3053 MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
3054 MRI.setRegBank(InsLo.getReg(0), *DstBank);
3055 MRI.setRegBank(InsHi.getReg(0), *DstBank);
3056 MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
3057 MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
3058 MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
3059
3060
3061 SmallSet<Register, 4> OpsToWaterfall;
3062 if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
3063 B.setInsertPt(B.getMBB(), MI);
3064 B.buildBitcast(DstReg, InsHi);
3065 MI.eraseFromParent();
3066 return;
3067 }
3068
3069 B.setInstr(*Span.begin());
3070 MI.eraseFromParent();
3071
3072 // Figure out the point after the waterfall loop before mangling the control
3073 // flow.
3074 executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
3075 OpsToWaterfall);
3076
3077 // The insertion point is now right after the original instruction.
3078 //
3079 // Keep the bitcast to the original vector type out of the loop. Doing this
3080 // saved an extra phi we don't need inside the loop.
3081 B.buildBitcast(DstReg, InsHi);
3082
3083 // Re-insert the constant offset add inside the waterfall loop.
3084 if (ShouldMoveIndexIntoLoop)
3085 reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
3086
3087 return;
3088 }
3089 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
3090 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
3091 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
3092 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
3093 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
3094 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
3095 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
3096 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
3097 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
3098 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
3099 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
3100 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
3101 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
3102 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
3103 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
3104 case AMDGPU::G_AMDGPU_BUFFER_STORE:
3105 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
3106 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
3107 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
3108 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
3109 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
3110 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
3111 applyDefaultMapping(OpdMapper);
3112 executeInWaterfallLoop(B, MI, {1, 4});
3113 return;
3114 }
3115 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
3116 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
3117 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
3118 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
3119 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
3120 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
3121 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
3122 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
3123 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
3124 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
3125 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
3126 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
3127 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
3128 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
3129 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
3130 applyDefaultMapping(OpdMapper);
3131 executeInWaterfallLoop(B, MI, {2, 5});
3132 return;
3133 }
3134 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
3135 applyDefaultMapping(OpdMapper);
3136 executeInWaterfallLoop(B, MI, {3, 6});
3137 return;
3138 }
3139 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
3140 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
3141 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
3142 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
3143 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
3144 applyMappingSBufferLoad(B, OpdMapper);
3145 return;
3146 }
3147 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
3150 return;
3151 case AMDGPU::G_INTRINSIC:
3152 case AMDGPU::G_INTRINSIC_CONVERGENT: {
3153 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
3154 case Intrinsic::amdgcn_readlane: {
3155 substituteSimpleCopyRegs(OpdMapper, 2);
3156
3157 assert(OpdMapper.getVRegs(0).empty());
3158 assert(OpdMapper.getVRegs(3).empty());
3159
3160 // Make sure the index is an SGPR. It doesn't make sense to run this in a
3161 // waterfall loop, so assume it's a uniform value.
3162 constrainOpWithReadfirstlane(B, MI, 3); // Index
3163 return;
3164 }
3165 case Intrinsic::amdgcn_writelane: {
3166 assert(OpdMapper.getVRegs(0).empty());
3167 assert(OpdMapper.getVRegs(2).empty());
3168 assert(OpdMapper.getVRegs(3).empty());
3169
3170 substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
3171 constrainOpWithReadfirstlane(B, MI, 2); // Source value
3172 constrainOpWithReadfirstlane(B, MI, 3); // Index
3173 return;
3174 }
3175 case Intrinsic::amdgcn_interp_p1:
3176 case Intrinsic::amdgcn_interp_p2:
3177 case Intrinsic::amdgcn_interp_mov:
3178 case Intrinsic::amdgcn_interp_p1_f16:
3179 case Intrinsic::amdgcn_interp_p2_f16:
3180 case Intrinsic::amdgcn_lds_param_load: {
3181 applyDefaultMapping(OpdMapper);
3182
3183 // Readlane for m0 value, which is always the last operand.
3184 // FIXME: Should this be a waterfall loop instead?
3185 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3186 return;
3187 }
3188 case Intrinsic::amdgcn_interp_inreg_p10:
3189 case Intrinsic::amdgcn_interp_inreg_p2:
3190 case Intrinsic::amdgcn_interp_inreg_p10_f16:
3191 case Intrinsic::amdgcn_interp_inreg_p2_f16:
3192 case Intrinsic::amdgcn_interp_p10_rtz_f16:
3193 case Intrinsic::amdgcn_interp_p2_rtz_f16:
3194 case Intrinsic::amdgcn_permlane16_swap:
3195 case Intrinsic::amdgcn_permlane32_swap:
3196 applyDefaultMapping(OpdMapper);
3197 return;
3198 case Intrinsic::amdgcn_permlane16:
3199 case Intrinsic::amdgcn_permlanex16: {
3200 // Doing a waterfall loop over these wouldn't make any sense.
3201 substituteSimpleCopyRegs(OpdMapper, 2);
3202 substituteSimpleCopyRegs(OpdMapper, 3);
3205 return;
3206 }
3207 case Intrinsic::amdgcn_permlane_bcast:
3208 case Intrinsic::amdgcn_permlane_up:
3209 case Intrinsic::amdgcn_permlane_down:
3210 case Intrinsic::amdgcn_permlane_xor:
3211 // Doing a waterfall loop over these wouldn't make any sense.
3214 return;
3215 case Intrinsic::amdgcn_permlane_idx_gen: {
3217 return;
3218 }
3219 case Intrinsic::amdgcn_sbfe:
3220 applyMappingBFE(B, OpdMapper, true);
3221 return;
3222 case Intrinsic::amdgcn_ubfe:
3223 applyMappingBFE(B, OpdMapper, false);
3224 return;
3225 case Intrinsic::amdgcn_inverse_ballot:
3226 case Intrinsic::amdgcn_s_bitreplicate:
3227 case Intrinsic::amdgcn_s_quadmask:
3228 case Intrinsic::amdgcn_s_wqm:
3229 applyDefaultMapping(OpdMapper);
3230 constrainOpWithReadfirstlane(B, MI, 2); // Mask
3231 return;
3232 case Intrinsic::amdgcn_ballot:
3233 // Use default handling and insert copy to vcc source.
3234 break;
3235 }
3236 break;
3237 }
3238 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
3239 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
3240 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
3241 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
3242 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
3243 const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3245 assert(RSrcIntrin && RSrcIntrin->IsImage);
3246 // Non-images can have complications from operands that allow both SGPR
3247 // and VGPR. For now it's too complicated to figure out the final opcode
3248 // to derive the register bank from the MCInstrDesc.
3249 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3250 return;
3251 }
3252 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
3253 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
3254 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
3255 bool IsDualOrBVH8 =
3256 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
3257 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
3258 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
3259 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
3260 applyDefaultMapping(OpdMapper);
3261 executeInWaterfallLoop(B, MI, {LastRegOpIdx});
3262 return;
3263 }
3264 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
3265 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
3266 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
3267 switch (IntrID) {
3268 case Intrinsic::amdgcn_ds_ordered_add:
3269 case Intrinsic::amdgcn_ds_ordered_swap: {
3270 // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3271 assert(OpdMapper.getVRegs(0).empty());
3272 substituteSimpleCopyRegs(OpdMapper, 3);
3274 return;
3275 }
3276 case Intrinsic::amdgcn_ds_gws_init:
3277 case Intrinsic::amdgcn_ds_gws_barrier:
3278 case Intrinsic::amdgcn_ds_gws_sema_br: {
3279 // Only the first lane is executes, so readfirstlane is safe.
3280 substituteSimpleCopyRegs(OpdMapper, 1);
3282 return;
3283 }
3284 case Intrinsic::amdgcn_ds_gws_sema_v:
3285 case Intrinsic::amdgcn_ds_gws_sema_p:
3286 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3287 // Only the first lane is executes, so readfirstlane is safe.
3289 return;
3290 }
3291 case Intrinsic::amdgcn_ds_append:
3292 case Intrinsic::amdgcn_ds_consume: {
3294 return;
3295 }
3296 case Intrinsic::amdgcn_s_sendmsg:
3297 case Intrinsic::amdgcn_s_sendmsghalt: {
3298 // FIXME: Should this use a waterfall loop?
3300 return;
3301 }
3302 case Intrinsic::amdgcn_s_setreg: {
3304 return;
3305 }
3306 case Intrinsic::amdgcn_s_ttracedata:
3308 return;
3309 case Intrinsic::amdgcn_raw_buffer_load_lds:
3310 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
3311 applyDefaultMapping(OpdMapper);
3312 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3314 constrainOpWithReadfirstlane(B, MI, 5); // soffset
3315 return;
3316 }
3317 case Intrinsic::amdgcn_struct_buffer_load_lds:
3318 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
3319 applyDefaultMapping(OpdMapper);
3320 constrainOpWithReadfirstlane(B, MI, 1); // rsrc
3322 constrainOpWithReadfirstlane(B, MI, 6); // soffset
3323 return;
3324 }
3325 case Intrinsic::amdgcn_load_to_lds:
3326 case Intrinsic::amdgcn_global_load_lds: {
3327 applyDefaultMapping(OpdMapper);
3329 return;
3330 }
3331 case Intrinsic::amdgcn_lds_direct_load: {
3332 applyDefaultMapping(OpdMapper);
3333 // Readlane for m0 value, which is always the last operand.
3334 constrainOpWithReadfirstlane(B, MI, MI.getNumOperands() - 1); // Index
3335 return;
3336 }
3337 case Intrinsic::amdgcn_exp_row:
3338 applyDefaultMapping(OpdMapper);
3340 return;
3341 case Intrinsic::amdgcn_s_sleep_var:
3342 assert(OpdMapper.getVRegs(1).empty());
3344 return;
3345 case Intrinsic::amdgcn_s_barrier_join:
3347 return;
3348 case Intrinsic::amdgcn_s_barrier_init:
3349 case Intrinsic::amdgcn_s_barrier_signal_var:
3352 return;
3353 case Intrinsic::amdgcn_s_get_barrier_state:
3354 case Intrinsic::amdgcn_s_get_named_barrier_state: {
3356 return;
3357 }
3358 case Intrinsic::amdgcn_s_prefetch_data: {
3359 Register PtrReg = MI.getOperand(1).getReg();
3360 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3364 } else
3365 MI.eraseFromParent();
3366 return;
3367 }
3368 case Intrinsic::amdgcn_tensor_load_to_lds:
3369 case Intrinsic::amdgcn_tensor_store_from_lds: {
3374 return;
3375 }
3376 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
3377 case Intrinsic::amdgcn_tensor_store_from_lds_d2: {
3380 return;
3381 }
3382 default: {
3383 if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3385 // Non-images can have complications from operands that allow both SGPR
3386 // and VGPR. For now it's too complicated to figure out the final opcode
3387 // to derive the register bank from the MCInstrDesc.
3388 if (RSrcIntrin->IsImage) {
3389 applyMappingImage(B, MI, OpdMapper, RSrcIntrin->RsrcArg);
3390 return;
3391 }
3392 }
3393
3394 break;
3395 }
3396 }
3397 break;
3398 }
3399 case AMDGPU::G_SI_CALL: {
3400 // Use a set to avoid extra readfirstlanes in the case where multiple
3401 // operands are the same register.
3402 SmallSet<Register, 4> SGPROperandRegs;
3403
3404 if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3405 break;
3406
3407 // Move all copies to physical SGPRs that are used by the call instruction
3408 // into the loop block. Start searching for these copies until the
3409 // ADJCALLSTACKUP.
3410 unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3411 unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3412
3413 // Move all non-copies before the copies, so that a complete range can be
3414 // moved into the waterfall loop.
3415 SmallVector<MachineInstr *, 4> NonCopyInstrs;
3416 // Count of NonCopyInstrs found until the current LastCopy.
3417 unsigned NonCopyInstrsLen = 0;
3419 MachineBasicBlock::iterator LastCopy = Start;
3420 MachineBasicBlock *MBB = MI.getParent();
3423 while (Start->getOpcode() != FrameSetupOpcode) {
3424 --Start;
3425 bool IsCopy = false;
3426 if (Start->getOpcode() == AMDGPU::COPY) {
3427 auto &Dst = Start->getOperand(0);
3428 if (Dst.isReg()) {
3429 Register Reg = Dst.getReg();
3430 if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3431 IsCopy = true;
3432 } else {
3433 // Also move the copy from the scratch rsrc descriptor into the loop
3434 // to allow it to be optimized away.
3435 auto &Src = Start->getOperand(1);
3436 if (Src.isReg()) {
3437 Reg = Src.getReg();
3438 IsCopy = Info->getScratchRSrcReg() == Reg;
3439 }
3440 }
3441 }
3442 }
3443
3444 if (IsCopy) {
3445 LastCopy = Start;
3446 NonCopyInstrsLen = NonCopyInstrs.size();
3447 } else {
3448 NonCopyInstrs.push_back(&*Start);
3449 }
3450 }
3451 NonCopyInstrs.resize(NonCopyInstrsLen);
3452
3453 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3454 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3455 }
3456 Start = LastCopy;
3457
3458 // Do the same for copies after the loop
3459 NonCopyInstrs.clear();
3460 NonCopyInstrsLen = 0;
3462 LastCopy = End;
3463 while (End->getOpcode() != FrameDestroyOpcode) {
3464 ++End;
3465 bool IsCopy = false;
3466 if (End->getOpcode() == AMDGPU::COPY) {
3467 auto &Src = End->getOperand(1);
3468 if (Src.isReg()) {
3469 Register Reg = Src.getReg();
3470 IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3471 }
3472 }
3473
3474 if (IsCopy) {
3475 LastCopy = End;
3476 NonCopyInstrsLen = NonCopyInstrs.size();
3477 } else {
3478 NonCopyInstrs.push_back(&*End);
3479 }
3480 }
3481 NonCopyInstrs.resize(NonCopyInstrsLen);
3482
3483 End = LastCopy;
3484 ++LastCopy;
3485 for (auto *NonCopy : reverse(NonCopyInstrs)) {
3486 MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3487 }
3488
3489 ++End;
3490 B.setInsertPt(B.getMBB(), Start);
3491 executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs);
3492 break;
3493 }
3494 case AMDGPU::G_LOAD:
3495 case AMDGPU::G_ZEXTLOAD:
3496 case AMDGPU::G_SEXTLOAD: {
3497 if (applyMappingLoad(B, OpdMapper, MI))
3498 return;
3499 break;
3500 }
3501 case AMDGPU::G_DYN_STACKALLOC:
3502 applyMappingDynStackAlloc(B, OpdMapper, MI);
3503 return;
3504 case AMDGPU::G_STACKRESTORE: {
3505 applyDefaultMapping(OpdMapper);
3507 return;
3508 }
3509 case AMDGPU::G_SBFX:
3510 applyMappingBFE(B, OpdMapper, /*Signed*/ true);
3511 return;
3512 case AMDGPU::G_UBFX:
3513 applyMappingBFE(B, OpdMapper, /*Signed*/ false);
3514 return;
3515 case AMDGPU::G_AMDGPU_MAD_U64_U32:
3516 case AMDGPU::G_AMDGPU_MAD_I64_I32:
3517 applyMappingMAD_64_32(B, OpdMapper);
3518 return;
3519 case AMDGPU::G_PREFETCH: {
3521 MI.eraseFromParent();
3522 return;
3523 }
3524 Register PtrReg = MI.getOperand(0).getReg();
3525 unsigned PtrBank = getRegBankID(PtrReg, MRI, AMDGPU::SGPRRegBankID);
3526 if (PtrBank == AMDGPU::VGPRRegBankID &&
3527 (!Subtarget.hasVmemPrefInsts() || !MI.getOperand(3).getImm())) {
3528 // Cannot do I$ prefetch with divergent pointer.
3529 MI.eraseFromParent();
3530 return;
3531 }
3532 unsigned AS = MRI.getType(PtrReg).getAddressSpace();
3537 !MI.getOperand(3).getImm() /* I$ prefetch */))) {
3538 MI.eraseFromParent();
3539 return;
3540 }
3541 applyDefaultMapping(OpdMapper);
3542 return;
3543 }
3544 default:
3545 break;
3546 }
3547
3548 return applyDefaultMapping(OpdMapper);
3549}
3550
3551// vgpr, sgpr -> vgpr
3552// vgpr, agpr -> vgpr
3553// agpr, agpr -> agpr
3554// agpr, sgpr -> vgpr
3555static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3556 if (RB0 == AMDGPU::InvalidRegBankID)
3557 return RB1;
3558 if (RB1 == AMDGPU::InvalidRegBankID)
3559 return RB0;
3560
3561 if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3562 return AMDGPU::SGPRRegBankID;
3563
3564 if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3565 return AMDGPU::AGPRRegBankID;
3566
3567 return AMDGPU::VGPRRegBankID;
3568}
3569
3570static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3571 if (RB0 == AMDGPU::InvalidRegBankID)
3572 return RB1;
3573 if (RB1 == AMDGPU::InvalidRegBankID)
3574 return RB0;
3575
3576 // vcc, vcc -> vcc
3577 // vcc, sgpr -> vcc
3578 // vcc, vgpr -> vcc
3579 if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3580 return AMDGPU::VCCRegBankID;
3581
3582 // vcc, vgpr -> vgpr
3583 return regBankUnion(RB0, RB1);
3584}
3585
3587 const MachineInstr &MI) const {
3588 unsigned RegBank = AMDGPU::InvalidRegBankID;
3589
3590 for (const MachineOperand &MO : MI.operands()) {
3591 if (!MO.isReg())
3592 continue;
3593 Register Reg = MO.getReg();
3594 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3595 RegBank = regBankUnion(RegBank, Bank->getID());
3596 if (RegBank == AMDGPU::VGPRRegBankID)
3597 break;
3598 }
3599 }
3600
3601 return RegBank;
3602}
3603
3605 const MachineFunction &MF = *MI.getParent()->getParent();
3606 const MachineRegisterInfo &MRI = MF.getRegInfo();
3607 for (const MachineOperand &MO : MI.operands()) {
3608 if (!MO.isReg())
3609 continue;
3610 Register Reg = MO.getReg();
3611 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3612 if (Bank->getID() != AMDGPU::SGPRRegBankID)
3613 return false;
3614 }
3615 }
3616 return true;
3617}
3618
3621 const MachineFunction &MF = *MI.getParent()->getParent();
3622 const MachineRegisterInfo &MRI = MF.getRegInfo();
3623 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3624
3625 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3626 const MachineOperand &SrcOp = MI.getOperand(i);
3627 if (!SrcOp.isReg())
3628 continue;
3629
3630 unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3631 OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3632 }
3633 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3634 MI.getNumOperands());
3635}
3636
3639 const MachineFunction &MF = *MI.getParent()->getParent();
3640 const MachineRegisterInfo &MRI = MF.getRegInfo();
3641 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3642
3643 // Even though we technically could use SGPRs, this would require knowledge of
3644 // the constant bus restriction. Force all sources to VGPR (except for VCC).
3645 //
3646 // TODO: Unary ops are trivially OK, so accept SGPRs?
3647 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3648 const MachineOperand &Src = MI.getOperand(i);
3649 if (!Src.isReg())
3650 continue;
3651
3652 unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3653 unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3654 OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3655 }
3656
3657 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3658 MI.getNumOperands());
3659}
3660
3663 const MachineFunction &MF = *MI.getParent()->getParent();
3664 const MachineRegisterInfo &MRI = MF.getRegInfo();
3665 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3666
3667 for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3668 const MachineOperand &Op = MI.getOperand(I);
3669 if (!Op.isReg())
3670 continue;
3671
3672 unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3673 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3674 }
3675
3676 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3677 MI.getNumOperands());
3678}
3679
3682 const MachineInstr &MI,
3683 int RsrcIdx) const {
3684 // The reported argument index is relative to the IR intrinsic call arguments,
3685 // so we need to shift by the number of defs and the intrinsic ID.
3686 RsrcIdx += MI.getNumExplicitDefs() + 1;
3687
3688 const int NumOps = MI.getNumOperands();
3689 SmallVector<const ValueMapping *, 8> OpdsMapping(NumOps);
3690
3691 // TODO: Should packed/unpacked D16 difference be reported here as part of
3692 // the value mapping?
3693 for (int I = 0; I != NumOps; ++I) {
3694 if (!MI.getOperand(I).isReg())
3695 continue;
3696
3697 Register OpReg = MI.getOperand(I).getReg();
3698 // We replace some dead address operands with $noreg
3699 if (!OpReg)
3700 continue;
3701
3702 unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3703
3704 // FIXME: Probably need a new intrinsic register bank searchable table to
3705 // handle arbitrary intrinsics easily.
3706 //
3707 // If this has a sampler, it immediately follows rsrc.
3708 const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3709
3710 if (MustBeSGPR) {
3711 // If this must be an SGPR, so we must report whatever it is as legal.
3712 unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3713 OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3714 } else {
3715 // Some operands must be VGPR, and these are easy to copy to.
3716 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3717 }
3718 }
3719
3720 return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3721}
3722
3723/// Return the mapping for a pointer argument.
3726 Register PtrReg) const {
3727 LLT PtrTy = MRI.getType(PtrReg);
3728 unsigned Size = PtrTy.getSizeInBits();
3731 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3732
3733 // If we're using MUBUF instructions for global memory, an SGPR base register
3734 // is possible. Otherwise this needs to be a VGPR.
3735 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3736 return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3737}
3738
3741
3742 const MachineFunction &MF = *MI.getParent()->getParent();
3743 const MachineRegisterInfo &MRI = MF.getRegInfo();
3745 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3746 Register PtrReg = MI.getOperand(1).getReg();
3747 LLT PtrTy = MRI.getType(PtrReg);
3748 unsigned AS = PtrTy.getAddressSpace();
3749 unsigned PtrSize = PtrTy.getSizeInBits();
3750
3751 const ValueMapping *ValMapping;
3752 const ValueMapping *PtrMapping;
3753
3754 const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3755
3756 if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3757 if (isScalarLoadLegal(MI)) {
3758 // We have a uniform instruction so we want to use an SMRD load
3759 ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3760 PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3761 } else {
3762 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3763
3764 // If we're using MUBUF instructions for global memory, an SGPR base
3765 // register is possible. Otherwise this needs to be a VGPR.
3766 unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3767 AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3768
3769 PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3770 }
3771 } else {
3772 ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3773 PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3774 }
3775
3776 OpdsMapping[0] = ValMapping;
3777 OpdsMapping[1] = PtrMapping;
3779 1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3780 return Mapping;
3781
3782 // FIXME: Do we want to add a mapping for FLAT load, or should we just
3783 // handle that during instruction selection?
3784}
3785
3786unsigned
3788 const MachineRegisterInfo &MRI,
3789 unsigned Default) const {
3790 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3791 return Bank ? Bank->getID() : Default;
3792}
3793
3796 const MachineRegisterInfo &MRI,
3797 const TargetRegisterInfo &TRI) const {
3798 // Lie and claim anything is legal, even though this needs to be an SGPR
3799 // applyMapping will have to deal with it as a waterfall loop.
3800 unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3801 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3802 return AMDGPU::getValueMapping(Bank, Size);
3803}
3804
3807 const MachineRegisterInfo &MRI,
3808 const TargetRegisterInfo &TRI) const {
3809 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3810 return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3811}
3812
3815 const MachineRegisterInfo &MRI,
3816 const TargetRegisterInfo &TRI) const {
3817 unsigned Size = getSizeInBits(Reg, MRI, TRI);
3818 return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3819}
3820
3821///
3822/// This function must return a legal mapping, because
3823/// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3824/// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3825/// VGPR to SGPR generated is illegal.
3826///
3827// Operands that must be SGPRs must accept potentially divergent VGPRs as
3828// legal. These will be dealt with in applyMappingImpl.
3829//
3832 const MachineFunction &MF = *MI.getParent()->getParent();
3833 const MachineRegisterInfo &MRI = MF.getRegInfo();
3834
3835 if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3836 Register DstReg = MI.getOperand(0).getReg();
3837 Register SrcReg = MI.getOperand(1).getReg();
3838
3839 // The default logic bothers to analyze impossible alternative mappings. We
3840 // want the most straightforward mapping, so just directly handle this.
3841 const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI);
3842 const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
3843 assert(SrcBank && "src bank should have been assigned already");
3844
3845 // For COPY between a physical reg and an s1, there is no type associated so
3846 // we need to take the virtual register's type as a hint on how to interpret
3847 // s1 values.
3848 if (!SrcReg.isVirtual() && !DstBank &&
3849 MRI.getType(DstReg) == LLT::scalar(1))
3850 DstBank = &AMDGPU::VCCRegBank;
3851 else if (!DstReg.isVirtual() && MRI.getType(SrcReg) == LLT::scalar(1))
3852 DstBank = &AMDGPU::VCCRegBank;
3853
3854 if (!DstBank)
3855 DstBank = SrcBank;
3856
3857 unsigned Size = getSizeInBits(DstReg, MRI, *TRI);
3858 if (MI.getOpcode() != AMDGPU::G_FREEZE &&
3859 cannotCopy(*DstBank, *SrcBank, TypeSize::getFixed(Size)))
3861
3862 const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3863 unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3864 SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3865 OpdsMapping[0] = &ValMap;
3866 if (MI.getOpcode() == AMDGPU::G_FREEZE)
3867 OpdsMapping[1] = &ValMap;
3868
3869 return getInstructionMapping(
3870 1, /*Cost*/ 1,
3871 /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3872 }
3873
3874 if (MI.isRegSequence()) {
3875 // If any input is a VGPR, the result must be a VGPR. The default handling
3876 // assumes any copy between banks is legal.
3877 unsigned BankID = AMDGPU::SGPRRegBankID;
3878
3879 for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3880 auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3881 // It doesn't make sense to use vcc or scc banks here, so just ignore
3882 // them.
3883 if (OpBank != AMDGPU::SGPRRegBankID) {
3884 BankID = AMDGPU::VGPRRegBankID;
3885 break;
3886 }
3887 }
3888 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3889
3890 const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3891 return getInstructionMapping(
3892 1, /*Cost*/ 1,
3893 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3894 }
3895
3896 // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3897 // properly.
3898 //
3899 // TODO: There are additional exec masking dependencies to analyze.
3900 if (auto *PHI = dyn_cast<GPhi>(&MI)) {
3901 unsigned ResultBank = AMDGPU::InvalidRegBankID;
3902 Register DstReg = PHI->getReg(0);
3903
3904 // Sometimes the result may have already been assigned a bank.
3905 if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3906 ResultBank = DstBank->getID();
3907
3908 for (unsigned I = 0; I < PHI->getNumIncomingValues(); ++I) {
3909 Register Reg = PHI->getIncomingValue(I);
3910 const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3911
3912 // FIXME: Assuming VGPR for any undetermined inputs.
3913 if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3914 ResultBank = AMDGPU::VGPRRegBankID;
3915 break;
3916 }
3917
3918 // FIXME: Need to promote SGPR case to s32
3919 unsigned OpBank = Bank->getID();
3920 ResultBank = regBankBoolUnion(ResultBank, OpBank);
3921 }
3922
3923 assert(ResultBank != AMDGPU::InvalidRegBankID);
3924
3925 unsigned Size = MRI.getType(DstReg).getSizeInBits();
3926
3927 const ValueMapping &ValMap =
3928 getValueMapping(0, Size, getRegBank(ResultBank));
3929 return getInstructionMapping(
3930 1, /*Cost*/ 1,
3931 /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3932 }
3933
3935 if (Mapping.isValid())
3936 return Mapping;
3937
3938 SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3939
3940 switch (MI.getOpcode()) {
3941 default:
3943
3944 case AMDGPU::G_AND:
3945 case AMDGPU::G_OR:
3946 case AMDGPU::G_XOR:
3947 case AMDGPU::G_MUL: {
3948 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3949 if (Size == 1) {
3950 const RegisterBank *DstBank
3951 = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3952
3953 unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3954 unsigned BankLHS = AMDGPU::InvalidRegBankID;
3955 unsigned BankRHS = AMDGPU::InvalidRegBankID;
3956 if (DstBank) {
3957 TargetBankID = DstBank->getID();
3958 if (DstBank == &AMDGPU::VCCRegBank) {
3959 TargetBankID = AMDGPU::VCCRegBankID;
3960 BankLHS = AMDGPU::VCCRegBankID;
3961 BankRHS = AMDGPU::VCCRegBankID;
3962 } else {
3963 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3964 AMDGPU::SGPRRegBankID);
3965 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3966 AMDGPU::SGPRRegBankID);
3967 }
3968 } else {
3969 BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3970 AMDGPU::VCCRegBankID);
3971 BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3972 AMDGPU::VCCRegBankID);
3973
3974 // Both inputs should be true booleans to produce a boolean result.
3975 if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3976 TargetBankID = AMDGPU::VGPRRegBankID;
3977 } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3978 TargetBankID = AMDGPU::VCCRegBankID;
3979 BankLHS = AMDGPU::VCCRegBankID;
3980 BankRHS = AMDGPU::VCCRegBankID;
3981 } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
3982 TargetBankID = AMDGPU::SGPRRegBankID;
3983 }
3984 }
3985
3986 OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
3987 OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
3988 OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
3989 break;
3990 }
3991
3992 if (Size == 64) {
3993
3994 if (isSALUMapping(MI)) {
3995 OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
3996 OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
3997 } else {
3998 if (MI.getOpcode() == AMDGPU::G_MUL && Subtarget.hasVectorMulU64())
3999 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4000 else
4001 OpdsMapping[0] =
4002 getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
4003 unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
4004 OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
4005
4006 unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
4007 OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
4008 }
4009
4010 break;
4011 }
4012
4013 [[fallthrough]];
4014 }
4015 case AMDGPU::G_PTR_ADD:
4016 case AMDGPU::G_PTRMASK:
4017 case AMDGPU::G_ADD:
4018 case AMDGPU::G_SUB:
4019 case AMDGPU::G_SHL:
4020 case AMDGPU::G_LSHR:
4021 case AMDGPU::G_ASHR:
4022 case AMDGPU::G_UADDO:
4023 case AMDGPU::G_USUBO:
4024 case AMDGPU::G_UADDE:
4025 case AMDGPU::G_SADDE:
4026 case AMDGPU::G_USUBE:
4027 case AMDGPU::G_SSUBE:
4028 case AMDGPU::G_ABS:
4029 case AMDGPU::G_SHUFFLE_VECTOR:
4030 case AMDGPU::G_SBFX:
4031 case AMDGPU::G_UBFX:
4032 case AMDGPU::G_AMDGPU_S_MUL_I64_I32:
4033 case AMDGPU::G_AMDGPU_S_MUL_U64_U32:
4034 if (isSALUMapping(MI))
4035 return getDefaultMappingSOP(MI);
4036 return getDefaultMappingVOP(MI);
4037 case AMDGPU::G_SMIN:
4038 case AMDGPU::G_SMAX:
4039 case AMDGPU::G_UMIN:
4040 case AMDGPU::G_UMAX:
4041 if (isSALUMapping(MI)) {
4042 // There are no scalar 64-bit min and max, use vector instruction instead.
4043 if (MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 64 &&
4045 return getDefaultMappingVOP(MI);
4046 return getDefaultMappingSOP(MI);
4047 }
4048 return getDefaultMappingVOP(MI);
4049 case AMDGPU::G_FADD:
4050 case AMDGPU::G_FSUB:
4051 case AMDGPU::G_FMUL:
4052 case AMDGPU::G_FMA:
4053 case AMDGPU::G_FFLOOR:
4054 case AMDGPU::G_FCEIL:
4055 case AMDGPU::G_INTRINSIC_ROUNDEVEN:
4056 case AMDGPU::G_FMINNUM:
4057 case AMDGPU::G_FMAXNUM:
4058 case AMDGPU::G_FMINIMUM:
4059 case AMDGPU::G_FMAXIMUM:
4060 case AMDGPU::G_FMINIMUMNUM:
4061 case AMDGPU::G_FMAXIMUMNUM:
4062 case AMDGPU::G_INTRINSIC_TRUNC:
4063 case AMDGPU::G_STRICT_FADD:
4064 case AMDGPU::G_STRICT_FSUB:
4065 case AMDGPU::G_STRICT_FMUL:
4066 case AMDGPU::G_STRICT_FMA: {
4067 LLT Ty = MRI.getType(MI.getOperand(0).getReg());
4068 unsigned Size = Ty.getSizeInBits();
4069 if (Subtarget.hasSALUFloatInsts() && Ty.isScalar() &&
4070 (Size == 32 || Size == 16) && isSALUMapping(MI))
4071 return getDefaultMappingSOP(MI);
4072 return getDefaultMappingVOP(MI);
4073 }
4074 case AMDGPU::G_FPTOSI:
4075 case AMDGPU::G_FPTOUI:
4076 case AMDGPU::G_SITOFP:
4077 case AMDGPU::G_UITOFP: {
4078 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4079 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4080 if (Subtarget.hasSALUFloatInsts() && SizeDst == 32 && SizeSrc == 32 &&
4082 return getDefaultMappingSOP(MI);
4083 return getDefaultMappingVOP(MI);
4084 }
4085 case AMDGPU::G_FPTRUNC:
4086 case AMDGPU::G_FPEXT: {
4087 unsigned SizeDst = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4088 unsigned SizeSrc = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4089 if (Subtarget.hasSALUFloatInsts() && SizeDst != 64 && SizeSrc != 64 &&
4091 return getDefaultMappingSOP(MI);
4092 return getDefaultMappingVOP(MI);
4093 }
4094 case AMDGPU::G_FSQRT:
4095 case AMDGPU::G_FEXP2:
4096 case AMDGPU::G_FLOG2: {
4097 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4098 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4100 return getDefaultMappingSOP(MI);
4101 return getDefaultMappingVOP(MI);
4102 }
4103 case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
4104 case AMDGPU::G_SSUBSAT:
4105 case AMDGPU::G_UADDSAT:
4106 case AMDGPU::G_USUBSAT:
4107 case AMDGPU::G_FMAD:
4108 case AMDGPU::G_FLDEXP:
4109 case AMDGPU::G_FMINNUM_IEEE:
4110 case AMDGPU::G_FMAXNUM_IEEE:
4111 case AMDGPU::G_FCANONICALIZE:
4112 case AMDGPU::G_STRICT_FLDEXP:
4113 case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
4114 case AMDGPU::G_FSHR: // TODO: Expand for scalar
4115 case AMDGPU::G_AMDGPU_FMIN_LEGACY:
4116 case AMDGPU::G_AMDGPU_FMAX_LEGACY:
4117 case AMDGPU::G_AMDGPU_RCP_IFLAG:
4118 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
4119 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
4120 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
4121 case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
4122 case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
4123 case AMDGPU::G_AMDGPU_SMED3:
4124 case AMDGPU::G_AMDGPU_FMED3:
4125 return getDefaultMappingVOP(MI);
4126 case AMDGPU::G_UMULH:
4127 case AMDGPU::G_SMULH: {
4129 return getDefaultMappingSOP(MI);
4130 return getDefaultMappingVOP(MI);
4131 }
4132 case AMDGPU::G_AMDGPU_MAD_U64_U32:
4133 case AMDGPU::G_AMDGPU_MAD_I64_I32: {
4134 // Three possible mappings:
4135 //
4136 // - Default SOP
4137 // - Default VOP
4138 // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
4139 //
4140 // This allows instruction selection to keep the multiplication part of the
4141 // instruction on the SALU.
4142 bool AllSalu = true;
4143 bool MulSalu = true;
4144 for (unsigned i = 0; i < 5; ++i) {
4145 Register Reg = MI.getOperand(i).getReg();
4146 if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
4147 if (Bank->getID() != AMDGPU::SGPRRegBankID) {
4148 AllSalu = false;
4149 if (i == 2 || i == 3) {
4150 MulSalu = false;
4151 break;
4152 }
4153 }
4154 }
4155 }
4156
4157 if (AllSalu)
4158 return getDefaultMappingSOP(MI);
4159
4160 // If the multiply-add is full-rate in VALU, use that even if the
4161 // multiplication part is scalar. Accumulating separately on the VALU would
4162 // take two instructions.
4163 if (!MulSalu || Subtarget.hasFullRate64Ops())
4164 return getDefaultMappingVOP(MI);
4165
4166 // Keep the multiplication on the SALU, then accumulate on the VALU.
4167 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4168 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4169 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4170 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4171 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
4172 break;
4173 }
4174 case AMDGPU::G_IMPLICIT_DEF: {
4175 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4176 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4177 break;
4178 }
4179 case AMDGPU::G_FCONSTANT:
4180 case AMDGPU::G_CONSTANT:
4181 case AMDGPU::G_GLOBAL_VALUE:
4182 case AMDGPU::G_FRAME_INDEX:
4183 case AMDGPU::G_BLOCK_ADDR:
4184 case AMDGPU::G_READSTEADYCOUNTER:
4185 case AMDGPU::G_READCYCLECOUNTER: {
4186 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4187 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4188 break;
4189 }
4190 case AMDGPU::G_DYN_STACKALLOC: {
4191 // Result is always uniform, and a wave reduction is needed for the source.
4192 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4193 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4194 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
4195 break;
4196 }
4197 case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
4198 // This case is weird because we expect a physical register in the source,
4199 // but need to set a bank anyway.
4200 //
4201 // TODO: We could select the result to SGPR or VGPR
4202 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4203 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
4204 break;
4205 }
4206 case AMDGPU::G_INSERT: {
4207 unsigned BankID = getMappingType(MRI, MI);
4208 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4209 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4210 unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
4211 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4212 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4213 OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
4214 OpdsMapping[3] = nullptr;
4215 break;
4216 }
4217 case AMDGPU::G_EXTRACT: {
4218 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4219 unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4220 unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4221 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
4222 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
4223 OpdsMapping[2] = nullptr;
4224 break;
4225 }
4226 case AMDGPU::G_BUILD_VECTOR:
4227 case AMDGPU::G_BUILD_VECTOR_TRUNC: {
4228 LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
4229 if (DstTy == LLT::fixed_vector(2, 16)) {
4230 unsigned DstSize = DstTy.getSizeInBits();
4231 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4232 unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4233 unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4234 unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
4235
4236 OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
4237 OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
4238 OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
4239 break;
4240 }
4241
4242 [[fallthrough]];
4243 }
4244 case AMDGPU::G_MERGE_VALUES:
4245 case AMDGPU::G_CONCAT_VECTORS: {
4246 unsigned Bank = getMappingType(MRI, MI);
4247 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4248 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4249
4250 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4251 // Op1 and Dst should use the same register bank.
4252 for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
4253 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
4254 break;
4255 }
4256 case AMDGPU::G_BITREVERSE:
4257 case AMDGPU::G_BITCAST:
4258 case AMDGPU::G_INTTOPTR:
4259 case AMDGPU::G_PTRTOINT:
4260 case AMDGPU::G_FABS:
4261 case AMDGPU::G_FNEG: {
4262 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4263 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4264 OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4265 break;
4266 }
4267 case AMDGPU::G_AMDGPU_FFBH_U32:
4268 case AMDGPU::G_AMDGPU_FFBL_B32:
4269 case AMDGPU::G_CTLZ_ZERO_UNDEF:
4270 case AMDGPU::G_CTTZ_ZERO_UNDEF: {
4271 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4272 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4273 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4274 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
4275 break;
4276 }
4277 case AMDGPU::G_CTPOP: {
4278 unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4279 unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4280 OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
4281
4282 // This should really be getValueMappingSGPR64Only, but allowing the generic
4283 // code to handle the register split just makes using LegalizerHelper more
4284 // difficult.
4285 OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
4286 break;
4287 }
4288 case AMDGPU::G_TRUNC: {
4289 Register Dst = MI.getOperand(0).getReg();
4290 Register Src = MI.getOperand(1).getReg();
4291 unsigned Bank = getRegBankID(Src, MRI);
4292 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4293 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4294 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
4295 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
4296 break;
4297 }
4298 case AMDGPU::G_ZEXT:
4299 case AMDGPU::G_SEXT:
4300 case AMDGPU::G_ANYEXT:
4301 case AMDGPU::G_SEXT_INREG: {
4302 Register Dst = MI.getOperand(0).getReg();
4303 Register Src = MI.getOperand(1).getReg();
4304 unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
4305 unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
4306
4307 unsigned DstBank;
4308 const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
4309 assert(SrcBank);
4310 switch (SrcBank->getID()) {
4311 case AMDGPU::SGPRRegBankID:
4312 DstBank = AMDGPU::SGPRRegBankID;
4313 break;
4314 default:
4315 DstBank = AMDGPU::VGPRRegBankID;
4316 break;
4317 }
4318
4319 // Scalar extend can use 64-bit BFE, but VGPRs require extending to
4320 // 32-bits, and then to 64.
4321 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
4322 OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
4323 SrcSize);
4324 break;
4325 }
4326 case AMDGPU::G_IS_FPCLASS: {
4327 Register SrcReg = MI.getOperand(1).getReg();
4328 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4329 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4330 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4331 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4332 break;
4333 }
4334 case AMDGPU::G_STORE: {
4335 assert(MI.getOperand(0).isReg());
4336 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4337
4338 // FIXME: We need to specify a different reg bank once scalar stores are
4339 // supported.
4340 const ValueMapping *ValMapping =
4341 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4342 OpdsMapping[0] = ValMapping;
4343 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4344 break;
4345 }
4346 case AMDGPU::G_ICMP:
4347 case AMDGPU::G_FCMP: {
4348 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4349
4350 // See if the result register has already been constrained to vcc, which may
4351 // happen due to control flow intrinsic lowering.
4352 unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4353 AMDGPU::SGPRRegBankID);
4354 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4355 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
4356
4357 auto canUseSCCICMP = [&]() {
4358 auto Pred =
4359 static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
4360 return Size == 32 ||
4361 (Size == 64 &&
4362 (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
4364 };
4365 auto canUseSCCFCMP = [&]() {
4366 return Subtarget.hasSALUFloatInsts() && (Size == 32 || Size == 16);
4367 };
4368
4369 bool isICMP = MI.getOpcode() == AMDGPU::G_ICMP;
4370 bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
4371 Op2Bank == AMDGPU::SGPRRegBankID &&
4372 Op3Bank == AMDGPU::SGPRRegBankID &&
4373 (isICMP ? canUseSCCICMP() : canUseSCCFCMP());
4374
4375 DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4376 unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4377
4378 // TODO: Use 32-bit for scalar output size.
4379 // SCC results will need to be copied to a 32-bit SGPR virtual register.
4380 const unsigned ResultSize = 1;
4381
4382 OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
4383 OpdsMapping[1] = nullptr; // Predicate Operand.
4384 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
4385 OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
4386 break;
4387 }
4388 case AMDGPU::G_EXTRACT_VECTOR_ELT: {
4389 // VGPR index can be used for waterfall when indexing a SGPR vector.
4390 unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4391 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4392 unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4393 unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4394 unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4395 unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4396
4397 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4398 OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4399
4400 // The index can be either if the source vector is VGPR.
4401 OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4402 break;
4403 }
4404 case AMDGPU::G_INSERT_VECTOR_ELT: {
4405 unsigned OutputBankID = isSALUMapping(MI) ?
4406 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4407
4408 unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4409 unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4410 unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4411 unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4412 unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4413
4414 OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4415 OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4416
4417 // This is a weird case, because we need to break down the mapping based on
4418 // the register bank of a different operand.
4419 if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4420 OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4421 InsertSize);
4422 } else {
4423 assert(InsertSize == 32 || InsertSize == 64);
4424 OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4425 }
4426
4427 // The index can be either if the source vector is VGPR.
4428 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4429 break;
4430 }
4431 case AMDGPU::G_UNMERGE_VALUES: {
4432 unsigned Bank = getMappingType(MRI, MI);
4433
4434 // Op1 and Dst should use the same register bank.
4435 // FIXME: Shouldn't this be the default? Why do we need to handle this?
4436 for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4437 unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4438 OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4439 }
4440 break;
4441 }
4442 case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4443 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4444 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4445 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4446 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4447 case AMDGPU::G_AMDGPU_BUFFER_LOAD_TFE:
4448 case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE_TFE:
4449 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE_TFE:
4450 case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT_TFE:
4451 case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT_TFE:
4452 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4453 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4454 case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4455 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4456 case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4457 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4458 case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4459 case AMDGPU::G_AMDGPU_BUFFER_STORE:
4460 case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4461 case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4462 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4463 case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4464 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4465
4466 // rsrc
4467 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4468
4469 // vindex
4470 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4471
4472 // voffset
4473 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4474
4475 // soffset
4476 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4477
4478 // Any remaining operands are immediates and were correctly null
4479 // initialized.
4480 break;
4481 }
4482 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4483 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4484 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4485 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4486 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4487 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4488 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4489 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4490 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4491 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4492 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4493 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4494 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4495 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4496 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4497 // vdata_out
4498 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4499
4500 // vdata_in
4501 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4502
4503 // rsrc
4504 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4505
4506 // vindex
4507 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4508
4509 // voffset
4510 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4511
4512 // soffset
4513 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4514
4515 // Any remaining operands are immediates and were correctly null
4516 // initialized.
4517 break;
4518 }
4519 case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4520 // vdata_out
4521 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4522
4523 // vdata_in
4524 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4525
4526 // cmp
4527 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4528
4529 // rsrc
4530 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4531
4532 // vindex
4533 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4534
4535 // voffset
4536 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4537
4538 // soffset
4539 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4540
4541 // Any remaining operands are immediates and were correctly null
4542 // initialized.
4543 break;
4544 }
4545 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD:
4546 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_UBYTE:
4547 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SBYTE:
4548 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_USHORT:
4549 case AMDGPU::G_AMDGPU_S_BUFFER_LOAD_SSHORT: {
4550 // Lie and claim everything is legal, even though some need to be
4551 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4552 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4553 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4554
4555 // We need to convert this to a MUBUF if either the resource of offset is
4556 // VGPR.
4557 unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4558 unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4559 unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4560
4561 unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4562 OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4563 break;
4564 }
4565 case AMDGPU::G_AMDGPU_S_BUFFER_PREFETCH:
4566 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4567 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4568 break;
4569 case AMDGPU::G_INTRINSIC:
4570 case AMDGPU::G_INTRINSIC_CONVERGENT: {
4571 switch (cast<GIntrinsic>(MI).getIntrinsicID()) {
4572 default:
4574 case Intrinsic::amdgcn_div_fmas:
4575 case Intrinsic::amdgcn_div_fixup:
4576 case Intrinsic::amdgcn_trig_preop:
4577 case Intrinsic::amdgcn_sin:
4578 case Intrinsic::amdgcn_cos:
4579 case Intrinsic::amdgcn_log_clamp:
4580 case Intrinsic::amdgcn_rcp_legacy:
4581 case Intrinsic::amdgcn_rsq_legacy:
4582 case Intrinsic::amdgcn_rsq_clamp:
4583 case Intrinsic::amdgcn_tanh:
4584 case Intrinsic::amdgcn_fmul_legacy:
4585 case Intrinsic::amdgcn_fma_legacy:
4586 case Intrinsic::amdgcn_frexp_mant:
4587 case Intrinsic::amdgcn_frexp_exp:
4588 case Intrinsic::amdgcn_fract:
4589 case Intrinsic::amdgcn_cvt_pknorm_i16:
4590 case Intrinsic::amdgcn_cvt_pknorm_u16:
4591 case Intrinsic::amdgcn_cvt_pk_i16:
4592 case Intrinsic::amdgcn_cvt_pk_u16:
4593 case Intrinsic::amdgcn_cvt_sr_pk_f16_f32:
4594 case Intrinsic::amdgcn_cvt_sr_pk_bf16_f32:
4595 case Intrinsic::amdgcn_cvt_pk_f16_fp8:
4596 case Intrinsic::amdgcn_cvt_pk_f16_bf8:
4597 case Intrinsic::amdgcn_cvt_pk_fp8_f16:
4598 case Intrinsic::amdgcn_cvt_pk_bf8_f16:
4599 case Intrinsic::amdgcn_cvt_sr_fp8_f16:
4600 case Intrinsic::amdgcn_cvt_sr_bf8_f16:
4601 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp8:
4602 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp8:
4603 case Intrinsic::amdgcn_cvt_scale_pk8_f16_bf8:
4604 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_bf8:
4605 case Intrinsic::amdgcn_cvt_scale_pk8_f16_fp4:
4606 case Intrinsic::amdgcn_cvt_scale_pk8_bf16_fp4:
4607 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp8:
4608 case Intrinsic::amdgcn_cvt_scale_pk8_f32_bf8:
4609 case Intrinsic::amdgcn_cvt_scale_pk8_f32_fp4:
4610 case Intrinsic::amdgcn_cvt_scale_pk16_f16_fp6:
4611 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_fp6:
4612 case Intrinsic::amdgcn_cvt_scale_pk16_f16_bf6:
4613 case Intrinsic::amdgcn_cvt_scale_pk16_bf16_bf6:
4614 case Intrinsic::amdgcn_cvt_scale_pk16_f32_fp6:
4615 case Intrinsic::amdgcn_cvt_scale_pk16_f32_bf6:
4616 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_bf16:
4617 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_bf16:
4618 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f16:
4619 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f16:
4620 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp8_f32:
4621 case Intrinsic::amdgcn_cvt_scalef32_pk8_bf8_f32:
4622 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f32:
4623 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_f16:
4624 case Intrinsic::amdgcn_cvt_scalef32_pk8_fp4_bf16:
4625 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f32:
4626 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f32:
4627 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_f16:
4628 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_f16:
4629 case Intrinsic::amdgcn_cvt_scalef32_pk16_fp6_bf16:
4630 case Intrinsic::amdgcn_cvt_scalef32_pk16_bf6_bf16:
4631 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_bf16:
4632 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_bf16:
4633 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f16:
4634 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f16:
4635 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp8_f32:
4636 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_bf8_f32:
4637 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f32:
4638 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_f16:
4639 case Intrinsic::amdgcn_cvt_scalef32_sr_pk8_fp4_bf16:
4640 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f32:
4641 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f32:
4642 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_f16:
4643 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_f16:
4644 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_fp6_bf16:
4645 case Intrinsic::amdgcn_cvt_scalef32_sr_pk16_bf6_bf16:
4646 case Intrinsic::amdgcn_sat_pk4_i4_i8:
4647 case Intrinsic::amdgcn_sat_pk4_u4_u8:
4648 case Intrinsic::amdgcn_fmed3:
4649 case Intrinsic::amdgcn_cubeid:
4650 case Intrinsic::amdgcn_cubema:
4651 case Intrinsic::amdgcn_cubesc:
4652 case Intrinsic::amdgcn_cubetc:
4653 case Intrinsic::amdgcn_sffbh:
4654 case Intrinsic::amdgcn_fmad_ftz:
4655 case Intrinsic::amdgcn_mbcnt_lo:
4656 case Intrinsic::amdgcn_mbcnt_hi:
4657 case Intrinsic::amdgcn_mul_u24:
4658 case Intrinsic::amdgcn_mul_i24:
4659 case Intrinsic::amdgcn_mulhi_u24:
4660 case Intrinsic::amdgcn_mulhi_i24:
4661 case Intrinsic::amdgcn_lerp:
4662 case Intrinsic::amdgcn_sad_u8:
4663 case Intrinsic::amdgcn_msad_u8:
4664 case Intrinsic::amdgcn_sad_hi_u8:
4665 case Intrinsic::amdgcn_sad_u16:
4666 case Intrinsic::amdgcn_qsad_pk_u16_u8:
4667 case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4668 case Intrinsic::amdgcn_mqsad_u32_u8:
4669 case Intrinsic::amdgcn_cvt_pk_u8_f32:
4670 case Intrinsic::amdgcn_alignbyte:
4671 case Intrinsic::amdgcn_perm:
4672 case Intrinsic::amdgcn_prng_b32:
4673 case Intrinsic::amdgcn_fdot2:
4674 case Intrinsic::amdgcn_sdot2:
4675 case Intrinsic::amdgcn_udot2:
4676 case Intrinsic::amdgcn_sdot4:
4677 case Intrinsic::amdgcn_udot4:
4678 case Intrinsic::amdgcn_sdot8:
4679 case Intrinsic::amdgcn_udot8:
4680 case Intrinsic::amdgcn_fdot2_bf16_bf16:
4681 case Intrinsic::amdgcn_fdot2_f16_f16:
4682 case Intrinsic::amdgcn_fdot2_f32_bf16:
4683 case Intrinsic::amdgcn_fdot2c_f32_bf16:
4684 case Intrinsic::amdgcn_sudot4:
4685 case Intrinsic::amdgcn_sudot8:
4686 case Intrinsic::amdgcn_dot4_f32_fp8_bf8:
4687 case Intrinsic::amdgcn_dot4_f32_bf8_fp8:
4688 case Intrinsic::amdgcn_dot4_f32_fp8_fp8:
4689 case Intrinsic::amdgcn_dot4_f32_bf8_bf8:
4690 case Intrinsic::amdgcn_cvt_f32_fp8:
4691 case Intrinsic::amdgcn_cvt_f32_fp8_e5m3:
4692 case Intrinsic::amdgcn_cvt_f32_bf8:
4693 case Intrinsic::amdgcn_cvt_off_f32_i4:
4694 case Intrinsic::amdgcn_cvt_pk_f32_fp8:
4695 case Intrinsic::amdgcn_cvt_pk_f32_bf8:
4696 case Intrinsic::amdgcn_cvt_pk_fp8_f32:
4697 case Intrinsic::amdgcn_cvt_pk_fp8_f32_e5m3:
4698 case Intrinsic::amdgcn_cvt_pk_bf8_f32:
4699 case Intrinsic::amdgcn_cvt_sr_fp8_f32:
4700 case Intrinsic::amdgcn_cvt_sr_fp8_f32_e5m3:
4701 case Intrinsic::amdgcn_cvt_sr_bf8_f32:
4702 case Intrinsic::amdgcn_cvt_sr_bf16_f32:
4703 case Intrinsic::amdgcn_cvt_sr_f16_f32:
4704 case Intrinsic::amdgcn_cvt_f16_fp8:
4705 case Intrinsic::amdgcn_cvt_f16_bf8:
4706 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_f16:
4707 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_f16:
4708 case Intrinsic::amdgcn_cvt_scalef32_pk32_fp6_bf16:
4709 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf6_bf16:
4710 case Intrinsic::amdgcn_cvt_scalef32_f16_fp8:
4711 case Intrinsic::amdgcn_cvt_scalef32_f16_bf8:
4712 case Intrinsic::amdgcn_cvt_scalef32_f32_fp8:
4713 case Intrinsic::amdgcn_cvt_scalef32_f32_bf8:
4714 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f32:
4715 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f32:
4716 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp8:
4717 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_bf8:
4718 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_f16:
4719 case Intrinsic::amdgcn_cvt_scalef32_pk_fp8_bf16:
4720 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_f16:
4721 case Intrinsic::amdgcn_cvt_scalef32_pk_bf8_bf16:
4722 case Intrinsic::amdgcn_cvt_scalef32_pk_f32_fp4:
4723 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f32:
4724 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp4:
4725 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp4:
4726 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_fp6:
4727 case Intrinsic::amdgcn_cvt_scalef32_pk32_f32_bf6:
4728 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_bf6:
4729 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_bf6:
4730 case Intrinsic::amdgcn_cvt_scalef32_pk32_f16_fp6:
4731 case Intrinsic::amdgcn_cvt_scalef32_pk32_bf16_fp6:
4732 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_bf8:
4733 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_bf8:
4734 case Intrinsic::amdgcn_cvt_scalef32_pk_f16_fp8:
4735 case Intrinsic::amdgcn_cvt_scalef32_pk_bf16_fp8:
4736 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_f16:
4737 case Intrinsic::amdgcn_cvt_scalef32_pk_fp4_bf16:
4738 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f16:
4739 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_bf16:
4740 case Intrinsic::amdgcn_cvt_scalef32_sr_pk_fp4_f32:
4741 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_bf16:
4742 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f16:
4743 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_bf6_f32:
4744 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_bf16:
4745 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f16:
4746 case Intrinsic::amdgcn_cvt_scalef32_sr_pk32_fp6_f32:
4747 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_bf16:
4748 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f16:
4749 case Intrinsic::amdgcn_cvt_scalef32_sr_bf8_f32:
4750 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_bf16:
4751 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f16:
4752 case Intrinsic::amdgcn_cvt_scalef32_sr_fp8_f32:
4753 case Intrinsic::amdgcn_ashr_pk_i8_i32:
4754 case Intrinsic::amdgcn_ashr_pk_u8_i32:
4755 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_fp6_f32:
4756 case Intrinsic::amdgcn_cvt_scalef32_2xpk16_bf6_f32:
4757 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4758 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4759 case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16_tied:
4760 case Intrinsic::amdgcn_wmma_f16_16x16x16_f16_tied:
4761 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4762 case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4763 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4764 case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4765 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_fp8:
4766 case Intrinsic::amdgcn_wmma_f32_16x16x16_fp8_bf8:
4767 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_fp8:
4768 case Intrinsic::amdgcn_wmma_f32_16x16x16_bf8_bf8:
4769 case Intrinsic::amdgcn_wmma_i32_16x16x32_iu4:
4770 case Intrinsic::amdgcn_swmmac_f32_16x16x32_f16:
4771 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf16:
4772 case Intrinsic::amdgcn_swmmac_f16_16x16x32_f16:
4773 case Intrinsic::amdgcn_swmmac_bf16_16x16x32_bf16:
4774 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu8:
4775 case Intrinsic::amdgcn_swmmac_i32_16x16x32_iu4:
4776 case Intrinsic::amdgcn_swmmac_i32_16x16x64_iu4:
4777 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_fp8:
4778 case Intrinsic::amdgcn_swmmac_f32_16x16x32_fp8_bf8:
4779 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_fp8:
4780 case Intrinsic::amdgcn_swmmac_f32_16x16x32_bf8_bf8:
4781 case Intrinsic::amdgcn_wmma_f32_16x16x4_f32:
4782 case Intrinsic::amdgcn_wmma_f32_16x16x32_bf16:
4783 case Intrinsic::amdgcn_wmma_f32_16x16x32_f16:
4784 case Intrinsic::amdgcn_wmma_f16_16x16x32_f16:
4785 case Intrinsic::amdgcn_wmma_bf16_16x16x32_bf16:
4786 case Intrinsic::amdgcn_wmma_bf16f32_16x16x32_bf16:
4787 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_fp8:
4788 case Intrinsic::amdgcn_wmma_f32_16x16x64_fp8_bf8:
4789 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_fp8:
4790 case Intrinsic::amdgcn_wmma_f32_16x16x64_bf8_bf8:
4791 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_fp8:
4792 case Intrinsic::amdgcn_wmma_f16_16x16x64_fp8_bf8:
4793 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_fp8:
4794 case Intrinsic::amdgcn_wmma_f16_16x16x64_bf8_bf8:
4795 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_fp8:
4796 case Intrinsic::amdgcn_wmma_f16_16x16x128_fp8_bf8:
4797 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_fp8:
4798 case Intrinsic::amdgcn_wmma_f16_16x16x128_bf8_bf8:
4799 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_fp8:
4800 case Intrinsic::amdgcn_wmma_f32_16x16x128_fp8_bf8:
4801 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_fp8:
4802 case Intrinsic::amdgcn_wmma_f32_16x16x128_bf8_bf8:
4803 case Intrinsic::amdgcn_wmma_i32_16x16x64_iu8:
4804 case Intrinsic::amdgcn_wmma_f32_16x16x128_f8f6f4:
4805 case Intrinsic::amdgcn_wmma_scale_f32_16x16x128_f8f6f4:
4806 case Intrinsic::amdgcn_wmma_scale16_f32_16x16x128_f8f6f4:
4807 case Intrinsic::amdgcn_wmma_f32_32x16x128_f4:
4808 case Intrinsic::amdgcn_wmma_scale_f32_32x16x128_f4:
4809 case Intrinsic::amdgcn_wmma_scale16_f32_32x16x128_f4:
4810 case Intrinsic::amdgcn_swmmac_f16_16x16x64_f16:
4811 case Intrinsic::amdgcn_swmmac_bf16_16x16x64_bf16:
4812 case Intrinsic::amdgcn_swmmac_f32_16x16x64_bf16:
4813 case Intrinsic::amdgcn_swmmac_bf16f32_16x16x64_bf16:
4814 case Intrinsic::amdgcn_swmmac_f32_16x16x64_f16:
4815 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_fp8:
4816 case Intrinsic::amdgcn_swmmac_f32_16x16x128_fp8_bf8:
4817 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_fp8:
4818 case Intrinsic::amdgcn_swmmac_f32_16x16x128_bf8_bf8:
4819 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_fp8:
4820 case Intrinsic::amdgcn_swmmac_f16_16x16x128_fp8_bf8:
4821 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_fp8:
4822 case Intrinsic::amdgcn_swmmac_f16_16x16x128_bf8_bf8:
4823 case Intrinsic::amdgcn_swmmac_i32_16x16x128_iu8:
4824 case Intrinsic::amdgcn_perm_pk16_b4_u4:
4825 case Intrinsic::amdgcn_perm_pk16_b6_u4:
4826 case Intrinsic::amdgcn_perm_pk16_b8_u4:
4827 return getDefaultMappingVOP(MI);
4828 case Intrinsic::amdgcn_log:
4829 case Intrinsic::amdgcn_exp2:
4830 case Intrinsic::amdgcn_rcp:
4831 case Intrinsic::amdgcn_rsq:
4832 case Intrinsic::amdgcn_sqrt: {
4833 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4834 if (Subtarget.hasPseudoScalarTrans() && (Size == 16 || Size == 32) &&
4836 return getDefaultMappingSOP(MI);
4837 return getDefaultMappingVOP(MI);
4838 }
4839 case Intrinsic::amdgcn_sbfe:
4840 case Intrinsic::amdgcn_ubfe:
4841 if (isSALUMapping(MI))
4842 return getDefaultMappingSOP(MI);
4843 return getDefaultMappingVOP(MI);
4844 case Intrinsic::amdgcn_ds_swizzle:
4845 case Intrinsic::amdgcn_ds_permute:
4846 case Intrinsic::amdgcn_ds_bpermute:
4847 case Intrinsic::amdgcn_update_dpp:
4848 case Intrinsic::amdgcn_mov_dpp8:
4849 case Intrinsic::amdgcn_mov_dpp:
4850 case Intrinsic::amdgcn_strict_wwm:
4851 case Intrinsic::amdgcn_wwm:
4852 case Intrinsic::amdgcn_strict_wqm:
4853 case Intrinsic::amdgcn_wqm:
4854 case Intrinsic::amdgcn_softwqm:
4855 case Intrinsic::amdgcn_set_inactive:
4856 case Intrinsic::amdgcn_set_inactive_chain_arg:
4857 case Intrinsic::amdgcn_permlane64:
4858 case Intrinsic::amdgcn_ds_bpermute_fi_b32:
4860 case Intrinsic::amdgcn_cvt_pkrtz:
4862 return getDefaultMappingSOP(MI);
4863 return getDefaultMappingVOP(MI);
4864 case Intrinsic::amdgcn_kernarg_segment_ptr:
4865 case Intrinsic::amdgcn_s_getpc:
4866 case Intrinsic::amdgcn_groupstaticsize:
4867 case Intrinsic::amdgcn_reloc_constant:
4868 case Intrinsic::returnaddress: {
4869 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4870 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4871 break;
4872 }
4873 case Intrinsic::amdgcn_wqm_vote: {
4874 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4875 OpdsMapping[0] = OpdsMapping[2]
4876 = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4877 break;
4878 }
4879 case Intrinsic::amdgcn_ps_live: {
4880 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4881 break;
4882 }
4883 case Intrinsic::amdgcn_div_scale: {
4884 unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4885 unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4886 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4887 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4888
4889 unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4890 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4891 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4892 break;
4893 }
4894 case Intrinsic::amdgcn_class: {
4895 Register Src0Reg = MI.getOperand(2).getReg();
4896 Register Src1Reg = MI.getOperand(3).getReg();
4897 unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4898 unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4899 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4900 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4901 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4902 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4903 break;
4904 }
4905 case Intrinsic::amdgcn_icmp:
4906 case Intrinsic::amdgcn_fcmp: {
4907 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4908 // This is not VCCRegBank because this is not used in boolean contexts.
4909 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4910 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4911 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4912 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4913 break;
4914 }
4915 case Intrinsic::amdgcn_readlane: {
4916 // This must be an SGPR, but accept a VGPR.
4917 Register IdxReg = MI.getOperand(3).getReg();
4918 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4919 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4920 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4921 [[fallthrough]];
4922 }
4923 case Intrinsic::amdgcn_readfirstlane: {
4924 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4925 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4926 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4927 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4928 break;
4929 }
4930 case Intrinsic::amdgcn_writelane: {
4931 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4932 Register SrcReg = MI.getOperand(2).getReg();
4933 unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4934 unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4935 Register IdxReg = MI.getOperand(3).getReg();
4936 unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4937 unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4938 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4939
4940 // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4941 // to legalize.
4942 OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4943 OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4944 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4945 break;
4946 }
4947 case Intrinsic::amdgcn_if_break: {
4948 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4949 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4950 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4951 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4952 break;
4953 }
4954 case Intrinsic::amdgcn_permlane16:
4955 case Intrinsic::amdgcn_permlanex16: {
4956 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4957 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4958 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4959 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4960 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4961 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4962 break;
4963 }
4964 case Intrinsic::amdgcn_permlane_bcast:
4965 case Intrinsic::amdgcn_permlane_up:
4966 case Intrinsic::amdgcn_permlane_down:
4967 case Intrinsic::amdgcn_permlane_xor: {
4968 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4969 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4970 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4971 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4972 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4973 break;
4974 }
4975 case Intrinsic::amdgcn_permlane_idx_gen: {
4976 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4977 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4978 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4979 OpdsMapping[3] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4980 break;
4981 }
4982 case Intrinsic::amdgcn_permlane16_var:
4983 case Intrinsic::amdgcn_permlanex16_var: {
4984 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4985 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4986 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4987 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4988 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4989 break;
4990 }
4991 case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
4992 case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
4993 case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
4994 case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
4995 case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
4996 case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
4997 case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
4998 case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
4999 case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
5000 case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
5001 case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
5002 case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
5003 case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
5004 case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
5005 case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
5006 case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
5007 case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
5008 case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
5009 case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
5010 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
5011 case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
5012 case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
5013 case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
5014 case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
5015 case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
5016 case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
5017 case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
5018 case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
5019 case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
5020 case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
5021 case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
5022 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
5023 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
5024 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
5025 case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
5026 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
5027 case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
5028 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
5029 case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8:
5030 case Intrinsic::amdgcn_mfma_f32_16x16x32_f16:
5031 case Intrinsic::amdgcn_mfma_f32_32x32x16_f16:
5032 case Intrinsic::amdgcn_mfma_i32_16x16x64_i8:
5033 case Intrinsic::amdgcn_mfma_i32_32x32x32_i8:
5034 case Intrinsic::amdgcn_mfma_f32_16x16x32_bf16: {
5035 // Default for MAI intrinsics.
5036 // srcC can also be an immediate which can be folded later.
5037 // FIXME: Should we eventually add an alternative mapping with AGPR src
5038 // for srcA/srcB?
5039 //
5040 // vdst, srcA, srcB, srcC
5042 OpdsMapping[0] =
5043 Info->mayNeedAGPRs()
5044 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5045 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5046 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5047 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5048 OpdsMapping[4] =
5049 Info->mayNeedAGPRs()
5050 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5051 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5052 break;
5053 }
5054 case Intrinsic::amdgcn_mfma_scale_f32_16x16x128_f8f6f4:
5055 case Intrinsic::amdgcn_mfma_scale_f32_32x32x64_f8f6f4: {
5057 OpdsMapping[0] =
5058 Info->mayNeedAGPRs()
5059 ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
5060 : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5061
5062 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5063 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5064 OpdsMapping[4] =
5065 Info->mayNeedAGPRs()
5066 ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
5067 : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5068
5069 OpdsMapping[8] = getVGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5070 OpdsMapping[10] = getVGPROpMapping(MI.getOperand(10).getReg(), MRI, *TRI);
5071 break;
5072 }
5073 case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
5074 case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
5075 case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
5076 case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
5077 case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
5078 case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
5079 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
5080 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
5081 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
5082 case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
5083 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
5084 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
5085 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
5086 case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8:
5087 case Intrinsic::amdgcn_smfmac_f32_16x16x64_f16:
5088 case Intrinsic::amdgcn_smfmac_f32_32x32x32_f16:
5089 case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf16:
5090 case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf16:
5091 case Intrinsic::amdgcn_smfmac_i32_16x16x128_i8:
5092 case Intrinsic::amdgcn_smfmac_i32_32x32x64_i8:
5093 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_bf8:
5094 case Intrinsic::amdgcn_smfmac_f32_16x16x128_bf8_fp8:
5095 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_bf8:
5096 case Intrinsic::amdgcn_smfmac_f32_16x16x128_fp8_fp8:
5097 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_bf8:
5098 case Intrinsic::amdgcn_smfmac_f32_32x32x64_bf8_fp8:
5099 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_bf8:
5100 case Intrinsic::amdgcn_smfmac_f32_32x32x64_fp8_fp8: {
5101 // vdst, srcA, srcB, srcC, idx
5102 OpdsMapping[0] = getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5103 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5104 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5105 OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5106 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5107 break;
5108 }
5109 case Intrinsic::amdgcn_interp_p1:
5110 case Intrinsic::amdgcn_interp_p2:
5111 case Intrinsic::amdgcn_interp_mov:
5112 case Intrinsic::amdgcn_interp_p1_f16:
5113 case Intrinsic::amdgcn_interp_p2_f16:
5114 case Intrinsic::amdgcn_lds_param_load: {
5115 const int M0Idx = MI.getNumOperands() - 1;
5116 Register M0Reg = MI.getOperand(M0Idx).getReg();
5117 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5118 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5119
5120 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5121 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5122 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5123
5124 // Must be SGPR, but we must take whatever the original bank is and fix it
5125 // later.
5126 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5127 break;
5128 }
5129 case Intrinsic::amdgcn_interp_inreg_p10:
5130 case Intrinsic::amdgcn_interp_inreg_p2:
5131 case Intrinsic::amdgcn_interp_inreg_p10_f16:
5132 case Intrinsic::amdgcn_interp_inreg_p2_f16:
5133 case Intrinsic::amdgcn_interp_p10_rtz_f16:
5134 case Intrinsic::amdgcn_interp_p2_rtz_f16: {
5135 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5136 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5137 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5138 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5139 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5140 break;
5141 }
5142 case Intrinsic::amdgcn_permlane16_swap:
5143 case Intrinsic::amdgcn_permlane32_swap: {
5144 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5145 OpdsMapping[0] = OpdsMapping[1] = OpdsMapping[3] = OpdsMapping[4] =
5146 AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5147 break;
5148 }
5149 case Intrinsic::amdgcn_ballot: {
5150 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5151 unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5152 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5153 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
5154 break;
5155 }
5156 case Intrinsic::amdgcn_inverse_ballot: {
5157 // This must be an SGPR, but accept a VGPR.
5158 Register MaskReg = MI.getOperand(2).getReg();
5159 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5160 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5161 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5162 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5163 break;
5164 }
5165 case Intrinsic::amdgcn_bitop3: {
5166 unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
5167 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5168 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5169 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5170 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5171 break;
5172 }
5173 case Intrinsic::amdgcn_s_quadmask:
5174 case Intrinsic::amdgcn_s_wqm: {
5175 Register MaskReg = MI.getOperand(2).getReg();
5176 unsigned MaskSize = MRI.getType(MaskReg).getSizeInBits();
5177 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5178 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, MaskSize);
5179 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, MaskSize);
5180 break;
5181 }
5182 case Intrinsic::amdgcn_wave_reduce_add:
5183 case Intrinsic::amdgcn_wave_reduce_sub:
5184 case Intrinsic::amdgcn_wave_reduce_min:
5185 case Intrinsic::amdgcn_wave_reduce_umin:
5186 case Intrinsic::amdgcn_wave_reduce_max:
5187 case Intrinsic::amdgcn_wave_reduce_umax:
5188 case Intrinsic::amdgcn_wave_reduce_and:
5189 case Intrinsic::amdgcn_wave_reduce_or:
5190 case Intrinsic::amdgcn_wave_reduce_xor: {
5191 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5192 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
5193 unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5194 auto regBankID =
5195 isSALUMapping(MI) ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5196 OpdsMapping[2] = AMDGPU::getValueMapping(regBankID, OpSize);
5197 break;
5198 }
5199 case Intrinsic::amdgcn_s_bitreplicate:
5200 Register MaskReg = MI.getOperand(2).getReg();
5201 unsigned MaskBank = getRegBankID(MaskReg, MRI, AMDGPU::SGPRRegBankID);
5202 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5203 OpdsMapping[2] = AMDGPU::getValueMapping(MaskBank, 32);
5204 }
5205 break;
5206 }
5207 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
5208 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
5209 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_NORET:
5210 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
5211 case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
5212 auto IntrID = AMDGPU::getIntrinsicID(MI);
5213 const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
5214 assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
5215 // Non-images can have complications from operands that allow both SGPR
5216 // and VGPR. For now it's too complicated to figure out the final opcode
5217 // to derive the register bank from the MCInstrDesc.
5218 assert(RSrcIntrin->IsImage);
5219 return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
5220 }
5221 case AMDGPU::G_AMDGPU_BVH_INTERSECT_RAY:
5222 case AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY:
5223 case AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY: {
5224 bool IsDualOrBVH8 =
5225 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH_DUAL_INTERSECT_RAY ||
5226 MI.getOpcode() == AMDGPU::G_AMDGPU_BVH8_INTERSECT_RAY;
5227 unsigned NumMods = IsDualOrBVH8 ? 0 : 1; // Has A16 modifier
5228 unsigned LastRegOpIdx = MI.getNumExplicitOperands() - 1 - NumMods;
5229 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5230 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5231 if (IsDualOrBVH8) {
5232 OpdsMapping[1] = AMDGPU::getValueMapping(
5233 AMDGPU::VGPRRegBankID,
5234 MRI.getType(MI.getOperand(1).getReg()).getSizeInBits());
5235 OpdsMapping[2] = AMDGPU::getValueMapping(
5236 AMDGPU::VGPRRegBankID,
5237 MRI.getType(MI.getOperand(2).getReg()).getSizeInBits());
5238 }
5239 OpdsMapping[LastRegOpIdx] =
5240 getSGPROpMapping(MI.getOperand(LastRegOpIdx).getReg(), MRI, *TRI);
5241 if (LastRegOpIdx == 3) {
5242 // Sequential form: all operands combined into VGPR256/VGPR512
5243 unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
5244 if (Size > 256)
5245 Size = 512;
5246 OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5247 } else {
5248 // NSA form
5249 unsigned FirstSrcOpIdx = IsDualOrBVH8 ? 4 : 2;
5250 for (unsigned I = FirstSrcOpIdx; I < LastRegOpIdx; ++I) {
5251 unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
5252 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
5253 }
5254 }
5255 break;
5256 }
5257 case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
5258 case AMDGPU::G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS: {
5259 auto IntrID = cast<GIntrinsic>(MI).getIntrinsicID();
5260 switch (IntrID) {
5261 case Intrinsic::amdgcn_s_getreg:
5262 case Intrinsic::amdgcn_s_memtime:
5263 case Intrinsic::amdgcn_s_memrealtime:
5264 case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
5265 case Intrinsic::amdgcn_s_sendmsg_rtn: {
5266 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5267 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5268 break;
5269 }
5270 case Intrinsic::amdgcn_global_atomic_csub:
5271 case Intrinsic::amdgcn_global_atomic_fmin_num:
5272 case Intrinsic::amdgcn_global_atomic_fmax_num:
5273 case Intrinsic::amdgcn_flat_atomic_fmin_num:
5274 case Intrinsic::amdgcn_flat_atomic_fmax_num:
5275 case Intrinsic::amdgcn_atomic_cond_sub_u32:
5276 case Intrinsic::amdgcn_global_atomic_ordered_add_b64:
5277 case Intrinsic::amdgcn_global_load_tr_b64:
5278 case Intrinsic::amdgcn_global_load_tr_b128:
5279 case Intrinsic::amdgcn_global_load_tr4_b64:
5280 case Intrinsic::amdgcn_global_load_tr6_b96:
5281 case Intrinsic::amdgcn_ds_load_tr8_b64:
5282 case Intrinsic::amdgcn_ds_load_tr16_b128:
5283 case Intrinsic::amdgcn_ds_load_tr4_b64:
5284 case Intrinsic::amdgcn_ds_load_tr6_b96:
5285 case Intrinsic::amdgcn_flat_load_monitor_b32:
5286 case Intrinsic::amdgcn_flat_load_monitor_b64:
5287 case Intrinsic::amdgcn_flat_load_monitor_b128:
5288 case Intrinsic::amdgcn_global_load_monitor_b32:
5289 case Intrinsic::amdgcn_global_load_monitor_b64:
5290 case Intrinsic::amdgcn_global_load_monitor_b128:
5291 case Intrinsic::amdgcn_ds_read_tr4_b64:
5292 case Intrinsic::amdgcn_ds_read_tr6_b96:
5293 case Intrinsic::amdgcn_ds_read_tr8_b64:
5294 case Intrinsic::amdgcn_ds_read_tr16_b64:
5295 case Intrinsic::amdgcn_ds_atomic_async_barrier_arrive_b64:
5296 case Intrinsic::amdgcn_ds_atomic_barrier_arrive_rtn_b64:
5298 case Intrinsic::amdgcn_ds_ordered_add:
5299 case Intrinsic::amdgcn_ds_ordered_swap: {
5300 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5301 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5302 unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5303 AMDGPU::SGPRRegBankID);
5304 OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
5305 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5306 break;
5307 }
5308 case Intrinsic::amdgcn_ds_append:
5309 case Intrinsic::amdgcn_ds_consume: {
5310 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5311 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5312 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5313 break;
5314 }
5315 case Intrinsic::amdgcn_exp_compr:
5316 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5317 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5318 break;
5319 case Intrinsic::amdgcn_exp:
5320 // FIXME: Could we support packed types here?
5321 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5322 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5323 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5324 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5325 break;
5326 case Intrinsic::amdgcn_exp_row:
5327 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5328 OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5329 OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5330 OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5331 OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
5332 break;
5333 case Intrinsic::amdgcn_s_sendmsg:
5334 case Intrinsic::amdgcn_s_sendmsghalt: {
5335 // This must be an SGPR, but accept a VGPR.
5336 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5337 AMDGPU::SGPRRegBankID);
5338 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5339 break;
5340 }
5341 case Intrinsic::amdgcn_s_setreg: {
5342 // This must be an SGPR, but accept a VGPR.
5343 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5344 AMDGPU::SGPRRegBankID);
5345 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5346 break;
5347 }
5348 case Intrinsic::amdgcn_s_ttracedata: {
5349 // This must be an SGPR, but accept a VGPR.
5350 unsigned Bank =
5351 getRegBankID(MI.getOperand(1).getReg(), MRI, AMDGPU::SGPRRegBankID);
5352 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5353 break;
5354 }
5355 case Intrinsic::amdgcn_end_cf: {
5356 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5357 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5358 break;
5359 }
5360 case Intrinsic::amdgcn_else: {
5361 unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5362 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5363 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5364 OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
5365 break;
5366 }
5367 case Intrinsic::amdgcn_init_whole_wave:
5368 case Intrinsic::amdgcn_live_mask: {
5369 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5370 break;
5371 }
5372 case Intrinsic::amdgcn_wqm_demote:
5373 case Intrinsic::amdgcn_kill: {
5374 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5375 break;
5376 }
5377 case Intrinsic::amdgcn_raw_buffer_load:
5378 case Intrinsic::amdgcn_raw_ptr_buffer_load:
5379 case Intrinsic::amdgcn_raw_atomic_buffer_load:
5380 case Intrinsic::amdgcn_raw_ptr_atomic_buffer_load:
5381 case Intrinsic::amdgcn_raw_tbuffer_load:
5382 case Intrinsic::amdgcn_raw_ptr_tbuffer_load: {
5383 // FIXME: Should make intrinsic ID the last operand of the instruction,
5384 // then this would be the same as store
5385 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5386 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5387 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5388 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5389 break;
5390 }
5391 case Intrinsic::amdgcn_raw_buffer_load_lds:
5392 case Intrinsic::amdgcn_raw_ptr_buffer_load_lds: {
5393 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5394 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5395 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5396 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5397 break;
5398 }
5399 case Intrinsic::amdgcn_raw_buffer_store:
5400 case Intrinsic::amdgcn_raw_ptr_buffer_store:
5401 case Intrinsic::amdgcn_raw_buffer_store_format:
5402 case Intrinsic::amdgcn_raw_ptr_buffer_store_format:
5403 case Intrinsic::amdgcn_raw_tbuffer_store:
5404 case Intrinsic::amdgcn_raw_ptr_tbuffer_store: {
5405 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5406 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5407 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5408 OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5409 break;
5410 }
5411 case Intrinsic::amdgcn_struct_buffer_load:
5412 case Intrinsic::amdgcn_struct_ptr_buffer_load:
5413 case Intrinsic::amdgcn_struct_tbuffer_load:
5414 case Intrinsic::amdgcn_struct_ptr_tbuffer_load:
5415 case Intrinsic::amdgcn_struct_atomic_buffer_load:
5416 case Intrinsic::amdgcn_struct_ptr_atomic_buffer_load: {
5417 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5418 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5419 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5420 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5421 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5422 break;
5423 }
5424 case Intrinsic::amdgcn_struct_buffer_load_lds:
5425 case Intrinsic::amdgcn_struct_ptr_buffer_load_lds: {
5426 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5427 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5428 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5429 OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5430 OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
5431 break;
5432 }
5433 case Intrinsic::amdgcn_struct_buffer_store:
5434 case Intrinsic::amdgcn_struct_ptr_buffer_store:
5435 case Intrinsic::amdgcn_struct_tbuffer_store:
5436 case Intrinsic::amdgcn_struct_ptr_tbuffer_store: {
5437 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5438 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5439 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5440 OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
5441 OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
5442 break;
5443 }
5444 case Intrinsic::amdgcn_init_exec_from_input: {
5445 unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
5446 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
5447 break;
5448 }
5449 case Intrinsic::amdgcn_ds_gws_init:
5450 case Intrinsic::amdgcn_ds_gws_barrier:
5451 case Intrinsic::amdgcn_ds_gws_sema_br: {
5452 OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5453
5454 // This must be an SGPR, but accept a VGPR.
5455 unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5456 AMDGPU::SGPRRegBankID);
5457 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
5458 break;
5459 }
5460 case Intrinsic::amdgcn_ds_gws_sema_v:
5461 case Intrinsic::amdgcn_ds_gws_sema_p:
5462 case Intrinsic::amdgcn_ds_gws_sema_release_all: {
5463 // This must be an SGPR, but accept a VGPR.
5464 unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5465 AMDGPU::SGPRRegBankID);
5466 OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
5467 break;
5468 }
5469 case Intrinsic::amdgcn_global_store_async_from_lds_b8:
5470 case Intrinsic::amdgcn_global_store_async_from_lds_b32:
5471 case Intrinsic::amdgcn_global_store_async_from_lds_b64:
5472 case Intrinsic::amdgcn_global_store_async_from_lds_b128:
5473 case Intrinsic::amdgcn_global_load_async_to_lds_b8:
5474 case Intrinsic::amdgcn_global_load_async_to_lds_b32:
5475 case Intrinsic::amdgcn_global_load_async_to_lds_b64:
5476 case Intrinsic::amdgcn_global_load_async_to_lds_b128:
5477 case Intrinsic::amdgcn_load_to_lds:
5478 case Intrinsic::amdgcn_global_load_lds: {
5479 OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5480 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5481 break;
5482 }
5483 case Intrinsic::amdgcn_lds_direct_load: {
5484 const int M0Idx = MI.getNumOperands() - 1;
5485 Register M0Reg = MI.getOperand(M0Idx).getReg();
5486 unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
5487 unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5488
5489 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
5490 for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
5491 OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
5492
5493 // Must be SGPR, but we must take whatever the original bank is and fix it
5494 // later.
5495 OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
5496 break;
5497 }
5498 case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
5499 case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
5500 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5501 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5502 break;
5503 case Intrinsic::amdgcn_ds_bvh_stack_rtn:
5504 case Intrinsic::amdgcn_ds_bvh_stack_push4_pop1_rtn:
5505 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop1_rtn:
5506 case Intrinsic::amdgcn_ds_bvh_stack_push8_pop2_rtn: {
5507 OpdsMapping[0] =
5508 getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
5509 OpdsMapping[1] =
5510 getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
5511 OpdsMapping[3] =
5512 getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
5513 OpdsMapping[4] =
5514 getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
5515 OpdsMapping[5] =
5516 getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
5517 break;
5518 }
5519 case Intrinsic::amdgcn_s_sleep_var:
5520 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5521 break;
5522 case Intrinsic::amdgcn_s_barrier_join:
5523 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5524 break;
5525 case Intrinsic::amdgcn_s_barrier_init:
5526 case Intrinsic::amdgcn_s_barrier_signal_var:
5527 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5528 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5529 break;
5530 case Intrinsic::amdgcn_s_barrier_signal_isfirst: {
5531 const unsigned ResultSize = 1;
5532 OpdsMapping[0] =
5533 AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, ResultSize);
5534 break;
5535 }
5536 case Intrinsic::amdgcn_s_get_barrier_state:
5537 case Intrinsic::amdgcn_s_get_named_barrier_state: {
5538 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5539 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5540 break;
5541 }
5542 case Intrinsic::amdgcn_pops_exiting_wave_id:
5543 return getDefaultMappingSOP(MI);
5544 case Intrinsic::amdgcn_tensor_load_to_lds_d2:
5545 case Intrinsic::amdgcn_tensor_store_from_lds_d2:
5546 case Intrinsic::amdgcn_tensor_load_to_lds:
5547 case Intrinsic::amdgcn_tensor_store_from_lds: {
5548 // Lie and claim everything is legal, even all operands need to be
5549 // SGPRs. applyMapping will have to deal with it with readfirstlane.
5550 for (unsigned I = 1; I < MI.getNumOperands(); ++I) {
5551 if (MI.getOperand(I).isReg()) {
5552 Register Reg = MI.getOperand(I).getReg();
5553 auto OpBank = getRegBankID(Reg, MRI);
5554 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5555 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5556 }
5557 }
5558 break;
5559 }
5560 case Intrinsic::amdgcn_s_prefetch_data: {
5561 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5562 OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5563 break;
5564 }
5565 case Intrinsic::amdgcn_flat_prefetch:
5566 case Intrinsic::amdgcn_global_prefetch:
5567 return getDefaultMappingVOP(MI);
5568 default:
5570 }
5571 break;
5572 }
5573 case AMDGPU::G_SELECT: {
5574 unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
5575 unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
5576 AMDGPU::SGPRRegBankID);
5577 unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
5578 AMDGPU::SGPRRegBankID);
5579 bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
5580 Op3Bank == AMDGPU::SGPRRegBankID;
5581
5582 unsigned CondBankDefault = SGPRSrcs ?
5583 AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5584 unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
5585 CondBankDefault);
5586 if (CondBank == AMDGPU::SGPRRegBankID)
5587 CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
5588 else if (CondBank == AMDGPU::VGPRRegBankID)
5589 CondBank = AMDGPU::VCCRegBankID;
5590
5591 unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
5592 AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
5593
5594 assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
5595
5596 // TODO: Should report 32-bit for scalar condition type.
5597 if (Size == 64) {
5598 OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5599 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5600 OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5601 OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
5602 } else {
5603 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
5604 OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
5605 OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
5606 OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
5607 }
5608
5609 break;
5610 }
5611
5612 case AMDGPU::G_SI_CALL: {
5613 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
5614 // Lie and claim everything is legal, even though some need to be
5615 // SGPRs. applyMapping will have to deal with it as a waterfall loop.
5616 OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
5617
5618 // Allow anything for implicit arguments
5619 for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
5620 if (MI.getOperand(I).isReg()) {
5621 Register Reg = MI.getOperand(I).getReg();
5622 auto OpBank = getRegBankID(Reg, MRI);
5623 unsigned Size = getSizeInBits(Reg, MRI, *TRI);
5624 OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
5625 }
5626 }
5627 break;
5628 }
5629 case AMDGPU::G_LOAD:
5630 case AMDGPU::G_ZEXTLOAD:
5631 case AMDGPU::G_SEXTLOAD:
5632 return getInstrMappingForLoad(MI);
5633
5634 case AMDGPU::G_ATOMICRMW_XCHG:
5635 case AMDGPU::G_ATOMICRMW_ADD:
5636 case AMDGPU::G_ATOMICRMW_SUB:
5637 case AMDGPU::G_ATOMICRMW_AND:
5638 case AMDGPU::G_ATOMICRMW_OR:
5639 case AMDGPU::G_ATOMICRMW_XOR:
5640 case AMDGPU::G_ATOMICRMW_MAX:
5641 case AMDGPU::G_ATOMICRMW_MIN:
5642 case AMDGPU::G_ATOMICRMW_UMAX:
5643 case AMDGPU::G_ATOMICRMW_UMIN:
5644 case AMDGPU::G_ATOMICRMW_FADD:
5645 case AMDGPU::G_ATOMICRMW_FMIN:
5646 case AMDGPU::G_ATOMICRMW_FMAX:
5647 case AMDGPU::G_ATOMICRMW_UINC_WRAP:
5648 case AMDGPU::G_ATOMICRMW_UDEC_WRAP:
5649 case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG: {
5650 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5651 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5652 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5653 break;
5654 }
5655 case AMDGPU::G_ATOMIC_CMPXCHG: {
5656 OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5657 OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
5658 OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
5659 OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
5660 break;
5661 }
5662 case AMDGPU::G_BRCOND: {
5663 unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
5664 AMDGPU::SGPRRegBankID);
5665 assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
5666 if (Bank != AMDGPU::SGPRRegBankID)
5667 Bank = AMDGPU::VCCRegBankID;
5668
5669 OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
5670 break;
5671 }
5672 case AMDGPU::G_INTRINSIC_FPTRUNC_ROUND:
5673 return getDefaultMappingVOP(MI);
5674 case AMDGPU::G_PREFETCH:
5675 OpdsMapping[0] = getSGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
5676 break;
5677 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_SETUP:
5678 case AMDGPU::G_AMDGPU_WHOLE_WAVE_FUNC_RETURN:
5679 OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
5680 break;
5681 }
5682
5683 return getInstructionMapping(/*ID*/1, /*Cost*/1,
5684 getOperandsMapping(OpdsMapping),
5685 MI.getNumOperands());
5686}
unsigned const MachineRegisterInfo * MRI
static unsigned getIntrinsicID(const SDNode *N)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
Contains the definition of a TargetInstrInfo class that is common to all AMD GPUs.
constexpr LLT S16
constexpr LLT S1
constexpr LLT S32
constexpr LLT S64
AMDGPU Register Bank Select
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
static std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg)
static Register constrainRegToBank(MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
static void extendLow32IntoHigh32(MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
Implement extending a 32-bit value to a 64-bit value.
static unsigned getExtendOp(unsigned Opc)
static bool isVectorRegisterBank(const RegisterBank &Bank)
static unsigned regBankUnion(unsigned RB0, unsigned RB1)
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
static void setRegsToType(MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
Replace the current type each register in Regs has with NewTy.
static void reinsertVectorIndexAdd(MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
Utility function for pushing dynamic vector indexes with a constant offset into waterfall loops.
static LLT widen96To128(LLT Ty)
static LLT getHalfSizedType(LLT Ty)
static unsigned getSBufferLoadCorrespondingBufferLoadOpcode(unsigned Opc)
This file declares the targeting of the RegisterBankInfo class for AMDGPU.
Rewrite undef for PHI
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static GCMetadataPrinterRegistry::Add< ErlangGCPrinter > X("erlang", "erlang-compatible garbage collector")
AMD GCN specific subclass of TargetSubtarget.
Declares convenience wrapper classes for interpreting MachineInstr instances as specific generic oper...
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
#define I(x, y, z)
Definition: MD5.cpp:58
Contains matchers for matching SSA Machine Instructions.
mir Rename Register Operands
This file declares the MachineIRBuilder class.
Register const TargetRegisterInfo * TRI
static bool isReg(const MCInst &MI, unsigned OpNo)
MachineInstr unsigned OpIdx
ConstantRange Range(APInt(BitWidth, Low), APInt(BitWidth, High))
static GCMetadataPrinterRegistry::Add< OcamlGCMetadataPrinter > Y("ocaml", "ocaml 3.10-compatible collector")
static constexpr MCPhysReg SPReg
Interface definition for SIRegisterInfo.
bool applyMappingDynStackAlloc(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
std::pair< Register, unsigned > splitBufferOffsets(MachineIRBuilder &B, Register Offset) const
bool collectWaterfallOperands(SmallSet< Register, 4 > &SGPROperandRegs, MachineInstr &MI, MachineRegisterInfo &MRI, ArrayRef< unsigned > OpIndices) const
const InstructionMapping & getImageMapping(const MachineRegisterInfo &MRI, const MachineInstr &MI, int RsrcIdx) const
InstructionMappings addMappingFromTable(const MachineInstr &MI, const MachineRegisterInfo &MRI, const std::array< unsigned, NumOps > RegSrcOpIdx, ArrayRef< OpRegBankEntry< NumOps > > Table) const
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsicWSideEffects(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool buildVCopy(MachineIRBuilder &B, Register DstReg, Register SrcReg) const
bool executeInWaterfallLoop(MachineIRBuilder &B, iterator_range< MachineBasicBlock::iterator > Range, SmallSet< Register, 4 > &SGPROperandRegs) const
Legalize instruction MI where operands in OpIndices must be SGPRs.
const RegisterBank & getRegBankFromRegClass(const TargetRegisterClass &RC, LLT) const override
Get a register bank that covers RC.
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
bool applyMappingMAD_64_32(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
unsigned getRegBankID(Register Reg, const MachineRegisterInfo &MRI, unsigned Default=AMDGPU::VGPRRegBankID) const
Register handleD16VData(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Reg) const
Handle register layout difference for f16 images for some subtargets.
const RegisterBankInfo::InstructionMapping & getInstrMappingForLoad(const MachineInstr &MI) const
void applyMappingImpl(MachineIRBuilder &Builder, const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
bool applyMappingBFE(MachineIRBuilder &B, const OperandsMapper &OpdMapper, bool Signed) const
bool applyMappingImage(MachineIRBuilder &B, MachineInstr &MI, const OperandsMapper &OpdMapper, int RSrcIdx) const
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool isScalarLoadLegal(const MachineInstr &MI) const
unsigned setBufferOffsets(MachineIRBuilder &B, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment) const
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
bool applyMappingLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper, MachineInstr &MI) const
void split64BitValueForMapping(MachineIRBuilder &B, SmallVector< Register, 2 > &Regs, LLT HalfTy, Register Reg) const
Split 64-bit value Reg into two 32-bit halves and populate them into Regs.
const ValueMapping * getValueMappingForPtr(const MachineRegisterInfo &MRI, Register Ptr) const
Return the mapping for a pointer argument.
unsigned getMappingType(const MachineRegisterInfo &MRI, const MachineInstr &MI) const
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
bool isDivergentRegBank(const RegisterBank *RB) const override
Returns true if the register bank is considered divergent.
void constrainOpWithReadfirstlane(MachineIRBuilder &B, MachineInstr &MI, unsigned OpIdx) const
InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const override
Get the alternative mappings for MI.
const InstructionMapping & getDefaultMappingSOP(const MachineInstr &MI) const
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
unsigned getBreakDownCost(const ValueMapping &ValMapping, const RegisterBank *CurBank=nullptr) const override
Get the cost of using ValMapping to decompose a register.
const ValueMapping * getAGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
const InstructionMapping & getDefaultMappingVOP(const MachineInstr &MI) const
bool isSALUMapping(const MachineInstr &MI) const
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
bool applyMappingSBufferLoad(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
void applyMappingSMULU64(MachineIRBuilder &B, const OperandsMapper &OpdMapper) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:678
@ ICMP_SLT
signed less than
Definition: InstrTypes.h:707
@ ICMP_EQ
equal
Definition: InstrTypes.h:699
@ ICMP_NE
not equal
Definition: InstrTypes.h:700
This class represents an Operation in the Expression.
A debug info location.
Definition: DebugLoc.h:124
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:177
iterator end()
Definition: DenseMap.h:87
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition: DenseMap.h:230
static constexpr ElementCount getFixed(ScalarTy MinVal)
Definition: TypeSize.h:312
bool hasScalarCompareEq64() const
bool hasSafeSmemPrefetch() const
bool hasScalarSubwordLoads() const
Definition: GCNSubtarget.h:500
bool hasFullRate64Ops() const
Definition: GCNSubtarget.h:420
bool hasIntMinMax64() const
bool hasVmemPrefInsts() const
bool isWave32() const
bool hasScalarDwordx3Loads() const
bool hasVectorMulU64() const
bool hasScalarMulHiInsts() const
Definition: GCNSubtarget.h:496
bool hasPseudoScalarTrans() const
bool useFlatForGlobal() const
Definition: GCNSubtarget.h:576
Generation getGeneration() const
Definition: GCNSubtarget.h:356
bool hasUnpackedD16VMem() const
Definition: GCNSubtarget.h:787
bool hasSALUFloatInsts() const
Abstract class that contains various methods for clients to notify about changes.
virtual void changingInstr(MachineInstr &MI)=0
This instruction is about to be mutated in some way.
virtual void changedInstr(MachineInstr &MI)=0
This instruction was mutated in some way.
virtual void createdInstr(MachineInstr &MI)=0
An instruction has been created and inserted into the function.
virtual void erasingInstr(MachineInstr &MI)=0
An instruction is about to be erased.
constexpr unsigned getScalarSizeInBits() const
Definition: LowLevelType.h:265
constexpr bool isScalar() const
Definition: LowLevelType.h:147
static constexpr LLT scalar(unsigned SizeInBits)
Get a low-level scalar or aggregate "bag of bits".
Definition: LowLevelType.h:43
constexpr bool isValid() const
Definition: LowLevelType.h:146
constexpr uint16_t getNumElements() const
Returns the number of elements in a vector LLT.
Definition: LowLevelType.h:160
constexpr bool isVector() const
Definition: LowLevelType.h:149
constexpr TypeSize getSizeInBits() const
Returns the total size of the type. Must only be called on sized types.
Definition: LowLevelType.h:191
constexpr LLT getElementType() const
Returns the vector's element type. Only valid for vector types.
Definition: LowLevelType.h:278
constexpr ElementCount getElementCount() const
Definition: LowLevelType.h:184
constexpr unsigned getAddressSpace() const
Definition: LowLevelType.h:271
static constexpr LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
Definition: LowLevelType.h:101
constexpr LLT getScalarType() const
Definition: LowLevelType.h:206
static constexpr LLT scalarOrVector(ElementCount EC, LLT ScalarTy)
Definition: LowLevelType.h:125
constexpr LLT divide(int Factor) const
Return a type that is Factor times smaller.
Definition: LowLevelType.h:235
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:68
LLVM_ABI void widenScalarSrc(MachineInstr &MI, LLT WideTy, unsigned OpIdx, unsigned ExtOpcode)
Legalize a single operand OpIdx of the machine instruction MI as a Use by extending the operand's typ...
LLVM_ABI LegalizeResult lowerAbsToMaxNeg(MachineInstr &MI)
LLVM_ABI LegalizeResult narrowScalar(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize an instruction by reducing the width of the underlying scalar type.
LLVM_ABI LegalizeResult reduceLoadStoreWidth(GLoadStore &MI, unsigned TypeIdx, LLT NarrowTy)
@ Legalized
Instruction has been legalized and the MachineFunction changed.
LLVM_ABI LegalizeResult fewerElementsVector(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy)
Legalize a vector instruction by splitting into multiple components, each acting on the same scalar t...
LLVM_ABI LegalizeResult widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy)
Legalize an instruction by performing the operation on a wider scalar type (for example a 16-bit addi...
LLVM_ABI void widenScalarDst(MachineInstr &MI, LLT WideTy, unsigned OpIdx=0, unsigned TruncOpcode=TargetOpcode::G_TRUNC)
Legalize a single operand OpIdx of the machine instruction MI as a Def by extending the operand's typ...
TypeSize getValue() const
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
Helper class to build MachineInstr.
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
MachineBasicBlock::iterator begin()
MachineBasicBlock::iterator end()
Representation of each machine instruction.
Definition: MachineInstr.h:72
const MachineBasicBlock * getParent() const
Definition: MachineInstr.h:359
const MachineOperand & getOperand(unsigned i) const
Definition: MachineInstr.h:595
A description of a memory reference used in the backend.
LocationSize getSize() const
Return the size in bytes of the memory reference.
unsigned getAddrSpace() const
bool isAtomic() const
Returns true if this operation has an atomic ordering requirement of unordered or higher,...
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
@ MOLoad
The memory access reads data.
@ MOInvariant
The memory access always returns the same value (or traps).
Flags getFlags() const
Return the raw flags of the source value,.
LLVM_ABI Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
MachineOperand class - Representation of each machine instruction operand.
LLVM_ABI void setReg(Register Reg)
Change the register this operand corresponds to.
Register getReg() const
getReg - Returns the register number.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Helper class that represents how the value of an instruction may be mapped and what is the related co...
bool isValid() const
Check whether this object is valid.
Helper class used to get/create the virtual registers that will be used to replace the MachineOperand...
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
iterator_range< SmallVectorImpl< Register >::const_iterator > getVRegs(unsigned OpIdx, bool ForDebug=false) const
Get all the virtual registers required to map the OpIdx-th operand of the instruction.
virtual InstructionMappings getInstrAlternativeMappings(const MachineInstr &MI) const
Get the alternative mappings for MI.
static const TargetRegisterClass * constrainGenericRegister(Register Reg, const TargetRegisterClass &RC, MachineRegisterInfo &MRI)
Constrain the (possibly generic) virtual register Reg to RC.
const InstructionMapping & getInstructionMapping(unsigned ID, unsigned Cost, const ValueMapping *OperandsMapping, unsigned NumOperands) const
Method to get a uniquely generated InstructionMapping.
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
const InstructionMapping & getInvalidInstructionMapping() const
Method to get a uniquely generated invalid InstructionMapping.
const RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
const unsigned * Sizes
Hold the sizes of the register banks for all HwModes.
bool cannotCopy(const RegisterBank &Dst, const RegisterBank &Src, TypeSize Size) const
TypeSize getSizeInBits(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Get the size in bits of Reg.
const ValueMapping * getOperandsMapping(Iterator Begin, Iterator End) const
Get the uniquely generated array of ValueMapping for the elements of between Begin and End.
virtual unsigned copyCost(const RegisterBank &A, const RegisterBank &B, TypeSize Size) const
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
This class implements the register bank concept.
Definition: RegisterBank.h:29
unsigned getID() const
Get the identifier of this register bank.
Definition: RegisterBank.h:46
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition: Register.h:74
bool splitMUBUFOffset(uint32_t Imm, uint32_t &SOffset, uint32_t &ImmOffset, Align Alignment=Align(4)) const
static unsigned getMaxMUBUFImmOffset(const GCNSubtarget &ST)
This class keeps track of the SPI_SP_INPUT_ADDR config register, which tells the hardware which inter...
const TargetRegisterClass * getWaveMaskRegClass() const
static bool isSGPRClass(const TargetRegisterClass *RC)
static bool isAGPRClass(const TargetRegisterClass *RC)
static bool shouldExpandVectorDynExt(unsigned EltSize, unsigned NumElem, bool IsDivergentIdx, const GCNSubtarget *Subtarget)
Check if EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT (<n x e>, var-idx) should be expanded into a set of cmp...
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition: SmallSet.h:134
size_type count(const T &V) const
count - Return 1 if the element is in the set, 0 otherwise.
Definition: SmallSet.h:176
bool empty() const
Definition: SmallSet.h:169
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition: SmallSet.h:182
bool empty() const
Definition: SmallVector.h:82
size_t size() const
Definition: SmallVector.h:79
void resize(size_type N)
Definition: SmallVector.h:639
void push_back(const T &Elt)
Definition: SmallVector.h:414
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197
Register getReg() const
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:346
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
constexpr bool isKnownMultipleOf(ScalarTy RHS) const
This function tells the caller whether the element count is known at compile time to be a multiple of...
Definition: TypeSize.h:184
constexpr LeafTy divideCoefficientBy(ScalarTy RHS) const
We do not provide the '/' operator here because division for polynomial types does not work in the sa...
Definition: TypeSize.h:255
self_iterator getIterator()
Definition: ilist_node.h:134
A range adaptor for a pair of iterators.
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ CONSTANT_ADDRESS_32BIT
Address space for 32-bit constant memory.
@ REGION_ADDRESS
Address space for region memory. (GDS)
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ PRIVATE_ADDRESS
Address space for private memory.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
bool isFlatGlobalAddrSpace(unsigned AS)
bool isUniformMMO(const MachineMemOperand *MMO)
bool isExtendedGlobalAddrSpace(unsigned AS)
Intrinsic::ID getIntrinsicID(const MachineInstr &I)
Return the intrinsic ID for opcodes with the G_AMDGPU_INTRIN_ prefix.
std::pair< Register, unsigned > getBaseWithConstantOffset(MachineRegisterInfo &MRI, Register Reg, GISelValueTracking *ValueTracking=nullptr, bool CheckNUW=false)
Returns base register and constant offset.
const RsrcIntrinsic * lookupRsrcIntrinsic(unsigned Intr)
operand_type_match m_Reg()
SpecificConstantOrSplatMatch m_SpecificICstOrSplat(APInt RequestedValue)
Matches a RequestedValue constant or a constant splat of RequestedValue.
SpecificConstantMatch m_ZeroInt()
Convenience matchers for specific integer values.
ConstantMatch< APInt > m_ICst(APInt &Cst)
BinaryOp_match< LHS, RHS, TargetOpcode::G_ADD, true > m_GAdd(const LHS &L, const RHS &R)
bool mi_match(Reg R, const MachineRegisterInfo &MRI, Pattern &&P)
@ Kill
The last use of a register.
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:477
LLVM_ABI MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition: Utils.cpp:651
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
LLVM_ABI bool constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition: Utils.cpp:155
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
LLVM_ABI std::optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition: Utils.cpp:314
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition: SIInstrInfo.h:44
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:428
@ Add
Sum of integers.
void call_once(once_flag &flag, Function &&F, Args &&... ArgList)
Execute the function specified as a parameter once.
Definition: Threading.h:86
LLVM_ABI std::optional< ValueAndVReg > getIConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs=true)
If VReg is defined by a statically evaluable chain of instructions rooted on a G_CONSTANT returns its...
Definition: Utils.cpp:433
Align assumeAligned(uint64_t Value)
Treats the value 0 as a 1, so Align is always at least 1.
Definition: Alignment.h:111
unsigned Log2(Align A)
Returns the log2 of the alignment.
Definition: Alignment.h:208
LLVM_ABI Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition: Utils.cpp:499
@ Default
The result values are uniform if and only if all operands are uniform.
#define N
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
This class contains a discriminated union of information about pointers in memory operands,...
unsigned StartIdx
Number of bits at which this partial mapping starts in the original value.
const RegisterBank * RegBank
Register bank where the partial value lives.
unsigned Length
Length of this mapping in bits.
Helper struct that represents how a value is mapped through different register banks.
unsigned NumBreakDowns
Number of partial mapping to break down this value.
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
The llvm::once_flag structure.
Definition: Threading.h:67