LLVM 22.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUMemoryUtils.h"
181#include "AMDGPUTargetMachine.h"
182#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
191#include "llvm/IR/Constants.h"
192#include "llvm/IR/DerivedTypes.h"
193#include "llvm/IR/Dominators.h"
194#include "llvm/IR/IRBuilder.h"
195#include "llvm/IR/InlineAsm.h"
196#include "llvm/IR/Instructions.h"
197#include "llvm/IR/IntrinsicsAMDGPU.h"
198#include "llvm/IR/MDBuilder.h"
201#include "llvm/Pass.h"
203#include "llvm/Support/Debug.h"
204#include "llvm/Support/Format.h"
209
210#include <vector>
211
212#include <cstdio>
213
214#define DEBUG_TYPE "amdgpu-lower-module-lds"
215
216using namespace llvm;
217using namespace AMDGPU;
218
219namespace {
220
221cl::opt<bool> SuperAlignLDSGlobals(
222 "amdgpu-super-align-lds-globals",
223 cl::desc("Increase alignment of LDS if it is not on align boundary"),
224 cl::init(true), cl::Hidden);
225
226enum class LoweringKind { module, table, kernel, hybrid };
227cl::opt<LoweringKind> LoweringKindLoc(
228 "amdgpu-lower-module-lds-strategy",
229 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
230 cl::init(LoweringKind::hybrid),
232 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
233 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
235 LoweringKind::kernel, "kernel",
236 "Lower variables reachable from one kernel, otherwise abort"),
237 clEnumValN(LoweringKind::hybrid, "hybrid",
238 "Lower via mixture of above strategies")));
239
240template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
241 llvm::sort(V, [](const auto *L, const auto *R) {
242 return L->getName() < R->getName();
243 });
244 return {std::move(V)};
245}
246
247class AMDGPULowerModuleLDS {
248 const AMDGPUTargetMachine &TM;
249
250 static void
251 removeLocalVarsFromUsedLists(Module &M,
252 const DenseSet<GlobalVariable *> &LocalVars) {
253 // The verifier rejects used lists containing an inttoptr of a constant
254 // so remove the variables from these lists before replaceAllUsesWith
255 SmallPtrSet<Constant *, 8> LocalVarsSet;
256 for (GlobalVariable *LocalVar : LocalVars)
257 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
258
260 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
261
262 for (GlobalVariable *LocalVar : LocalVars)
263 LocalVar->removeDeadConstantUsers();
264 }
265
266 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
267 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
268 // that might call a function which accesses a field within it. This is
269 // presently approximated to 'all kernels' if there are any such functions
270 // in the module. This implicit use is redefined as an explicit use here so
271 // that later passes, specifically PromoteAlloca, account for the required
272 // memory without any knowledge of this transform.
273
274 // An operand bundle on llvm.donothing works because the call instruction
275 // survives until after the last pass that needs to account for LDS. It is
276 // better than inline asm as the latter survives until the end of codegen. A
277 // totally robust solution would be a function with the same semantics as
278 // llvm.donothing that takes a pointer to the instance and is lowered to a
279 // no-op after LDS is allocated, but that is not presently necessary.
280
281 // This intrinsic is eliminated shortly before instruction selection. It
282 // does not suffice to indicate to ISel that a given global which is not
283 // immediately used by the kernel must still be allocated by it. An
284 // equivalent target specific intrinsic which lasts until immediately after
285 // codegen would suffice for that, but one would still need to ensure that
286 // the variables are allocated in the anticipated order.
287 BasicBlock *Entry = &Func->getEntryBlock();
288 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
289
291 Func->getParent(), Intrinsic::donothing, {});
292
293 Value *UseInstance[1] = {
294 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
295
296 Builder.CreateCall(
297 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
298 }
299
300public:
301 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
302
303 struct LDSVariableReplacement {
304 GlobalVariable *SGV = nullptr;
305 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
306 };
307
308 // remap from lds global to a constantexpr gep to where it has been moved to
309 // for each kernel
310 // an array with an element for each kernel containing where the corresponding
311 // variable was remapped to
312
313 static Constant *getAddressesOfVariablesInKernel(
315 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
316 // Create a ConstantArray containing the address of each Variable within the
317 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
318 // does not allocate it
319 // TODO: Drop the ptrtoint conversion
320
321 Type *I32 = Type::getInt32Ty(Ctx);
322
323 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
324
326 for (GlobalVariable *GV : Variables) {
327 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
328 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
329 auto *elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
330 Elements.push_back(elt);
331 } else {
332 Elements.push_back(PoisonValue::get(I32));
333 }
334 }
335 return ConstantArray::get(KernelOffsetsType, Elements);
336 }
337
338 static GlobalVariable *buildLookupTable(
340 ArrayRef<Function *> kernels,
342 if (Variables.empty()) {
343 return nullptr;
344 }
345 LLVMContext &Ctx = M.getContext();
346
347 const size_t NumberVariables = Variables.size();
348 const size_t NumberKernels = kernels.size();
349
350 ArrayType *KernelOffsetsType =
351 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
352
353 ArrayType *AllKernelsOffsetsType =
354 ArrayType::get(KernelOffsetsType, NumberKernels);
355
356 Constant *Missing = PoisonValue::get(KernelOffsetsType);
357 std::vector<Constant *> overallConstantExprElts(NumberKernels);
358 for (size_t i = 0; i < NumberKernels; i++) {
359 auto Replacement = KernelToReplacement.find(kernels[i]);
360 overallConstantExprElts[i] =
361 (Replacement == KernelToReplacement.end())
362 ? Missing
363 : getAddressesOfVariablesInKernel(
364 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
365 }
366
367 Constant *init =
368 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
369
370 return new GlobalVariable(
371 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
372 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
374 }
375
376 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
377 GlobalVariable *LookupTable,
378 GlobalVariable *GV, Use &U,
379 Value *OptionalIndex) {
380 // Table is a constant array of the same length as OrderedKernels
381 LLVMContext &Ctx = M.getContext();
382 Type *I32 = Type::getInt32Ty(Ctx);
383 auto *I = cast<Instruction>(U.getUser());
384
385 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
386
387 if (auto *Phi = dyn_cast<PHINode>(I)) {
388 BasicBlock *BB = Phi->getIncomingBlock(U);
389 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
390 } else {
391 Builder.SetInsertPoint(I);
392 }
393
394 SmallVector<Value *, 3> GEPIdx = {
395 ConstantInt::get(I32, 0),
396 tableKernelIndex,
397 };
398 if (OptionalIndex)
399 GEPIdx.push_back(OptionalIndex);
400
402 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
403
404 Value *loaded = Builder.CreateLoad(I32, Address);
405
406 Value *replacement =
407 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
408
409 U.set(replacement);
410 }
411
412 void replaceUsesInInstructionsWithTableLookup(
413 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
414 GlobalVariable *LookupTable) {
415
416 LLVMContext &Ctx = M.getContext();
417 IRBuilder<> Builder(Ctx);
418 Type *I32 = Type::getInt32Ty(Ctx);
419
420 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
421 auto *GV = ModuleScopeVariables[Index];
422
423 for (Use &U : make_early_inc_range(GV->uses())) {
424 auto *I = dyn_cast<Instruction>(U.getUser());
425 if (!I)
426 continue;
427
428 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
429 ConstantInt::get(I32, Index));
430 }
431 }
432 }
433
434 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
435 Module &M, LDSUsesInfoTy &LDSUsesInfo,
436 DenseSet<GlobalVariable *> const &VariableSet) {
437
438 DenseSet<Function *> KernelSet;
439
440 if (VariableSet.empty())
441 return KernelSet;
442
443 for (Function &Func : M.functions()) {
444 if (Func.isDeclaration() || !isKernelLDS(&Func))
445 continue;
446 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
447 if (VariableSet.contains(GV)) {
448 KernelSet.insert(&Func);
449 break;
450 }
451 }
452 }
453
454 return KernelSet;
455 }
456
457 static GlobalVariable *
458 chooseBestVariableForModuleStrategy(const DataLayout &DL,
459 VariableFunctionMap &LDSVars) {
460 // Find the global variable with the most indirect uses from kernels
461
462 struct CandidateTy {
463 GlobalVariable *GV = nullptr;
464 size_t UserCount = 0;
465 size_t Size = 0;
466
467 CandidateTy() = default;
468
469 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
470 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
471
472 bool operator<(const CandidateTy &Other) const {
473 // Fewer users makes module scope variable less attractive
474 if (UserCount < Other.UserCount) {
475 return true;
476 }
477 if (UserCount > Other.UserCount) {
478 return false;
479 }
480
481 // Bigger makes module scope variable less attractive
482 if (Size < Other.Size) {
483 return false;
484 }
485
486 if (Size > Other.Size) {
487 return true;
488 }
489
490 // Arbitrary but consistent
491 return GV->getName() < Other.GV->getName();
492 }
493 };
494
495 CandidateTy MostUsed;
496
497 for (auto &K : LDSVars) {
498 GlobalVariable *GV = K.first;
499 if (K.second.size() <= 1) {
500 // A variable reachable by only one kernel is best lowered with kernel
501 // strategy
502 continue;
503 }
504 CandidateTy Candidate(
505 GV, K.second.size(),
506 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
507 if (MostUsed < Candidate)
508 MostUsed = Candidate;
509 }
510
511 return MostUsed.GV;
512 }
513
514 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
516 // Write the specified address into metadata where it can be retrieved by
517 // the assembler. Format is a half open range, [Address Address+1)
518 LLVMContext &Ctx = M->getContext();
519 auto *IntTy =
520 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
521 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
522 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
523 GV->setMetadata(LLVMContext::MD_absolute_symbol,
524 MDNode::get(Ctx, {MinC, MaxC}));
525 }
526
527 DenseMap<Function *, Value *> tableKernelIndexCache;
528 Value *getTableLookupKernelIndex(Module &M, Function *F) {
529 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
530 // lowers to a read from a live in register. Emit it once in the entry
531 // block to spare deduplicating it later.
532 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
533 if (Inserted) {
534 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
535 IRBuilder<> Builder(&*InsertAt);
536
537 It->second = Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {});
538 }
539
540 return It->second;
541 }
542
543 static std::vector<Function *> assignLDSKernelIDToEachKernel(
544 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
545 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
546 // Associate kernels in the set with an arbitrary but reproducible order and
547 // annotate them with that order in metadata. This metadata is recognised by
548 // the backend and lowered to a SGPR which can be read from using
549 // amdgcn_lds_kernel_id.
550
551 std::vector<Function *> OrderedKernels;
552 if (!KernelsThatAllocateTableLDS.empty() ||
553 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
554
555 for (Function &Func : M->functions()) {
556 if (Func.isDeclaration())
557 continue;
558 if (!isKernelLDS(&Func))
559 continue;
560
561 if (KernelsThatAllocateTableLDS.contains(&Func) ||
562 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
563 assert(Func.hasName()); // else fatal error earlier
564 OrderedKernels.push_back(&Func);
565 }
566 }
567
568 // Put them in an arbitrary but reproducible order
569 OrderedKernels = sortByName(std::move(OrderedKernels));
570
571 // Annotate the kernels with their order in this vector
572 LLVMContext &Ctx = M->getContext();
573 IRBuilder<> Builder(Ctx);
574
575 if (OrderedKernels.size() > UINT32_MAX) {
576 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
577 reportFatalUsageError("unimplemented LDS lowering for > 2**32 kernels");
578 }
579
580 for (size_t i = 0; i < OrderedKernels.size(); i++) {
581 Metadata *AttrMDArgs[1] = {
583 };
584 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
585 MDNode::get(Ctx, AttrMDArgs));
586 }
587 }
588 return OrderedKernels;
589 }
590
591 static void partitionVariablesIntoIndirectStrategies(
592 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
593 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
594 DenseSet<GlobalVariable *> &ModuleScopeVariables,
595 DenseSet<GlobalVariable *> &TableLookupVariables,
596 DenseSet<GlobalVariable *> &KernelAccessVariables,
597 DenseSet<GlobalVariable *> &DynamicVariables) {
598
599 GlobalVariable *HybridModuleRoot =
600 LoweringKindLoc != LoweringKind::hybrid
601 ? nullptr
602 : chooseBestVariableForModuleStrategy(
603 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
604
605 DenseSet<Function *> const EmptySet;
606 DenseSet<Function *> const &HybridModuleRootKernels =
607 HybridModuleRoot
608 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
609 : EmptySet;
610
611 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
612 // Each iteration of this loop assigns exactly one global variable to
613 // exactly one of the implementation strategies.
614
615 GlobalVariable *GV = K.first;
617 assert(K.second.size() != 0);
618
619 if (AMDGPU::isDynamicLDS(*GV)) {
620 DynamicVariables.insert(GV);
621 continue;
622 }
623
624 switch (LoweringKindLoc) {
625 case LoweringKind::module:
626 ModuleScopeVariables.insert(GV);
627 break;
628
629 case LoweringKind::table:
630 TableLookupVariables.insert(GV);
631 break;
632
633 case LoweringKind::kernel:
634 if (K.second.size() == 1) {
635 KernelAccessVariables.insert(GV);
636 } else {
637 // FIXME: This should use DiagnosticInfo
639 "cannot lower LDS '" + GV->getName() +
640 "' to kernel access as it is reachable from multiple kernels");
641 }
642 break;
643
644 case LoweringKind::hybrid: {
645 if (GV == HybridModuleRoot) {
646 assert(K.second.size() != 1);
647 ModuleScopeVariables.insert(GV);
648 } else if (K.second.size() == 1) {
649 KernelAccessVariables.insert(GV);
650 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
651 ModuleScopeVariables.insert(GV);
652 } else {
653 TableLookupVariables.insert(GV);
654 }
655 break;
656 }
657 }
658 }
659
660 // All LDS variables accessed indirectly have now been partitioned into
661 // the distinct lowering strategies.
662 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
663 KernelAccessVariables.size() + DynamicVariables.size() ==
664 LDSToKernelsThatNeedToAccessItIndirectly.size());
665 }
666
667 static GlobalVariable *lowerModuleScopeStructVariables(
668 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
669 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
670 // Create a struct to hold the ModuleScopeVariables
671 // Replace all uses of those variables from non-kernel functions with the
672 // new struct instance Replace only the uses from kernel functions that will
673 // allocate this instance. That is a space optimisation - kernels that use a
674 // subset of the module scope struct and do not need to allocate it for
675 // indirect calls will only allocate the subset they use (they do so as part
676 // of the per-kernel lowering).
677 if (ModuleScopeVariables.empty()) {
678 return nullptr;
679 }
680
681 LLVMContext &Ctx = M.getContext();
682
683 LDSVariableReplacement ModuleScopeReplacement =
684 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
685 ModuleScopeVariables);
686
687 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
689 cast<Constant>(ModuleScopeReplacement.SGV),
690 PointerType::getUnqual(Ctx)))});
691
692 // module.lds will be allocated at zero in any kernel that allocates it
693 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
694
695 // historic
696 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
697
698 // Replace all uses of module scope variable from non-kernel functions
699 replaceLDSVariablesWithStruct(
700 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
701 Instruction *I = dyn_cast<Instruction>(U.getUser());
702 if (!I) {
703 return false;
704 }
705 Function *F = I->getFunction();
706 return !isKernelLDS(F);
707 });
708
709 // Replace uses of module scope variable from kernel functions that
710 // allocate the module scope variable, otherwise leave them unchanged
711 // Record on each kernel whether the module scope global is used by it
712
713 for (Function &Func : M.functions()) {
714 if (Func.isDeclaration() || !isKernelLDS(&Func))
715 continue;
716
717 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
718 replaceLDSVariablesWithStruct(
719 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
720 Instruction *I = dyn_cast<Instruction>(U.getUser());
721 if (!I) {
722 return false;
723 }
724 Function *F = I->getFunction();
725 return F == &Func;
726 });
727
728 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
729 }
730 }
731
732 return ModuleScopeReplacement.SGV;
733 }
734
736 lowerKernelScopeStructVariables(
737 Module &M, LDSUsesInfoTy &LDSUsesInfo,
738 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
739 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
740 GlobalVariable *MaybeModuleScopeStruct) {
741
742 // Create a struct for each kernel for the non-module-scope variables.
743
745 for (Function &Func : M.functions()) {
746 if (Func.isDeclaration() || !isKernelLDS(&Func))
747 continue;
748
749 DenseSet<GlobalVariable *> KernelUsedVariables;
750 // Allocating variables that are used directly in this struct to get
751 // alignment aware allocation and predictable frame size.
752 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
753 if (!AMDGPU::isDynamicLDS(*v)) {
754 KernelUsedVariables.insert(v);
755 }
756 }
757
758 // Allocating variables that are accessed indirectly so that a lookup of
759 // this struct instance can find them from nested functions.
760 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
761 if (!AMDGPU::isDynamicLDS(*v)) {
762 KernelUsedVariables.insert(v);
763 }
764 }
765
766 // Variables allocated in module lds must all resolve to that struct,
767 // not to the per-kernel instance.
768 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
769 for (GlobalVariable *v : ModuleScopeVariables) {
770 KernelUsedVariables.erase(v);
771 }
772 }
773
774 if (KernelUsedVariables.empty()) {
775 // Either used no LDS, or the LDS it used was all in the module struct
776 // or dynamically sized
777 continue;
778 }
779
780 // The association between kernel function and LDS struct is done by
781 // symbol name, which only works if the function in question has a
782 // name This is not expected to be a problem in practice as kernels
783 // are called by name making anonymous ones (which are named by the
784 // backend) difficult to use. This does mean that llvm test cases need
785 // to name the kernels.
786 if (!Func.hasName()) {
787 reportFatalUsageError("anonymous kernels cannot use LDS variables");
788 }
789
790 std::string VarName =
791 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
792
793 auto Replacement =
794 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
795
796 // If any indirect uses, create a direct use to ensure allocation
797 // TODO: Simpler to unconditionally mark used but that regresses
798 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
799 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
800 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
801 !Accesses->second.empty())
802 markUsedByKernel(&Func, Replacement.SGV);
803
804 // remove preserves existing codegen
805 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
806 KernelToReplacement[&Func] = Replacement;
807
808 // Rewrite uses within kernel to the new struct
809 replaceLDSVariablesWithStruct(
810 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
811 Instruction *I = dyn_cast<Instruction>(U.getUser());
812 return I && I->getFunction() == &Func;
813 });
814 }
815 return KernelToReplacement;
816 }
817
818 static GlobalVariable *
819 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
820 Function *func) {
821 // Create a dynamic lds variable with a name associated with the passed
822 // function that has the maximum alignment of any dynamic lds variable
823 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
824 // allocation, possibly after alignment padding. The representative variable
825 // created here has the maximum alignment of any other dynamic variable
826 // reachable by that kernel. All dynamic LDS variables are allocated at the
827 // same address in each kernel in order to provide the documented aliasing
828 // semantics. Setting the alignment here allows this IR pass to accurately
829 // predict the exact constant at which it will be allocated.
830
832
833 LLVMContext &Ctx = M.getContext();
834 const DataLayout &DL = M.getDataLayout();
835 Align MaxDynamicAlignment(1);
836
837 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
838 if (AMDGPU::isDynamicLDS(*GV)) {
839 MaxDynamicAlignment =
840 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
841 }
842 };
843
844 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
845 UpdateMaxAlignment(GV);
846 }
847
848 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
849 UpdateMaxAlignment(GV);
850 }
851
852 assert(func->hasName()); // Checked by caller
853 auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
855 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
856 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
857 false);
858 N->setAlignment(MaxDynamicAlignment);
859
861 return N;
862 }
863
864 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
865 Module &M, LDSUsesInfoTy &LDSUsesInfo,
866 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
867 DenseSet<GlobalVariable *> const &DynamicVariables,
868 std::vector<Function *> const &OrderedKernels) {
869 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
870 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
871 LLVMContext &Ctx = M.getContext();
872 IRBuilder<> Builder(Ctx);
873 Type *I32 = Type::getInt32Ty(Ctx);
874
875 std::vector<Constant *> newDynamicLDS;
876
877 // Table is built in the same order as OrderedKernels
878 for (auto &func : OrderedKernels) {
879
880 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
882 if (!func->hasName()) {
883 reportFatalUsageError("anonymous kernels cannot use LDS variables");
884 }
885
887 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
888
889 KernelToCreatedDynamicLDS[func] = N;
890
891 markUsedByKernel(func, N);
892
893 auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
895 emptyCharArray, N, ConstantInt::get(I32, 0), true);
896 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
897 } else {
898 newDynamicLDS.push_back(PoisonValue::get(I32));
899 }
900 }
901 assert(OrderedKernels.size() == newDynamicLDS.size());
902
903 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
904 Constant *init = ConstantArray::get(t, newDynamicLDS);
905 GlobalVariable *table = new GlobalVariable(
906 M, t, true, GlobalValue::InternalLinkage, init,
907 "llvm.amdgcn.dynlds.offset.table", nullptr,
909
910 for (GlobalVariable *GV : DynamicVariables) {
911 for (Use &U : make_early_inc_range(GV->uses())) {
912 auto *I = dyn_cast<Instruction>(U.getUser());
913 if (!I)
914 continue;
915 if (isKernelLDS(I->getFunction()))
916 continue;
917
918 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
919 }
920 }
921 }
922 return KernelToCreatedDynamicLDS;
923 }
924
925 static GlobalVariable *uniquifyGVPerKernel(Module &M, GlobalVariable *GV,
926 Function *KF) {
927 bool NeedsReplacement = false;
928 for (Use &U : GV->uses()) {
929 if (auto *I = dyn_cast<Instruction>(U.getUser())) {
930 Function *F = I->getFunction();
931 if (isKernelLDS(F) && F != KF) {
932 NeedsReplacement = true;
933 break;
934 }
935 }
936 }
937 if (!NeedsReplacement)
938 return GV;
939 // Create a new GV used only by this kernel and its function
940 GlobalVariable *NewGV = new GlobalVariable(
941 M, GV->getValueType(), GV->isConstant(), GV->getLinkage(),
942 GV->getInitializer(), GV->getName() + "." + KF->getName(), nullptr,
944 NewGV->copyAttributesFrom(GV);
945 for (Use &U : make_early_inc_range(GV->uses())) {
946 if (auto *I = dyn_cast<Instruction>(U.getUser())) {
947 Function *F = I->getFunction();
948 if (!isKernelLDS(F) || F == KF) {
949 U.getUser()->replaceUsesOfWith(GV, NewGV);
950 }
951 }
952 }
953 return NewGV;
954 }
955
956 bool lowerSpecialLDSVariables(
957 Module &M, LDSUsesInfoTy &LDSUsesInfo,
958 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly) {
959 bool Changed = false;
960 const DataLayout &DL = M.getDataLayout();
961 // The 1st round: give module-absolute assignments
962 int NumAbsolutes = 0;
963 std::vector<GlobalVariable *> OrderedGVs;
964 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
965 GlobalVariable *GV = K.first;
966 if (!isNamedBarrier(*GV))
967 continue;
968 // give a module-absolute assignment if it is indirectly accessed by
969 // multiple kernels. This is not precise, but we don't want to duplicate
970 // a function when it is called by multiple kernels.
971 if (LDSToKernelsThatNeedToAccessItIndirectly[GV].size() > 1) {
972 OrderedGVs.push_back(GV);
973 } else {
974 // leave it to the 2nd round, which will give a kernel-relative
975 // assignment if it is only indirectly accessed by one kernel
976 LDSUsesInfo.direct_access[*K.second.begin()].insert(GV);
977 }
978 LDSToKernelsThatNeedToAccessItIndirectly.erase(GV);
979 }
980 OrderedGVs = sortByName(std::move(OrderedGVs));
981 for (GlobalVariable *GV : OrderedGVs) {
983 unsigned BarId = NumAbsolutes + 1;
984 unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;
985 NumAbsolutes += BarCnt;
986
987 // 4 bits for alignment, 5 bits for the barrier num,
988 // 3 bits for the barrier scope
989 unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;
990 recordLDSAbsoluteAddress(&M, GV, Offset);
991 }
992 OrderedGVs.clear();
993
994 // The 2nd round: give a kernel-relative assignment for GV that
995 // either only indirectly accessed by single kernel or only directly
996 // accessed by multiple kernels.
997 std::vector<Function *> OrderedKernels;
998 for (auto &K : LDSUsesInfo.direct_access) {
999 Function *F = K.first;
1001 OrderedKernels.push_back(F);
1002 }
1003 OrderedKernels = sortByName(std::move(OrderedKernels));
1004
1006 for (Function *F : OrderedKernels) {
1007 for (GlobalVariable *GV : LDSUsesInfo.direct_access[F]) {
1008 if (!isNamedBarrier(*GV))
1009 continue;
1010
1011 LDSUsesInfo.direct_access[F].erase(GV);
1012 if (GV->isAbsoluteSymbolRef()) {
1013 // already assigned
1014 continue;
1015 }
1016 OrderedGVs.push_back(GV);
1017 }
1018 OrderedGVs = sortByName(std::move(OrderedGVs));
1019 for (GlobalVariable *GV : OrderedGVs) {
1020 // GV could also be used directly by other kernels. If so, we need to
1021 // create a new GV used only by this kernel and its function.
1022 auto NewGV = uniquifyGVPerKernel(M, GV, F);
1023 Changed |= (NewGV != GV);
1024 unsigned BarrierScope = llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;
1025 unsigned BarId = Kernel2BarId[F];
1026 BarId += NumAbsolutes + 1;
1027 unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;
1028 Kernel2BarId[F] += BarCnt;
1029 unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;
1030 recordLDSAbsoluteAddress(&M, NewGV, Offset);
1031 }
1032 OrderedGVs.clear();
1033 }
1034 // Also erase those special LDS variables from indirect_access.
1035 for (auto &K : LDSUsesInfo.indirect_access) {
1036 assert(isKernelLDS(K.first));
1037 for (GlobalVariable *GV : K.second) {
1038 if (isNamedBarrier(*GV))
1039 K.second.erase(GV);
1040 }
1041 }
1042 return Changed;
1043 }
1044
1045 bool runOnModule(Module &M) {
1046 CallGraph CG = CallGraph(M);
1047 bool Changed = superAlignLDSGlobals(M);
1048
1050
1051 Changed = true; // todo: narrow this down
1052
1053 // For each kernel, what variables does it access directly or through
1054 // callees
1055 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1056
1057 // For each variable accessed through callees, which kernels access it
1058 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1059 for (auto &K : LDSUsesInfo.indirect_access) {
1060 Function *F = K.first;
1062 for (GlobalVariable *GV : K.second) {
1063 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1064 }
1065 }
1066
1067 if (LDSUsesInfo.HasSpecialGVs) {
1068 // Special LDS variables need special address assignment
1069 Changed |= lowerSpecialLDSVariables(
1070 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);
1071 }
1072
1073 // Partition variables accessed indirectly into the different strategies
1074 DenseSet<GlobalVariable *> ModuleScopeVariables;
1075 DenseSet<GlobalVariable *> TableLookupVariables;
1076 DenseSet<GlobalVariable *> KernelAccessVariables;
1077 DenseSet<GlobalVariable *> DynamicVariables;
1078 partitionVariablesIntoIndirectStrategies(
1079 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1080 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1081 DynamicVariables);
1082
1083 // If the kernel accesses a variable that is going to be stored in the
1084 // module instance through a call then that kernel needs to allocate the
1085 // module instance
1086 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1087 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1088 ModuleScopeVariables);
1089 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1090 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1091 TableLookupVariables);
1092
1093 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1094 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1095 DynamicVariables);
1096
1097 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1098 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1099
1101 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1102 KernelsThatAllocateModuleLDS,
1103 MaybeModuleScopeStruct);
1104
1105 // Lower zero cost accesses to the kernel instances just created
1106 for (auto &GV : KernelAccessVariables) {
1107 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1108 assert(funcs.size() == 1); // Only one kernel can access it
1109 LDSVariableReplacement Replacement =
1110 KernelToReplacement[*(funcs.begin())];
1111
1113 Vec.insert(GV);
1114
1115 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1116 return isa<Instruction>(U.getUser());
1117 });
1118 }
1119
1120 // The ith element of this vector is kernel id i
1121 std::vector<Function *> OrderedKernels =
1122 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1123 KernelsThatIndirectlyAllocateDynamicLDS);
1124
1125 if (!KernelsThatAllocateTableLDS.empty()) {
1126 LLVMContext &Ctx = M.getContext();
1127 IRBuilder<> Builder(Ctx);
1128
1129 // The order must be consistent between lookup table and accesses to
1130 // lookup table
1131 auto TableLookupVariablesOrdered =
1132 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1133 TableLookupVariables.end()));
1134
1135 GlobalVariable *LookupTable = buildLookupTable(
1136 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1137 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1138 LookupTable);
1139 }
1140
1141 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1142 lowerDynamicLDSVariables(M, LDSUsesInfo,
1143 KernelsThatIndirectlyAllocateDynamicLDS,
1144 DynamicVariables, OrderedKernels);
1145
1146 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1147 // kernel. We may have inferred this wasn't used prior to the pass.
1148 // TODO: We could filter out subgraphs that do not access LDS globals.
1149 for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,
1150 &KernelsThatAllocateTableLDS})
1151 for (Function *F : *KernelSet)
1152 removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});
1153
1154 // All kernel frames have been allocated. Calculate and record the
1155 // addresses.
1156 {
1157 const DataLayout &DL = M.getDataLayout();
1158
1159 for (Function &Func : M.functions()) {
1160 if (Func.isDeclaration() || !isKernelLDS(&Func))
1161 continue;
1162
1163 // All three of these are optional. The first variable is allocated at
1164 // zero. They are allocated by AMDGPUMachineFunction as one block.
1165 // Layout:
1166 //{
1167 // module.lds
1168 // alignment padding
1169 // kernel instance
1170 // alignment padding
1171 // dynamic lds variables
1172 //}
1173
1174 const bool AllocateModuleScopeStruct =
1175 MaybeModuleScopeStruct &&
1176 KernelsThatAllocateModuleLDS.contains(&Func);
1177
1178 auto Replacement = KernelToReplacement.find(&Func);
1179 const bool AllocateKernelScopeStruct =
1180 Replacement != KernelToReplacement.end();
1181
1182 const bool AllocateDynamicVariable =
1183 KernelToCreatedDynamicLDS.contains(&Func);
1184
1185 uint32_t Offset = 0;
1186
1187 if (AllocateModuleScopeStruct) {
1188 // Allocated at zero, recorded once on construction, not once per
1189 // kernel
1190 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1191 }
1192
1193 if (AllocateKernelScopeStruct) {
1194 GlobalVariable *KernelStruct = Replacement->second.SGV;
1195 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1196 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1197 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1198 }
1199
1200 // If there is dynamic allocation, the alignment needed is included in
1201 // the static frame size. There may be no reference to the dynamic
1202 // variable in the kernel itself, so without including it here, that
1203 // alignment padding could be missed.
1204 if (AllocateDynamicVariable) {
1205 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1206 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1207 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1208 }
1209
1210 if (Offset != 0) {
1211 (void)TM; // TODO: Account for target maximum LDS
1212 std::string Buffer;
1213 raw_string_ostream SS{Buffer};
1214 SS << format("%u", Offset);
1215
1216 // Instead of explicitly marking kernels that access dynamic variables
1217 // using special case metadata, annotate with min-lds == max-lds, i.e.
1218 // that there is no more space available for allocating more static
1219 // LDS variables. That is the right condition to prevent allocating
1220 // more variables which would collide with the addresses assigned to
1221 // dynamic variables.
1222 if (AllocateDynamicVariable)
1223 SS << format(",%u", Offset);
1224
1225 Func.addFnAttr("amdgpu-lds-size", Buffer);
1226 }
1227 }
1228 }
1229
1230 for (auto &GV : make_early_inc_range(M.globals()))
1232 // probably want to remove from used lists
1234 if (GV.use_empty())
1235 GV.eraseFromParent();
1236 }
1237
1238 return Changed;
1239 }
1240
1241private:
1242 // Increase the alignment of LDS globals if necessary to maximise the chance
1243 // that we can use aligned LDS instructions to access them.
1244 static bool superAlignLDSGlobals(Module &M) {
1245 const DataLayout &DL = M.getDataLayout();
1246 bool Changed = false;
1247 if (!SuperAlignLDSGlobals) {
1248 return Changed;
1249 }
1250
1251 for (auto &GV : M.globals()) {
1253 // Only changing alignment of LDS variables
1254 continue;
1255 }
1256 if (!GV.hasInitializer()) {
1257 // cuda/hip extern __shared__ variable, leave alignment alone
1258 continue;
1259 }
1260
1261 if (GV.isAbsoluteSymbolRef()) {
1262 // If the variable is already allocated, don't change the alignment
1263 continue;
1264 }
1265
1266 Align Alignment = AMDGPU::getAlign(DL, &GV);
1267 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1268
1269 if (GVSize > 8) {
1270 // We might want to use a b96 or b128 load/store
1271 Alignment = std::max(Alignment, Align(16));
1272 } else if (GVSize > 4) {
1273 // We might want to use a b64 load/store
1274 Alignment = std::max(Alignment, Align(8));
1275 } else if (GVSize > 2) {
1276 // We might want to use a b32 load/store
1277 Alignment = std::max(Alignment, Align(4));
1278 } else if (GVSize > 1) {
1279 // We might want to use a b16 load/store
1280 Alignment = std::max(Alignment, Align(2));
1281 }
1282
1283 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1284 Changed = true;
1285 GV.setAlignment(Alignment);
1286 }
1287 }
1288 return Changed;
1289 }
1290
1291 static LDSVariableReplacement createLDSVariableReplacement(
1292 Module &M, std::string VarName,
1293 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1294 // Create a struct instance containing LDSVarsToTransform and map from those
1295 // variables to ConstantExprGEP
1296 // Variables may be introduced to meet alignment requirements. No aliasing
1297 // metadata is useful for these as they have no uses. Erased before return.
1298
1299 LLVMContext &Ctx = M.getContext();
1300 const DataLayout &DL = M.getDataLayout();
1301 assert(!LDSVarsToTransform.empty());
1302
1304 LayoutFields.reserve(LDSVarsToTransform.size());
1305 {
1306 // The order of fields in this struct depends on the order of
1307 // variables in the argument which varies when changing how they
1308 // are identified, leading to spurious test breakage.
1309 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1310 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1311
1312 for (GlobalVariable *GV : Sorted) {
1314 DL.getTypeAllocSize(GV->getValueType()),
1315 AMDGPU::getAlign(DL, GV));
1316 LayoutFields.emplace_back(F);
1317 }
1318 }
1319
1320 performOptimizedStructLayout(LayoutFields);
1321
1322 std::vector<GlobalVariable *> LocalVars;
1323 BitVector IsPaddingField;
1324 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1325 IsPaddingField.reserve(LDSVarsToTransform.size());
1326 {
1327 uint64_t CurrentOffset = 0;
1328 for (auto &F : LayoutFields) {
1329 GlobalVariable *FGV =
1330 static_cast<GlobalVariable *>(const_cast<void *>(F.Id));
1331 Align DataAlign = F.Alignment;
1332
1333 uint64_t DataAlignV = DataAlign.value();
1334 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1335 uint64_t Padding = DataAlignV - Rem;
1336
1337 // Append an array of padding bytes to meet alignment requested
1338 // Note (o + (a - (o % a)) ) % a == 0
1339 // (offset + Padding ) % align == 0
1340
1341 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1342 LocalVars.push_back(new GlobalVariable(
1343 M, ATy, false, GlobalValue::InternalLinkage,
1345 AMDGPUAS::LOCAL_ADDRESS, false));
1346 IsPaddingField.push_back(true);
1347 CurrentOffset += Padding;
1348 }
1349
1350 LocalVars.push_back(FGV);
1351 IsPaddingField.push_back(false);
1352 CurrentOffset += F.Size;
1353 }
1354 }
1355
1356 std::vector<Type *> LocalVarTypes;
1357 LocalVarTypes.reserve(LocalVars.size());
1358 std::transform(
1359 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1360 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1361
1362 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1363
1364 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1365
1366 GlobalVariable *SGV = new GlobalVariable(
1367 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1369 false);
1370 SGV->setAlignment(StructAlign);
1371
1373 Type *I32 = Type::getInt32Ty(Ctx);
1374 for (size_t I = 0; I < LocalVars.size(); I++) {
1375 GlobalVariable *GV = LocalVars[I];
1376 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1377 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1378 if (IsPaddingField[I]) {
1379 assert(GV->use_empty());
1380 GV->eraseFromParent();
1381 } else {
1382 Map[GV] = GEP;
1383 }
1384 }
1385 assert(Map.size() == LDSVarsToTransform.size());
1386 return {SGV, std::move(Map)};
1387 }
1388
1389 template <typename PredicateTy>
1390 static void replaceLDSVariablesWithStruct(
1391 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1392 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1393 LLVMContext &Ctx = M.getContext();
1394 const DataLayout &DL = M.getDataLayout();
1395
1396 // A hack... we need to insert the aliasing info in a predictable order for
1397 // lit tests. Would like to have them in a stable order already, ideally the
1398 // same order they get allocated, which might mean an ordered set container
1399 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1400 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1401
1402 // Create alias.scope and their lists. Each field in the new structure
1403 // does not alias with all other fields.
1404 SmallVector<MDNode *> AliasScopes;
1405 SmallVector<Metadata *> NoAliasList;
1406 const size_t NumberVars = LDSVarsToTransform.size();
1407 if (NumberVars > 1) {
1408 MDBuilder MDB(Ctx);
1409 AliasScopes.reserve(NumberVars);
1411 for (size_t I = 0; I < NumberVars; I++) {
1413 AliasScopes.push_back(Scope);
1414 }
1415 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1416 }
1417
1418 // Replace uses of ith variable with a constantexpr to the corresponding
1419 // field of the instance that will be allocated by AMDGPUMachineFunction
1420 for (size_t I = 0; I < NumberVars; I++) {
1421 GlobalVariable *GV = LDSVarsToTransform[I];
1422 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1423
1425
1426 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1427 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1428 uint64_t Offset = APOff.getZExtValue();
1429
1430 Align A =
1431 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1432
1433 if (I)
1434 NoAliasList[I - 1] = AliasScopes[I - 1];
1435 MDNode *NoAlias =
1436 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1437 MDNode *AliasScope =
1438 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1439
1440 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1441 }
1442 }
1443
1444 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1445 const DataLayout &DL, MDNode *AliasScope,
1446 MDNode *NoAlias, unsigned MaxDepth = 5) {
1447 if (!MaxDepth || (A == 1 && !AliasScope))
1448 return;
1449
1450 ScopedNoAliasAAResult ScopedNoAlias;
1451
1452 for (User *U : Ptr->users()) {
1453 if (auto *I = dyn_cast<Instruction>(U)) {
1454 if (AliasScope && I->mayReadOrWriteMemory()) {
1455 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1456 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1457 : AliasScope);
1458 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1459
1460 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1461
1462 // Scoped aliases can originate from two different domains.
1463 // First domain would be from LDS domain (created by this pass).
1464 // All entries (LDS vars) into LDS struct will have same domain.
1465
1466 // Second domain could be existing scoped aliases that are the
1467 // results of noalias params and subsequent optimizations that
1468 // may alter thesse sets.
1469
1470 // We need to be careful how we create new alias sets, and
1471 // have right scopes and domains for loads/stores of these new
1472 // LDS variables. We intersect NoAlias set if alias sets belong
1473 // to the same domain. This is the case if we have memcpy using
1474 // LDS variables. Both src and dst of memcpy would belong to
1475 // LDS struct, they donot alias.
1476 // On the other hand, if one of the domains is LDS and other is
1477 // existing domain prior to LDS, we need to have a union of all
1478 // these aliases set to preserve existing aliasing information.
1479
1480 SmallPtrSet<const MDNode *, 16> ExistingDomains, LDSDomains;
1481 ScopedNoAlias.collectScopedDomains(NA, ExistingDomains);
1482 ScopedNoAlias.collectScopedDomains(NoAlias, LDSDomains);
1483 auto Intersection = set_intersection(ExistingDomains, LDSDomains);
1484 if (Intersection.empty()) {
1485 NA = NA ? MDNode::concatenate(NA, NoAlias) : NoAlias;
1486 } else {
1487 NA = NA ? MDNode::intersect(NA, NoAlias) : NoAlias;
1488 }
1489 I->setMetadata(LLVMContext::MD_noalias, NA);
1490 }
1491 }
1492
1493 if (auto *LI = dyn_cast<LoadInst>(U)) {
1494 LI->setAlignment(std::max(A, LI->getAlign()));
1495 continue;
1496 }
1497 if (auto *SI = dyn_cast<StoreInst>(U)) {
1498 if (SI->getPointerOperand() == Ptr)
1499 SI->setAlignment(std::max(A, SI->getAlign()));
1500 continue;
1501 }
1502 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1503 // None of atomicrmw operations can work on pointers, but let's
1504 // check it anyway in case it will or we will process ConstantExpr.
1505 if (AI->getPointerOperand() == Ptr)
1506 AI->setAlignment(std::max(A, AI->getAlign()));
1507 continue;
1508 }
1509 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1510 if (AI->getPointerOperand() == Ptr)
1511 AI->setAlignment(std::max(A, AI->getAlign()));
1512 continue;
1513 }
1514 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1515 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1516 APInt Off(BitWidth, 0);
1517 if (GEP->getPointerOperand() == Ptr) {
1518 Align GA;
1519 if (GEP->accumulateConstantOffset(DL, Off))
1520 GA = commonAlignment(A, Off.getLimitedValue());
1521 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1522 MaxDepth - 1);
1523 }
1524 continue;
1525 }
1526 if (auto *I = dyn_cast<Instruction>(U)) {
1527 if (I->getOpcode() == Instruction::BitCast ||
1528 I->getOpcode() == Instruction::AddrSpaceCast)
1529 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1530 }
1531 }
1532 }
1533};
1534
1535class AMDGPULowerModuleLDSLegacy : public ModulePass {
1536public:
1537 const AMDGPUTargetMachine *TM;
1538 static char ID;
1539
1540 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM = nullptr)
1541 : ModulePass(ID), TM(TM) {}
1542
1543 void getAnalysisUsage(AnalysisUsage &AU) const override {
1544 if (!TM)
1546 }
1547
1548 bool runOnModule(Module &M) override {
1549 if (!TM) {
1550 auto &TPC = getAnalysis<TargetPassConfig>();
1551 TM = &TPC.getTM<AMDGPUTargetMachine>();
1552 }
1553
1554 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1555 }
1556};
1557
1558} // namespace
1559char AMDGPULowerModuleLDSLegacy::ID = 0;
1560
1561char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1562
1563INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1564 "Lower uses of LDS variables from non-kernel functions",
1565 false, false)
1567INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1568 "Lower uses of LDS variables from non-kernel functions",
1570
1571ModulePass *
1573 return new AMDGPULowerModuleLDSLegacy(TM);
1574}
1575
1578 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1580}
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
Lower uses of LDS variables from non kernel functions
#define DEBUG_TYPE
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:687
This file contains the declarations for the subclasses of Constant, which represent the different fla...
DXIL Forward Handle Accesses
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1328
global merge func
Hexagon Common GEP
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:42
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:44
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:39
This file contains some templates that are useful if you are working with the STL at all.
This is the interface for a metadata-based scoped no-alias analysis.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:78
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1540
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:255
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:142
LLVM Basic Block Representation.
Definition: BasicBlock.h:62
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:393
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1314
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:535
static LLVM_ABI Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2261
static LLVM_ABI Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2300
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1274
This is an important base class in LLVM.
Definition: Constant.h:43
LLVM_ABI void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:739
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:63
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:177
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
Definition: DenseMap.h:245
bool erase(const KeyT &Val)
Definition: DenseMap.h:319
iterator end()
Definition: DenseMap.h:87
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:168
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition: DenseMap.h:230
Implements a dense probed hash-table based set.
Definition: DenseSet.h:263
LLVM_ABI void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1571
LinkageTypes getLinkage() const
Definition: GlobalValue.h:548
LLVM_ABI bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
Definition: Globals.cpp:424
ThreadLocalMode getThreadLocalMode() const
Definition: GlobalValue.h:273
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:296
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:60
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:53
Type * getValueType() const
Definition: GlobalValue.h:298
const Constant * getInitializer() const
getInitializer - Return the initializer for this global variable.
bool hasInitializer() const
Definitions have initializers, declarations don't.
LLVM_ABI void copyAttributesFrom(const GlobalVariable *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...
Definition: Globals.cpp:540
bool isConstant() const
If the value is a global constant, its value is immutable throughout the runtime execution of the pro...
LLVM_ABI void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:507
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalVariable.
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition: IRBuilder.h:2199
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition: IRBuilder.h:1946
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition: IRBuilder.h:1931
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
Definition: IRBuilder.cpp:834
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition: IRBuilder.h:522
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition: IRBuilder.h:1847
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition: IRBuilder.h:2508
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition: IRBuilder.h:207
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2780
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:68
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:181
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:174
Metadata node.
Definition: Metadata.h:1077
static LLVM_ABI MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1142
static LLVM_ABI MDNode * concatenate(MDNode *A, MDNode *B)
Methods for metadata merging.
Definition: Metadata.cpp:1115
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1565
static LLVM_ABI MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1129
Root of the metadata hierarchy.
Definition: Metadata.h:63
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:255
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:67
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1069
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:112
unsigned getAddressSpace() const
Return the address space of the Pointer type.
Definition: DerivedTypes.h:740
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1885
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:118
A simple AA result which uses scoped-noalias metadata to answer queries.
LLVM_ABI void collectScopedDomains(const MDNode *NoAlias, SmallPtrSetImpl< const MDNode * > &Domains) const
Collect the set of scoped domains relevant to the noalias scopes.
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:470
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:401
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:541
bool empty() const
Definition: SmallVector.h:82
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:938
void reserve(size_type N)
Definition: SmallVector.h:664
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:684
void push_back(const T &Elt)
Definition: SmallVector.h:414
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197
Class to represent struct types.
Definition: DerivedTypes.h:218
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:620
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:82
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
static LLVM_ABI IntegerType * getInt8Ty(LLVMContext &C)
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
LLVM_ABI unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
A Use represents the edge between a Value definition and its users.
Definition: Use.h:35
LLVM Value Representation.
Definition: Value.h:75
LLVM_ABI void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:554
bool use_empty() const
Definition: Value.h:346
iterator_range< use_iterator > uses()
Definition: Value.h:380
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:322
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:194
size_type size() const
Definition: DenseSet.h:87
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:169
bool erase(const ValueT &V)
Definition: DenseSet.h:100
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:662
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
bool isDynamicLDS(const GlobalVariable &GV)
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
bool isKernelLDS(const Function *F)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
LLVM_ABI Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})
Look up the Function declaration of the intrinsic id in the Module M.
Definition: Intrinsics.cpp:751
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:712
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:444
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:477
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:362
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition: STLExtras.h:1702
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:663
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1669
char & AMDGPULowerModuleLDSLegacyPassID
S1Ty set_intersection(const S1Ty &S1, const S2Ty &S2)
set_intersection(A, B) - Return A ^ B
Definition: SetOperations.h:83
LLVM_ABI void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:126
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
LLVM_ABI void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
LLVM_ABI std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:223
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition: Error.cpp:180
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:139
FunctionVariableMap direct_access
FunctionVariableMap indirect_access
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85