LLVM 22.0.0git
AMDGPULowerBufferFatPointers.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerBufferFatPointers.cpp ---------------------------=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass lowers operations on buffer fat pointers (addrspace 7) to
10// operations on buffer resources (addrspace 8) and is needed for correct
11// codegen.
12//
13// # Background
14//
15// Address space 7 (the buffer fat pointer) is a 160-bit pointer that consists
16// of a 128-bit buffer descriptor and a 32-bit offset into that descriptor.
17// The buffer resource part needs to be it needs to be a "raw" buffer resource
18// (it must have a stride of 0 and bounds checks must be in raw buffer mode
19// or disabled).
20//
21// When these requirements are met, a buffer resource can be treated as a
22// typical (though quite wide) pointer that follows typical LLVM pointer
23// semantics. This allows the frontend to reason about such buffers (which are
24// often encountered in the context of SPIR-V kernels).
25//
26// However, because of their non-power-of-2 size, these fat pointers cannot be
27// present during translation to MIR (though this restriction may be lifted
28// during the transition to GlobalISel). Therefore, this pass is needed in order
29// to correctly implement these fat pointers.
30//
31// The resource intrinsics take the resource part (the address space 8 pointer)
32// and the offset part (the 32-bit integer) as separate arguments. In addition,
33// many users of these buffers manipulate the offset while leaving the resource
34// part alone. For these reasons, we want to typically separate the resource
35// and offset parts into separate variables, but combine them together when
36// encountering cases where this is required, such as by inserting these values
37// into aggretates or moving them to memory.
38//
39// Therefore, at a high level, `ptr addrspace(7) %x` becomes `ptr addrspace(8)
40// %x.rsrc` and `i32 %x.off`, which will be combined into `{ptr addrspace(8),
41// i32} %x = {%x.rsrc, %x.off}` if needed. Similarly, `vector<Nxp7>` becomes
42// `{vector<Nxp8>, vector<Nxi32 >}` and its component parts.
43//
44// # Implementation
45//
46// This pass proceeds in three main phases:
47//
48// ## Rewriting loads and stores of p7 and memcpy()-like handling
49//
50// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`,
51// including aggregates containing such pointers, to ones that use `i160`. This
52// is handled by `StoreFatPtrsAsIntsAndExpandMemcpyVisitor` , which visits
53// loads, stores, and allocas and, if the loaded or stored type contains `ptr
54// addrspace(7)`, rewrites that type to one where the p7s are replaced by i160s,
55// copying other parts of aggregates as needed. In the case of a store, each
56// pointer is `ptrtoint`d to i160 before storing, and load integers are
57// `inttoptr`d back. This same transformation is applied to vectors of pointers.
58//
59// Such a transformation allows the later phases of the pass to not need
60// to handle buffer fat pointers moving to and from memory, where we load
61// have to handle the incompatibility between a `{Nxp8, Nxi32}` representation
62// and `Nxi60` directly. Instead, that transposing action (where the vectors
63// of resources and vectors of offsets are concatentated before being stored to
64// memory) are handled through implementing `inttoptr` and `ptrtoint` only.
65//
66// Atomics operations on `ptr addrspace(7)` values are not suppported, as the
67// hardware does not include a 160-bit atomic.
68//
69// In order to save on O(N) work and to ensure that the contents type
70// legalizer correctly splits up wide loads, also unconditionally lower
71// memcpy-like intrinsics into loops here.
72//
73// ## Buffer contents type legalization
74//
75// The underlying buffer intrinsics only support types up to 128 bits long,
76// and don't support complex types. If buffer operations were
77// standard pointer operations that could be represented as MIR-level loads,
78// this would be handled by the various legalization schemes in instruction
79// selection. However, because we have to do the conversion from `load` and
80// `store` to intrinsics at LLVM IR level, we must perform that legalization
81// ourselves.
82//
83// This involves a combination of
84// - Converting arrays to vectors where possible
85// - Otherwise, splitting loads and stores of aggregates into loads/stores of
86// each component.
87// - Zero-extending things to fill a whole number of bytes
88// - Casting values of types that don't neatly correspond to supported machine
89// value
90// (for example, an i96 or i256) into ones that would work (
91// like <3 x i32> and <8 x i32>, respectively)
92// - Splitting values that are too long (such as aforementioned <8 x i32>) into
93// multiple operations.
94//
95// ## Type remapping
96//
97// We use a `ValueMapper` to mangle uses of [vectors of] buffer fat pointers
98// to the corresponding struct type, which has a resource part and an offset
99// part.
100//
101// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
102// to, usually by way of `setType`ing values. Constants are handled here
103// because there isn't a good way to fix them up later.
104//
105// This has the downside of leaving the IR in an invalid state (for example,
106// the instruction `getelementptr {ptr addrspace(8), i32} %p, ...` will exist),
107// but all such invalid states will be resolved by the third phase.
108//
109// Functions that don't take buffer fat pointers are modified in place. Those
110// that do take such pointers have their basic blocks moved to a new function
111// with arguments that are {ptr addrspace(8), i32} arguments and return values.
112// This phase also records intrinsics so that they can be remangled or deleted
113// later.
114//
115// ## Splitting pointer structs
116//
117// The meat of this pass consists of defining semantics for operations that
118// produce or consume [vectors of] buffer fat pointers in terms of their
119// resource and offset parts. This is accomplished throgh the `SplitPtrStructs`
120// visitor.
121//
122// In the first pass through each function that is being lowered, the splitter
123// inserts new instructions to implement the split-structures behavior, which is
124// needed for correctness and performance. It records a list of "split users",
125// instructions that are being replaced by operations on the resource and offset
126// parts.
127//
128// Split users do not necessarily need to produce parts themselves (
129// a `load float, ptr addrspace(7)` does not, for example), but, if they do not
130// generate fat buffer pointers, they must RAUW in their replacement
131// instructions during the initial visit.
132//
133// When these new instructions are created, they use the split parts recorded
134// for their initial arguments in order to generate their replacements, creating
135// a parallel set of instructions that does not refer to the original fat
136// pointer values but instead to their resource and offset components.
137//
138// Instructions, such as `extractvalue`, that produce buffer fat pointers from
139// sources that do not have split parts, have such parts generated using
140// `extractvalue`. This is also the initial handling of PHI nodes, which
141// are then cleaned up.
142//
143// ### Conditionals
144//
145// PHI nodes are initially given resource parts via `extractvalue`. However,
146// this is not an efficient rewrite of such nodes, as, in most cases, the
147// resource part in a conditional or loop remains constant throughout the loop
148// and only the offset varies. Failing to optimize away these constant resources
149// would cause additional registers to be sent around loops and might lead to
150// waterfall loops being generated for buffer operations due to the
151// "non-uniform" resource argument.
152//
153// Therefore, after all instructions have been visited, the pointer splitter
154// post-processes all encountered conditionals. Given a PHI node or select,
155// getPossibleRsrcRoots() collects all values that the resource parts of that
156// conditional's input could come from as well as collecting all conditional
157// instructions encountered during the search. If, after filtering out the
158// initial node itself, the set of encountered conditionals is a subset of the
159// potential roots and there is a single potential resource that isn't in the
160// conditional set, that value is the only possible value the resource argument
161// could have throughout the control flow.
162//
163// If that condition is met, then a PHI node can have its resource part changed
164// to the singleton value and then be replaced by a PHI on the offsets.
165// Otherwise, each PHI node is split into two, one for the resource part and one
166// for the offset part, which replace the temporary `extractvalue` instructions
167// that were added during the first pass.
168//
169// Similar logic applies to `select`, where
170// `%z = select i1 %cond, %cond, ptr addrspace(7) %x, ptr addrspace(7) %y`
171// can be split into `%z.rsrc = %x.rsrc` and
172// `%z.off = select i1 %cond, ptr i32 %x.off, i32 %y.off`
173// if both `%x` and `%y` have the same resource part, but two `select`
174// operations will be needed if they do not.
175//
176// ### Final processing
177//
178// After conditionals have been cleaned up, the IR for each function is
179// rewritten to remove all the old instructions that have been split up.
180//
181// Any instruction that used to produce a buffer fat pointer (and therefore now
182// produces a resource-and-offset struct after type remapping) is
183// replaced as follows:
184// 1. All debug value annotations are cloned to reflect that the resource part
185// and offset parts are computed separately and constitute different
186// fragments of the underlying source language variable.
187// 2. All uses that were themselves split are replaced by a `poison` of the
188// struct type, as they will themselves be erased soon. This rule, combined
189// with debug handling, should leave the use lists of split instructions
190// empty in almost all cases.
191// 3. If a user of the original struct-valued result remains, the structure
192// needed for the new types to work is constructed out of the newly-defined
193// parts, and the original instruction is replaced by this structure
194// before being erased. Instructions requiring this construction include
195// `ret` and `insertvalue`.
196//
197// # Consequences
198//
199// This pass does not alter the CFG.
200//
201// Alias analysis information will become coarser, as the LLVM alias analyzer
202// cannot handle the buffer intrinsics. Specifically, while we can determine
203// that the following two loads do not alias:
204// ```
205// %y = getelementptr i32, ptr addrspace(7) %x, i32 1
206// %a = load i32, ptr addrspace(7) %x
207// %b = load i32, ptr addrspace(7) %y
208// ```
209// we cannot (except through some code that runs during scheduling) determine
210// that the rewritten loads below do not alias.
211// ```
212// %y.off = add i32 %x.off, 1
213// %a = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %x.rsrc, i32
214// %x.off, ...)
215// %b = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8)
216// %x.rsrc, i32 %y.off, ...)
217// ```
218// However, existing alias information is preserved.
219//===----------------------------------------------------------------------===//
220
221#include "AMDGPU.h"
222#include "AMDGPUTargetMachine.h"
223#include "GCNSubtarget.h"
224#include "SIDefines.h"
226#include "llvm/ADT/SmallVector.h"
232#include "llvm/IR/Constants.h"
233#include "llvm/IR/DebugInfo.h"
234#include "llvm/IR/DerivedTypes.h"
235#include "llvm/IR/IRBuilder.h"
236#include "llvm/IR/InstIterator.h"
237#include "llvm/IR/InstVisitor.h"
238#include "llvm/IR/Instructions.h"
240#include "llvm/IR/Intrinsics.h"
241#include "llvm/IR/IntrinsicsAMDGPU.h"
242#include "llvm/IR/Metadata.h"
243#include "llvm/IR/Operator.h"
244#include "llvm/IR/PatternMatch.h"
246#include "llvm/IR/ValueHandle.h"
248#include "llvm/Pass.h"
252#include "llvm/Support/Debug.h"
258
259#define DEBUG_TYPE "amdgpu-lower-buffer-fat-pointers"
260
261using namespace llvm;
262
263static constexpr unsigned BufferOffsetWidth = 32;
264
265namespace {
266/// Recursively replace instances of ptr addrspace(7) and vector<Nxptr
267/// addrspace(7)> with some other type as defined by the relevant subclass.
268class BufferFatPtrTypeLoweringBase : public ValueMapTypeRemapper {
270
271 Type *remapTypeImpl(Type *Ty);
272
273protected:
274 virtual Type *remapScalar(PointerType *PT) = 0;
275 virtual Type *remapVector(VectorType *VT) = 0;
276
277 const DataLayout &DL;
278
279public:
280 BufferFatPtrTypeLoweringBase(const DataLayout &DL) : DL(DL) {}
281 Type *remapType(Type *SrcTy) override;
282 void clear() { Map.clear(); }
283};
284
285/// Remap ptr addrspace(7) to i160 and vector<Nxptr addrspace(7)> to
286/// vector<Nxi60> in order to correctly handling loading/storing these values
287/// from memory.
288class BufferFatPtrToIntTypeMap : public BufferFatPtrTypeLoweringBase {
289 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
290
291protected:
292 Type *remapScalar(PointerType *PT) override { return DL.getIntPtrType(PT); }
293 Type *remapVector(VectorType *VT) override { return DL.getIntPtrType(VT); }
294};
295
296/// Remap ptr addrspace(7) to {ptr addrspace(8), i32} (the resource and offset
297/// parts of the pointer) so that we can easily rewrite operations on these
298/// values that aren't loading them from or storing them to memory.
299class BufferFatPtrToStructTypeMap : public BufferFatPtrTypeLoweringBase {
300 using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
301
302protected:
303 Type *remapScalar(PointerType *PT) override;
304 Type *remapVector(VectorType *VT) override;
305};
306} // namespace
307
308// This code is adapted from the type remapper in lib/Linker/IRMover.cpp
309Type *BufferFatPtrTypeLoweringBase::remapTypeImpl(Type *Ty) {
310 Type **Entry = &Map[Ty];
311 if (*Entry)
312 return *Entry;
313 if (auto *PT = dyn_cast<PointerType>(Ty)) {
314 if (PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
315 return *Entry = remapScalar(PT);
316 }
317 }
318 if (auto *VT = dyn_cast<VectorType>(Ty)) {
319 auto *PT = dyn_cast<PointerType>(VT->getElementType());
320 if (PT && PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
321 return *Entry = remapVector(VT);
322 }
323 return *Entry = Ty;
324 }
325 // Whether the type is one that is structurally uniqued - that is, if it is
326 // not a named struct (the only kind of type where multiple structurally
327 // identical types that have a distinct `Type*`)
328 StructType *TyAsStruct = dyn_cast<StructType>(Ty);
329 bool IsUniqued = !TyAsStruct || TyAsStruct->isLiteral();
330 // Base case for ints, floats, opaque pointers, and so on, which don't
331 // require recursion.
332 if (Ty->getNumContainedTypes() == 0 && IsUniqued)
333 return *Entry = Ty;
334 bool Changed = false;
335 SmallVector<Type *> ElementTypes(Ty->getNumContainedTypes(), nullptr);
336 for (unsigned int I = 0, E = Ty->getNumContainedTypes(); I < E; ++I) {
337 Type *OldElem = Ty->getContainedType(I);
338 Type *NewElem = remapTypeImpl(OldElem);
339 ElementTypes[I] = NewElem;
340 Changed |= (OldElem != NewElem);
341 }
342 // Recursive calls to remapTypeImpl() may have invalidated pointer.
343 Entry = &Map[Ty];
344 if (!Changed) {
345 return *Entry = Ty;
346 }
347 if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
348 return *Entry = ArrayType::get(ElementTypes[0], ArrTy->getNumElements());
349 if (auto *FnTy = dyn_cast<FunctionType>(Ty))
350 return *Entry = FunctionType::get(ElementTypes[0],
351 ArrayRef(ElementTypes).slice(1),
352 FnTy->isVarArg());
353 if (auto *STy = dyn_cast<StructType>(Ty)) {
354 // Genuine opaque types don't have a remapping.
355 if (STy->isOpaque())
356 return *Entry = Ty;
357 bool IsPacked = STy->isPacked();
358 if (IsUniqued)
359 return *Entry = StructType::get(Ty->getContext(), ElementTypes, IsPacked);
360 SmallString<16> Name(STy->getName());
361 STy->setName("");
362 return *Entry = StructType::create(Ty->getContext(), ElementTypes, Name,
363 IsPacked);
364 }
365 llvm_unreachable("Unknown type of type that contains elements");
366}
367
368Type *BufferFatPtrTypeLoweringBase::remapType(Type *SrcTy) {
369 return remapTypeImpl(SrcTy);
370}
371
372Type *BufferFatPtrToStructTypeMap::remapScalar(PointerType *PT) {
373 LLVMContext &Ctx = PT->getContext();
374 return StructType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
376}
377
378Type *BufferFatPtrToStructTypeMap::remapVector(VectorType *VT) {
379 ElementCount EC = VT->getElementCount();
380 LLVMContext &Ctx = VT->getContext();
381 Type *RsrcVec =
382 VectorType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE), EC);
383 Type *OffVec = VectorType::get(IntegerType::get(Ctx, BufferOffsetWidth), EC);
384 return StructType::get(RsrcVec, OffVec);
385}
386
387static bool isBufferFatPtrOrVector(Type *Ty) {
388 if (auto *PT = dyn_cast<PointerType>(Ty->getScalarType()))
389 return PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER;
390 return false;
391}
392
393// True if the type is {ptr addrspace(8), i32} or a struct containing vectors of
394// those types. Used to quickly skip instructions we don't need to process.
395static bool isSplitFatPtr(Type *Ty) {
396 auto *ST = dyn_cast<StructType>(Ty);
397 if (!ST)
398 return false;
399 if (!ST->isLiteral() || ST->getNumElements() != 2)
400 return false;
401 auto *MaybeRsrc =
402 dyn_cast<PointerType>(ST->getElementType(0)->getScalarType());
403 auto *MaybeOff =
404 dyn_cast<IntegerType>(ST->getElementType(1)->getScalarType());
405 return MaybeRsrc && MaybeOff &&
406 MaybeRsrc->getAddressSpace() == AMDGPUAS::BUFFER_RESOURCE &&
407 MaybeOff->getBitWidth() == BufferOffsetWidth;
408}
409
410// True if the result type or any argument types are buffer fat pointers.
412 Type *T = C->getType();
413 return isBufferFatPtrOrVector(T) || any_of(C->operands(), [](const Use &U) {
414 return isBufferFatPtrOrVector(U.get()->getType());
415 });
416}
417
418namespace {
419/// Convert [vectors of] buffer fat pointers to integers when they are read from
420/// or stored to memory. This ensures that these pointers will have the same
421/// memory layout as before they are lowered, even though they will no longer
422/// have their previous layout in registers/in the program (they'll be broken
423/// down into resource and offset parts). This has the downside of imposing
424/// marshalling costs when reading or storing these values, but since placing
425/// such pointers into memory is an uncommon operation at best, we feel that
426/// this cost is acceptable for better performance in the common case.
427class StoreFatPtrsAsIntsAndExpandMemcpyVisitor
428 : public InstVisitor<StoreFatPtrsAsIntsAndExpandMemcpyVisitor, bool> {
429 BufferFatPtrToIntTypeMap *TypeMap;
430
431 ValueToValueMapTy ConvertedForStore;
432
434
435 const TargetMachine *TM;
436
437 // Convert all the buffer fat pointers within the input value to inttegers
438 // so that it can be stored in memory.
439 Value *fatPtrsToInts(Value *V, Type *From, Type *To, const Twine &Name);
440 // Convert all the i160s that need to be buffer fat pointers (as specified)
441 // by the To type) into those pointers to preserve the semantics of the rest
442 // of the program.
443 Value *intsToFatPtrs(Value *V, Type *From, Type *To, const Twine &Name);
444
445public:
446 StoreFatPtrsAsIntsAndExpandMemcpyVisitor(BufferFatPtrToIntTypeMap *TypeMap,
447 const DataLayout &DL,
448 LLVMContext &Ctx,
449 const TargetMachine *TM)
450 : TypeMap(TypeMap), IRB(Ctx, InstSimplifyFolder(DL)), TM(TM) {}
452
453 bool visitInstruction(Instruction &I) { return false; }
455 bool visitLoadInst(LoadInst &LI);
456 bool visitStoreInst(StoreInst &SI);
458
459 bool visitMemCpyInst(MemCpyInst &MCI);
460 bool visitMemMoveInst(MemMoveInst &MMI);
461 bool visitMemSetInst(MemSetInst &MSI);
463};
464} // namespace
465
466Value *StoreFatPtrsAsIntsAndExpandMemcpyVisitor::fatPtrsToInts(
467 Value *V, Type *From, Type *To, const Twine &Name) {
468 if (From == To)
469 return V;
470 ValueToValueMapTy::iterator Find = ConvertedForStore.find(V);
471 if (Find != ConvertedForStore.end())
472 return Find->second;
474 Value *Cast = IRB.CreatePtrToInt(V, To, Name + ".int");
475 ConvertedForStore[V] = Cast;
476 return Cast;
477 }
478 if (From->getNumContainedTypes() == 0)
479 return V;
480 // Structs, arrays, and other compound types.
482 if (auto *AT = dyn_cast<ArrayType>(From)) {
483 Type *FromPart = AT->getArrayElementType();
484 Type *ToPart = cast<ArrayType>(To)->getElementType();
485 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
486 Value *Field = IRB.CreateExtractValue(V, I);
487 Value *NewField =
488 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(I));
489 Ret = IRB.CreateInsertValue(Ret, NewField, I);
490 }
491 } else {
492 for (auto [Idx, FromPart, ToPart] :
493 enumerate(From->subtypes(), To->subtypes())) {
494 Value *Field = IRB.CreateExtractValue(V, Idx);
495 Value *NewField =
496 fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(Idx));
497 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
498 }
499 }
500 ConvertedForStore[V] = Ret;
501 return Ret;
502}
503
504Value *StoreFatPtrsAsIntsAndExpandMemcpyVisitor::intsToFatPtrs(
505 Value *V, Type *From, Type *To, const Twine &Name) {
506 if (From == To)
507 return V;
508 if (isBufferFatPtrOrVector(To)) {
509 Value *Cast = IRB.CreateIntToPtr(V, To, Name + ".ptr");
510 return Cast;
511 }
512 if (From->getNumContainedTypes() == 0)
513 return V;
514 // Structs, arrays, and other compound types.
516 if (auto *AT = dyn_cast<ArrayType>(From)) {
517 Type *FromPart = AT->getArrayElementType();
518 Type *ToPart = cast<ArrayType>(To)->getElementType();
519 for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
520 Value *Field = IRB.CreateExtractValue(V, I);
521 Value *NewField =
522 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(I));
523 Ret = IRB.CreateInsertValue(Ret, NewField, I);
524 }
525 } else {
526 for (auto [Idx, FromPart, ToPart] :
527 enumerate(From->subtypes(), To->subtypes())) {
528 Value *Field = IRB.CreateExtractValue(V, Idx);
529 Value *NewField =
530 intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(Idx));
531 Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
532 }
533 }
534 return Ret;
535}
536
537bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::processFunction(Function &F) {
538 bool Changed = false;
539 // Process memcpy-like instructions after the main iteration because they can
540 // invalidate iterators.
541 SmallVector<WeakTrackingVH> CanBecomeLoops;
543 if (isa<MemTransferInst, MemSetInst, MemSetPatternInst>(I))
544 CanBecomeLoops.push_back(&I);
545 else
546 Changed |= visit(I);
547 }
548 for (WeakTrackingVH VH : make_early_inc_range(CanBecomeLoops)) {
549 Changed |= visit(cast<Instruction>(VH));
550 }
551 ConvertedForStore.clear();
552 return Changed;
553}
554
555bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitAllocaInst(AllocaInst &I) {
556 Type *Ty = I.getAllocatedType();
557 Type *NewTy = TypeMap->remapType(Ty);
558 if (Ty == NewTy)
559 return false;
560 I.setAllocatedType(NewTy);
561 return true;
562}
563
564bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitGetElementPtrInst(
566 Type *Ty = I.getSourceElementType();
567 Type *NewTy = TypeMap->remapType(Ty);
568 if (Ty == NewTy)
569 return false;
570 // We'll be rewriting the type `ptr addrspace(7)` out of existence soon, so
571 // make sure GEPs don't have different semantics with the new type.
572 I.setSourceElementType(NewTy);
573 I.setResultElementType(TypeMap->remapType(I.getResultElementType()));
574 return true;
575}
576
577bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitLoadInst(LoadInst &LI) {
578 Type *Ty = LI.getType();
579 Type *IntTy = TypeMap->remapType(Ty);
580 if (Ty == IntTy)
581 return false;
582
583 IRB.SetInsertPoint(&LI);
584 auto *NLI = cast<LoadInst>(LI.clone());
585 NLI->mutateType(IntTy);
586 NLI = IRB.Insert(NLI);
587 NLI->takeName(&LI);
588
589 Value *CastBack = intsToFatPtrs(NLI, IntTy, Ty, NLI->getName());
590 LI.replaceAllUsesWith(CastBack);
591 LI.eraseFromParent();
592 return true;
593}
594
595bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitStoreInst(StoreInst &SI) {
596 Value *V = SI.getValueOperand();
597 Type *Ty = V->getType();
598 Type *IntTy = TypeMap->remapType(Ty);
599 if (Ty == IntTy)
600 return false;
601
602 IRB.SetInsertPoint(&SI);
603 Value *IntV = fatPtrsToInts(V, Ty, IntTy, V->getName());
604 for (auto *Dbg : at::getDVRAssignmentMarkers(&SI))
605 Dbg->setRawLocation(ValueAsMetadata::get(IntV));
606
607 SI.setOperand(0, IntV);
608 return true;
609}
610
611bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemCpyInst(
612 MemCpyInst &MCI) {
613 // TODO: Allow memcpy.p7.p3 as a synonym for the direct-to-LDS copy, which'll
614 // need loop expansion here.
617 return false;
619 TM->getTargetTransformInfo(*MCI.getFunction()));
620 MCI.eraseFromParent();
621 return true;
622}
623
624bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemMoveInst(
625 MemMoveInst &MMI) {
628 return false;
630 "memmove() on buffer descriptors is not implemented because pointer "
631 "comparison on buffer descriptors isn't implemented\n");
632}
633
634bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemSetInst(
635 MemSetInst &MSI) {
637 return false;
639 MSI.eraseFromParent();
640 return true;
641}
642
643bool StoreFatPtrsAsIntsAndExpandMemcpyVisitor::visitMemSetPatternInst(
644 MemSetPatternInst &MSPI) {
646 return false;
648 MSPI.eraseFromParent();
649 return true;
650}
651
652namespace {
653/// Convert loads/stores of types that the buffer intrinsics can't handle into
654/// one ore more such loads/stores that consist of legal types.
655///
656/// Do this by
657/// 1. Recursing into structs (and arrays that don't share a memory layout with
658/// vectors) since the intrinsics can't handle complex types.
659/// 2. Converting arrays of non-aggregate, byte-sized types into their
660/// corresponding vectors
661/// 3. Bitcasting unsupported types, namely overly-long scalars and byte
662/// vectors, into vectors of supported types.
663/// 4. Splitting up excessively long reads/writes into multiple operations.
664///
665/// Note that this doesn't handle complex data strucures, but, in the future,
666/// the aggregate load splitter from SROA could be refactored to allow for that
667/// case.
668class LegalizeBufferContentTypesVisitor
669 : public InstVisitor<LegalizeBufferContentTypesVisitor, bool> {
670 friend class InstVisitor<LegalizeBufferContentTypesVisitor, bool>;
671
673
674 const DataLayout &DL;
675
676 /// If T is [N x U], where U is a scalar type, return the vector type
677 /// <N x U>, otherwise, return T.
678 Type *scalarArrayTypeAsVector(Type *MaybeArrayType);
679 Value *arrayToVector(Value *V, Type *TargetType, const Twine &Name);
680 Value *vectorToArray(Value *V, Type *OrigType, const Twine &Name);
681
682 /// Break up the loads of a struct into the loads of its components
683
684 /// Convert a vector or scalar type that can't be operated on by buffer
685 /// intrinsics to one that would be legal through bitcasts and/or truncation.
686 /// Uses the wider of i32, i16, or i8 where possible.
687 Type *legalNonAggregateFor(Type *T);
688 Value *makeLegalNonAggregate(Value *V, Type *TargetType, const Twine &Name);
689 Value *makeIllegalNonAggregate(Value *V, Type *OrigType, const Twine &Name);
690
691 struct VecSlice {
692 uint64_t Index = 0;
693 uint64_t Length = 0;
694 VecSlice() = delete;
695 // Needed for some Clangs
696 VecSlice(uint64_t Index, uint64_t Length) : Index(Index), Length(Length) {}
697 };
698 /// Return the [index, length] pairs into which `T` needs to be cut to form
699 /// legal buffer load or store operations. Clears `Slices`. Creates an empty
700 /// `Slices` for non-vector inputs and creates one slice if no slicing will be
701 /// needed.
702 void getVecSlices(Type *T, SmallVectorImpl<VecSlice> &Slices);
703
704 Value *extractSlice(Value *Vec, VecSlice S, const Twine &Name);
705 Value *insertSlice(Value *Whole, Value *Part, VecSlice S, const Twine &Name);
706
707 /// In most cases, return `LegalType`. However, when given an input that would
708 /// normally be a legal type for the buffer intrinsics to return but that
709 /// isn't hooked up through SelectionDAG, return a type of the same width that
710 /// can be used with the relevant intrinsics. Specifically, handle the cases:
711 /// - <1 x T> => T for all T
712 /// - <N x i8> <=> i16, i32, 2xi32, 4xi32 (as needed)
713 /// - <N x T> where T is under 32 bits and the total size is 96 bits <=> <3 x
714 /// i32>
715 Type *intrinsicTypeFor(Type *LegalType);
716
717 bool visitLoadImpl(LoadInst &OrigLI, Type *PartType,
718 SmallVectorImpl<uint32_t> &AggIdxs, uint64_t AggByteOffset,
719 Value *&Result, const Twine &Name);
720 /// Return value is (Changed, ModifiedInPlace)
721 std::pair<bool, bool> visitStoreImpl(StoreInst &OrigSI, Type *PartType,
723 uint64_t AggByteOffset,
724 const Twine &Name);
725
726 bool visitInstruction(Instruction &I) { return false; }
727 bool visitLoadInst(LoadInst &LI);
728 bool visitStoreInst(StoreInst &SI);
729
730public:
731 LegalizeBufferContentTypesVisitor(const DataLayout &DL, LLVMContext &Ctx)
732 : IRB(Ctx, InstSimplifyFolder(DL)), DL(DL) {}
734};
735} // namespace
736
737Type *LegalizeBufferContentTypesVisitor::scalarArrayTypeAsVector(Type *T) {
738 ArrayType *AT = dyn_cast<ArrayType>(T);
739 if (!AT)
740 return T;
741 Type *ET = AT->getElementType();
742 if (!ET->isSingleValueType() || isa<VectorType>(ET))
743 reportFatalUsageError("loading non-scalar arrays from buffer fat pointers "
744 "should have recursed");
745 if (!DL.typeSizeEqualsStoreSize(AT))
747 "loading padded arrays from buffer fat pinters should have recursed");
748 return FixedVectorType::get(ET, AT->getNumElements());
749}
750
751Value *LegalizeBufferContentTypesVisitor::arrayToVector(Value *V,
752 Type *TargetType,
753 const Twine &Name) {
754 Value *VectorRes = PoisonValue::get(TargetType);
755 auto *VT = cast<FixedVectorType>(TargetType);
756 unsigned EC = VT->getNumElements();
757 for (auto I : iota_range<unsigned>(0, EC, /*Inclusive=*/false)) {
758 Value *Elem = IRB.CreateExtractValue(V, I, Name + ".elem." + Twine(I));
759 VectorRes = IRB.CreateInsertElement(VectorRes, Elem, I,
760 Name + ".as.vec." + Twine(I));
761 }
762 return VectorRes;
763}
764
765Value *LegalizeBufferContentTypesVisitor::vectorToArray(Value *V,
766 Type *OrigType,
767 const Twine &Name) {
768 Value *ArrayRes = PoisonValue::get(OrigType);
769 ArrayType *AT = cast<ArrayType>(OrigType);
770 unsigned EC = AT->getNumElements();
771 for (auto I : iota_range<unsigned>(0, EC, /*Inclusive=*/false)) {
772 Value *Elem = IRB.CreateExtractElement(V, I, Name + ".elem." + Twine(I));
773 ArrayRes = IRB.CreateInsertValue(ArrayRes, Elem, I,
774 Name + ".as.array." + Twine(I));
775 }
776 return ArrayRes;
777}
778
779Type *LegalizeBufferContentTypesVisitor::legalNonAggregateFor(Type *T) {
780 TypeSize Size = DL.getTypeStoreSizeInBits(T);
781 // Implicitly zero-extend to the next byte if needed
782 if (!DL.typeSizeEqualsStoreSize(T))
783 T = IRB.getIntNTy(Size.getFixedValue());
784 Type *ElemTy = T->getScalarType();
785 if (isa<PointerType, ScalableVectorType>(ElemTy)) {
786 // Pointers are always big enough, and we'll let scalable vectors through to
787 // fail in codegen.
788 return T;
789 }
790 unsigned ElemSize = DL.getTypeSizeInBits(ElemTy).getFixedValue();
791 if (isPowerOf2_32(ElemSize) && ElemSize >= 16 && ElemSize <= 128) {
792 // [vectors of] anything that's 16/32/64/128 bits can be cast and split into
793 // legal buffer operations.
794 return T;
795 }
796 Type *BestVectorElemType = nullptr;
797 if (Size.isKnownMultipleOf(32))
798 BestVectorElemType = IRB.getInt32Ty();
799 else if (Size.isKnownMultipleOf(16))
800 BestVectorElemType = IRB.getInt16Ty();
801 else
802 BestVectorElemType = IRB.getInt8Ty();
803 unsigned NumCastElems =
804 Size.getFixedValue() / BestVectorElemType->getIntegerBitWidth();
805 if (NumCastElems == 1)
806 return BestVectorElemType;
807 return FixedVectorType::get(BestVectorElemType, NumCastElems);
808}
809
810Value *LegalizeBufferContentTypesVisitor::makeLegalNonAggregate(
811 Value *V, Type *TargetType, const Twine &Name) {
812 Type *SourceType = V->getType();
813 TypeSize SourceSize = DL.getTypeSizeInBits(SourceType);
814 TypeSize TargetSize = DL.getTypeSizeInBits(TargetType);
815 if (SourceSize != TargetSize) {
816 Type *ShortScalarTy = IRB.getIntNTy(SourceSize.getFixedValue());
817 Type *ByteScalarTy = IRB.getIntNTy(TargetSize.getFixedValue());
818 Value *AsScalar = IRB.CreateBitCast(V, ShortScalarTy, Name + ".as.scalar");
819 Value *Zext = IRB.CreateZExt(AsScalar, ByteScalarTy, Name + ".zext");
820 V = Zext;
821 SourceType = ByteScalarTy;
822 }
823 return IRB.CreateBitCast(V, TargetType, Name + ".legal");
824}
825
826Value *LegalizeBufferContentTypesVisitor::makeIllegalNonAggregate(
827 Value *V, Type *OrigType, const Twine &Name) {
828 Type *LegalType = V->getType();
829 TypeSize LegalSize = DL.getTypeSizeInBits(LegalType);
830 TypeSize OrigSize = DL.getTypeSizeInBits(OrigType);
831 if (LegalSize != OrigSize) {
832 Type *ShortScalarTy = IRB.getIntNTy(OrigSize.getFixedValue());
833 Type *ByteScalarTy = IRB.getIntNTy(LegalSize.getFixedValue());
834 Value *AsScalar = IRB.CreateBitCast(V, ByteScalarTy, Name + ".bytes.cast");
835 Value *Trunc = IRB.CreateTrunc(AsScalar, ShortScalarTy, Name + ".trunc");
836 return IRB.CreateBitCast(Trunc, OrigType, Name + ".orig");
837 }
838 return IRB.CreateBitCast(V, OrigType, Name + ".real.ty");
839}
840
841Type *LegalizeBufferContentTypesVisitor::intrinsicTypeFor(Type *LegalType) {
842 auto *VT = dyn_cast<FixedVectorType>(LegalType);
843 if (!VT)
844 return LegalType;
845 Type *ET = VT->getElementType();
846 // Explicitly return the element type of 1-element vectors because the
847 // underlying intrinsics don't like <1 x T> even though it's a synonym for T.
848 if (VT->getNumElements() == 1)
849 return ET;
850 if (DL.getTypeSizeInBits(LegalType) == 96 && DL.getTypeSizeInBits(ET) < 32)
851 return FixedVectorType::get(IRB.getInt32Ty(), 3);
852 if (ET->isIntegerTy(8)) {
853 switch (VT->getNumElements()) {
854 default:
855 return LegalType; // Let it crash later
856 case 1:
857 return IRB.getInt8Ty();
858 case 2:
859 return IRB.getInt16Ty();
860 case 4:
861 return IRB.getInt32Ty();
862 case 8:
863 return FixedVectorType::get(IRB.getInt32Ty(), 2);
864 case 16:
865 return FixedVectorType::get(IRB.getInt32Ty(), 4);
866 }
867 }
868 return LegalType;
869}
870
871void LegalizeBufferContentTypesVisitor::getVecSlices(
872 Type *T, SmallVectorImpl<VecSlice> &Slices) {
873 Slices.clear();
874 auto *VT = dyn_cast<FixedVectorType>(T);
875 if (!VT)
876 return;
877
878 uint64_t ElemBitWidth =
879 DL.getTypeSizeInBits(VT->getElementType()).getFixedValue();
880
881 uint64_t ElemsPer4Words = 128 / ElemBitWidth;
882 uint64_t ElemsPer2Words = ElemsPer4Words / 2;
883 uint64_t ElemsPerWord = ElemsPer2Words / 2;
884 uint64_t ElemsPerShort = ElemsPerWord / 2;
885 uint64_t ElemsPerByte = ElemsPerShort / 2;
886 // If the elements evenly pack into 32-bit words, we can use 3-word stores,
887 // such as for <6 x bfloat> or <3 x i32>, but we can't dot his for, for
888 // example, <3 x i64>, since that's not slicing.
889 uint64_t ElemsPer3Words = ElemsPerWord * 3;
890
891 uint64_t TotalElems = VT->getNumElements();
892 uint64_t Index = 0;
893 auto TrySlice = [&](unsigned MaybeLen) {
894 if (MaybeLen > 0 && Index + MaybeLen <= TotalElems) {
895 VecSlice Slice{/*Index=*/Index, /*Length=*/MaybeLen};
896 Slices.push_back(Slice);
897 Index += MaybeLen;
898 return true;
899 }
900 return false;
901 };
902 while (Index < TotalElems) {
903 TrySlice(ElemsPer4Words) || TrySlice(ElemsPer3Words) ||
904 TrySlice(ElemsPer2Words) || TrySlice(ElemsPerWord) ||
905 TrySlice(ElemsPerShort) || TrySlice(ElemsPerByte);
906 }
907}
908
909Value *LegalizeBufferContentTypesVisitor::extractSlice(Value *Vec, VecSlice S,
910 const Twine &Name) {
911 auto *VecVT = dyn_cast<FixedVectorType>(Vec->getType());
912 if (!VecVT)
913 return Vec;
914 if (S.Length == VecVT->getNumElements() && S.Index == 0)
915 return Vec;
916 if (S.Length == 1)
917 return IRB.CreateExtractElement(Vec, S.Index,
918 Name + ".slice." + Twine(S.Index));
920 llvm::iota_range<int>(S.Index, S.Index + S.Length, /*Inclusive=*/false));
921 return IRB.CreateShuffleVector(Vec, Mask, Name + ".slice." + Twine(S.Index));
922}
923
924Value *LegalizeBufferContentTypesVisitor::insertSlice(Value *Whole, Value *Part,
925 VecSlice S,
926 const Twine &Name) {
927 auto *WholeVT = dyn_cast<FixedVectorType>(Whole->getType());
928 if (!WholeVT)
929 return Part;
930 if (S.Length == WholeVT->getNumElements() && S.Index == 0)
931 return Part;
932 if (S.Length == 1) {
933 return IRB.CreateInsertElement(Whole, Part, S.Index,
934 Name + ".slice." + Twine(S.Index));
935 }
936 int NumElems = cast<FixedVectorType>(Whole->getType())->getNumElements();
937
938 // Extend the slice with poisons to make the main shufflevector happy.
939 SmallVector<int> ExtPartMask(NumElems, -1);
940 for (auto [I, E] : llvm::enumerate(
941 MutableArrayRef<int>(ExtPartMask).take_front(S.Length))) {
942 E = I;
943 }
944 Value *ExtPart = IRB.CreateShuffleVector(Part, ExtPartMask,
945 Name + ".ext." + Twine(S.Index));
946
948 llvm::to_vector(llvm::iota_range<int>(0, NumElems, /*Inclusive=*/false));
949 for (auto [I, E] :
950 llvm::enumerate(MutableArrayRef<int>(Mask).slice(S.Index, S.Length)))
951 E = I + NumElems;
952 return IRB.CreateShuffleVector(Whole, ExtPart, Mask,
953 Name + ".parts." + Twine(S.Index));
954}
955
956bool LegalizeBufferContentTypesVisitor::visitLoadImpl(
957 LoadInst &OrigLI, Type *PartType, SmallVectorImpl<uint32_t> &AggIdxs,
958 uint64_t AggByteOff, Value *&Result, const Twine &Name) {
959 if (auto *ST = dyn_cast<StructType>(PartType)) {
960 const StructLayout *Layout = DL.getStructLayout(ST);
961 bool Changed = false;
962 for (auto [I, ElemTy, Offset] :
963 llvm::enumerate(ST->elements(), Layout->getMemberOffsets())) {
964 AggIdxs.push_back(I);
965 Changed |= visitLoadImpl(OrigLI, ElemTy, AggIdxs,
966 AggByteOff + Offset.getFixedValue(), Result,
967 Name + "." + Twine(I));
968 AggIdxs.pop_back();
969 }
970 return Changed;
971 }
972 if (auto *AT = dyn_cast<ArrayType>(PartType)) {
973 Type *ElemTy = AT->getElementType();
974 if (!ElemTy->isSingleValueType() || !DL.typeSizeEqualsStoreSize(ElemTy) ||
975 ElemTy->isVectorTy()) {
976 TypeSize ElemStoreSize = DL.getTypeStoreSize(ElemTy);
977 bool Changed = false;
978 for (auto I : llvm::iota_range<uint32_t>(0, AT->getNumElements(),
979 /*Inclusive=*/false)) {
980 AggIdxs.push_back(I);
981 Changed |= visitLoadImpl(OrigLI, ElemTy, AggIdxs,
982 AggByteOff + I * ElemStoreSize.getFixedValue(),
983 Result, Name + Twine(I));
984 AggIdxs.pop_back();
985 }
986 return Changed;
987 }
988 }
989
990 // Typical case
991
992 Type *ArrayAsVecType = scalarArrayTypeAsVector(PartType);
993 Type *LegalType = legalNonAggregateFor(ArrayAsVecType);
994
996 getVecSlices(LegalType, Slices);
997 bool HasSlices = Slices.size() > 1;
998 bool IsAggPart = !AggIdxs.empty();
999 Value *LoadsRes;
1000 if (!HasSlices && !IsAggPart) {
1001 Type *LoadableType = intrinsicTypeFor(LegalType);
1002 if (LoadableType == PartType)
1003 return false;
1004
1005 IRB.SetInsertPoint(&OrigLI);
1006 auto *NLI = cast<LoadInst>(OrigLI.clone());
1007 NLI->mutateType(LoadableType);
1008 NLI = IRB.Insert(NLI);
1009 NLI->setName(Name + ".loadable");
1010
1011 LoadsRes = IRB.CreateBitCast(NLI, LegalType, Name + ".from.loadable");
1012 } else {
1013 IRB.SetInsertPoint(&OrigLI);
1014 LoadsRes = PoisonValue::get(LegalType);
1015 Value *OrigPtr = OrigLI.getPointerOperand();
1016 // If we're needing to spill something into more than one load, its legal
1017 // type will be a vector (ex. an i256 load will have LegalType = <8 x i32>).
1018 // But if we're already a scalar (which can happen if we're splitting up a
1019 // struct), the element type will be the legal type itself.
1020 Type *ElemType = LegalType->getScalarType();
1021 unsigned ElemBytes = DL.getTypeStoreSize(ElemType);
1022 AAMDNodes AANodes = OrigLI.getAAMetadata();
1023 if (IsAggPart && Slices.empty())
1024 Slices.push_back(VecSlice{/*Index=*/0, /*Length=*/1});
1025 for (VecSlice S : Slices) {
1026 Type *SliceType =
1027 S.Length != 1 ? FixedVectorType::get(ElemType, S.Length) : ElemType;
1028 int64_t ByteOffset = AggByteOff + S.Index * ElemBytes;
1029 // You can't reasonably expect loads to wrap around the edge of memory.
1030 Value *NewPtr = IRB.CreateGEP(
1031 IRB.getInt8Ty(), OrigLI.getPointerOperand(), IRB.getInt32(ByteOffset),
1032 OrigPtr->getName() + ".off.ptr." + Twine(ByteOffset),
1034 Type *LoadableType = intrinsicTypeFor(SliceType);
1035 LoadInst *NewLI = IRB.CreateAlignedLoad(
1036 LoadableType, NewPtr, commonAlignment(OrigLI.getAlign(), ByteOffset),
1037 Name + ".off." + Twine(ByteOffset));
1038 copyMetadataForLoad(*NewLI, OrigLI);
1039 NewLI->setAAMetadata(
1040 AANodes.adjustForAccess(ByteOffset, LoadableType, DL));
1041 NewLI->setAtomic(OrigLI.getOrdering(), OrigLI.getSyncScopeID());
1042 NewLI->setVolatile(OrigLI.isVolatile());
1043 Value *Loaded = IRB.CreateBitCast(NewLI, SliceType,
1044 NewLI->getName() + ".from.loadable");
1045 LoadsRes = insertSlice(LoadsRes, Loaded, S, Name);
1046 }
1047 }
1048 if (LegalType != ArrayAsVecType)
1049 LoadsRes = makeIllegalNonAggregate(LoadsRes, ArrayAsVecType, Name);
1050 if (ArrayAsVecType != PartType)
1051 LoadsRes = vectorToArray(LoadsRes, PartType, Name);
1052
1053 if (IsAggPart)
1054 Result = IRB.CreateInsertValue(Result, LoadsRes, AggIdxs, Name);
1055 else
1056 Result = LoadsRes;
1057 return true;
1058}
1059
1060bool LegalizeBufferContentTypesVisitor::visitLoadInst(LoadInst &LI) {
1062 return false;
1063
1064 SmallVector<uint32_t> AggIdxs;
1065 Type *OrigType = LI.getType();
1066 Value *Result = PoisonValue::get(OrigType);
1067 bool Changed = visitLoadImpl(LI, OrigType, AggIdxs, 0, Result, LI.getName());
1068 if (!Changed)
1069 return false;
1070 Result->takeName(&LI);
1071 LI.replaceAllUsesWith(Result);
1072 LI.eraseFromParent();
1073 return Changed;
1074}
1075
1076std::pair<bool, bool> LegalizeBufferContentTypesVisitor::visitStoreImpl(
1077 StoreInst &OrigSI, Type *PartType, SmallVectorImpl<uint32_t> &AggIdxs,
1078 uint64_t AggByteOff, const Twine &Name) {
1079 if (auto *ST = dyn_cast<StructType>(PartType)) {
1080 const StructLayout *Layout = DL.getStructLayout(ST);
1081 bool Changed = false;
1082 for (auto [I, ElemTy, Offset] :
1083 llvm::enumerate(ST->elements(), Layout->getMemberOffsets())) {
1084 AggIdxs.push_back(I);
1085 Changed |= std::get<0>(visitStoreImpl(OrigSI, ElemTy, AggIdxs,
1086 AggByteOff + Offset.getFixedValue(),
1087 Name + "." + Twine(I)));
1088 AggIdxs.pop_back();
1089 }
1090 return std::make_pair(Changed, /*ModifiedInPlace=*/false);
1091 }
1092 if (auto *AT = dyn_cast<ArrayType>(PartType)) {
1093 Type *ElemTy = AT->getElementType();
1094 if (!ElemTy->isSingleValueType() || !DL.typeSizeEqualsStoreSize(ElemTy) ||
1095 ElemTy->isVectorTy()) {
1096 TypeSize ElemStoreSize = DL.getTypeStoreSize(ElemTy);
1097 bool Changed = false;
1098 for (auto I : llvm::iota_range<uint32_t>(0, AT->getNumElements(),
1099 /*Inclusive=*/false)) {
1100 AggIdxs.push_back(I);
1101 Changed |= std::get<0>(visitStoreImpl(
1102 OrigSI, ElemTy, AggIdxs,
1103 AggByteOff + I * ElemStoreSize.getFixedValue(), Name + Twine(I)));
1104 AggIdxs.pop_back();
1105 }
1106 return std::make_pair(Changed, /*ModifiedInPlace=*/false);
1107 }
1108 }
1109
1110 Value *OrigData = OrigSI.getValueOperand();
1111 Value *NewData = OrigData;
1112
1113 bool IsAggPart = !AggIdxs.empty();
1114 if (IsAggPart)
1115 NewData = IRB.CreateExtractValue(NewData, AggIdxs, Name);
1116
1117 Type *ArrayAsVecType = scalarArrayTypeAsVector(PartType);
1118 if (ArrayAsVecType != PartType) {
1119 NewData = arrayToVector(NewData, ArrayAsVecType, Name);
1120 }
1121
1122 Type *LegalType = legalNonAggregateFor(ArrayAsVecType);
1123 if (LegalType != ArrayAsVecType) {
1124 NewData = makeLegalNonAggregate(NewData, LegalType, Name);
1125 }
1126
1127 SmallVector<VecSlice> Slices;
1128 getVecSlices(LegalType, Slices);
1129 bool NeedToSplit = Slices.size() > 1 || IsAggPart;
1130 if (!NeedToSplit) {
1131 Type *StorableType = intrinsicTypeFor(LegalType);
1132 if (StorableType == PartType)
1133 return std::make_pair(/*Changed=*/false, /*ModifiedInPlace=*/false);
1134 NewData = IRB.CreateBitCast(NewData, StorableType, Name + ".storable");
1135 OrigSI.setOperand(0, NewData);
1136 return std::make_pair(/*Changed=*/true, /*ModifiedInPlace=*/true);
1137 }
1138
1139 Value *OrigPtr = OrigSI.getPointerOperand();
1140 Type *ElemType = LegalType->getScalarType();
1141 if (IsAggPart && Slices.empty())
1142 Slices.push_back(VecSlice{/*Index=*/0, /*Length=*/1});
1143 unsigned ElemBytes = DL.getTypeStoreSize(ElemType);
1144 AAMDNodes AANodes = OrigSI.getAAMetadata();
1145 for (VecSlice S : Slices) {
1146 Type *SliceType =
1147 S.Length != 1 ? FixedVectorType::get(ElemType, S.Length) : ElemType;
1148 int64_t ByteOffset = AggByteOff + S.Index * ElemBytes;
1149 Value *NewPtr =
1150 IRB.CreateGEP(IRB.getInt8Ty(), OrigPtr, IRB.getInt32(ByteOffset),
1151 OrigPtr->getName() + ".part." + Twine(S.Index),
1153 Value *DataSlice = extractSlice(NewData, S, Name);
1154 Type *StorableType = intrinsicTypeFor(SliceType);
1155 DataSlice = IRB.CreateBitCast(DataSlice, StorableType,
1156 DataSlice->getName() + ".storable");
1157 auto *NewSI = cast<StoreInst>(OrigSI.clone());
1158 NewSI->setAlignment(commonAlignment(OrigSI.getAlign(), ByteOffset));
1159 IRB.Insert(NewSI);
1160 NewSI->setOperand(0, DataSlice);
1161 NewSI->setOperand(1, NewPtr);
1162 NewSI->setAAMetadata(AANodes.adjustForAccess(ByteOffset, StorableType, DL));
1163 }
1164 return std::make_pair(/*Changed=*/true, /*ModifiedInPlace=*/false);
1165}
1166
1167bool LegalizeBufferContentTypesVisitor::visitStoreInst(StoreInst &SI) {
1168 if (SI.getPointerAddressSpace() != AMDGPUAS::BUFFER_FAT_POINTER)
1169 return false;
1170 IRB.SetInsertPoint(&SI);
1171 SmallVector<uint32_t> AggIdxs;
1172 Value *OrigData = SI.getValueOperand();
1173 auto [Changed, ModifiedInPlace] =
1174 visitStoreImpl(SI, OrigData->getType(), AggIdxs, 0, OrigData->getName());
1175 if (Changed && !ModifiedInPlace)
1176 SI.eraseFromParent();
1177 return Changed;
1178}
1179
1180bool LegalizeBufferContentTypesVisitor::processFunction(Function &F) {
1181 bool Changed = false;
1182 // Note, memory transfer intrinsics won't
1184 Changed |= visit(I);
1185 }
1186 return Changed;
1187}
1188
1189/// Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered
1190/// buffer fat pointer constant.
1191static std::pair<Constant *, Constant *>
1193 assert(isSplitFatPtr(C->getType()) && "Not a split fat buffer pointer");
1194 return std::make_pair(C->getAggregateElement(0u), C->getAggregateElement(1u));
1195}
1196
1197namespace {
1198/// Handle the remapping of ptr addrspace(7) constants.
1199class FatPtrConstMaterializer final : public ValueMaterializer {
1200 BufferFatPtrToStructTypeMap *TypeMap;
1201 // An internal mapper that is used to recurse into the arguments of constants.
1202 // While the documentation for `ValueMapper` specifies not to use it
1203 // recursively, examination of the logic in mapValue() shows that it can
1204 // safely be used recursively when handling constants, like it does in its own
1205 // logic.
1206 ValueMapper InternalMapper;
1207
1208 Constant *materializeBufferFatPtrConst(Constant *C);
1209
1210public:
1211 // UnderlyingMap is the value map this materializer will be filling.
1212 FatPtrConstMaterializer(BufferFatPtrToStructTypeMap *TypeMap,
1213 ValueToValueMapTy &UnderlyingMap)
1214 : TypeMap(TypeMap),
1215 InternalMapper(UnderlyingMap, RF_None, TypeMap, this) {}
1216 ~FatPtrConstMaterializer() = default;
1217
1218 Value *materialize(Value *V) override;
1219};
1220} // namespace
1221
1222Constant *FatPtrConstMaterializer::materializeBufferFatPtrConst(Constant *C) {
1223 Type *SrcTy = C->getType();
1224 auto *NewTy = dyn_cast<StructType>(TypeMap->remapType(SrcTy));
1225 if (C->isNullValue())
1226 return ConstantAggregateZero::getNullValue(NewTy);
1227 if (isa<PoisonValue>(C)) {
1228 return ConstantStruct::get(NewTy,
1229 {PoisonValue::get(NewTy->getElementType(0)),
1230 PoisonValue::get(NewTy->getElementType(1))});
1231 }
1232 if (isa<UndefValue>(C)) {
1233 return ConstantStruct::get(NewTy,
1234 {UndefValue::get(NewTy->getElementType(0)),
1235 UndefValue::get(NewTy->getElementType(1))});
1236 }
1237
1238 if (auto *VC = dyn_cast<ConstantVector>(C)) {
1239 if (Constant *S = VC->getSplatValue()) {
1240 Constant *NewS = InternalMapper.mapConstant(*S);
1241 if (!NewS)
1242 return nullptr;
1243 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewS);
1244 auto EC = VC->getType()->getElementCount();
1245 return ConstantStruct::get(NewTy, {ConstantVector::getSplat(EC, Rsrc),
1246 ConstantVector::getSplat(EC, Off)});
1247 }
1250 for (Value *Op : VC->operand_values()) {
1251 auto *NewOp = dyn_cast_or_null<Constant>(InternalMapper.mapValue(*Op));
1252 if (!NewOp)
1253 return nullptr;
1254 auto [Rsrc, Off] = splitLoweredFatBufferConst(NewOp);
1255 Rsrcs.push_back(Rsrc);
1256 Offs.push_back(Off);
1257 }
1258 Constant *RsrcVec = ConstantVector::get(Rsrcs);
1259 Constant *OffVec = ConstantVector::get(Offs);
1260 return ConstantStruct::get(NewTy, {RsrcVec, OffVec});
1261 }
1262
1263 if (isa<GlobalValue>(C))
1264 reportFatalUsageError("global values containing ptr addrspace(7) (buffer "
1265 "fat pointer) values are not supported");
1266
1267 if (isa<ConstantExpr>(C))
1269 "constant exprs containing ptr addrspace(7) (buffer "
1270 "fat pointer) values should have been expanded earlier");
1271
1272 return nullptr;
1273}
1274
1275Value *FatPtrConstMaterializer::materialize(Value *V) {
1276 Constant *C = dyn_cast<Constant>(V);
1277 if (!C)
1278 return nullptr;
1279 // Structs and other types that happen to contain fat pointers get remapped
1280 // by the mapValue() logic.
1281 if (!isBufferFatPtrConst(C))
1282 return nullptr;
1283 return materializeBufferFatPtrConst(C);
1284}
1285
1286using PtrParts = std::pair<Value *, Value *>;
1287namespace {
1288// The visitor returns the resource and offset parts for an instruction if they
1289// can be computed, or (nullptr, nullptr) for cases that don't have a meaningful
1290// value mapping.
1291class SplitPtrStructs : public InstVisitor<SplitPtrStructs, PtrParts> {
1292 ValueToValueMapTy RsrcParts;
1293 ValueToValueMapTy OffParts;
1294
1295 // Track instructions that have been rewritten into a user of the component
1296 // parts of their ptr addrspace(7) input. Instructions that produced
1297 // ptr addrspace(7) parts should **not** be RAUW'd before being added to this
1298 // set, as that replacement will be handled in a post-visit step. However,
1299 // instructions that yield values that aren't fat pointers (ex. ptrtoint)
1300 // should RAUW themselves with new instructions that use the split parts
1301 // of their arguments during processing.
1302 DenseSet<Instruction *> SplitUsers;
1303
1304 // Nodes that need a second look once we've computed the parts for all other
1305 // instructions to see if, for example, we really need to phi on the resource
1306 // part.
1307 SmallVector<Instruction *> Conditionals;
1308 // Temporary instructions produced while lowering conditionals that should be
1309 // killed.
1310 SmallVector<Instruction *> ConditionalTemps;
1311
1312 // Subtarget info, needed for determining what cache control bits to set.
1313 const TargetMachine *TM;
1314 const GCNSubtarget *ST = nullptr;
1315
1317
1318 // Copy metadata between instructions if applicable.
1319 void copyMetadata(Value *Dest, Value *Src);
1320
1321 // Get the resource and offset parts of the value V, inserting appropriate
1322 // extractvalue calls if needed.
1323 PtrParts getPtrParts(Value *V);
1324
1325 // Given an instruction that could produce multiple resource parts (a PHI or
1326 // select), collect the set of possible instructions that could have provided
1327 // its resource parts that it could have (the `Roots`) and the set of
1328 // conditional instructions visited during the search (`Seen`). If, after
1329 // removing the root of the search from `Seen` and `Roots`, `Seen` is a subset
1330 // of `Roots` and `Roots - Seen` contains one element, the resource part of
1331 // that element can replace the resource part of all other elements in `Seen`.
1332 void getPossibleRsrcRoots(Instruction *I, SmallPtrSetImpl<Value *> &Roots,
1334 void processConditionals();
1335
1336 // If an instruction hav been split into resource and offset parts,
1337 // delete that instruction. If any of its uses have not themselves been split
1338 // into parts (for example, an insertvalue), construct the structure
1339 // that the type rewrites declared should be produced by the dying instruction
1340 // and use that.
1341 // Also, kill the temporary extractvalue operations produced by the two-stage
1342 // lowering of PHIs and conditionals.
1343 void killAndReplaceSplitInstructions(SmallVectorImpl<Instruction *> &Origs);
1344
1345 void setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx);
1346 void insertPreMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
1347 void insertPostMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
1348 Value *handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr, Type *Ty,
1349 Align Alignment, AtomicOrdering Order,
1350 bool IsVolatile, SyncScope::ID SSID);
1351
1352public:
1353 SplitPtrStructs(const DataLayout &DL, LLVMContext &Ctx,
1354 const TargetMachine *TM)
1355 : TM(TM), IRB(Ctx, InstSimplifyFolder(DL)) {}
1356
1357 void processFunction(Function &F);
1358
1365
1372
1376
1379
1381};
1382} // namespace
1383
1384void SplitPtrStructs::copyMetadata(Value *Dest, Value *Src) {
1385 auto *DestI = dyn_cast<Instruction>(Dest);
1386 auto *SrcI = dyn_cast<Instruction>(Src);
1387
1388 if (!DestI || !SrcI)
1389 return;
1390
1391 DestI->copyMetadata(*SrcI);
1392}
1393
1394PtrParts SplitPtrStructs::getPtrParts(Value *V) {
1395 assert(isSplitFatPtr(V->getType()) && "it's not meaningful to get the parts "
1396 "of something that wasn't rewritten");
1397 auto *RsrcEntry = &RsrcParts[V];
1398 auto *OffEntry = &OffParts[V];
1399 if (*RsrcEntry && *OffEntry)
1400 return {*RsrcEntry, *OffEntry};
1401
1402 if (auto *C = dyn_cast<Constant>(V)) {
1403 auto [Rsrc, Off] = splitLoweredFatBufferConst(C);
1404 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1405 }
1406
1408 if (auto *I = dyn_cast<Instruction>(V)) {
1409 LLVM_DEBUG(dbgs() << "Recursing to split parts of " << *I << "\n");
1410 auto [Rsrc, Off] = visit(*I);
1411 if (Rsrc && Off)
1412 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1413 // We'll be creating the new values after the relevant instruction.
1414 // This instruction generates a value and so isn't a terminator.
1415 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1416 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1417 } else if (auto *A = dyn_cast<Argument>(V)) {
1418 IRB.SetInsertPointPastAllocas(A->getParent());
1419 IRB.SetCurrentDebugLocation(DebugLoc());
1420 }
1421 Value *Rsrc = IRB.CreateExtractValue(V, 0, V->getName() + ".rsrc");
1422 Value *Off = IRB.CreateExtractValue(V, 1, V->getName() + ".off");
1423 return {*RsrcEntry = Rsrc, *OffEntry = Off};
1424}
1425
1426/// Returns the instruction that defines the resource part of the value V.
1427/// Note that this is not getUnderlyingObject(), since that looks through
1428/// operations like ptrmask which might modify the resource part.
1429///
1430/// We can limit ourselves to just looking through GEPs followed by looking
1431/// through addrspacecasts because only those two operations preserve the
1432/// resource part, and because operations on an `addrspace(8)` (which is the
1433/// legal input to this addrspacecast) would produce a different resource part.
1435 while (auto *GEP = dyn_cast<GEPOperator>(V))
1436 V = GEP->getPointerOperand();
1437 while (auto *ASC = dyn_cast<AddrSpaceCastOperator>(V))
1438 V = ASC->getPointerOperand();
1439 return V;
1440}
1441
1442void SplitPtrStructs::getPossibleRsrcRoots(Instruction *I,
1445 if (auto *PHI = dyn_cast<PHINode>(I)) {
1446 if (!Seen.insert(I).second)
1447 return;
1448 for (Value *In : PHI->incoming_values()) {
1449 In = rsrcPartRoot(In);
1450 Roots.insert(In);
1451 if (isa<PHINode, SelectInst>(In))
1452 getPossibleRsrcRoots(cast<Instruction>(In), Roots, Seen);
1453 }
1454 } else if (auto *SI = dyn_cast<SelectInst>(I)) {
1455 if (!Seen.insert(SI).second)
1456 return;
1457 Value *TrueVal = rsrcPartRoot(SI->getTrueValue());
1458 Value *FalseVal = rsrcPartRoot(SI->getFalseValue());
1459 Roots.insert(TrueVal);
1460 Roots.insert(FalseVal);
1461 if (isa<PHINode, SelectInst>(TrueVal))
1462 getPossibleRsrcRoots(cast<Instruction>(TrueVal), Roots, Seen);
1463 if (isa<PHINode, SelectInst>(FalseVal))
1464 getPossibleRsrcRoots(cast<Instruction>(FalseVal), Roots, Seen);
1465 } else {
1466 llvm_unreachable("getPossibleRsrcParts() only works on phi and select");
1467 }
1468}
1469
1470void SplitPtrStructs::processConditionals() {
1474 for (Instruction *I : Conditionals) {
1475 // These have to exist by now because we've visited these nodes.
1476 Value *Rsrc = RsrcParts[I];
1477 Value *Off = OffParts[I];
1478 assert(Rsrc && Off && "must have visited conditionals by now");
1479
1480 std::optional<Value *> MaybeRsrc;
1481 auto MaybeFoundRsrc = FoundRsrcs.find(I);
1482 if (MaybeFoundRsrc != FoundRsrcs.end()) {
1483 MaybeRsrc = MaybeFoundRsrc->second;
1484 } else {
1486 Roots.clear();
1487 Seen.clear();
1488 getPossibleRsrcRoots(I, Roots, Seen);
1489 LLVM_DEBUG(dbgs() << "Processing conditional: " << *I << "\n");
1490#ifndef NDEBUG
1491 for (Value *V : Roots)
1492 LLVM_DEBUG(dbgs() << "Root: " << *V << "\n");
1493 for (Value *V : Seen)
1494 LLVM_DEBUG(dbgs() << "Seen: " << *V << "\n");
1495#endif
1496 // If we are our own possible root, then we shouldn't block our
1497 // replacement with a valid incoming value.
1498 Roots.erase(I);
1499 // We don't want to block the optimization for conditionals that don't
1500 // refer to themselves but did see themselves during the traversal.
1501 Seen.erase(I);
1502
1503 if (set_is_subset(Seen, Roots)) {
1504 auto Diff = set_difference(Roots, Seen);
1505 if (Diff.size() == 1) {
1506 Value *RootVal = *Diff.begin();
1507 // Handle the case where previous loops already looked through
1508 // an addrspacecast.
1509 if (isSplitFatPtr(RootVal->getType()))
1510 MaybeRsrc = std::get<0>(getPtrParts(RootVal));
1511 else
1512 MaybeRsrc = RootVal;
1513 }
1514 }
1515 }
1516
1517 if (auto *PHI = dyn_cast<PHINode>(I)) {
1518 Value *NewRsrc;
1519 StructType *PHITy = cast<StructType>(PHI->getType());
1520 IRB.SetInsertPoint(*PHI->getInsertionPointAfterDef());
1521 IRB.SetCurrentDebugLocation(PHI->getDebugLoc());
1522 if (MaybeRsrc) {
1523 NewRsrc = *MaybeRsrc;
1524 } else {
1525 Type *RsrcTy = PHITy->getElementType(0);
1526 auto *RsrcPHI = IRB.CreatePHI(RsrcTy, PHI->getNumIncomingValues());
1527 RsrcPHI->takeName(Rsrc);
1528 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
1529 Value *VRsrc = std::get<0>(getPtrParts(V));
1530 RsrcPHI->addIncoming(VRsrc, BB);
1531 }
1532 copyMetadata(RsrcPHI, PHI);
1533 NewRsrc = RsrcPHI;
1534 }
1535
1536 Type *OffTy = PHITy->getElementType(1);
1537 auto *NewOff = IRB.CreatePHI(OffTy, PHI->getNumIncomingValues());
1538 NewOff->takeName(Off);
1539 for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
1540 assert(OffParts.count(V) && "An offset part had to be created by now");
1541 Value *VOff = std::get<1>(getPtrParts(V));
1542 NewOff->addIncoming(VOff, BB);
1543 }
1544 copyMetadata(NewOff, PHI);
1545
1546 // Note: We don't eraseFromParent() the temporaries because we don't want
1547 // to put the corrections maps in an inconstent state. That'll be handed
1548 // during the rest of the killing. Also, `ValueToValueMapTy` guarantees
1549 // that references in that map will be updated as well.
1550 // Note that if the temporary instruction got `InstSimplify`'d away, it
1551 // might be something like a block argument.
1552 if (auto *RsrcInst = dyn_cast<Instruction>(Rsrc)) {
1553 ConditionalTemps.push_back(RsrcInst);
1554 RsrcInst->replaceAllUsesWith(NewRsrc);
1555 }
1556 if (auto *OffInst = dyn_cast<Instruction>(Off)) {
1557 ConditionalTemps.push_back(OffInst);
1558 OffInst->replaceAllUsesWith(NewOff);
1559 }
1560
1561 // Save on recomputing the cycle traversals in known-root cases.
1562 if (MaybeRsrc)
1563 for (Value *V : Seen)
1564 FoundRsrcs[V] = NewRsrc;
1565 } else if (isa<SelectInst>(I)) {
1566 if (MaybeRsrc) {
1567 if (auto *RsrcInst = dyn_cast<Instruction>(Rsrc)) {
1568 ConditionalTemps.push_back(RsrcInst);
1569 RsrcInst->replaceAllUsesWith(*MaybeRsrc);
1570 }
1571 for (Value *V : Seen)
1572 FoundRsrcs[V] = *MaybeRsrc;
1573 }
1574 } else {
1575 llvm_unreachable("Only PHIs and selects go in the conditionals list");
1576 }
1577 }
1578}
1579
1580void SplitPtrStructs::killAndReplaceSplitInstructions(
1582 for (Instruction *I : ConditionalTemps)
1583 I->eraseFromParent();
1584
1585 for (Instruction *I : Origs) {
1586 if (!SplitUsers.contains(I))
1587 continue;
1588
1590 findDbgValues(I, Dbgs);
1591 for (DbgVariableRecord *Dbg : Dbgs) {
1592 auto &DL = I->getDataLayout();
1593 assert(isSplitFatPtr(I->getType()) &&
1594 "We should've RAUW'd away loads, stores, etc. at this point");
1595 DbgVariableRecord *OffDbg = Dbg->clone();
1596 auto [Rsrc, Off] = getPtrParts(I);
1597
1598 int64_t RsrcSz = DL.getTypeSizeInBits(Rsrc->getType());
1599 int64_t OffSz = DL.getTypeSizeInBits(Off->getType());
1600
1601 std::optional<DIExpression *> RsrcExpr =
1602 DIExpression::createFragmentExpression(Dbg->getExpression(), 0,
1603 RsrcSz);
1604 std::optional<DIExpression *> OffExpr =
1605 DIExpression::createFragmentExpression(Dbg->getExpression(), RsrcSz,
1606 OffSz);
1607 if (OffExpr) {
1608 OffDbg->setExpression(*OffExpr);
1609 OffDbg->replaceVariableLocationOp(I, Off);
1610 OffDbg->insertBefore(Dbg);
1611 } else {
1612 OffDbg->eraseFromParent();
1613 }
1614 if (RsrcExpr) {
1615 Dbg->setExpression(*RsrcExpr);
1616 Dbg->replaceVariableLocationOp(I, Rsrc);
1617 } else {
1618 Dbg->replaceVariableLocationOp(I, PoisonValue::get(I->getType()));
1619 }
1620 }
1621
1622 Value *Poison = PoisonValue::get(I->getType());
1623 I->replaceUsesWithIf(Poison, [&](const Use &U) -> bool {
1624 if (const auto *UI = dyn_cast<Instruction>(U.getUser()))
1625 return SplitUsers.contains(UI);
1626 return false;
1627 });
1628
1629 if (I->use_empty()) {
1630 I->eraseFromParent();
1631 continue;
1632 }
1633 IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
1634 IRB.SetCurrentDebugLocation(I->getDebugLoc());
1635 auto [Rsrc, Off] = getPtrParts(I);
1636 Value *Struct = PoisonValue::get(I->getType());
1637 Struct = IRB.CreateInsertValue(Struct, Rsrc, 0);
1638 Struct = IRB.CreateInsertValue(Struct, Off, 1);
1639 copyMetadata(Struct, I);
1640 Struct->takeName(I);
1641 I->replaceAllUsesWith(Struct);
1642 I->eraseFromParent();
1643 }
1644}
1645
1646void SplitPtrStructs::setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx) {
1647 LLVMContext &Ctx = Intr->getContext();
1648 Intr->addParamAttr(RsrcArgIdx, Attribute::getWithAlignment(Ctx, A));
1649}
1650
1651void SplitPtrStructs::insertPreMemOpFence(AtomicOrdering Order,
1652 SyncScope::ID SSID) {
1653 switch (Order) {
1654 case AtomicOrdering::Release:
1655 case AtomicOrdering::AcquireRelease:
1656 case AtomicOrdering::SequentiallyConsistent:
1657 IRB.CreateFence(AtomicOrdering::Release, SSID);
1658 break;
1659 default:
1660 break;
1661 }
1662}
1663
1664void SplitPtrStructs::insertPostMemOpFence(AtomicOrdering Order,
1665 SyncScope::ID SSID) {
1666 switch (Order) {
1667 case AtomicOrdering::Acquire:
1668 case AtomicOrdering::AcquireRelease:
1669 case AtomicOrdering::SequentiallyConsistent:
1670 IRB.CreateFence(AtomicOrdering::Acquire, SSID);
1671 break;
1672 default:
1673 break;
1674 }
1675}
1676
1677Value *SplitPtrStructs::handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr,
1678 Type *Ty, Align Alignment,
1679 AtomicOrdering Order, bool IsVolatile,
1680 SyncScope::ID SSID) {
1681 IRB.SetInsertPoint(I);
1682
1683 auto [Rsrc, Off] = getPtrParts(Ptr);
1685 if (Arg)
1686 Args.push_back(Arg);
1687 Args.push_back(Rsrc);
1688 Args.push_back(Off);
1689 insertPreMemOpFence(Order, SSID);
1690 // soffset is always 0 for these cases, where we always want any offset to be
1691 // part of bounds checking and we don't know which parts of the GEPs is
1692 // uniform.
1693 Args.push_back(IRB.getInt32(0));
1694
1695 uint32_t Aux = 0;
1696 if (IsVolatile)
1698 Args.push_back(IRB.getInt32(Aux));
1699
1701 if (isa<LoadInst>(I))
1702 IID = Order == AtomicOrdering::NotAtomic
1703 ? Intrinsic::amdgcn_raw_ptr_buffer_load
1704 : Intrinsic::amdgcn_raw_ptr_atomic_buffer_load;
1705 else if (isa<StoreInst>(I))
1706 IID = Intrinsic::amdgcn_raw_ptr_buffer_store;
1707 else if (auto *RMW = dyn_cast<AtomicRMWInst>(I)) {
1708 switch (RMW->getOperation()) {
1710 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap;
1711 break;
1712 case AtomicRMWInst::Add:
1713 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_add;
1714 break;
1715 case AtomicRMWInst::Sub:
1716 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub;
1717 break;
1718 case AtomicRMWInst::And:
1719 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_and;
1720 break;
1721 case AtomicRMWInst::Or:
1722 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_or;
1723 break;
1724 case AtomicRMWInst::Xor:
1725 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_xor;
1726 break;
1727 case AtomicRMWInst::Max:
1728 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smax;
1729 break;
1730 case AtomicRMWInst::Min:
1731 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smin;
1732 break;
1734 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umax;
1735 break;
1737 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umin;
1738 break;
1740 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fadd;
1741 break;
1743 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmax;
1744 break;
1746 IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmin;
1747 break;
1748 case AtomicRMWInst::FSub: {
1750 "atomic floating point subtraction not supported for "
1751 "buffer resources and should've been expanded away");
1752 break;
1753 }
1756 "atomic floating point fmaximum not supported for "
1757 "buffer resources and should've been expanded away");
1758 break;
1759 }
1762 "atomic floating point fminimum not supported for "
1763 "buffer resources and should've been expanded away");
1764 break;
1765 }
1768 "atomic nand not supported for buffer resources and "
1769 "should've been expanded away");
1770 break;
1773 reportFatalUsageError("wrapping increment/decrement not supported for "
1774 "buffer resources and should've ben expanded away");
1775 break;
1777 llvm_unreachable("Not sure how we got a bad binop");
1780 break;
1781 }
1782 }
1783
1784 auto *Call = IRB.CreateIntrinsic(IID, Ty, Args);
1785 copyMetadata(Call, I);
1786 setAlign(Call, Alignment, Arg ? 1 : 0);
1787 Call->takeName(I);
1788
1789 insertPostMemOpFence(Order, SSID);
1790 // The "no moving p7 directly" rewrites ensure that this load or store won't
1791 // itself need to be split into parts.
1792 SplitUsers.insert(I);
1793 I->replaceAllUsesWith(Call);
1794 return Call;
1795}
1796
1797PtrParts SplitPtrStructs::visitInstruction(Instruction &I) {
1798 return {nullptr, nullptr};
1799}
1800
1801PtrParts SplitPtrStructs::visitLoadInst(LoadInst &LI) {
1803 return {nullptr, nullptr};
1804 handleMemoryInst(&LI, nullptr, LI.getPointerOperand(), LI.getType(),
1805 LI.getAlign(), LI.getOrdering(), LI.isVolatile(),
1806 LI.getSyncScopeID());
1807 return {nullptr, nullptr};
1808}
1809
1810PtrParts SplitPtrStructs::visitStoreInst(StoreInst &SI) {
1811 if (!isSplitFatPtr(SI.getPointerOperandType()))
1812 return {nullptr, nullptr};
1813 Value *Arg = SI.getValueOperand();
1814 handleMemoryInst(&SI, Arg, SI.getPointerOperand(), Arg->getType(),
1815 SI.getAlign(), SI.getOrdering(), SI.isVolatile(),
1816 SI.getSyncScopeID());
1817 return {nullptr, nullptr};
1818}
1819
1820PtrParts SplitPtrStructs::visitAtomicRMWInst(AtomicRMWInst &AI) {
1822 return {nullptr, nullptr};
1823 Value *Arg = AI.getValOperand();
1824 handleMemoryInst(&AI, Arg, AI.getPointerOperand(), Arg->getType(),
1825 AI.getAlign(), AI.getOrdering(), AI.isVolatile(),
1826 AI.getSyncScopeID());
1827 return {nullptr, nullptr};
1828}
1829
1830// Unlike load, store, and RMW, cmpxchg needs special handling to account
1831// for the boolean argument.
1832PtrParts SplitPtrStructs::visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI) {
1833 Value *Ptr = AI.getPointerOperand();
1834 if (!isSplitFatPtr(Ptr->getType()))
1835 return {nullptr, nullptr};
1836 IRB.SetInsertPoint(&AI);
1837
1838 Type *Ty = AI.getNewValOperand()->getType();
1839 AtomicOrdering Order = AI.getMergedOrdering();
1840 SyncScope::ID SSID = AI.getSyncScopeID();
1841 bool IsNonTemporal = AI.getMetadata(LLVMContext::MD_nontemporal);
1842
1843 auto [Rsrc, Off] = getPtrParts(Ptr);
1844 insertPreMemOpFence(Order, SSID);
1845
1846 uint32_t Aux = 0;
1847 if (IsNonTemporal)
1848 Aux |= AMDGPU::CPol::SLC;
1849 if (AI.isVolatile())
1851 auto *Call =
1852 IRB.CreateIntrinsic(Intrinsic::amdgcn_raw_ptr_buffer_atomic_cmpswap, Ty,
1853 {AI.getNewValOperand(), AI.getCompareOperand(), Rsrc,
1854 Off, IRB.getInt32(0), IRB.getInt32(Aux)});
1855 copyMetadata(Call, &AI);
1856 setAlign(Call, AI.getAlign(), 2);
1857 Call->takeName(&AI);
1858 insertPostMemOpFence(Order, SSID);
1859
1860 Value *Res = PoisonValue::get(AI.getType());
1861 Res = IRB.CreateInsertValue(Res, Call, 0);
1862 if (!AI.isWeak()) {
1863 Value *Succeeded = IRB.CreateICmpEQ(Call, AI.getCompareOperand());
1864 Res = IRB.CreateInsertValue(Res, Succeeded, 1);
1865 }
1866 SplitUsers.insert(&AI);
1867 AI.replaceAllUsesWith(Res);
1868 return {nullptr, nullptr};
1869}
1870
1871PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
1872 using namespace llvm::PatternMatch;
1873 Value *Ptr = GEP.getPointerOperand();
1874 if (!isSplitFatPtr(Ptr->getType()))
1875 return {nullptr, nullptr};
1876 IRB.SetInsertPoint(&GEP);
1877
1878 auto [Rsrc, Off] = getPtrParts(Ptr);
1879 const DataLayout &DL = GEP.getDataLayout();
1880 bool IsNUW = GEP.hasNoUnsignedWrap();
1881 bool IsNUSW = GEP.hasNoUnsignedSignedWrap();
1882
1883 StructType *ResTy = cast<StructType>(GEP.getType());
1884 Type *ResRsrcTy = ResTy->getElementType(0);
1885 VectorType *ResRsrcVecTy = dyn_cast<VectorType>(ResRsrcTy);
1886 bool BroadcastsPtr = ResRsrcVecTy && !isa<VectorType>(Off->getType());
1887
1888 // In order to call emitGEPOffset() and thus not have to reimplement it,
1889 // we need the GEP result to have ptr addrspace(7) type.
1890 Type *FatPtrTy =
1891 ResRsrcTy->getWithNewType(IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER));
1892 GEP.mutateType(FatPtrTy);
1893 Value *OffAccum = emitGEPOffset(&IRB, DL, &GEP);
1894 GEP.mutateType(ResTy);
1895
1896 if (BroadcastsPtr) {
1897 Rsrc = IRB.CreateVectorSplat(ResRsrcVecTy->getElementCount(), Rsrc,
1898 Rsrc->getName());
1899 Off = IRB.CreateVectorSplat(ResRsrcVecTy->getElementCount(), Off,
1900 Off->getName());
1901 }
1902 if (match(OffAccum, m_Zero())) { // Constant-zero offset
1903 SplitUsers.insert(&GEP);
1904 return {Rsrc, Off};
1905 }
1906
1907 bool HasNonNegativeOff = false;
1908 if (auto *CI = dyn_cast<ConstantInt>(OffAccum)) {
1909 HasNonNegativeOff = !CI->isNegative();
1910 }
1911 Value *NewOff;
1912 if (match(Off, m_Zero())) {
1913 NewOff = OffAccum;
1914 } else {
1915 NewOff = IRB.CreateAdd(Off, OffAccum, "",
1916 /*hasNUW=*/IsNUW || (IsNUSW && HasNonNegativeOff),
1917 /*hasNSW=*/false);
1918 }
1919 copyMetadata(NewOff, &GEP);
1920 NewOff->takeName(&GEP);
1921 SplitUsers.insert(&GEP);
1922 return {Rsrc, NewOff};
1923}
1924
1925PtrParts SplitPtrStructs::visitPtrToIntInst(PtrToIntInst &PI) {
1926 Value *Ptr = PI.getPointerOperand();
1927 if (!isSplitFatPtr(Ptr->getType()))
1928 return {nullptr, nullptr};
1929 IRB.SetInsertPoint(&PI);
1930
1931 Type *ResTy = PI.getType();
1932 unsigned Width = ResTy->getScalarSizeInBits();
1933
1934 auto [Rsrc, Off] = getPtrParts(Ptr);
1935 const DataLayout &DL = PI.getDataLayout();
1936 unsigned FatPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
1937
1938 Value *Res;
1939 if (Width <= BufferOffsetWidth) {
1940 Res = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1941 PI.getName() + ".off");
1942 } else {
1943 Value *RsrcInt = IRB.CreatePtrToInt(Rsrc, ResTy, PI.getName() + ".rsrc");
1944 Value *Shl = IRB.CreateShl(
1945 RsrcInt,
1946 ConstantExpr::getIntegerValue(ResTy, APInt(Width, BufferOffsetWidth)),
1947 "", Width >= FatPtrWidth, Width > FatPtrWidth);
1948 Value *OffCast = IRB.CreateIntCast(Off, ResTy, /*isSigned=*/false,
1949 PI.getName() + ".off");
1950 Res = IRB.CreateOr(Shl, OffCast);
1951 }
1952
1953 copyMetadata(Res, &PI);
1954 Res->takeName(&PI);
1955 SplitUsers.insert(&PI);
1956 PI.replaceAllUsesWith(Res);
1957 return {nullptr, nullptr};
1958}
1959
1960PtrParts SplitPtrStructs::visitPtrToAddrInst(PtrToAddrInst &PA) {
1961 Value *Ptr = PA.getPointerOperand();
1962 if (!isSplitFatPtr(Ptr->getType()))
1963 return {nullptr, nullptr};
1964 IRB.SetInsertPoint(&PA);
1965
1966 auto [Rsrc, Off] = getPtrParts(Ptr);
1967 Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false);
1968 copyMetadata(Res, &PA);
1969 Res->takeName(&PA);
1970 SplitUsers.insert(&PA);
1971 PA.replaceAllUsesWith(Res);
1972 return {nullptr, nullptr};
1973}
1974
1975PtrParts SplitPtrStructs::visitIntToPtrInst(IntToPtrInst &IP) {
1976 if (!isSplitFatPtr(IP.getType()))
1977 return {nullptr, nullptr};
1978 IRB.SetInsertPoint(&IP);
1979 const DataLayout &DL = IP.getDataLayout();
1980 unsigned RsrcPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_RESOURCE);
1981 Value *Int = IP.getOperand(0);
1982 Type *IntTy = Int->getType();
1983 Type *RsrcIntTy = IntTy->getWithNewBitWidth(RsrcPtrWidth);
1984 unsigned Width = IntTy->getScalarSizeInBits();
1985
1986 auto *RetTy = cast<StructType>(IP.getType());
1987 Type *RsrcTy = RetTy->getElementType(0);
1988 Type *OffTy = RetTy->getElementType(1);
1989 Value *RsrcPart = IRB.CreateLShr(
1990 Int,
1991 ConstantExpr::getIntegerValue(IntTy, APInt(Width, BufferOffsetWidth)));
1992 Value *RsrcInt = IRB.CreateIntCast(RsrcPart, RsrcIntTy, /*isSigned=*/false);
1993 Value *Rsrc = IRB.CreateIntToPtr(RsrcInt, RsrcTy, IP.getName() + ".rsrc");
1994 Value *Off =
1995 IRB.CreateIntCast(Int, OffTy, /*IsSigned=*/false, IP.getName() + ".off");
1996
1997 copyMetadata(Rsrc, &IP);
1998 SplitUsers.insert(&IP);
1999 return {Rsrc, Off};
2000}
2001
2002PtrParts SplitPtrStructs::visitAddrSpaceCastInst(AddrSpaceCastInst &I) {
2003 // TODO(krzysz00): handle casts from ptr addrspace(7) to global pointers
2004 // by computing the effective address.
2005 if (!isSplitFatPtr(I.getType()))
2006 return {nullptr, nullptr};
2007 IRB.SetInsertPoint(&I);
2008 Value *In = I.getPointerOperand();
2009 // No-op casts preserve parts
2010 if (In->getType() == I.getType()) {
2011 auto [Rsrc, Off] = getPtrParts(In);
2012 SplitUsers.insert(&I);
2013 return {Rsrc, Off};
2014 }
2015
2016 auto *ResTy = cast<StructType>(I.getType());
2017 Type *RsrcTy = ResTy->getElementType(0);
2018 Type *OffTy = ResTy->getElementType(1);
2019 Value *ZeroOff = Constant::getNullValue(OffTy);
2020
2021 // Special case for null pointers, undef, and poison, which can be created by
2022 // address space propagation.
2023 auto *InConst = dyn_cast<Constant>(In);
2024 if (InConst && InConst->isNullValue()) {
2025 Value *NullRsrc = Constant::getNullValue(RsrcTy);
2026 SplitUsers.insert(&I);
2027 return {NullRsrc, ZeroOff};
2028 }
2029 if (isa<PoisonValue>(In)) {
2030 Value *PoisonRsrc = PoisonValue::get(RsrcTy);
2031 Value *PoisonOff = PoisonValue::get(OffTy);
2032 SplitUsers.insert(&I);
2033 return {PoisonRsrc, PoisonOff};
2034 }
2035 if (isa<UndefValue>(In)) {
2036 Value *UndefRsrc = UndefValue::get(RsrcTy);
2037 Value *UndefOff = UndefValue::get(OffTy);
2038 SplitUsers.insert(&I);
2039 return {UndefRsrc, UndefOff};
2040 }
2041
2042 if (I.getSrcAddressSpace() != AMDGPUAS::BUFFER_RESOURCE)
2044 "only buffer resources (addrspace 8) and null/poison pointers can be "
2045 "cast to buffer fat pointers (addrspace 7)");
2046 SplitUsers.insert(&I);
2047 return {In, ZeroOff};
2048}
2049
2050PtrParts SplitPtrStructs::visitICmpInst(ICmpInst &Cmp) {
2051 Value *Lhs = Cmp.getOperand(0);
2052 if (!isSplitFatPtr(Lhs->getType()))
2053 return {nullptr, nullptr};
2054 Value *Rhs = Cmp.getOperand(1);
2055 IRB.SetInsertPoint(&Cmp);
2056 ICmpInst::Predicate Pred = Cmp.getPredicate();
2057
2058 assert((Pred == ICmpInst::ICMP_EQ || Pred == ICmpInst::ICMP_NE) &&
2059 "Pointer comparison is only equal or unequal");
2060 auto [LhsRsrc, LhsOff] = getPtrParts(Lhs);
2061 auto [RhsRsrc, RhsOff] = getPtrParts(Rhs);
2062 Value *RsrcCmp =
2063 IRB.CreateICmp(Pred, LhsRsrc, RhsRsrc, Cmp.getName() + ".rsrc");
2064 copyMetadata(RsrcCmp, &Cmp);
2065 Value *OffCmp = IRB.CreateICmp(Pred, LhsOff, RhsOff, Cmp.getName() + ".off");
2066 copyMetadata(OffCmp, &Cmp);
2067
2068 Value *Res = nullptr;
2069 if (Pred == ICmpInst::ICMP_EQ)
2070 Res = IRB.CreateAnd(RsrcCmp, OffCmp);
2071 else if (Pred == ICmpInst::ICMP_NE)
2072 Res = IRB.CreateOr(RsrcCmp, OffCmp);
2073 copyMetadata(Res, &Cmp);
2074 Res->takeName(&Cmp);
2075 SplitUsers.insert(&Cmp);
2076 Cmp.replaceAllUsesWith(Res);
2077 return {nullptr, nullptr};
2078}
2079
2080PtrParts SplitPtrStructs::visitFreezeInst(FreezeInst &I) {
2081 if (!isSplitFatPtr(I.getType()))
2082 return {nullptr, nullptr};
2083 IRB.SetInsertPoint(&I);
2084 auto [Rsrc, Off] = getPtrParts(I.getOperand(0));
2085
2086 Value *RsrcRes = IRB.CreateFreeze(Rsrc, I.getName() + ".rsrc");
2087 copyMetadata(RsrcRes, &I);
2088 Value *OffRes = IRB.CreateFreeze(Off, I.getName() + ".off");
2089 copyMetadata(OffRes, &I);
2090 SplitUsers.insert(&I);
2091 return {RsrcRes, OffRes};
2092}
2093
2094PtrParts SplitPtrStructs::visitExtractElementInst(ExtractElementInst &I) {
2095 if (!isSplitFatPtr(I.getType()))
2096 return {nullptr, nullptr};
2097 IRB.SetInsertPoint(&I);
2098 Value *Vec = I.getVectorOperand();
2099 Value *Idx = I.getIndexOperand();
2100 auto [Rsrc, Off] = getPtrParts(Vec);
2101
2102 Value *RsrcRes = IRB.CreateExtractElement(Rsrc, Idx, I.getName() + ".rsrc");
2103 copyMetadata(RsrcRes, &I);
2104 Value *OffRes = IRB.CreateExtractElement(Off, Idx, I.getName() + ".off");
2105 copyMetadata(OffRes, &I);
2106 SplitUsers.insert(&I);
2107 return {RsrcRes, OffRes};
2108}
2109
2110PtrParts SplitPtrStructs::visitInsertElementInst(InsertElementInst &I) {
2111 // The mutated instructions temporarily don't return vectors, and so
2112 // we need the generic getType() here to avoid crashes.
2113 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
2114 return {nullptr, nullptr};
2115 IRB.SetInsertPoint(&I);
2116 Value *Vec = I.getOperand(0);
2117 Value *Elem = I.getOperand(1);
2118 Value *Idx = I.getOperand(2);
2119 auto [VecRsrc, VecOff] = getPtrParts(Vec);
2120 auto [ElemRsrc, ElemOff] = getPtrParts(Elem);
2121
2122 Value *RsrcRes =
2123 IRB.CreateInsertElement(VecRsrc, ElemRsrc, Idx, I.getName() + ".rsrc");
2124 copyMetadata(RsrcRes, &I);
2125 Value *OffRes =
2126 IRB.CreateInsertElement(VecOff, ElemOff, Idx, I.getName() + ".off");
2127 copyMetadata(OffRes, &I);
2128 SplitUsers.insert(&I);
2129 return {RsrcRes, OffRes};
2130}
2131
2132PtrParts SplitPtrStructs::visitShuffleVectorInst(ShuffleVectorInst &I) {
2133 // Cast is needed for the same reason as insertelement's.
2134 if (!isSplitFatPtr(cast<Instruction>(I).getType()))
2135 return {nullptr, nullptr};
2136 IRB.SetInsertPoint(&I);
2137
2138 Value *V1 = I.getOperand(0);
2139 Value *V2 = I.getOperand(1);
2140 ArrayRef<int> Mask = I.getShuffleMask();
2141 auto [V1Rsrc, V1Off] = getPtrParts(V1);
2142 auto [V2Rsrc, V2Off] = getPtrParts(V2);
2143
2144 Value *RsrcRes =
2145 IRB.CreateShuffleVector(V1Rsrc, V2Rsrc, Mask, I.getName() + ".rsrc");
2146 copyMetadata(RsrcRes, &I);
2147 Value *OffRes =
2148 IRB.CreateShuffleVector(V1Off, V2Off, Mask, I.getName() + ".off");
2149 copyMetadata(OffRes, &I);
2150 SplitUsers.insert(&I);
2151 return {RsrcRes, OffRes};
2152}
2153
2154PtrParts SplitPtrStructs::visitPHINode(PHINode &PHI) {
2155 if (!isSplitFatPtr(PHI.getType()))
2156 return {nullptr, nullptr};
2157 IRB.SetInsertPoint(*PHI.getInsertionPointAfterDef());
2158 // Phi nodes will be handled in post-processing after we've visited every
2159 // instruction. However, instead of just returning {nullptr, nullptr},
2160 // we explicitly create the temporary extractvalue operations that are our
2161 // temporary results so that they end up at the beginning of the block with
2162 // the PHIs.
2163 Value *TmpRsrc = IRB.CreateExtractValue(&PHI, 0, PHI.getName() + ".rsrc");
2164 Value *TmpOff = IRB.CreateExtractValue(&PHI, 1, PHI.getName() + ".off");
2165 Conditionals.push_back(&PHI);
2166 SplitUsers.insert(&PHI);
2167 return {TmpRsrc, TmpOff};
2168}
2169
2170PtrParts SplitPtrStructs::visitSelectInst(SelectInst &SI) {
2171 if (!isSplitFatPtr(SI.getType()))
2172 return {nullptr, nullptr};
2173 IRB.SetInsertPoint(&SI);
2174
2175 Value *Cond = SI.getCondition();
2176 Value *True = SI.getTrueValue();
2177 Value *False = SI.getFalseValue();
2178 auto [TrueRsrc, TrueOff] = getPtrParts(True);
2179 auto [FalseRsrc, FalseOff] = getPtrParts(False);
2180
2181 Value *RsrcRes =
2182 IRB.CreateSelect(Cond, TrueRsrc, FalseRsrc, SI.getName() + ".rsrc", &SI);
2183 copyMetadata(RsrcRes, &SI);
2184 Conditionals.push_back(&SI);
2185 Value *OffRes =
2186 IRB.CreateSelect(Cond, TrueOff, FalseOff, SI.getName() + ".off", &SI);
2187 copyMetadata(OffRes, &SI);
2188 SplitUsers.insert(&SI);
2189 return {RsrcRes, OffRes};
2190}
2191
2192/// Returns true if this intrinsic needs to be removed when it is
2193/// applied to `ptr addrspace(7)` values. Calls to these intrinsics are
2194/// rewritten into calls to versions of that intrinsic on the resource
2195/// descriptor.
2197 switch (IID) {
2198 default:
2199 return false;
2200 case Intrinsic::amdgcn_make_buffer_rsrc:
2201 case Intrinsic::ptrmask:
2202 case Intrinsic::invariant_start:
2203 case Intrinsic::invariant_end:
2204 case Intrinsic::launder_invariant_group:
2205 case Intrinsic::strip_invariant_group:
2206 case Intrinsic::memcpy:
2207 case Intrinsic::memcpy_inline:
2208 case Intrinsic::memmove:
2209 case Intrinsic::memset:
2210 case Intrinsic::memset_inline:
2211 case Intrinsic::experimental_memset_pattern:
2212 case Intrinsic::amdgcn_load_to_lds:
2213 return true;
2214 }
2215}
2216
2217PtrParts SplitPtrStructs::visitIntrinsicInst(IntrinsicInst &I) {
2218 Intrinsic::ID IID = I.getIntrinsicID();
2219 switch (IID) {
2220 default:
2221 break;
2222 case Intrinsic::amdgcn_make_buffer_rsrc: {
2223 if (!isSplitFatPtr(I.getType()))
2224 return {nullptr, nullptr};
2225 Value *Base = I.getArgOperand(0);
2226 Value *Stride = I.getArgOperand(1);
2227 Value *NumRecords = I.getArgOperand(2);
2228 Value *Flags = I.getArgOperand(3);
2229 auto *SplitType = cast<StructType>(I.getType());
2230 Type *RsrcType = SplitType->getElementType(0);
2231 Type *OffType = SplitType->getElementType(1);
2232 IRB.SetInsertPoint(&I);
2233 Value *Rsrc = IRB.CreateIntrinsic(IID, {RsrcType, Base->getType()},
2234 {Base, Stride, NumRecords, Flags});
2235 copyMetadata(Rsrc, &I);
2236 Rsrc->takeName(&I);
2237 Value *Zero = Constant::getNullValue(OffType);
2238 SplitUsers.insert(&I);
2239 return {Rsrc, Zero};
2240 }
2241 case Intrinsic::ptrmask: {
2242 Value *Ptr = I.getArgOperand(0);
2243 if (!isSplitFatPtr(Ptr->getType()))
2244 return {nullptr, nullptr};
2245 Value *Mask = I.getArgOperand(1);
2246 IRB.SetInsertPoint(&I);
2247 auto [Rsrc, Off] = getPtrParts(Ptr);
2248 if (Mask->getType() != Off->getType())
2249 reportFatalUsageError("offset width is not equal to index width of fat "
2250 "pointer (data layout not set up correctly?)");
2251 Value *OffRes = IRB.CreateAnd(Off, Mask, I.getName() + ".off");
2252 copyMetadata(OffRes, &I);
2253 SplitUsers.insert(&I);
2254 return {Rsrc, OffRes};
2255 }
2256 // Pointer annotation intrinsics that, given their object-wide nature
2257 // operate on the resource part.
2258 case Intrinsic::invariant_start: {
2259 Value *Ptr = I.getArgOperand(1);
2260 if (!isSplitFatPtr(Ptr->getType()))
2261 return {nullptr, nullptr};
2262 IRB.SetInsertPoint(&I);
2263 auto [Rsrc, Off] = getPtrParts(Ptr);
2264 Type *NewTy = PointerType::get(I.getContext(), AMDGPUAS::BUFFER_RESOURCE);
2265 auto *NewRsrc = IRB.CreateIntrinsic(IID, {NewTy}, {I.getOperand(0), Rsrc});
2266 copyMetadata(NewRsrc, &I);
2267 NewRsrc->takeName(&I);
2268 SplitUsers.insert(&I);
2269 I.replaceAllUsesWith(NewRsrc);
2270 return {nullptr, nullptr};
2271 }
2272 case Intrinsic::invariant_end: {
2273 Value *RealPtr = I.getArgOperand(2);
2274 if (!isSplitFatPtr(RealPtr->getType()))
2275 return {nullptr, nullptr};
2276 IRB.SetInsertPoint(&I);
2277 Value *RealRsrc = getPtrParts(RealPtr).first;
2278 Value *InvPtr = I.getArgOperand(0);
2279 Value *Size = I.getArgOperand(1);
2280 Value *NewRsrc = IRB.CreateIntrinsic(IID, {RealRsrc->getType()},
2281 {InvPtr, Size, RealRsrc});
2282 copyMetadata(NewRsrc, &I);
2283 NewRsrc->takeName(&I);
2284 SplitUsers.insert(&I);
2285 I.replaceAllUsesWith(NewRsrc);
2286 return {nullptr, nullptr};
2287 }
2288 case Intrinsic::launder_invariant_group:
2289 case Intrinsic::strip_invariant_group: {
2290 Value *Ptr = I.getArgOperand(0);
2291 if (!isSplitFatPtr(Ptr->getType()))
2292 return {nullptr, nullptr};
2293 IRB.SetInsertPoint(&I);
2294 auto [Rsrc, Off] = getPtrParts(Ptr);
2295 Value *NewRsrc = IRB.CreateIntrinsic(IID, {Rsrc->getType()}, {Rsrc});
2296 copyMetadata(NewRsrc, &I);
2297 NewRsrc->takeName(&I);
2298 SplitUsers.insert(&I);
2299 return {NewRsrc, Off};
2300 }
2301 case Intrinsic::amdgcn_load_to_lds: {
2302 Value *Ptr = I.getArgOperand(0);
2303 if (!isSplitFatPtr(Ptr->getType()))
2304 return {nullptr, nullptr};
2305 IRB.SetInsertPoint(&I);
2306 auto [Rsrc, Off] = getPtrParts(Ptr);
2307 Value *LDSPtr = I.getArgOperand(1);
2308 Value *LoadSize = I.getArgOperand(2);
2309 Value *ImmOff = I.getArgOperand(3);
2310 Value *Aux = I.getArgOperand(4);
2311 Value *SOffset = IRB.getInt32(0);
2312 Instruction *NewLoad = IRB.CreateIntrinsic(
2313 Intrinsic::amdgcn_raw_ptr_buffer_load_lds, {},
2314 {Rsrc, LDSPtr, LoadSize, Off, SOffset, ImmOff, Aux});
2315 copyMetadata(NewLoad, &I);
2316 SplitUsers.insert(&I);
2317 I.replaceAllUsesWith(NewLoad);
2318 return {nullptr, nullptr};
2319 }
2320 }
2321 return {nullptr, nullptr};
2322}
2323
2324void SplitPtrStructs::processFunction(Function &F) {
2325 ST = &TM->getSubtarget<GCNSubtarget>(F);
2328 LLVM_DEBUG(dbgs() << "Splitting pointer structs in function: " << F.getName()
2329 << "\n");
2330 for (Instruction *I : Originals) {
2331 auto [Rsrc, Off] = visit(I);
2332 assert(((Rsrc && Off) || (!Rsrc && !Off)) &&
2333 "Can't have a resource but no offset");
2334 if (Rsrc)
2335 RsrcParts[I] = Rsrc;
2336 if (Off)
2337 OffParts[I] = Off;
2338 }
2339 processConditionals();
2340 killAndReplaceSplitInstructions(Originals);
2341
2342 // Clean up after ourselves to save on memory.
2343 RsrcParts.clear();
2344 OffParts.clear();
2345 SplitUsers.clear();
2346 Conditionals.clear();
2347 ConditionalTemps.clear();
2348}
2349
2350namespace {
2351class AMDGPULowerBufferFatPointers : public ModulePass {
2352public:
2353 static char ID;
2354
2355 AMDGPULowerBufferFatPointers() : ModulePass(ID) {}
2356
2357 bool run(Module &M, const TargetMachine &TM);
2358 bool runOnModule(Module &M) override;
2359
2360 void getAnalysisUsage(AnalysisUsage &AU) const override;
2361};
2362} // namespace
2363
2364/// Returns true if there are values that have a buffer fat pointer in them,
2365/// which means we'll need to perform rewrites on this function. As a side
2366/// effect, this will populate the type remapping cache.
2368 BufferFatPtrToStructTypeMap *TypeMap) {
2369 bool HasFatPointers = false;
2370 for (const BasicBlock &BB : F)
2371 for (const Instruction &I : BB) {
2372 HasFatPointers |= (I.getType() != TypeMap->remapType(I.getType()));
2373 // Catch null pointer constants in loads, stores, etc.
2374 for (const Value *V : I.operand_values())
2375 HasFatPointers |= (V->getType() != TypeMap->remapType(V->getType()));
2376 }
2377 return HasFatPointers;
2378}
2379
2381 BufferFatPtrToStructTypeMap *TypeMap) {
2382 Type *Ty = F.getFunctionType();
2383 return Ty != TypeMap->remapType(Ty);
2384}
2385
2386/// Move the body of `OldF` into a new function, returning it.
2388 ValueToValueMapTy &CloneMap) {
2389 bool IsIntrinsic = OldF->isIntrinsic();
2390 Function *NewF =
2391 Function::Create(NewTy, OldF->getLinkage(), OldF->getAddressSpace());
2392 NewF->copyAttributesFrom(OldF);
2393 NewF->copyMetadata(OldF, 0);
2394 NewF->takeName(OldF);
2395 NewF->updateAfterNameChange();
2397 OldF->getParent()->getFunctionList().insertAfter(OldF->getIterator(), NewF);
2398
2399 while (!OldF->empty()) {
2400 BasicBlock *BB = &OldF->front();
2401 BB->removeFromParent();
2402 BB->insertInto(NewF);
2403 CloneMap[BB] = BB;
2404 for (Instruction &I : *BB) {
2405 CloneMap[&I] = &I;
2406 }
2407 }
2408
2410 AttributeList OldAttrs = OldF->getAttributes();
2411
2412 for (auto [I, OldArg, NewArg] : enumerate(OldF->args(), NewF->args())) {
2413 CloneMap[&NewArg] = &OldArg;
2414 NewArg.takeName(&OldArg);
2415 Type *OldArgTy = OldArg.getType(), *NewArgTy = NewArg.getType();
2416 // Temporarily mutate type of `NewArg` to allow RAUW to work.
2417 NewArg.mutateType(OldArgTy);
2418 OldArg.replaceAllUsesWith(&NewArg);
2419 NewArg.mutateType(NewArgTy);
2420
2421 AttributeSet ArgAttr = OldAttrs.getParamAttrs(I);
2422 // Intrinsics get their attributes fixed later.
2423 if (OldArgTy != NewArgTy && !IsIntrinsic)
2424 ArgAttr = ArgAttr.removeAttributes(
2425 NewF->getContext(),
2426 AttributeFuncs::typeIncompatible(NewArgTy, ArgAttr));
2427 ArgAttrs.push_back(ArgAttr);
2428 }
2429 AttributeSet RetAttrs = OldAttrs.getRetAttrs();
2430 if (OldF->getReturnType() != NewF->getReturnType() && !IsIntrinsic)
2431 RetAttrs = RetAttrs.removeAttributes(
2432 NewF->getContext(),
2435 NewF->getContext(), OldAttrs.getFnAttrs(), RetAttrs, ArgAttrs));
2436 return NewF;
2437}
2438
2440 for (Argument &A : F->args())
2441 CloneMap[&A] = &A;
2442 for (BasicBlock &BB : *F) {
2443 CloneMap[&BB] = &BB;
2444 for (Instruction &I : BB)
2445 CloneMap[&I] = &I;
2446 }
2447}
2448
2449bool AMDGPULowerBufferFatPointers::run(Module &M, const TargetMachine &TM) {
2450 bool Changed = false;
2451 const DataLayout &DL = M.getDataLayout();
2452 // Record the functions which need to be remapped.
2453 // The second element of the pair indicates whether the function has to have
2454 // its arguments or return types adjusted.
2456
2457 LLVMContext &Ctx = M.getContext();
2458
2459 BufferFatPtrToStructTypeMap StructTM(DL);
2460 BufferFatPtrToIntTypeMap IntTM(DL);
2461 for (const GlobalVariable &GV : M.globals()) {
2462 if (GV.getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
2463 // FIXME: Use DiagnosticInfo unsupported but it requires a Function
2464 Ctx.emitError("global variables with a buffer fat pointer address "
2465 "space (7) are not supported");
2466 continue;
2467 }
2468
2469 Type *VT = GV.getValueType();
2470 if (VT != StructTM.remapType(VT)) {
2471 // FIXME: Use DiagnosticInfo unsupported but it requires a Function
2472 Ctx.emitError("global variables that contain buffer fat pointers "
2473 "(address space 7 pointers) are unsupported. Use "
2474 "buffer resource pointers (address space 8) instead");
2475 continue;
2476 }
2477 }
2478
2479 {
2480 // Collect all constant exprs and aggregates referenced by any function.
2482 for (Function &F : M.functions())
2483 for (Instruction &I : instructions(F))
2484 for (Value *Op : I.operands())
2485 if (isa<ConstantExpr, ConstantAggregate>(Op))
2486 Worklist.push_back(cast<Constant>(Op));
2487
2488 // Recursively look for any referenced buffer pointer constants.
2490 SetVector<Constant *> BufferFatPtrConsts;
2491 while (!Worklist.empty()) {
2492 Constant *C = Worklist.pop_back_val();
2493 if (!Visited.insert(C).second)
2494 continue;
2495 if (isBufferFatPtrOrVector(C->getType()))
2496 BufferFatPtrConsts.insert(C);
2497 for (Value *Op : C->operands())
2498 if (isa<ConstantExpr, ConstantAggregate>(Op))
2499 Worklist.push_back(cast<Constant>(Op));
2500 }
2501
2502 // Expand all constant expressions using fat buffer pointers to
2503 // instructions.
2505 BufferFatPtrConsts.getArrayRef(), /*RestrictToFunc=*/nullptr,
2506 /*RemoveDeadConstants=*/false, /*IncludeSelf=*/true);
2507 }
2508
2509 StoreFatPtrsAsIntsAndExpandMemcpyVisitor MemOpsRewrite(&IntTM, DL,
2510 M.getContext(), &TM);
2511 LegalizeBufferContentTypesVisitor BufferContentsTypeRewrite(DL,
2512 M.getContext());
2513 for (Function &F : M.functions()) {
2514 bool InterfaceChange = hasFatPointerInterface(F, &StructTM);
2515 bool BodyChanges = containsBufferFatPointers(F, &StructTM);
2516 Changed |= MemOpsRewrite.processFunction(F);
2517 if (InterfaceChange || BodyChanges) {
2518 NeedsRemap.push_back(std::make_pair(&F, InterfaceChange));
2519 Changed |= BufferContentsTypeRewrite.processFunction(F);
2520 }
2521 }
2522 if (NeedsRemap.empty())
2523 return Changed;
2524
2525 SmallVector<Function *> NeedsPostProcess;
2526 SmallVector<Function *> Intrinsics;
2527 // Keep one big map so as to memoize constants across functions.
2528 ValueToValueMapTy CloneMap;
2529 FatPtrConstMaterializer Materializer(&StructTM, CloneMap);
2530
2531 ValueMapper LowerInFuncs(CloneMap, RF_None, &StructTM, &Materializer);
2532 for (auto [F, InterfaceChange] : NeedsRemap) {
2533 Function *NewF = F;
2534 if (InterfaceChange)
2536 F, cast<FunctionType>(StructTM.remapType(F->getFunctionType())),
2537 CloneMap);
2538 else
2539 makeCloneInPraceMap(F, CloneMap);
2540 LowerInFuncs.remapFunction(*NewF);
2541 if (NewF->isIntrinsic())
2542 Intrinsics.push_back(NewF);
2543 else
2544 NeedsPostProcess.push_back(NewF);
2545 if (InterfaceChange) {
2546 F->replaceAllUsesWith(NewF);
2547 F->eraseFromParent();
2548 }
2549 Changed = true;
2550 }
2551 StructTM.clear();
2552 IntTM.clear();
2553 CloneMap.clear();
2554
2555 SplitPtrStructs Splitter(DL, M.getContext(), &TM);
2556 for (Function *F : NeedsPostProcess)
2557 Splitter.processFunction(*F);
2558 for (Function *F : Intrinsics) {
2559 if (isRemovablePointerIntrinsic(F->getIntrinsicID())) {
2560 F->eraseFromParent();
2561 } else {
2562 std::optional<Function *> NewF = Intrinsic::remangleIntrinsicFunction(F);
2563 if (NewF)
2564 F->replaceAllUsesWith(*NewF);
2565 }
2566 }
2567 return Changed;
2568}
2569
2570bool AMDGPULowerBufferFatPointers::runOnModule(Module &M) {
2571 TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
2572 const TargetMachine &TM = TPC.getTM<TargetMachine>();
2573 return run(M, TM);
2574}
2575
2576char AMDGPULowerBufferFatPointers::ID = 0;
2577
2578char &llvm::AMDGPULowerBufferFatPointersID = AMDGPULowerBufferFatPointers::ID;
2579
2580void AMDGPULowerBufferFatPointers::getAnalysisUsage(AnalysisUsage &AU) const {
2582}
2583
2584#define PASS_DESC "Lower buffer fat pointer operations to buffer resources"
2585INITIALIZE_PASS_BEGIN(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC,
2586 false, false)
2588INITIALIZE_PASS_END(AMDGPULowerBufferFatPointers, DEBUG_TYPE, PASS_DESC, false,
2589 false)
2590#undef PASS_DESC
2591
2593 return new AMDGPULowerBufferFatPointers();
2594}
2595
2598 return AMDGPULowerBufferFatPointers().run(M, TM) ? PreservedAnalyses::none()
2600}
@ Poison
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
AMDGPU address space definition.
unsigned Intr
static Function * moveFunctionAdaptingType(Function *OldF, FunctionType *NewTy, ValueToValueMapTy &CloneMap)
Move the body of OldF into a new function, returning it.
static void makeCloneInPraceMap(Function *F, ValueToValueMapTy &CloneMap)
static bool isBufferFatPtrOrVector(Type *Ty)
static bool isSplitFatPtr(Type *Ty)
std::pair< Value *, Value * > PtrParts
static bool hasFatPointerInterface(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
static bool isRemovablePointerIntrinsic(Intrinsic::ID IID)
Returns true if this intrinsic needs to be removed when it is applied to ptr addrspace(7) values.
static bool containsBufferFatPointers(const Function &F, BufferFatPtrToStructTypeMap *TypeMap)
Returns true if there are values that have a buffer fat pointer in them, which means we'll need to pe...
static Value * rsrcPartRoot(Value *V)
Returns the instruction that defines the resource part of the value V.
static constexpr unsigned BufferOffsetWidth
static bool isBufferFatPtrConst(Constant *C)
static std::pair< Constant *, Constant * > splitLoweredFatBufferConst(Constant *C)
Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered buffer fat pointer const...
Rewrite undef for PHI
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Expand Atomic instructions
Atomic ordering constants.
BlockVerifier::State From
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
return RetTy
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
std::string Name
uint64_t Size
AMD GCN specific subclass of TargetSubtarget.
Hexagon Common GEP
static const T * Find(StringRef S, ArrayRef< T > A)
Find KV in array using binary search.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file contains the declarations for metadata subclasses.
static bool processFunction(Function &F, NVPTXTargetMachine &TM)
uint64_t IntrinsicInst * II
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:42
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:44
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:39
const SmallVectorImpl< MachineOperand > & Cond
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file defines generic set operations that may be used on set's of different types,...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition: Debug.h:119
static SymbolRef::Type getType(const Symbol *Sym)
Definition: TapiFile.cpp:39
@ Struct
Target-Independent Code Generator Pass Configuration Options pass.
This pass exposes codegen information to IR-level passes.
Class for arbitrary precision integers.
Definition: APInt.h:78
This class represents a conversion between pointers from one address space to another.
an instruction to allocate memory on the stack
Definition: Instructions.h:64
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:255
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
This class represents an incoming formal argument to a Function.
Definition: Argument.h:32
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
ArrayRef< T > slice(size_t N, size_t M) const
slice(n, m) - Chop off the first N elements of the array, and keep M elements in the array.
Definition: ArrayRef.h:191
An instruction that atomically checks whether a specified value is in a memory location,...
Definition: Instructions.h:506
AtomicOrdering getMergedOrdering() const
Returns a single ordering which is at least as strong as both the success and failure orderings for t...
Definition: Instructions.h:612
bool isVolatile() const
Return true if this is a cmpxchg from a volatile memory location.
Definition: Instructions.h:560
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:549
bool isWeak() const
Return true if this cmpxchg may spuriously fail.
Definition: Instructions.h:567
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this cmpxchg instruction.
Definition: Instructions.h:625
an instruction that atomically reads a memory location, combines it with another value,...
Definition: Instructions.h:709
Align getAlign() const
Return the alignment of the memory that is being allocated by the instruction.
Definition: Instructions.h:843
bool isVolatile() const
Return true if this is a RMW on a volatile memory location.
Definition: Instructions.h:853
@ Add
*p = old + v
Definition: Instructions.h:725
@ FAdd
*p = old + v
Definition: Instructions.h:746
@ USubCond
Subtract only if no unsigned overflow.
Definition: Instructions.h:777
@ FMinimum
*p = minimum(old, v) minimum matches the behavior of llvm.minimum.
Definition: Instructions.h:765
@ Min
*p = old <signed v ? old : v
Definition: Instructions.h:739
@ Or
*p = old | v
Definition: Instructions.h:733
@ Sub
*p = old - v
Definition: Instructions.h:727
@ And
*p = old & v
Definition: Instructions.h:729
@ Xor
*p = old ^ v
Definition: Instructions.h:735
@ USubSat
*p = usub.sat(old, v) usub.sat matches the behavior of llvm.usub.sat.
Definition: Instructions.h:781
@ FMaximum
*p = maximum(old, v) maximum matches the behavior of llvm.maximum.
Definition: Instructions.h:761
@ FSub
*p = old - v
Definition: Instructions.h:749
@ UIncWrap
Increment one up to a maximum value.
Definition: Instructions.h:769
@ Max
*p = old >signed v ? old : v
Definition: Instructions.h:737
@ UMin
*p = old <unsigned v ? old : v
Definition: Instructions.h:743
@ FMin
*p = minnum(old, v) minnum matches the behavior of llvm.minnum.
Definition: Instructions.h:757
@ UMax
*p = old >unsigned v ? old : v
Definition: Instructions.h:741
@ FMax
*p = maxnum(old, v) maxnum matches the behavior of llvm.maxnum.
Definition: Instructions.h:753
@ UDecWrap
Decrement one until a minimum value or zero.
Definition: Instructions.h:773
@ Nand
*p = ~(old & v)
Definition: Instructions.h:731
Value * getPointerOperand()
Definition: Instructions.h:886
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this rmw instruction.
Definition: Instructions.h:877
Value * getValOperand()
Definition: Instructions.h:890
AtomicOrdering getOrdering() const
Returns the ordering constraint of this rmw instruction.
Definition: Instructions.h:863
LLVM_ABI AttributeSet getFnAttrs() const
The function attributes are returned.
static LLVM_ABI AttributeList get(LLVMContext &C, ArrayRef< std::pair< unsigned, Attribute > > Attrs)
Create an AttributeList with the specified parameters in it.
LLVM_ABI AttributeSet getRetAttrs() const
The attributes for the ret value are returned.
LLVM_ABI AttributeSet getParamAttrs(unsigned ArgNo) const
The attributes for the argument or parameter at the given index are returned.
LLVM_ABI AttributeSet removeAttributes(LLVMContext &C, const AttributeMask &AttrsToRemove) const
Remove the specified attributes from this set.
Definition: Attributes.cpp:973
static LLVM_ABI Attribute getWithAlignment(LLVMContext &Context, Align Alignment)
Return a uniquified Attribute object that has the specific alignment set.
Definition: Attributes.cpp:234
LLVM Basic Block Representation.
Definition: BasicBlock.h:62
LLVM_ABI void removeFromParent()
Unlink 'this' from the containing function, but do not delete it.
Definition: BasicBlock.cpp:231
LLVM_ABI void insertInto(Function *Parent, BasicBlock *InsertBefore=nullptr)
Insert unlinked basic block into a function.
Definition: BasicBlock.cpp:158
This class represents a function call, abstracting a target machine's calling convention.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:678
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1380
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
Definition: Constants.cpp:1474
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
Definition: Constants.cpp:1423
This is an important base class in LLVM.
Definition: Constant.h:43
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
Definition: Constants.cpp:373
static LLVM_ABI std::optional< DIExpression * > createFragmentExpression(const DIExpression *Expr, unsigned OffsetInBits, unsigned SizeInBits)
Create a DIExpression to describe one part of an aggregate variable that is fragmented across multipl...
This class represents an Operation in the Expression.
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:63
LLVM_ABI void insertBefore(DbgRecord *InsertBefore)
LLVM_ABI void eraseFromParent()
Record of a variable value-assignment, aka a non instruction representation of the dbg....
void setExpression(DIExpression *NewExpr)
LLVM_ABI void replaceVariableLocationOp(Value *OldValue, Value *NewValue, bool AllowEmpty=false)
A debug info location.
Definition: DebugLoc.h:124
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:177
iterator end()
Definition: DenseMap.h:87
Implements a dense probed hash-table based set.
Definition: DenseSet.h:263
This instruction extracts a single (scalar) element from a VectorType value.
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition: Type.cpp:803
This class represents a freeze function that returns random concrete value if an operand is either a ...
static Function * Create(FunctionType *Ty, LinkageTypes Linkage, unsigned AddrSpace, const Twine &N="", Module *M=nullptr)
Definition: Function.h:166
bool empty() const
Definition: Function.h:857
const BasicBlock & front() const
Definition: Function.h:858
iterator_range< arg_iterator > args()
Definition: Function.h:890
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:352
bool isIntrinsic() const
isIntrinsic - Returns true if the function's name starts with "llvm.".
Definition: Function.h:249
void setAttributes(AttributeList Attrs)
Set the attribute list for this Function.
Definition: Function.h:355
LLVMContext & getContext() const
getContext - Return a reference to the LLVMContext associated with this function.
Definition: Function.cpp:359
void updateAfterNameChange()
Update internal caches that depend on the function name (such as the intrinsic ID and libcall cache).
Definition: Function.cpp:935
Type * getReturnType() const
Returns the type of the ret val.
Definition: Function.h:214
void copyAttributesFrom(const Function *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a Function) from the ...
Definition: Function.cpp:856
static GEPNoWrapFlags noUnsignedWrap()
an instruction for type-safe pointer arithmetic to access elements of arrays and structs
Definition: Instructions.h:949
LLVM_ABI void copyMetadata(const GlobalObject *Src, unsigned Offset)
Copy metadata from Src, adjusting offsets by Offset.
Definition: Metadata.cpp:1840
LinkageTypes getLinkage() const
Definition: GlobalValue.h:548
void setDLLStorageClass(DLLStorageClassTypes C)
Definition: GlobalValue.h:286
unsigned getAddressSpace() const
Definition: GlobalValue.h:207
Module * getParent()
Get the module that this global value is contained inside of...
Definition: GlobalValue.h:663
DLLStorageClassTypes getDLLStorageClass() const
Definition: GlobalValue.h:277
This instruction compares its operands according to the predicate given to the constructor.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2780
This instruction inserts a single (scalar) element into a VectorType value.
InstSimplifyFolder - Use InstructionSimplify to fold operations to existing values.
Base class for instruction visitors.
Definition: InstVisitor.h:78
RetTy visitFreezeInst(FreezeInst &I)
Definition: InstVisitor.h:201
RetTy visitPtrToIntInst(PtrToIntInst &I)
Definition: InstVisitor.h:185
RetTy visitExtractElementInst(ExtractElementInst &I)
Definition: InstVisitor.h:192
RetTy visitMemCpyInst(MemCpyInst &I)
Definition: InstVisitor.h:207
RetTy visitIntrinsicInst(IntrinsicInst &I)
Definition: InstVisitor.h:214
RetTy visitShuffleVectorInst(ShuffleVectorInst &I)
Definition: InstVisitor.h:194
RetTy visitAtomicCmpXchgInst(AtomicCmpXchgInst &I)
Definition: InstVisitor.h:171
RetTy visitIntToPtrInst(IntToPtrInst &I)
Definition: InstVisitor.h:187
RetTy visitPHINode(PHINode &I)
Definition: InstVisitor.h:175
RetTy visitStoreInst(StoreInst &I)
Definition: InstVisitor.h:170
RetTy visitInsertElementInst(InsertElementInst &I)
Definition: InstVisitor.h:193
RetTy visitMemMoveInst(MemMoveInst &I)
Definition: InstVisitor.h:208
RetTy visitAtomicRMWInst(AtomicRMWInst &I)
Definition: InstVisitor.h:172
RetTy visitAddrSpaceCastInst(AddrSpaceCastInst &I)
Definition: InstVisitor.h:189
RetTy visitAllocaInst(AllocaInst &I)
Definition: InstVisitor.h:168
RetTy visitICmpInst(ICmpInst &I)
Definition: InstVisitor.h:166
RetTy visitPtrToAddrInst(PtrToAddrInst &I)
Definition: InstVisitor.h:186
RetTy visitMemSetPatternInst(MemSetPatternInst &I)
Definition: InstVisitor.h:204
RetTy visitMemSetInst(MemSetInst &I)
Definition: InstVisitor.h:203
RetTy visitSelectInst(SelectInst &I)
Definition: InstVisitor.h:190
RetTy visitGetElementPtrInst(GetElementPtrInst &I)
Definition: InstVisitor.h:174
void visitInstruction(Instruction &I)
Definition: InstVisitor.h:275
RetTy visitLoadInst(LoadInst &I)
Definition: InstVisitor.h:169
LLVM_ABI Instruction * clone() const
Create a copy of 'this' instruction that is identical in all ways except the following:
LLVM_ABI void setAAMetadata(const AAMDNodes &N)
Sets the AA metadata on this instruction from the AAMDNodes structure.
Definition: Metadata.cpp:1804
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
LLVM_ABI const Function * getFunction() const
Return the function this instruction belongs to.
Definition: Instruction.cpp:82
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
Definition: Instruction.h:428
LLVM_ABI AAMDNodes getAAMetadata() const
Returns the AA metadata for this instruction.
Definition: Metadata.cpp:1789
LLVM_ABI const DataLayout & getDataLayout() const
Get the data layout of the module this instruction belongs to.
Definition: Instruction.cpp:86
This class represents a cast from an integer to a pointer.
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition: Type.cpp:319
A wrapper class for inspecting calls to intrinsic functions.
Definition: IntrinsicInst.h:49
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:68
LLVM_ABI void emitError(const Instruction *I, const Twine &ErrorStr)
emitError - Emit an error message to the currently installed error handler with optional location inf...
An instruction for reading from memory.
Definition: Instructions.h:180
unsigned getPointerAddressSpace() const
Returns the address space of the pointer operand.
Definition: Instructions.h:265
Value * getPointerOperand()
Definition: Instructions.h:259
bool isVolatile() const
Return true if this is a load from a volatile memory location.
Definition: Instructions.h:209
void setAtomic(AtomicOrdering Ordering, SyncScope::ID SSID=SyncScope::System)
Sets the ordering constraint and the synchronization scope ID of this load instruction.
Definition: Instructions.h:245
AtomicOrdering getOrdering() const
Returns the ordering constraint of this load instruction.
Definition: Instructions.h:224
Type * getPointerOperandType() const
Definition: Instructions.h:262
void setVolatile(bool V)
Specify whether this is a volatile load or not.
Definition: Instructions.h:212
SyncScope::ID getSyncScopeID() const
Returns the synchronization scope ID of this load instruction.
Definition: Instructions.h:234
Align getAlign() const
Return the alignment of the access that is being performed.
Definition: Instructions.h:215
This class wraps the llvm.memcpy intrinsic.
unsigned getDestAddressSpace() const
This class wraps the llvm.memmove intrinsic.
This class wraps the llvm.memset and llvm.memset.inline intrinsics.
This class wraps the llvm.experimental.memset.pattern intrinsic.
unsigned getSourceAddressSpace() const
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:255
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:67
const FunctionListType & getFunctionList() const
Get the Module's list of functions (constant).
Definition: Module.h:596
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:303
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:112
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1885
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:118
This class represents a cast from a pointer to an address (non-capturing ptrtoint).
Value * getPointerOperand()
Gets the pointer operand.
This class represents a cast from a pointer to an integer.
Value * getPointerOperand()
Gets the pointer operand.
This class represents the LLVM 'select' instruction.
A vector that has set insertion semantics.
Definition: SetVector.h:59
ArrayRef< value_type > getArrayRef() const
Definition: SetVector.h:90
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition: SetVector.h:168
This instruction constructs a fixed permutation of two input vectors.
A templated base class for SmallPtrSet which provides the typesafe interface that is common across al...
Definition: SmallPtrSet.h:380
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:401
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:541
SmallString - A SmallString is just a SmallVector with methods and accessors that make it work better...
Definition: SmallString.h:26
bool empty() const
Definition: SmallVector.h:82
size_t size() const
Definition: SmallVector.h:79
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:574
void push_back(const T &Elt)
Definition: SmallVector.h:414
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197
An instruction for storing to memory.
Definition: Instructions.h:296
Align getAlign() const
Definition: Instructions.h:338
Value * getValueOperand()
Definition: Instructions.h:383
Value * getPointerOperand()
Definition: Instructions.h:386
Used to lazily calculate structure layout information for a target machine, based on the DataLayout s...
Definition: DataLayout.h:626
MutableArrayRef< TypeSize > getMemberOffsets()
Definition: DataLayout.h:649
Class to represent struct types.
Definition: DerivedTypes.h:218
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition: Type.cpp:414
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:620
bool isLiteral() const
Return true if this type is uniqued by structural equivalence, false if it is a struct definition.
Definition: DerivedTypes.h:290
Type * getElementType(unsigned N) const
Definition: DerivedTypes.h:369
Primary interface to the complete machine description for the target machine.
Definition: TargetMachine.h:83
Target-Independent Code Generator Pass Configuration Options.
TMC & getTM() const
Get the right type of TargetMachine for this target.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:82
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
bool isVectorTy() const
True if this is an instance of VectorType.
Definition: Type.h:273
static LLVM_ABI IntegerType * getInt8Ty(LLVMContext &C)
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Type * getArrayElementType() const
Definition: Type.h:408
static LLVM_ABI IntegerType * getIntNTy(LLVMContext &C, unsigned N)
ArrayRef< Type * > subtypes() const
Definition: Type.h:365
bool isSingleValueType() const
Return true if the type is a valid type for a register in codegen.
Definition: Type.h:296
unsigned getNumContainedTypes() const
Return the number of types in the derived type.
Definition: Type.h:387
LLVM_ABI Type * getWithNewBitWidth(unsigned NewBitWidth) const
Given an integer or vector type, change the lane bitwidth to NewBitwidth, whilst keeping the old numb...
LLVMContext & getContext() const
Return the LLVMContext in which this type was uniqued.
Definition: Type.h:128
static LLVM_ABI IntegerType * getInt16Ty(LLVMContext &C)
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition: Type.h:240
Type * getContainedType(unsigned i) const
This method is used to implement the type iterator (defined at the end of the file).
Definition: Type.h:381
LLVM_ABI unsigned getIntegerBitWidth() const
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition: Type.h:352
LLVM_ABI Type * getWithNewType(Type *EltTy) const
Given vector type, change the element type, whilst keeping the old number of elements.
static LLVM_ABI UndefValue * get(Type *T)
Static factory methods - Return an 'undef' object of the specified type.
Definition: Constants.cpp:1866
A Use represents the edge between a Value definition and its users.
Definition: Use.h:35
void setOperand(unsigned i, Value *Val)
Definition: User.h:237
Value * getOperand(unsigned i) const
Definition: User.h:232
static LLVM_ABI ValueAsMetadata * get(Value *V)
Definition: Metadata.cpp:502
This is a class that can be implemented by clients to remap types when cloning constants and instruct...
Definition: ValueMapper.h:45
virtual Type * remapType(Type *SrcTy)=0
The client should implement this method if they want to remap types while mapping values.
void clear()
Definition: ValueMap.h:149
Context for (re-)mapping values (and metadata).
Definition: ValueMapper.h:163
This is a class that can be implemented by clients to materialize Values on demand.
Definition: ValueMapper.h:58
virtual Value * materialize(Value *V)=0
This method can be implemented to generate a mapped Value on demand.
LLVM Value Representation.
Definition: Value.h:75
Type * getType() const
All values are typed, get the type of this value.
Definition: Value.h:256
LLVM_ABI void replaceAllUsesWith(Value *V)
Change all uses of this to point to a new Value.
Definition: Value.cpp:546
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:322
LLVM_ABI void takeName(Value *V)
Transfer the name from V to this value.
Definition: Value.cpp:396
Value handle that is nullable, but tries to track the Value.
Definition: ValueHandle.h:205
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:203
self_iterator getIterator()
Definition: ilist_node.h:134
iterator insertAfter(iterator where, pointer New)
Definition: ilist.h:174
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ BUFFER_FAT_POINTER
Address space for 160-bit buffer fat pointers.
@ BUFFER_RESOURCE
Address space for 128-bit buffer resources.
constexpr char Args[]
Key for Kernel::Metadata::mArgs.
LLVM_ABI AttributeMask typeIncompatible(Type *Ty, AttributeSet AS, AttributeSafetyKind ASK=ASK_ALL)
Which attributes cannot be applied to a type.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
Definition: BitmaskEnum.h:126
@ Entry
Definition: COFF.h:862
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
LLVM_ABI std::optional< Function * > remangleIntrinsicFunction(Function *F)
bool match(Val *V, const Pattern &P)
Definition: PatternMatch.h:49
is_zero m_Zero()
Match any null constant or a vector with all elements equal to 0.
Definition: PatternMatch.h:612
SmallVector< DbgVariableRecord * > getDVRAssignmentMarkers(const Instruction *Inst)
Return a range of dbg_assign records for which Inst performs the assignment they encode.
Definition: DebugInfo.h:201
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
@ FalseVal
Definition: TGLexer.h:59
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:477
@ Length
Definition: DWP.cpp:477
detail::zippy< detail::zip_shortest, T, U, Args... > zip(T &&t, U &&u, Args &&...args)
zip iterator for two or more iteratable types.
Definition: STLExtras.h:860
LLVM_ABI void findDbgValues(Value *V, SmallVectorImpl< DbgVariableRecord * > &DbgVariableRecords)
Finds the dbg.values describing a value.
Definition: DebugInfo.cpp:124
ModulePass * createAMDGPULowerBufferFatPointersPass()
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition: STLExtras.h:2491
LLVM_ABI void expandMemSetPatternAsLoop(MemSetPatternInst *MemSet)
Expand MemSetPattern as a loop. MemSet is not deleted.
LLVM_ABI void copyMetadataForLoad(LoadInst &Dest, const LoadInst &Source)
Copy the metadata from the source instruction to the destination (the replacement for the source inst...
Definition: Local.cpp:3090
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:663
LLVM_ABI bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts, Function *RestrictToFunc=nullptr, bool RemoveDeadConstants=true, bool IncludeSelf=false)
Replace constant expressions users of the given constants with instructions.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1751
LLVM_ABI Value * emitGEPOffset(IRBuilderBase *Builder, const DataLayout &DL, User *GEP, bool NoAssumptions=false)
Given a getelementptr instruction/constantexpr, emit the code necessary to compute the offset from th...
Definition: Local.cpp:22
constexpr bool isPowerOf2_32(uint32_t Value)
Return true if the argument is a power of two > 0.
Definition: MathExtras.h:288
@ RF_None
Definition: ValueMapper.h:75
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:207
SmallVector< ValueTypeFromRangeType< R >, Size > to_vector(R &&Range)
Given a range of type R, iterate the entire range and return a SmallVector with elements of the vecto...
Definition: SmallVector.h:1300
char & AMDGPULowerBufferFatPointersID
AtomicOrdering
Atomic ordering for LLVM's memory model.
S1Ty set_difference(const S1Ty &S1, const S2Ty &S2)
set_difference(A, B) - Return A - B
Definition: SetOperations.h:93
iterator_range< pointer_iterator< WrappedIteratorT > > make_pointer_range(RangeT &&Range)
Definition: iterator.h:363
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
LLVM_ABI void expandMemCpyAsLoop(MemCpyInst *MemCpy, const TargetTransformInfo &TTI, ScalarEvolution *SE=nullptr)
Expand MemCpy as a loop. MemCpy is not deleted.
LLVM_ABI void expandMemSetAsLoop(MemSetInst *MemSet)
Expand MemSet as a loop. MemSet is not deleted.
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition: Error.cpp:180
A collection of metadata nodes that might be associated with a memory access used by the alias-analys...
Definition: Metadata.h:760
LLVM_ABI AAMDNodes adjustForAccess(unsigned AccessSize)
Create a new AAMDNode for accessing AccessSize bytes of this AAMDNode.
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39