LLVM: lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp Source File

//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//

//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//

//===----------------------------------------------------------------------===//

//

// This pass eliminates local data store, LDS, uses from non-kernel functions.

// LDS is contiguous memory allocated per kernel execution.

//

// Background.

//

// The programming model is global variables, or equivalently function local

// static variables, accessible from kernels or other functions. For uses from

// kernels this is straightforward - assign an integer to the kernel for the

// memory required by all the variables combined, allocate them within that.

// For uses from functions there are performance tradeoffs to choose between.

//

// This model means the GPU runtime can specify the amount of memory allocated.

// If this is more than the kernel assumed, the excess can be made available

// using a language specific feature, which IR represents as a variable with

// no initializer. This feature is referred to here as "Dynamic LDS" and is

// lowered slightly differently to the normal case.

//

// Consequences of this GPU feature:

// - memory is limited and exceeding it halts compilation

// - a global accessed by one kernel exists independent of other kernels

// - a global exists independent of simultaneous execution of the same kernel

// - the address of the global may be different from different kernels as they

//   do not alias, which permits only allocating variables they use

// - if the address is allowed to differ, functions need help to find it

//

// Uses from kernels are implemented here by grouping them in a per-kernel

// struct instance. This duplicates the variables, accurately modelling their

// aliasing properties relative to a single global representation. It also

// permits control over alignment via padding.

//

// Uses from functions are more complicated and the primary purpose of this

// IR pass. Several different lowering are chosen between to meet requirements

// to avoid allocating any LDS where it is not necessary, as that impacts

// occupancy and may fail the compilation, while not imposing overhead on a

// feature whose primary advantage over global memory is performance. The basic

// design goal is to avoid one kernel imposing overhead on another.

//

// Implementation.

//

// LDS variables with constant annotation or non-undef initializer are passed

// through unchanged for simplification or error diagnostics in later passes.

// Non-undef initializers are not yet implemented for LDS.

//

// LDS variables that are always allocated at the same address can be found

// by lookup at that address. Otherwise runtime information/cost is required.

//

// The simplest strategy possible is to group all LDS variables in a single

// struct and allocate that struct in every kernel such that the original

// variables are always at the same address. LDS is however a limited resource

// so this strategy is unusable in practice. It is not implemented here.

//

// Strategy | Precise allocation | Zero runtime cost | General purpose |

//  --------+--------------------+-------------------+-----------------+

//   Module |                 No |               Yes |             Yes |

//    Table |                Yes |                No |             Yes |

//   Kernel |                Yes |               Yes |              No |

//   Hybrid |                Yes |           Partial |             Yes |

//

// "Module" spends LDS memory to save cycles. "Table" spends cycles and global

// memory to save LDS. "Kernel" is as fast as kernel allocation but only works

// for variables that are known reachable from a single kernel. "Hybrid" picks

// between all three. When forced to choose between LDS and cycles we minimise

// LDS use.


// The "module" lowering implemented here finds LDS variables which are used by

// non-kernel functions and creates a new struct with a field for each of those

// LDS variables. Variables that are only used from kernels are excluded.

//

// The "table" lowering implemented here has three components.

// First kernels are assigned a unique integer identifier which is available in

// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer

// is passed through a specific SGPR, thus works with indirect calls.

// Second, each kernel allocates LDS variables independent of other kernels and

// writes the addresses it chose for each variable into an array in consistent

// order. If the kernel does not allocate a given variable, it writes undef to

// the corresponding array location. These arrays are written to a constant

// table in the order matching the kernel unique integer identifier.

// Third, uses from non-kernel functions are replaced with a table lookup using

// the intrinsic function to find the address of the variable.

//

// "Kernel" lowering is only applicable for variables that are unambiguously

// reachable from exactly one kernel. For those cases, accesses to the variable

// can be lowered to ConstantExpr address of a struct instance specific to that

// one kernel. This is zero cost in space and in compute. It will raise a fatal

// error on any variable that might be reachable from multiple kernels and is

// thus most easily used as part of the hybrid lowering strategy.

//

// Hybrid lowering is a mixture of the above. It uses the zero cost kernel

// lowering where it can. It lowers the variable accessed by the greatest

// number of kernels using the module strategy as that is free for the first

// variable. Any futher variables that can be lowered with the module strategy

// without incurring LDS memory overhead are. The remaining ones are lowered

// via table.

//

// Consequences

// - No heuristics or user controlled magic numbers, hybrid is the right choice

// - Kernels that don't use functions (or have had them all inlined) are not

//   affected by any lowering for kernels that do.

// - Kernels that don't make indirect function calls are not affected by those

//   that do.

// - Variables which are used by lots of kernels, e.g. those injected by a

//   language runtime in most kernels, are expected to have no overhead

// - Implementations that instantiate templates per-kernel where those templates

//   use LDS are expected to hit the "Kernel" lowering strategy

// - The runtime properties impose a cost in compiler implementation complexity

//

// Dynamic LDS implementation

// Dynamic LDS is lowered similarly to the "table" strategy above and uses the

// same intrinsic to identify which kernel is at the root of the dynamic call

// graph. This relies on the specified behaviour that all dynamic LDS variables

// alias one another, i.e. are at the same address, with respect to a given

// kernel. Therefore this pass creates new dynamic LDS variables for each kernel

// that allocates any dynamic LDS and builds a table of addresses out of those.

// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.

// The corresponding optimisation for "kernel" lowering where the table lookup

// is elided is not implemented.

//

//

// Implementation notes / limitations

// A single LDS global variable represents an instance per kernel that can reach

// said variables. This pass essentially specialises said variables per kernel.

// Handling ConstantExpr during the pass complicated this significantly so now

// all ConstantExpr uses of LDS variables are expanded to instructions. This

// may need amending when implementing non-undef initialisers.

//

// Lowering is split between this IR pass and the back end. This pass chooses

// where given variables should be allocated and marks them with metadata,

// MD_absolute_symbol. The backend places the variables in coincidentally the

// same location and raises a fatal error if something has gone awry. This works

// in practice because the only pass between this one and the backend that

// changes LDS is PromoteAlloca and the changes it makes do not conflict.

//

// Addresses are written to constant global arrays based on the same metadata.

//

// The backend lowers LDS variables in the order of traversal of the function.

// This is at odds with the deterministic layout required. The workaround is to

// allocate the fixed-address variables immediately upon starting the function

// where they can be placed as intended. This requires a means of mapping from

// the function to the variables that it allocates. For the module scope lds,

// this is via metadata indicating whether the variable is not required. If a

// pass deletes that metadata, a fatal error on disagreement with the absolute

// symbol metadata will occur. For kernel scope and dynamic, this is by _name_

// correspondence between the function and the variable. It requires the

// kernel to have a name (which is only a limitation for tests in practice) and

// for nothing to rename the corresponding symbols. This is a hazard if the pass

// is run multiple times during debugging. Alternative schemes considered all

// involve bespoke metadata.

//

// If the name correspondence can be replaced, multiple distinct kernels that

// have the same memory layout can map to the same kernel id (as the address

// itself is handled by the absolute symbol metadata) and that will allow more

// uses of the "kernel" style faster lowering and reduce the size of the lookup

// tables.

//

// There is a test that checks this does not fire for a graphics shader. This

// lowering is expected to work for graphics if the isKernel test is changed.

//

// The current markUsedByKernel is sufficient for PromoteAlloca but is elided

// before codegen. Replacing this with an equivalent intrinsic which lasts until

// shortly after the machine function lowering of LDS would help break the name

// mapping. The other part needed is probably to amend PromoteAlloca to embed

// the LDS variables it creates in the same struct created here. That avoids the

// current hazard where a PromoteAlloca LDS variable might be allocated before

// the kernel scope (and thus error on the address check). Given a new invariant

// that no LDS variables exist outside of the structs managed here, and an

// intrinsic that lasts until after the LDS frame lowering, it should be

// possible to drop the name mapping and fold equivalent memory layouts.

//

//===----------------------------------------------------------------------===//


#include "AMDGPU.h"

#include "AMDGPUMemoryUtils.h"

#include "AMDGPUTargetMachine.h"

#include "Utils/AMDGPUBaseInfo.h"

#include "llvm/ADT/BitVector.h"

#include "llvm/ADT/DenseMap.h"

#include "llvm/ADT/DenseSet.h"

#include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SetOperations.h"

#include "llvm/Analysis/CallGraph.h"

#include "llvm/Analysis/ScopedNoAliasAA.h"

#include "llvm/CodeGen/TargetPassConfig.h"

#include "llvm/IR/Constants.h"

#include "llvm/IR/DerivedTypes.h"

#include "llvm/IR/Dominators.h"

#include "llvm/IR/IRBuilder.h"

#include "llvm/IR/InlineAsm.h"

#include "llvm/IR/Instructions.h"

#include "llvm/IR/IntrinsicsAMDGPU.h"

#include "llvm/IR/MDBuilder.h"

#include "llvm/IR/ReplaceConstant.h"

#include "llvm/InitializePasses.h"

#include "llvm/Pass.h"

#include "llvm/Support/CommandLine.h"

#include "llvm/Support/Debug.h"

#include "llvm/Support/Format.h"

#include "llvm/Support/OptimizedStructLayout.h"

#include "llvm/Support/raw_ostream.h"

#include "llvm/Transforms/Utils/BasicBlockUtils.h"

#include "llvm/Transforms/Utils/ModuleUtils.h"


#include <vector>


#include <cstdio>


#define DEBUG_TYPE "amdgpu-lower-module-lds"


using namespace llvm;

using namespace AMDGPU;


namespace {


cl::opt<bool> SuperAlignLDSGlobals(

    "amdgpu-super-align-lds-globals",

    cl::desc("Increase alignment of LDS if it is not on align boundary"),

    cl::init(true), cl::Hidden);


enum class LoweringKind { module, table, kernel, hybrid };

cl::opt<LoweringKind> LoweringKindLoc(

    "amdgpu-lower-module-lds-strategy",

    cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,

    cl::init(LoweringKind::hybrid),

    cl::values(

        clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),

        clEnumValN(LoweringKind::module, "module", "Lower via module struct"),

        clEnumValN(

            LoweringKind::kernel, "kernel",

            "Lower variables reachable from one kernel, otherwise abort"),

        clEnumValN(LoweringKind::hybrid, "hybrid",

                   "Lower via mixture of above strategies")));


template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {

  llvm::sort(V, [](const auto *L, const auto *R) {

    return L->getName() < R->getName();

  });

  return {std::move(V)};

}


class AMDGPULowerModuleLDS {

  const AMDGPUTargetMachine &TM;


  static void

  removeLocalVarsFromUsedLists(Module &M,

                               const DenseSet<GlobalVariable *> &LocalVars) {

    // The verifier rejects used lists containing an inttoptr of a constant

    // so remove the variables from these lists before replaceAllUsesWith

    SmallPtrSet<Constant *, 8> LocalVarsSet;

    for (GlobalVariable *LocalVar : LocalVars)

      LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));


    removeFromUsedLists(

        M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });


    for (GlobalVariable *LocalVar : LocalVars)

      LocalVar->removeDeadConstantUsers();

  }


  static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {

    // The llvm.amdgcn.module.lds instance is implicitly used by all kernels

    // that might call a function which accesses a field within it. This is

    // presently approximated to 'all kernels' if there are any such functions

    // in the module. This implicit use is redefined as an explicit use here so

    // that later passes, specifically PromoteAlloca, account for the required

    // memory without any knowledge of this transform.


    // An operand bundle on llvm.donothing works because the call instruction

    // survives until after the last pass that needs to account for LDS. It is

    // better than inline asm as the latter survives until the end of codegen. A

    // totally robust solution would be a function with the same semantics as

    // llvm.donothing that takes a pointer to the instance and is lowered to a

    // no-op after LDS is allocated, but that is not presently necessary.


    // This intrinsic is eliminated shortly before instruction selection. It

    // does not suffice to indicate to ISel that a given global which is not

    // immediately used by the kernel must still be allocated by it. An

    // equivalent target specific intrinsic which lasts until immediately after

    // codegen would suffice for that, but one would still need to ensure that

    // the variables are allocated in the anticipated order.

    BasicBlock *Entry = &Func->getEntryBlock();

    IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());


    Function *Decl = Intrinsic::getOrInsertDeclaration(

        Func->getParent(), Intrinsic::donothing, {});


    Value *UseInstance[1] = {

        Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};


    Builder.CreateCall(

        Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});

  }


public:

  AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}


  struct LDSVariableReplacement {

    GlobalVariable *SGV = nullptr;

    DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;

  };


  // remap from lds global to a constantexpr gep to where it has been moved to

  // for each kernel

  // an array with an element for each kernel containing where the corresponding

  // variable was remapped to


  static Constant *getAddressesOfVariablesInKernel(

      LLVMContext &Ctx, ArrayRef<GlobalVariable *> Variables,

      const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {

    // Create a ConstantArray containing the address of each Variable within the

    // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel

    // does not allocate it

    // TODO: Drop the ptrtoint conversion


    Type *I32 = Type::getInt32Ty(Ctx);


    ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());


    SmallVector<Constant *> Elements;

    for (GlobalVariable *GV : Variables) {

      auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);

      if (ConstantGepIt != LDSVarsToConstantGEP.end()) {

        auto *elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);

        Elements.push_back(elt);

      } else {

        Elements.push_back(PoisonValue::get(I32));

      }

    }

    return ConstantArray::get(KernelOffsetsType, Elements);

  }


  static GlobalVariable *buildLookupTable(

      Module &M, ArrayRef<GlobalVariable *> Variables,

      ArrayRef<Function *> kernels,

      DenseMap<Function *, LDSVariableReplacement> &KernelToReplacement) {

    if (Variables.empty()) {

      return nullptr;

    }

    LLVMContext &Ctx = M.getContext();


    const size_t NumberVariables = Variables.size();

    const size_t NumberKernels = kernels.size();


    ArrayType *KernelOffsetsType =

        ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);


    ArrayType *AllKernelsOffsetsType =

        ArrayType::get(KernelOffsetsType, NumberKernels);


    Constant *Missing = PoisonValue::get(KernelOffsetsType);

    std::vector<Constant *> overallConstantExprElts(NumberKernels);

    for (size_t i = 0; i < NumberKernels; i++) {

      auto Replacement = KernelToReplacement.find(kernels[i]);

      overallConstantExprElts[i] =

          (Replacement == KernelToReplacement.end())

              ? Missing

              : getAddressesOfVariablesInKernel(

                    Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);

    }


    Constant *init =

        ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);


    return new GlobalVariable(

        M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,

        "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,

        AMDGPUAS::CONSTANT_ADDRESS);

  }


  void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,

                                 GlobalVariable *LookupTable,

                                 GlobalVariable *GV, Use &U,

                                 Value *OptionalIndex) {

    // Table is a constant array of the same length as OrderedKernels

    LLVMContext &Ctx = M.getContext();

    Type *I32 = Type::getInt32Ty(Ctx);

    auto *I = cast<Instruction>(U.getUser());


    Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());


    if (auto *Phi = dyn_cast<PHINode>(I)) {

      BasicBlock *BB = Phi->getIncomingBlock(U);

      Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));

    } else {

      Builder.SetInsertPoint(I);

    }


    SmallVector<Value *, 3> GEPIdx = {

        ConstantInt::get(I32, 0),

        tableKernelIndex,

    };

    if (OptionalIndex)

      GEPIdx.push_back(OptionalIndex);


    Value *Address = Builder.CreateInBoundsGEP(

        LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());


    Value *loaded = Builder.CreateLoad(I32, Address);


    Value *replacement =

        Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());


    U.set(replacement);

  }


  void replaceUsesInInstructionsWithTableLookup(

      Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,

      GlobalVariable *LookupTable) {


    LLVMContext &Ctx = M.getContext();

    IRBuilder<> Builder(Ctx);

    Type *I32 = Type::getInt32Ty(Ctx);


    for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {

      auto *GV = ModuleScopeVariables[Index];


      for (Use &U : make_early_inc_range(GV->uses())) {

        auto *I = dyn_cast<Instruction>(U.getUser());

        if (!I)

          continue;


        replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,

                                  ConstantInt::get(I32, Index));

      }

    }

  }


  static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(

      Module &M, LDSUsesInfoTy &LDSUsesInfo,

      DenseSet<GlobalVariable *> const &VariableSet) {


    DenseSet<Function *> KernelSet;


    if (VariableSet.empty())

      return KernelSet;


    for (Function &Func : M.functions()) {

      if (Func.isDeclaration() || !isKernelLDS(&Func))

        continue;

      for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {

        if (VariableSet.contains(GV)) {

          KernelSet.insert(&Func);

          break;

        }

      }

    }


    return KernelSet;

  }


  static GlobalVariable *

  chooseBestVariableForModuleStrategy(const DataLayout &DL,

                                      VariableFunctionMap &LDSVars) {

    // Find the global variable with the most indirect uses from kernels


    struct CandidateTy {

      GlobalVariable *GV = nullptr;

      size_t UserCount = 0;

      size_t Size = 0;


      CandidateTy() = default;


      CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)

          : GV(GV), UserCount(UserCount), Size(AllocSize) {}


      bool operator<(const CandidateTy &Other) const {

        // Fewer users makes module scope variable less attractive

        if (UserCount < Other.UserCount) {

          return true;

        }

        if (UserCount > Other.UserCount) {

          return false;

        }


        // Bigger makes module scope variable less attractive

        if (Size < Other.Size) {

          return false;

        }


        if (Size > Other.Size) {

          return true;

        }


        // Arbitrary but consistent

        return GV->getName() < Other.GV->getName();

      }

    };


    CandidateTy MostUsed;


    for (auto &K : LDSVars) {

      GlobalVariable *GV = K.first;

      if (K.second.size() <= 1) {

        // A variable reachable by only one kernel is best lowered with kernel

        // strategy

        continue;

      }

      CandidateTy Candidate(

          GV, K.second.size(),

          DL.getTypeAllocSize(GV->getValueType()).getFixedValue());

      if (MostUsed < Candidate)

        MostUsed = Candidate;

    }


    return MostUsed.GV;

  }


  static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,

                                       uint32_t Address) {

    // Write the specified address into metadata where it can be retrieved by

    // the assembler. Format is a half open range, [Address Address+1)

    LLVMContext &Ctx = M->getContext();

    auto *IntTy =

        M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);

    auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));

    auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));

    GV->setMetadata(LLVMContext::MD_absolute_symbol,

                    MDNode::get(Ctx, {MinC, MaxC}));

  }


  DenseMap<Function *, Value *> tableKernelIndexCache;

  Value *getTableLookupKernelIndex(Module &M, Function *F) {

    // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which

    // lowers to a read from a live in register. Emit it once in the entry

    // block to spare deduplicating it later.

    auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);

    if (Inserted) {

      auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();

      IRBuilder<> Builder(&*InsertAt);


      It->second = Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {});

    }


    return It->second;

  }


  static std::vector<Function *> assignLDSKernelIDToEachKernel(

      Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,

      DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {

    // Associate kernels in the set with an arbitrary but reproducible order and

    // annotate them with that order in metadata. This metadata is recognised by

    // the backend and lowered to a SGPR which can be read from using

    // amdgcn_lds_kernel_id.


    std::vector<Function *> OrderedKernels;

    if (!KernelsThatAllocateTableLDS.empty() ||

        !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {


      for (Function &Func : M->functions()) {

        if (Func.isDeclaration())

          continue;

        if (!isKernelLDS(&Func))

          continue;


        if (KernelsThatAllocateTableLDS.contains(&Func) ||

            KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {

          assert(Func.hasName()); // else fatal error earlier

          OrderedKernels.push_back(&Func);

        }

      }


      // Put them in an arbitrary but reproducible order

      OrderedKernels = sortByName(std::move(OrderedKernels));


      // Annotate the kernels with their order in this vector

      LLVMContext &Ctx = M->getContext();

      IRBuilder<> Builder(Ctx);


      if (OrderedKernels.size() > UINT32_MAX) {

        // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU

        reportFatalUsageError("unimplemented LDS lowering for > 2**32 kernels");

      }


      for (size_t i = 0; i < OrderedKernels.size(); i++) {

        Metadata *AttrMDArgs[1] = {

            ConstantAsMetadata::get(Builder.getInt32(i)),

        };

        OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",

                                       MDNode::get(Ctx, AttrMDArgs));

      }

    }

    return OrderedKernels;

  }


  static void partitionVariablesIntoIndirectStrategies(

      Module &M, LDSUsesInfoTy const &LDSUsesInfo,

      VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,

      DenseSet<GlobalVariable *> &ModuleScopeVariables,

      DenseSet<GlobalVariable *> &TableLookupVariables,

      DenseSet<GlobalVariable *> &KernelAccessVariables,

      DenseSet<GlobalVariable *> &DynamicVariables) {


    GlobalVariable *HybridModuleRoot =

        LoweringKindLoc != LoweringKind::hybrid

            ? nullptr

            : chooseBestVariableForModuleStrategy(

                  M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);


    DenseSet<Function *> const EmptySet;

    DenseSet<Function *> const &HybridModuleRootKernels =

        HybridModuleRoot

            ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]

            : EmptySet;


    for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {

      // Each iteration of this loop assigns exactly one global variable to

      // exactly one of the implementation strategies.


      GlobalVariable *GV = K.first;

      assert(AMDGPU::isLDSVariableToLower(*GV));

      assert(K.second.size() != 0);


      if (AMDGPU::isDynamicLDS(*GV)) {

        DynamicVariables.insert(GV);

        continue;

      }


      switch (LoweringKindLoc) {

      case LoweringKind::module:

        ModuleScopeVariables.insert(GV);

        break;


      case LoweringKind::table:

        TableLookupVariables.insert(GV);

        break;


      case LoweringKind::kernel:

        if (K.second.size() == 1) {

          KernelAccessVariables.insert(GV);

        } else {

          // FIXME: This should use DiagnosticInfo

          reportFatalUsageError(

              "cannot lower LDS '" + GV->getName() +

              "' to kernel access as it is reachable from multiple kernels");

        }

        break;


      case LoweringKind::hybrid: {

        if (GV == HybridModuleRoot) {

          assert(K.second.size() != 1);

          ModuleScopeVariables.insert(GV);

        } else if (K.second.size() == 1) {

          KernelAccessVariables.insert(GV);

        } else if (set_is_subset(K.second, HybridModuleRootKernels)) {

          ModuleScopeVariables.insert(GV);

        } else {

          TableLookupVariables.insert(GV);

        }

        break;

      }

      }

    }


    // All LDS variables accessed indirectly have now been partitioned into

    // the distinct lowering strategies.

    assert(ModuleScopeVariables.size() + TableLookupVariables.size() +

               KernelAccessVariables.size() + DynamicVariables.size() ==

           LDSToKernelsThatNeedToAccessItIndirectly.size());

  }


  static GlobalVariable *lowerModuleScopeStructVariables(

      Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,

      DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {

    // Create a struct to hold the ModuleScopeVariables

    // Replace all uses of those variables from non-kernel functions with the

    // new struct instance Replace only the uses from kernel functions that will

    // allocate this instance. That is a space optimisation - kernels that use a

    // subset of the module scope struct and do not need to allocate it for

    // indirect calls will only allocate the subset they use (they do so as part

    // of the per-kernel lowering).

    if (ModuleScopeVariables.empty()) {

      return nullptr;

    }


    LLVMContext &Ctx = M.getContext();


    LDSVariableReplacement ModuleScopeReplacement =

        createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",

                                     ModuleScopeVariables);


    appendToCompilerUsed(M, {static_cast<GlobalValue *>(

                                ConstantExpr::getPointerBitCastOrAddrSpaceCast(

                                    cast<Constant>(ModuleScopeReplacement.SGV),

                                    PointerType::getUnqual(Ctx)))});


    // module.lds will be allocated at zero in any kernel that allocates it

    recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);


    // historic

    removeLocalVarsFromUsedLists(M, ModuleScopeVariables);


    // Replace all uses of module scope variable from non-kernel functions

    replaceLDSVariablesWithStruct(

        M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {

          Instruction *I = dyn_cast<Instruction>(U.getUser());

          if (!I) {

            return false;

          }

          Function *F = I->getFunction();

          return !isKernelLDS(F);

        });


    // Replace uses of module scope variable from kernel functions that

    // allocate the module scope variable, otherwise leave them unchanged

    // Record on each kernel whether the module scope global is used by it


    for (Function &Func : M.functions()) {

      if (Func.isDeclaration() || !isKernelLDS(&Func))

        continue;


      if (KernelsThatAllocateModuleLDS.contains(&Func)) {

        replaceLDSVariablesWithStruct(

            M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {

              Instruction *I = dyn_cast<Instruction>(U.getUser());

              if (!I) {

                return false;

              }

              Function *F = I->getFunction();

              return F == &Func;

            });


        markUsedByKernel(&Func, ModuleScopeReplacement.SGV);

      }

    }


    return ModuleScopeReplacement.SGV;

  }


  static DenseMap<Function *, LDSVariableReplacement>

  lowerKernelScopeStructVariables(

      Module &M, LDSUsesInfoTy &LDSUsesInfo,

      DenseSet<GlobalVariable *> const &ModuleScopeVariables,

      DenseSet<Function *> const &KernelsThatAllocateModuleLDS,

      GlobalVariable *MaybeModuleScopeStruct) {


    // Create a struct for each kernel for the non-module-scope variables.


    DenseMap<Function *, LDSVariableReplacement> KernelToReplacement;

    for (Function &Func : M.functions()) {

      if (Func.isDeclaration() || !isKernelLDS(&Func))

        continue;


      DenseSet<GlobalVariable *> KernelUsedVariables;

      // Allocating variables that are used directly in this struct to get

      // alignment aware allocation and predictable frame size.

      for (auto &v : LDSUsesInfo.direct_access[&Func]) {

        if (!AMDGPU::isDynamicLDS(*v)) {

          KernelUsedVariables.insert(v);

        }

      }


      // Allocating variables that are accessed indirectly so that a lookup of

      // this struct instance can find them from nested functions.

      for (auto &v : LDSUsesInfo.indirect_access[&Func]) {

        if (!AMDGPU::isDynamicLDS(*v)) {

          KernelUsedVariables.insert(v);

        }

      }


      // Variables allocated in module lds must all resolve to that struct,

      // not to the per-kernel instance.

      if (KernelsThatAllocateModuleLDS.contains(&Func)) {

        for (GlobalVariable *v : ModuleScopeVariables) {

          KernelUsedVariables.erase(v);

        }

      }


      if (KernelUsedVariables.empty()) {

        // Either used no LDS, or the LDS it used was all in the module struct

        // or dynamically sized

        continue;

      }


      // The association between kernel function and LDS struct is done by

      // symbol name, which only works if the function in question has a

      // name This is not expected to be a problem in practice as kernels

      // are called by name making anonymous ones (which are named by the

      // backend) difficult to use. This does mean that llvm test cases need

      // to name the kernels.

      if (!Func.hasName()) {

        reportFatalUsageError("anonymous kernels cannot use LDS variables");

      }


      std::string VarName =

          (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();


      auto Replacement =

          createLDSVariableReplacement(M, VarName, KernelUsedVariables);


      // If any indirect uses, create a direct use to ensure allocation

      // TODO: Simpler to unconditionally mark used but that regresses

      // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll

      auto Accesses = LDSUsesInfo.indirect_access.find(&Func);

      if ((Accesses != LDSUsesInfo.indirect_access.end()) &&

          !Accesses->second.empty())

        markUsedByKernel(&Func, Replacement.SGV);


      // remove preserves existing codegen

      removeLocalVarsFromUsedLists(M, KernelUsedVariables);

      KernelToReplacement[&Func] = Replacement;


      // Rewrite uses within kernel to the new struct

      replaceLDSVariablesWithStruct(

          M, KernelUsedVariables, Replacement, [&Func](Use &U) {

            Instruction *I = dyn_cast<Instruction>(U.getUser());

            return I && I->getFunction() == &Func;

          });

    }

    return KernelToReplacement;

  }


  static GlobalVariable *

  buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,

                                        Function *func) {

    // Create a dynamic lds variable with a name associated with the passed

    // function that has the maximum alignment of any dynamic lds variable

    // reachable from this kernel. Dynamic LDS is allocated after the static LDS

    // allocation, possibly after alignment padding. The representative variable

    // created here has the maximum alignment of any other dynamic variable

    // reachable by that kernel. All dynamic LDS variables are allocated at the

    // same address in each kernel in order to provide the documented aliasing

    // semantics. Setting the alignment here allows this IR pass to accurately

    // predict the exact constant at which it will be allocated.


    assert(isKernelLDS(func));


    LLVMContext &Ctx = M.getContext();

    const DataLayout &DL = M.getDataLayout();

    Align MaxDynamicAlignment(1);


    auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {

      if (AMDGPU::isDynamicLDS(*GV)) {

        MaxDynamicAlignment =

            std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));

      }

    };


    for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {

      UpdateMaxAlignment(GV);

    }


    for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {

      UpdateMaxAlignment(GV);

    }


    assert(func->hasName()); // Checked by caller

    auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);

    GlobalVariable *N = new GlobalVariable(

        M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,

        Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,

        false);

    N->setAlignment(MaxDynamicAlignment);


    assert(AMDGPU::isDynamicLDS(*N));

    return N;

  }


  DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(

      Module &M, LDSUsesInfoTy &LDSUsesInfo,

      DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,

      DenseSet<GlobalVariable *> const &DynamicVariables,

      std::vector<Function *> const &OrderedKernels) {

    DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;

    if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {

      LLVMContext &Ctx = M.getContext();

      IRBuilder<> Builder(Ctx);

      Type *I32 = Type::getInt32Ty(Ctx);


      std::vector<Constant *> newDynamicLDS;


      // Table is built in the same order as OrderedKernels

      for (auto &func : OrderedKernels) {


        if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {

          assert(isKernelLDS(func));

          if (!func->hasName()) {

            reportFatalUsageError("anonymous kernels cannot use LDS variables");

          }


          GlobalVariable *N =

              buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);


          KernelToCreatedDynamicLDS[func] = N;


          markUsedByKernel(func, N);


          auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);

          auto *GEP = ConstantExpr::getGetElementPtr(

              emptyCharArray, N, ConstantInt::get(I32, 0), true);

          newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));

        } else {

          newDynamicLDS.push_back(PoisonValue::get(I32));

        }

      }

      assert(OrderedKernels.size() == newDynamicLDS.size());


      ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());

      Constant *init = ConstantArray::get(t, newDynamicLDS);

      GlobalVariable *table = new GlobalVariable(

          M, t, true, GlobalValue::InternalLinkage, init,

          "llvm.amdgcn.dynlds.offset.table", nullptr,

          GlobalValue::NotThreadLocal, AMDGPUAS::CONSTANT_ADDRESS);


      for (GlobalVariable *GV : DynamicVariables) {

        for (Use &U : make_early_inc_range(GV->uses())) {

          auto *I = dyn_cast<Instruction>(U.getUser());

          if (!I)

            continue;

          if (isKernelLDS(I->getFunction()))

            continue;


          replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);

        }

      }

    }

    return KernelToCreatedDynamicLDS;

  }


  static GlobalVariable *uniquifyGVPerKernel(Module &M, GlobalVariable *GV,

                                             Function *KF) {

    bool NeedsReplacement = false;

    for (Use &U : GV->uses()) {

      if (auto *I = dyn_cast<Instruction>(U.getUser())) {

        Function *F = I->getFunction();

        if (isKernelLDS(F) && F != KF) {

          NeedsReplacement = true;

          break;

        }

      }

    }

    if (!NeedsReplacement)

      return GV;

    // Create a new GV used only by this kernel and its function

    GlobalVariable *NewGV = new GlobalVariable(

        M, GV->getValueType(), GV->isConstant(), GV->getLinkage(),

        GV->getInitializer(), GV->getName() + "." + KF->getName(), nullptr,

        GV->getThreadLocalMode(), GV->getType()->getAddressSpace());

    NewGV->copyAttributesFrom(GV);

    for (Use &U : make_early_inc_range(GV->uses())) {

      if (auto *I = dyn_cast<Instruction>(U.getUser())) {

        Function *F = I->getFunction();

        if (!isKernelLDS(F) || F == KF) {

          U.getUser()->replaceUsesOfWith(GV, NewGV);

        }

      }

    }

    return NewGV;

  }


  bool lowerSpecialLDSVariables(

      Module &M, LDSUsesInfoTy &LDSUsesInfo,

      VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly) {

    bool Changed = false;

    const DataLayout &DL = M.getDataLayout();

    // The 1st round: give module-absolute assignments

    int NumAbsolutes = 0;

    std::vector<GlobalVariable *> OrderedGVs;

    for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {

      GlobalVariable *GV = K.first;

      if (!isNamedBarrier(*GV))

        continue;

      // give a module-absolute assignment if it is indirectly accessed by

      // multiple kernels. This is not precise, but we don't want to duplicate

      // a function when it is called by multiple kernels.

      if (LDSToKernelsThatNeedToAccessItIndirectly[GV].size() > 1) {

        OrderedGVs.push_back(GV);

      } else {

        // leave it to the 2nd round, which will give a kernel-relative

        // assignment if it is only indirectly accessed by one kernel

        LDSUsesInfo.direct_access[*K.second.begin()].insert(GV);

      }

      LDSToKernelsThatNeedToAccessItIndirectly.erase(GV);

    }

    OrderedGVs = sortByName(std::move(OrderedGVs));

    for (GlobalVariable *GV : OrderedGVs) {

      unsigned BarrierScope = llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;

      unsigned BarId = NumAbsolutes + 1;

      unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;

      NumAbsolutes += BarCnt;


      // 4 bits for alignment, 5 bits for the barrier num,

      // 3 bits for the barrier scope

      unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;

      recordLDSAbsoluteAddress(&M, GV, Offset);

    }

    OrderedGVs.clear();


    // The 2nd round: give a kernel-relative assignment for GV that

    // either only indirectly accessed by single kernel or only directly

    // accessed by multiple kernels.

    std::vector<Function *> OrderedKernels;

    for (auto &K : LDSUsesInfo.direct_access) {

      Function *F = K.first;

      assert(isKernelLDS(F));

      OrderedKernels.push_back(F);

    }

    OrderedKernels = sortByName(std::move(OrderedKernels));


    llvm::DenseMap<Function *, uint32_t> Kernel2BarId;

    for (Function *F : OrderedKernels) {

      for (GlobalVariable *GV : LDSUsesInfo.direct_access[F]) {

        if (!isNamedBarrier(*GV))

          continue;


        LDSUsesInfo.direct_access[F].erase(GV);

        if (GV->isAbsoluteSymbolRef()) {

          // already assigned

          continue;

        }

        OrderedGVs.push_back(GV);

      }

      OrderedGVs = sortByName(std::move(OrderedGVs));

      for (GlobalVariable *GV : OrderedGVs) {

        // GV could also be used directly by other kernels. If so, we need to

        // create a new GV used only by this kernel and its function.

        auto NewGV = uniquifyGVPerKernel(M, GV, F);

        Changed |= (NewGV != GV);

        unsigned BarrierScope = llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;

        unsigned BarId = Kernel2BarId[F];

        BarId += NumAbsolutes + 1;

        unsigned BarCnt = DL.getTypeAllocSize(GV->getValueType()) / 16;

        Kernel2BarId[F] += BarCnt;

        unsigned Offset = 0x802000u | BarrierScope << 9 | BarId << 4;

        recordLDSAbsoluteAddress(&M, NewGV, Offset);

      }

      OrderedGVs.clear();

    }

    // Also erase those special LDS variables from indirect_access.

    for (auto &K : LDSUsesInfo.indirect_access) {

      assert(isKernelLDS(K.first));

      for (GlobalVariable *GV : K.second) {

        if (isNamedBarrier(*GV))

          K.second.erase(GV);

      }

    }

    return Changed;

  }


  bool runOnModule(Module &M) {

    CallGraph CG = CallGraph(M);

    bool Changed = superAlignLDSGlobals(M);


    Changed |= eliminateConstantExprUsesOfLDSFromAllInstructions(M);


    Changed = true; // todo: narrow this down


    // For each kernel, what variables does it access directly or through

    // callees

    LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);


    // For each variable accessed through callees, which kernels access it

    VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;

    for (auto &K : LDSUsesInfo.indirect_access) {

      Function *F = K.first;

      assert(isKernelLDS(F));

      for (GlobalVariable *GV : K.second) {

        LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);

      }

    }


    if (LDSUsesInfo.HasSpecialGVs) {

      // Special LDS variables need special address assignment

      Changed |= lowerSpecialLDSVariables(

          M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);

    }


    // Partition variables accessed indirectly into the different strategies

    DenseSet<GlobalVariable *> ModuleScopeVariables;

    DenseSet<GlobalVariable *> TableLookupVariables;

    DenseSet<GlobalVariable *> KernelAccessVariables;

    DenseSet<GlobalVariable *> DynamicVariables;

    partitionVariablesIntoIndirectStrategies(

        M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,

        ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,

        DynamicVariables);


    // If the kernel accesses a variable that is going to be stored in the

    // module instance through a call then that kernel needs to allocate the

    // module instance

    const DenseSet<Function *> KernelsThatAllocateModuleLDS =

        kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

                                                        ModuleScopeVariables);

    const DenseSet<Function *> KernelsThatAllocateTableLDS =

        kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

                                                        TableLookupVariables);


    const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =

        kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

                                                        DynamicVariables);


    GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(

        M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);


    DenseMap<Function *, LDSVariableReplacement> KernelToReplacement =

        lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,

                                        KernelsThatAllocateModuleLDS,

                                        MaybeModuleScopeStruct);


    // Lower zero cost accesses to the kernel instances just created

    for (auto &GV : KernelAccessVariables) {

      auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];

      assert(funcs.size() == 1); // Only one kernel can access it

      LDSVariableReplacement Replacement =

          KernelToReplacement[*(funcs.begin())];


      DenseSet<GlobalVariable *> Vec;

      Vec.insert(GV);


      replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {

        return isa<Instruction>(U.getUser());

      });

    }


    // The ith element of this vector is kernel id i

    std::vector<Function *> OrderedKernels =

        assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,

                                      KernelsThatIndirectlyAllocateDynamicLDS);


    if (!KernelsThatAllocateTableLDS.empty()) {

      LLVMContext &Ctx = M.getContext();

      IRBuilder<> Builder(Ctx);


      // The order must be consistent between lookup table and accesses to

      // lookup table

      auto TableLookupVariablesOrdered =

          sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),

                                                   TableLookupVariables.end()));


      GlobalVariable *LookupTable = buildLookupTable(

          M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);

      replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,

                                               LookupTable);

    }


    DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =

        lowerDynamicLDSVariables(M, LDSUsesInfo,

                                 KernelsThatIndirectlyAllocateDynamicLDS,

                                 DynamicVariables, OrderedKernels);


    // Strip amdgpu-no-lds-kernel-id from all functions reachable from the

    // kernel. We may have inferred this wasn't used prior to the pass.

    // TODO: We could filter out subgraphs that do not access LDS globals.

    for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,

                            &KernelsThatAllocateTableLDS})

      for (Function *F : *KernelSet)

        removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});


    // All kernel frames have been allocated. Calculate and record the

    // addresses.

    {

      const DataLayout &DL = M.getDataLayout();


      for (Function &Func : M.functions()) {

        if (Func.isDeclaration() || !isKernelLDS(&Func))

          continue;


        // All three of these are optional. The first variable is allocated at

        // zero. They are allocated by AMDGPUMachineFunction as one block.

        // Layout:

        //{

        //  module.lds

        //  alignment padding

        //  kernel instance

        //  alignment padding

        //  dynamic lds variables

        //}


        const bool AllocateModuleScopeStruct =

            MaybeModuleScopeStruct &&

            KernelsThatAllocateModuleLDS.contains(&Func);


        auto Replacement = KernelToReplacement.find(&Func);

        const bool AllocateKernelScopeStruct =

            Replacement != KernelToReplacement.end();


        const bool AllocateDynamicVariable =

            KernelToCreatedDynamicLDS.contains(&Func);


        uint32_t Offset = 0;


        if (AllocateModuleScopeStruct) {

          // Allocated at zero, recorded once on construction, not once per

          // kernel

          Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());

        }


        if (AllocateKernelScopeStruct) {

          GlobalVariable *KernelStruct = Replacement->second.SGV;

          Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));

          recordLDSAbsoluteAddress(&M, KernelStruct, Offset);

          Offset += DL.getTypeAllocSize(KernelStruct->getValueType());

        }


        // If there is dynamic allocation, the alignment needed is included in

        // the static frame size. There may be no reference to the dynamic

        // variable in the kernel itself, so without including it here, that

        // alignment padding could be missed.

        if (AllocateDynamicVariable) {

          GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];

          Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));

          recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);

        }


        if (Offset != 0) {

          (void)TM; // TODO: Account for target maximum LDS

          std::string Buffer;

          raw_string_ostream SS{Buffer};

          SS << format("%u", Offset);


          // Instead of explicitly marking kernels that access dynamic variables

          // using special case metadata, annotate with min-lds == max-lds, i.e.

          // that there is no more space available for allocating more static

          // LDS variables. That is the right condition to prevent allocating

          // more variables which would collide with the addresses assigned to

          // dynamic variables.

          if (AllocateDynamicVariable)

            SS << format(",%u", Offset);


          Func.addFnAttr("amdgpu-lds-size", Buffer);

        }

      }

    }


    for (auto &GV : make_early_inc_range(M.globals()))

      if (AMDGPU::isLDSVariableToLower(GV)) {

        // probably want to remove from used lists

        GV.removeDeadConstantUsers();

        if (GV.use_empty())

          GV.eraseFromParent();

      }


    return Changed;

  }


private:

  // Increase the alignment of LDS globals if necessary to maximise the chance

  // that we can use aligned LDS instructions to access them.

  static bool superAlignLDSGlobals(Module &M) {

    const DataLayout &DL = M.getDataLayout();

    bool Changed = false;

    if (!SuperAlignLDSGlobals) {

      return Changed;

    }


    for (auto &GV : M.globals()) {

      if (GV.getType()->getPointerAddressSpace() != AMDGPUAS::LOCAL_ADDRESS) {

        // Only changing alignment of LDS variables

        continue;

      }

      if (!GV.hasInitializer()) {

        // cuda/hip extern __shared__ variable, leave alignment alone

        continue;

      }


      if (GV.isAbsoluteSymbolRef()) {

        // If the variable is already allocated, don't change the alignment

        continue;

      }


      Align Alignment = AMDGPU::getAlign(DL, &GV);

      TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());


      if (GVSize > 8) {

        // We might want to use a b96 or b128 load/store

        Alignment = std::max(Alignment, Align(16));

      } else if (GVSize > 4) {

        // We might want to use a b64 load/store

        Alignment = std::max(Alignment, Align(8));

      } else if (GVSize > 2) {

        // We might want to use a b32 load/store

        Alignment = std::max(Alignment, Align(4));

      } else if (GVSize > 1) {

        // We might want to use a b16 load/store

        Alignment = std::max(Alignment, Align(2));

      }


      if (Alignment != AMDGPU::getAlign(DL, &GV)) {

        Changed = true;

        GV.setAlignment(Alignment);

      }

    }

    return Changed;

  }


  static LDSVariableReplacement createLDSVariableReplacement(

      Module &M, std::string VarName,

      DenseSet<GlobalVariable *> const &LDSVarsToTransform) {

    // Create a struct instance containing LDSVarsToTransform and map from those

    // variables to ConstantExprGEP

    // Variables may be introduced to meet alignment requirements. No aliasing

    // metadata is useful for these as they have no uses. Erased before return.


    LLVMContext &Ctx = M.getContext();

    const DataLayout &DL = M.getDataLayout();

    assert(!LDSVarsToTransform.empty());


    SmallVector<OptimizedStructLayoutField, 8> LayoutFields;

    LayoutFields.reserve(LDSVarsToTransform.size());

    {

      // The order of fields in this struct depends on the order of

      // variables in the argument which varies when changing how they

      // are identified, leading to spurious test breakage.

      auto Sorted = sortByName(std::vector<GlobalVariable *>(

          LDSVarsToTransform.begin(), LDSVarsToTransform.end()));


      for (GlobalVariable *GV : Sorted) {

        OptimizedStructLayoutField F(GV,

                                     DL.getTypeAllocSize(GV->getValueType()),

                                     AMDGPU::getAlign(DL, GV));

        LayoutFields.emplace_back(F);

      }

    }


    performOptimizedStructLayout(LayoutFields);


    std::vector<GlobalVariable *> LocalVars;

    BitVector IsPaddingField;

    LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large

    IsPaddingField.reserve(LDSVarsToTransform.size());

    {

      uint64_t CurrentOffset = 0;

      for (auto &F : LayoutFields) {

        GlobalVariable *FGV =

            static_cast<GlobalVariable *>(const_cast<void *>(F.Id));

        Align DataAlign = F.Alignment;


        uint64_t DataAlignV = DataAlign.value();

        if (uint64_t Rem = CurrentOffset % DataAlignV) {

          uint64_t Padding = DataAlignV - Rem;


          // Append an array of padding bytes to meet alignment requested

          // Note (o +      (a - (o % a)) ) % a == 0

          //      (offset + Padding       ) % align == 0


          Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);

          LocalVars.push_back(new GlobalVariable(

              M, ATy, false, GlobalValue::InternalLinkage,

              PoisonValue::get(ATy), "", nullptr, GlobalValue::NotThreadLocal,

              AMDGPUAS::LOCAL_ADDRESS, false));

          IsPaddingField.push_back(true);

          CurrentOffset += Padding;

        }


        LocalVars.push_back(FGV);

        IsPaddingField.push_back(false);

        CurrentOffset += F.Size;

      }

    }


    std::vector<Type *> LocalVarTypes;

    LocalVarTypes.reserve(LocalVars.size());

    std::transform(

        LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),

        [](const GlobalVariable *V) -> Type * { return V->getValueType(); });


    StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");


    Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);


    GlobalVariable *SGV = new GlobalVariable(

        M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),

        VarName, nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,

        false);

    SGV->setAlignment(StructAlign);


    DenseMap<GlobalVariable *, Constant *> Map;

    Type *I32 = Type::getInt32Ty(Ctx);

    for (size_t I = 0; I < LocalVars.size(); I++) {

      GlobalVariable *GV = LocalVars[I];

      Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};

      Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);

      if (IsPaddingField[I]) {

        assert(GV->use_empty());

        GV->eraseFromParent();

      } else {

        Map[GV] = GEP;

      }

    }

    assert(Map.size() == LDSVarsToTransform.size());

    return {SGV, std::move(Map)};

  }


  template <typename PredicateTy>

  static void replaceLDSVariablesWithStruct(

      Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,

      const LDSVariableReplacement &Replacement, PredicateTy Predicate) {

    LLVMContext &Ctx = M.getContext();

    const DataLayout &DL = M.getDataLayout();


    // A hack... we need to insert the aliasing info in a predictable order for

    // lit tests. Would like to have them in a stable order already, ideally the

    // same order they get allocated, which might mean an ordered set container

    auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(

        LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));


    // Create alias.scope and their lists. Each field in the new structure

    // does not alias with all other fields.

    SmallVector<MDNode *> AliasScopes;

    SmallVector<Metadata *> NoAliasList;

    const size_t NumberVars = LDSVarsToTransform.size();

    if (NumberVars > 1) {

      MDBuilder MDB(Ctx);

      AliasScopes.reserve(NumberVars);

      MDNode *Domain = MDB.createAnonymousAliasScopeDomain();

      for (size_t I = 0; I < NumberVars; I++) {

        MDNode *Scope = MDB.createAnonymousAliasScope(Domain);

        AliasScopes.push_back(Scope);

      }

      NoAliasList.append(&AliasScopes[1], AliasScopes.end());

    }


    // Replace uses of ith variable with a constantexpr to the corresponding

    // field of the instance that will be allocated by AMDGPUMachineFunction

    for (size_t I = 0; I < NumberVars; I++) {

      GlobalVariable *GV = LDSVarsToTransform[I];

      Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);


      GV->replaceUsesWithIf(GEP, Predicate);


      APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);

      GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);

      uint64_t Offset = APOff.getZExtValue();


      Align A =

          commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);


      if (I)

        NoAliasList[I - 1] = AliasScopes[I - 1];

      MDNode *NoAlias =

          NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);

      MDNode *AliasScope =

          AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});


      refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);

    }

  }


  static void refineUsesAlignmentAndAA(Value *Ptr, Align A,

                                       const DataLayout &DL, MDNode *AliasScope,

                                       MDNode *NoAlias, unsigned MaxDepth = 5) {

    if (!MaxDepth || (A == 1 && !AliasScope))

      return;


    ScopedNoAliasAAResult ScopedNoAlias;


    for (User *U : Ptr->users()) {

      if (auto *I = dyn_cast<Instruction>(U)) {

        if (AliasScope && I->mayReadOrWriteMemory()) {

          MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);

          AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)

                   : AliasScope);

          I->setMetadata(LLVMContext::MD_alias_scope, AS);


          MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);


          // Scoped aliases can originate from two different domains.

          // First domain would be from LDS domain (created by this pass).

          // All entries (LDS vars) into LDS struct will have same domain.


          // Second domain could be existing scoped aliases that are the

          // results of noalias params and subsequent optimizations that

          // may alter thesse sets.


          // We need to be careful how we create new alias sets, and

          // have right scopes and domains for loads/stores of these new

          // LDS variables. We intersect NoAlias set if alias sets belong

          // to the same domain. This is the case if we have memcpy using

          // LDS variables. Both src and dst of memcpy would belong to

          // LDS struct, they donot alias.

          // On the other hand, if one of the domains is LDS and other is

          // existing domain prior to LDS, we need to have a union of all

          // these aliases set to preserve existing aliasing information.


          SmallPtrSet<const MDNode *, 16> ExistingDomains, LDSDomains;

          ScopedNoAlias.collectScopedDomains(NA, ExistingDomains);

          ScopedNoAlias.collectScopedDomains(NoAlias, LDSDomains);

          auto Intersection = set_intersection(ExistingDomains, LDSDomains);

          if (Intersection.empty()) {

            NA = NA ? MDNode::concatenate(NA, NoAlias) : NoAlias;

          } else {

            NA = NA ? MDNode::intersect(NA, NoAlias) : NoAlias;

          }

          I->setMetadata(LLVMContext::MD_noalias, NA);

        }

      }


      if (auto *LI = dyn_cast<LoadInst>(U)) {

        LI->setAlignment(std::max(A, LI->getAlign()));

        continue;

      }

      if (auto *SI = dyn_cast<StoreInst>(U)) {

        if (SI->getPointerOperand() == Ptr)

          SI->setAlignment(std::max(A, SI->getAlign()));

        continue;

      }

      if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {

        // None of atomicrmw operations can work on pointers, but let's

        // check it anyway in case it will or we will process ConstantExpr.

        if (AI->getPointerOperand() == Ptr)

          AI->setAlignment(std::max(A, AI->getAlign()));

        continue;

      }

      if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {

        if (AI->getPointerOperand() == Ptr)

          AI->setAlignment(std::max(A, AI->getAlign()));

        continue;

      }

      if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {

        unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());

        APInt Off(BitWidth, 0);

        if (GEP->getPointerOperand() == Ptr) {

          Align GA;

          if (GEP->accumulateConstantOffset(DL, Off))

            GA = commonAlignment(A, Off.getLimitedValue());

          refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,

                                   MaxDepth - 1);

        }

        continue;

      }

      if (auto *I = dyn_cast<Instruction>(U)) {

        if (I->getOpcode() == Instruction::BitCast ||

            I->getOpcode() == Instruction::AddrSpaceCast)

          refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);

      }

    }

  }

};


class AMDGPULowerModuleLDSLegacy : public ModulePass {

public:

  const AMDGPUTargetMachine *TM;

  static char ID;


  AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM = nullptr)

      : ModulePass(ID), TM(TM) {}


  void getAnalysisUsage(AnalysisUsage &AU) const override {

    if (!TM)

      AU.addRequired<TargetPassConfig>();

  }


  bool runOnModule(Module &M) override {

    if (!TM) {

      auto &TPC = getAnalysis<TargetPassConfig>();

      TM = &TPC.getTM<AMDGPUTargetMachine>();

    }


    return AMDGPULowerModuleLDS(*TM).runOnModule(M);

  }

};


} // namespace

char AMDGPULowerModuleLDSLegacy::ID = 0;


char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;


INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,

                      "Lower uses of LDS variables from non-kernel functions",

                      false, false)

INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)

INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,

                    "Lower uses of LDS variables from non-kernel functions",

                    false, false)


ModulePass *

llvm::createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM) {

  return new AMDGPULowerModuleLDSLegacy(TM);

}


PreservedAnalyses AMDGPULowerModuleLDSPass::run(Module &M,

                                                ModuleAnalysisManager &) {

  return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()

                                                 : PreservedAnalyses::all();

}

assert
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")

const
aarch64 promote const
Definition: AArch64PromoteConstant.cpp:228

AMDGPUBaseInfo.h

functions
Lower uses of LDS variables from non kernel functions
Definition: AMDGPULowerModuleLDSPass.cpp:1568

DEBUG_TYPE
#define DEBUG_TYPE
Definition: AMDGPULowerModuleLDSPass.cpp:214

AMDGPUMemoryUtils.h

LDS
AMDGPU promote alloca to vector or LDS
Definition: AMDGPUPromoteAlloca.cpp:207

AMDGPUTargetMachine.h
The AMDGPU TargetMachine interface definition for hw codegen targets.

AMDGPU.h

DL
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Definition: ARMSLSHardening.cpp:73

BasicBlockUtils.h

BitVector.h
This file implements the BitVector class.

A
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")

CallGraph.h
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...

CommandLine.h

clEnumValN
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:687

Constants.h
This file contains the declarations for the subclasses of Constant, which represent the different fla...

Domain
Domain
Definition: CorrelatedValuePropagation.cpp:756

Accesses
DXIL Forward Handle Accesses
Definition: DXILForwardHandleAccesses.cpp:215

uses
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
Definition: DeadArgumentElimination.cpp:709

DenseMap.h
This file defines the DenseMap class.

DenseSet.h
This file defines the DenseSet and SmallDenseSet classes.

DerivedTypes.h

Dominators.h

Size
uint64_t Size
Definition: ELFObjHandler.cpp:81

Other
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1328

Format.h

func
global merge func
Definition: GlobalMergeFunctions.cpp:592

GEP
Hexagon Common GEP
Definition: HexagonCommonGEP.cpp:164

IRBuilder.h

InitializePasses.h

InlineAsm.h

Instructions.h

F
#define F(x, y, z)
Definition: MD5.cpp:55

I
#define I(x, y, z)
Definition: MD5.cpp:58

MDBuilder.h

ModuleUtils.h

OptimizedStructLayout.h
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...

INITIALIZE_PASS_DEPENDENCY
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:42

INITIALIZE_PASS_END
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:44

INITIALIZE_PASS_BEGIN
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:39

Pass.h

ReplaceConstant.h

Address
@ Address
Definition: SPIRVEmitNonSemanticDI.cpp:58

STLExtras.h
This file contains some templates that are useful if you are working with the STL at all.

ScopedNoAliasAA.h
This is the interface for a metadata-based scoped no-alias analysis.

SetOperations.h
This file defines generic set operations that may be used on set's of different types,...

Debug.h

Ptr
@ Ptr
Definition: TargetLibraryInfo.cpp:77

TargetPassConfig.h
Target-Independent Code Generator Pass Configuration Options pass.

ArrayType
Definition: ItaniumDemangle.h:797

Predicate
Definition: AMDGPURegBankLegalizeRules.cpp:376

llvm::AMDGPUTargetMachine
Definition: AMDGPUTargetMachine.h:30

llvm::APInt
Class for arbitrary precision integers.
Definition: APInt.h:78

llvm::APInt::getZExtValue
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1540

llvm::AnalysisManager
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:255

llvm::AnalysisUsage
Represent the analysis usage information of a pass.
Definition: PassAnalysisSupport.h:48

llvm::AnalysisUsage::addRequired
AnalysisUsage & addRequired()
Definition: PassAnalysisSupport.h:76

llvm::ArrayRef
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41

llvm::ArrayRef::size
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:147

llvm::ArrayRef::empty
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:142

llvm::BasicBlock
LLVM Basic Block Representation.
Definition: BasicBlock.h:62

llvm::BasicBlock::getFirstInsertionPt
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:393

llvm::BitVector
Definition: BitVector.h:82

llvm::BitVector::reserve
void reserve(unsigned N)
Definition: BitVector.h:348

llvm::BitVector::push_back
void push_back(bool Val)
Definition: BitVector.h:466

llvm::CallGraph
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72

llvm::ConstantArray::get
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1314

llvm::ConstantAsMetadata::get
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:535

llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast
static LLVM_ABI Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2261

llvm::ConstantExpr::getPtrToInt
static LLVM_ABI Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2300

llvm::ConstantExpr::getGetElementPtr
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1274

llvm::Constant
This is an important base class in LLVM.
Definition: Constant.h:43

llvm::Constant::removeDeadConstantUsers
LLVM_ABI void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:739

llvm::DataLayout
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:63

llvm::DenseMapBase::find
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:177

llvm::DenseMapBase::try_emplace
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
Definition: DenseMap.h:245

llvm::DenseMapBase::erase
bool erase(const KeyT &Val)
Definition: DenseMap.h:319

llvm::DenseMapBase::end
iterator end()
Definition: DenseMap.h:87

llvm::DenseMapBase::contains
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:168

llvm::DenseMapBase::insert
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition: DenseMap.h:230

llvm::DenseMap
Definition: DenseMap.h:730

llvm::DenseSet
Implements a dense probed hash-table based set.
Definition: DenseSet.h:263

llvm::Function
Definition: Function.h:64

llvm::GlobalObject::setMetadata
LLVM_ABI void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1571

llvm::GlobalValue
Definition: GlobalValue.h:49

llvm::GlobalValue::NotThreadLocal
@ NotThreadLocal
Definition: GlobalValue.h:198

llvm::GlobalValue::getLinkage
LinkageTypes getLinkage() const
Definition: GlobalValue.h:548

llvm::GlobalValue::isAbsoluteSymbolRef
LLVM_ABI bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
Definition: Globals.cpp:424

llvm::GlobalValue::getThreadLocalMode
ThreadLocalMode getThreadLocalMode() const
Definition: GlobalValue.h:273

llvm::GlobalValue::getType
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:296

llvm::GlobalValue::InternalLinkage
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:60

llvm::GlobalValue::ExternalLinkage
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:53

llvm::GlobalValue::getValueType
Type * getValueType() const
Definition: GlobalValue.h:298

llvm::GlobalVariable
Definition: GlobalVariable.h:40

llvm::GlobalVariable::getInitializer
const Constant * getInitializer() const
getInitializer - Return the initializer for this global variable.
Definition: GlobalVariable.h:154

llvm::GlobalVariable::hasInitializer
bool hasInitializer() const
Definitions have initializers, declarations don't.
Definition: GlobalVariable.h:110

llvm::GlobalVariable::copyAttributesFrom
LLVM_ABI void copyAttributesFrom(const GlobalVariable *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...
Definition: Globals.cpp:540

llvm::GlobalVariable::isConstant
bool isConstant() const
If the value is a global constant, its value is immutable throughout the runtime execution of the pro...
Definition: GlobalVariable.h:177

llvm::GlobalVariable::eraseFromParent
LLVM_ABI void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:507

llvm::GlobalVariable::setAlignment
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalVariable.
Definition: GlobalVariable.h:311

llvm::IRBuilderBase::CreateIntToPtr
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition: IRBuilder.h:2199

llvm::IRBuilderBase::CreateConstInBoundsGEP1_32
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition: IRBuilder.h:1946

llvm::IRBuilderBase::CreateInBoundsGEP
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition: IRBuilder.h:1931

llvm::IRBuilderBase::CreateIntrinsic
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
Definition: IRBuilder.cpp:834

llvm::IRBuilderBase::getInt32
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition: IRBuilder.h:522

llvm::IRBuilderBase::CreateLoad
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition: IRBuilder.h:1847

llvm::IRBuilderBase::CreateCall
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition: IRBuilder.h:2508

llvm::IRBuilderBase::SetInsertPoint
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition: IRBuilder.h:207

llvm::IRBuilder
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2780

llvm::Instruction
Definition: Instruction.h:69

llvm::LLVMContext
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:68

llvm::MDBuilder
Definition: MDBuilder.h:37

llvm::MDBuilder::createAnonymousAliasScope
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:181

llvm::MDBuilder::createAnonymousAliasScopeDomain
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:174

llvm::MDNode
Metadata node.
Definition: Metadata.h:1077

llvm::MDNode::getMostGenericAliasScope
static LLVM_ABI MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1142

llvm::MDNode::concatenate
static LLVM_ABI MDNode * concatenate(MDNode *A, MDNode *B)
Methods for metadata merging.
Definition: Metadata.cpp:1115

llvm::MDNode::get
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1565

llvm::MDNode::intersect
static LLVM_ABI MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1129

llvm::Metadata
Root of the metadata hierarchy.
Definition: Metadata.h:63

llvm::ModulePass
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:255

llvm::ModulePass::runOnModule
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.

llvm::Module
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:67

llvm::OperandBundleDefT
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1069

llvm::Pass::getAnalysisUsage
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:112

llvm::PointerType::getAddressSpace
unsigned getAddressSpace() const
Return the address space of the Pointer type.
Definition: DerivedTypes.h:740

llvm::PoisonValue::get
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1885

llvm::PreservedAnalyses
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:112

llvm::PreservedAnalyses::none
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:115

llvm::PreservedAnalyses::all
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:118

llvm::ScopedNoAliasAAResult
A simple AA result which uses scoped-noalias metadata to answer queries.
Definition: ScopedNoAliasAA.h:30

llvm::ScopedNoAliasAAResult::collectScopedDomains
LLVM_ABI void collectScopedDomains(const MDNode *NoAlias, SmallPtrSetImpl< const MDNode * > &Domains) const
Collect the set of scoped domains relevant to the noalias scopes.
Definition: ScopedNoAliasAA.cpp:118

llvm::SmallPtrSetImpl::count
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:470

llvm::SmallPtrSetImpl::insert
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:401

llvm::SmallPtrSet
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:541

llvm::SmallVectorBase::empty
bool empty() const
Definition: SmallVector.h:82

llvm::SmallVectorImpl::emplace_back
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:938

llvm::SmallVectorImpl::reserve
void reserve(size_type N)
Definition: SmallVector.h:664

llvm::SmallVectorImpl::append
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:684

llvm::SmallVectorTemplateBase::push_back
void push_back(const T &Elt)
Definition: SmallVector.h:414

llvm::SmallVectorTemplateCommon::end
iterator end()
Definition: SmallVector.h:270

llvm::SmallVector
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197

llvm::StructType
Class to represent struct types.
Definition: DerivedTypes.h:218

llvm::StructType::create
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:620

llvm::TargetPassConfig
Target-Independent Code Generator Pass Configuration Options.
Definition: TargetPassConfig.h:84

llvm::Twine
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:82

llvm::TypeSize
Definition: TypeSize.h:335

llvm::Type
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45

llvm::Type::getInt8Ty
static LLVM_ABI IntegerType * getInt8Ty(LLVMContext &C)

llvm::Type::getInt32Ty
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)

llvm::Type::getPointerAddressSpace
LLVM_ABI unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.

llvm::Use
A Use represents the edge between a Value definition and its users.
Definition: Use.h:35

llvm::User
Definition: User.h:44

llvm::Value
LLVM Value Representation.
Definition: Value.h:75

llvm::Value::replaceUsesWithIf
LLVM_ABI void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:554

llvm::Value::use_empty
bool use_empty() const
Definition: Value.h:346

llvm::Value::uses
iterator_range< use_iterator > uses()
Definition: Value.h:380

llvm::Value::getName
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:322

llvm::cl::opt
Definition: CommandLine.h:1429

llvm::detail::DenseSetImpl::insert
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:194

llvm::detail::DenseSetImpl::end
iterator end()
Definition: DenseSet.h:158

llvm::detail::DenseSetImpl::size
size_type size() const
Definition: DenseSet.h:87

llvm::detail::DenseSetImpl::empty
bool empty() const
Definition: DenseSet.h:86

llvm::detail::DenseSetImpl::contains
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:169

llvm::detail::DenseSetImpl::begin
iterator begin()
Definition: DenseSet.h:157

llvm::detail::DenseSetImpl::erase
bool erase(const ValueT &V)
Definition: DenseSet.h:100

llvm::raw_string_ostream
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:662

uint32_t

uint64_t

unsigned

false
Definition: MachinePipeliner.cpp:239

llvm::AMDGPUAS::LOCAL_ADDRESS
@ LOCAL_ADDRESS
Address space for local memory.
Definition: AMDGPUAddrSpace.h:34

llvm::AMDGPUAS::CONSTANT_ADDRESS
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
Definition: AMDGPUAddrSpace.h:35

llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP
@ BARRIER_SCOPE_WORKGROUP
Definition: SIDefines.h:1101

llvm::AMDGPU::isDynamicLDS
bool isDynamicLDS(const GlobalVariable &GV)
Definition: AMDGPUMemoryUtils.cpp:67

llvm::AMDGPU::removeFnAttrFromReachable
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
Definition: AMDGPUMemoryUtils.cpp:310

llvm::AMDGPU::getTransitiveUsesOfLDS
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
Definition: AMDGPUMemoryUtils.cpp:142

llvm::AMDGPU::isNamedBarrier
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
Definition: AMDGPUMemoryUtils.cpp:61

llvm::AMDGPU::isLDSVariableToLower
bool isLDSVariableToLower(const GlobalVariable &GV)
Definition: AMDGPUMemoryUtils.cpp:76

llvm::AMDGPU::eliminateConstantExprUsesOfLDSFromAllInstructions
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Definition: AMDGPUMemoryUtils.cpp:97

llvm::AMDGPU::getAlign
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
Definition: AMDGPUMemoryUtils.cpp:28

llvm::AMDGPU::isKernelLDS
bool isKernelLDS(const Function *F)
Definition: AMDGPUMemoryUtils.cpp:138

llvm::CallingConv::ID
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24

llvm::CallingConv::C
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34

llvm::Intrinsic::getOrInsertDeclaration
LLVM_ABI Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})
Look up the Function declaration of the intrinsic id in the Module M.
Definition: Intrinsics.cpp:751

llvm::cl::Hidden
@ Hidden
Definition: CommandLine.h:138

llvm::cl::values
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:712

llvm::cl::init
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:444

llvm
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18

llvm::Offset
@ Offset
Definition: DWP.cpp:477

llvm::operator<
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:362

llvm::size
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition: STLExtras.h:1702

llvm::set_is_subset
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
Definition: SetOperations.h:151

llvm::make_early_inc_range
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:663

llvm::HexPrintStyle::Lower
@ Lower

llvm::sort
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1669

llvm::AMDGPULowerModuleLDSLegacyPassID
char & AMDGPULowerModuleLDSLegacyPassID
Definition: AMDGPULowerModuleLDSPass.cpp:1561

llvm::set_intersection
S1Ty set_intersection(const S1Ty &S1, const S2Ty &S2)
set_intersection(A, B) - Return A ^ B
Definition: SetOperations.h:83

llvm::removeFromUsedLists
LLVM_ABI void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
Definition: ModuleUtils.cpp:196

llvm::format
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:126

llvm::createAMDGPULowerModuleLDSLegacyPass
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
Definition: AMDGPULowerModuleLDSPass.cpp:1572

llvm::appendToCompilerUsed
LLVM_ABI void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
Definition: ModuleUtils.cpp:162

llvm::performOptimizedStructLayout
LLVM_ABI std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
Definition: OptimizedStructLayout.cpp:43

llvm::alignTo
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155

llvm::BitWidth
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:223

llvm::commonAlignment
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212

llvm::reportFatalUsageError
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition: Error.cpp:180

raw_ostream.h

N
#define N

llvm::AMDGPULowerModuleLDSPass::run
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
Definition: AMDGPULowerModuleLDSPass.cpp:1576

llvm::AMDGPULowerModuleLDSPass::TM
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:139

llvm::AMDGPU::LDSUsesInfoTy
Definition: AMDGPUMemoryUtils.h:44

llvm::AMDGPU::LDSUsesInfoTy::direct_access
FunctionVariableMap direct_access
Definition: AMDGPUMemoryUtils.h:45

llvm::AMDGPU::LDSUsesInfoTy::indirect_access
FunctionVariableMap indirect_access
Definition: AMDGPUMemoryUtils.h:46

llvm::AMDGPU::LDSUsesInfoTy::HasSpecialGVs
bool HasSpecialGVs
Definition: AMDGPUMemoryUtils.h:47

llvm::Align
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39

llvm::Align::value
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85

llvm::OptimizedStructLayoutField
A field in a structure.
Definition: OptimizedStructLayout.h:46

llvm::cl::desc
Definition: CommandLine.h:410