LLVM: lib/Target/AMDGPU/SIMachineScheduler.cpp Source File

//===-- SIMachineScheduler.cpp - SI Scheduler Interface -------------------===//

//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//

//===----------------------------------------------------------------------===//

//

/// \file

/// SI Machine Scheduler interface

//

//===----------------------------------------------------------------------===//


#include "SIMachineScheduler.h"

#include "MCTargetDesc/AMDGPUMCTargetDesc.h"

#include "SIInstrInfo.h"

#include "llvm/CodeGen/LiveIntervals.h"

#include "llvm/CodeGen/MachineRegisterInfo.h"


using namespace llvm;


#define DEBUG_TYPE "machine-scheduler"


// This scheduler implements a different scheduling algorithm than

// GenericScheduler.

//

// There are several specific architecture behaviours that can't be modelled

// for GenericScheduler:

// . When accessing the result of an SGPR load instruction, you have to wait

// for all the SGPR load instructions before your current instruction to

// have finished.

// . When accessing the result of an VGPR load instruction, you have to wait

// for all the VGPR load instructions previous to the VGPR load instruction

// you are interested in to finish.

// . The less the register pressure, the best load latencies are hidden

//

// Moreover some specifities (like the fact a lot of instructions in the shader

// have few dependencies) makes the generic scheduler have some unpredictable

// behaviours. For example when register pressure becomes high, it can either

// manage to prevent register pressure from going too high, or it can

// increase register pressure even more than if it hadn't taken register

// pressure into account.

//

// Also some other bad behaviours are generated, like loading at the beginning

// of the shader a constant in VGPR you won't need until the end of the shader.

//

// The scheduling problem for SI can distinguish three main parts:

// . Hiding high latencies (texture sampling, etc)

// . Hiding low latencies (SGPR constant loading, etc)

// . Keeping register usage low for better latency hiding and general

//   performance

//

// Some other things can also affect performance, but are hard to predict

// (cache usage, the fact the HW can issue several instructions from different

// wavefronts if different types, etc)

//

// This scheduler tries to solve the scheduling problem by dividing it into

// simpler sub-problems. It divides the instructions into blocks, schedules

// locally inside the blocks where it takes care of low latencies, and then

// chooses the order of the blocks by taking care of high latencies.

// Dividing the instructions into blocks helps control keeping register

// usage low.

//

// First the instructions are put into blocks.

//   We want the blocks help control register usage and hide high latencies

//   later. To help control register usage, we typically want all local

//   computations, when for example you create a result that can be consumed

//   right away, to be contained in a block. Block inputs and outputs would

//   typically be important results that are needed in several locations of

//   the shader. Since we do want blocks to help hide high latencies, we want

//   the instructions inside the block to have a minimal set of dependencies

//   on high latencies. It will make it easy to pick blocks to hide specific

//   high latencies.

//   The block creation algorithm is divided into several steps, and several

//   variants can be tried during the scheduling process.

//

// Second the order of the instructions inside the blocks is chosen.

//   At that step we do take into account only register usage and hiding

//   low latency instructions

//

// Third the block order is chosen, there we try to hide high latencies

// and keep register usage low.

//

// After the third step, a pass is done to improve the hiding of low

// latencies.

//

// Actually when talking about 'low latency' or 'high latency' it includes

// both the latency to get the cache (or global mem) data go to the register,

// and the bandwidth limitations.

// Increasing the number of active wavefronts helps hide the former, but it

// doesn't solve the latter, thus why even if wavefront count is high, we have

// to try have as many instructions hiding high latencies as possible.

// The OpenCL doc says for example latency of 400 cycles for a global mem

// access, which is hidden by 10 instructions if the wavefront count is 10.


// Some figures taken from AMD docs:

// Both texture and constant L1 caches are 4-way associative with 64 bytes

// lines.

// Constant cache is shared with 4 CUs.

// For texture sampling, the address generation unit receives 4 texture

// addresses per cycle, thus we could expect texture sampling latency to be

// equivalent to 4 instructions in the very best case (a VGPR is 64 work items,

// instructions in a wavefront group are executed every 4 cycles),

// or 16 instructions if the other wavefronts associated to the 3 other VALUs

// of the CU do texture sampling too. (Don't take these figures too seriously,

// as I'm not 100% sure of the computation)

// Data exports should get similar latency.

// For constant loading, the cache is shader with 4 CUs.

// The doc says "a throughput of 16B/cycle for each of the 4 Compute Unit"

// I guess if the other CU don't read the cache, it can go up to 64B/cycle.

// It means a simple s_buffer_load should take one instruction to hide, as

// well as a s_buffer_loadx2 and potentially a s_buffer_loadx8 if on the same

// cache line.

//

// As of today the driver doesn't preload the constants in cache, thus the

// first loads get extra latency. The doc says global memory access can be

// 300-600 cycles. We do not specially take that into account when scheduling

// As we expect the driver to be able to preload the constants soon.


// common code //


#ifndef NDEBUG


static const char *getReasonStr(SIScheduleCandReason Reason) {

  switch (Reason) {

  case NoCand:         return "NOCAND";

  case RegUsage:       return "REGUSAGE";

  case Latency:        return "LATENCY";

  case Successor:      return "SUCCESSOR";

  case Depth:          return "DEPTH";

  case NodeOrder:      return "ORDER";

  }

  llvm_unreachable("Unknown reason!");

}


#endif


namespace llvm::SISched {

static bool tryLess(int TryVal, int CandVal,

                    SISchedulerCandidate &TryCand,

                    SISchedulerCandidate &Cand,

                    SIScheduleCandReason Reason) {

  if (TryVal < CandVal) {

    TryCand.Reason = Reason;

    return true;

  }

  if (TryVal > CandVal) {

    if (Cand.Reason > Reason)

      Cand.Reason = Reason;

    return true;

  }

  Cand.setRepeat(Reason);

  return false;

}


static bool tryGreater(int TryVal, int CandVal,

                       SISchedulerCandidate &TryCand,

                       SISchedulerCandidate &Cand,

                       SIScheduleCandReason Reason) {

  if (TryVal > CandVal) {

    TryCand.Reason = Reason;

    return true;

  }

  if (TryVal < CandVal) {

    if (Cand.Reason > Reason)

      Cand.Reason = Reason;

    return true;

  }

  Cand.setRepeat(Reason);

  return false;

}

} // end namespace llvm::SISched


// SIScheduleBlock //


void SIScheduleBlock::addUnit(SUnit *SU) {

  NodeNum2Index[SU->NodeNum] = SUnits.size();

  SUnits.push_back(SU);

}


#ifndef NDEBUG

void SIScheduleBlock::traceCandidate(const SISchedCandidate &Cand) {


  dbgs() << "  SU(" << Cand.SU->NodeNum << ") " << getReasonStr(Cand.Reason);

  dbgs() << '\n';

}

#endif


void SIScheduleBlock::tryCandidateTopDown(SISchedCandidate &Cand,

                                          SISchedCandidate &TryCand) {

  // Initialize the candidate if needed.

  if (!Cand.isValid()) {

    TryCand.Reason = NodeOrder;

    return;

  }


  if (Cand.SGPRUsage > 60 &&

      SISched::tryLess(TryCand.SGPRUsage, Cand.SGPRUsage,

                       TryCand, Cand, RegUsage))

    return;


  // Schedule low latency instructions as top as possible.

  // Order of priority is:

  // . Low latency instructions which do not depend on other low latency

  //   instructions we haven't waited for

  // . Other instructions which do not depend on low latency instructions

  //   we haven't waited for

  // . Low latencies

  // . All other instructions

  // Goal is to get: low latency instructions - independent instructions

  //     - (eventually some more low latency instructions)

  //     - instructions that depend on the first low latency instructions.

  // If in the block there is a lot of constant loads, the SGPR usage

  // could go quite high, thus above the arbitrary limit of 60 will encourage

  // use the already loaded constants (in order to release some SGPRs) before

  // loading more.

  if (SISched::tryLess(TryCand.HasLowLatencyNonWaitedParent,

                       Cand.HasLowLatencyNonWaitedParent,

                       TryCand, Cand, SIScheduleCandReason::Depth))

    return;


  if (SISched::tryGreater(TryCand.IsLowLatency, Cand.IsLowLatency,

                          TryCand, Cand, SIScheduleCandReason::Depth))

    return;


  if (TryCand.IsLowLatency &&

      SISched::tryLess(TryCand.LowLatencyOffset, Cand.LowLatencyOffset,

                       TryCand, Cand, SIScheduleCandReason::Depth))

    return;


  if (SISched::tryLess(TryCand.VGPRUsage, Cand.VGPRUsage,

                       TryCand, Cand, RegUsage))

    return;


  // Fall through to original instruction order.

  if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {

    TryCand.Reason = NodeOrder;

  }

}


SUnit* SIScheduleBlock::pickNode() {

  SISchedCandidate TopCand;


  for (SUnit* SU : TopReadySUs) {

    SISchedCandidate TryCand;

    std::vector<unsigned> pressure;

    std::vector<unsigned> MaxPressure;

    // Predict register usage after this instruction.

    TryCand.SU = SU;

    TopRPTracker.getDownwardPressure(SU->getInstr(), pressure, MaxPressure);

    TryCand.SGPRUsage = pressure[AMDGPU::RegisterPressureSets::SReg_32];

    TryCand.VGPRUsage = pressure[AMDGPU::RegisterPressureSets::VGPR_32];

    TryCand.IsLowLatency = DAG->IsLowLatencySU[SU->NodeNum];

    TryCand.LowLatencyOffset = DAG->LowLatencyOffset[SU->NodeNum];

    TryCand.HasLowLatencyNonWaitedParent =

      HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]];

    tryCandidateTopDown(TopCand, TryCand);

    if (TryCand.Reason != NoCand)

      TopCand.setBest(TryCand);

  }


  return TopCand.SU;

}


// Schedule something valid.

void SIScheduleBlock::fastSchedule() {

  TopReadySUs.clear();

  if (Scheduled)

    undoSchedule();


  for (SUnit* SU : SUnits) {

    if (!SU->NumPredsLeft)

      TopReadySUs.push_back(SU);

  }


  while (!TopReadySUs.empty()) {

    SUnit *SU = TopReadySUs[0];

    ScheduledSUnits.push_back(SU);

    nodeScheduled(SU);

  }


  Scheduled = true;

}


// Returns if the register was set between first and last.

static bool isDefBetween(Register Reg, SlotIndex First, SlotIndex Last,

                         const MachineRegisterInfo *MRI,

                         const LiveIntervals *LIS) {

  for (const MachineInstr &MI : MRI->def_instructions(Reg)) {

    if (MI.isDebugValue())

      continue;

    SlotIndex InstSlot = LIS->getInstructionIndex(MI).getRegSlot();

    if (InstSlot >= First && InstSlot <= Last)

      return true;

  }

  return false;

}


void SIScheduleBlock::initRegPressure(MachineBasicBlock::iterator BeginBlock,

                                      MachineBasicBlock::iterator EndBlock) {

  IntervalPressure Pressure, BotPressure;

  RegPressureTracker RPTracker(Pressure), BotRPTracker(BotPressure);

  LiveIntervals *LIS = DAG->getLIS();

  MachineRegisterInfo *MRI = DAG->getMRI();

  DAG->initRPTracker(TopRPTracker);

  DAG->initRPTracker(BotRPTracker);

  DAG->initRPTracker(RPTracker);


  // Goes though all SU. RPTracker captures what had to be alive for the SUs

  // to execute, and what is still alive at the end.

  for (SUnit* SU : ScheduledSUnits) {

    RPTracker.setPos(SU->getInstr());

    RPTracker.advance();

  }


  // Close the RPTracker to finalize live ins/outs.

  RPTracker.closeRegion();


  // Initialize the live ins and live outs.

  TopRPTracker.addLiveRegs(RPTracker.getPressure().LiveInRegs);

  BotRPTracker.addLiveRegs(RPTracker.getPressure().LiveOutRegs);


  // Do not Track Physical Registers, because it messes up.

  for (const auto &RegMaskPair : RPTracker.getPressure().LiveInRegs) {

    if (RegMaskPair.RegUnit.isVirtual())

      LiveInRegs.insert(RegMaskPair.RegUnit);

  }

  LiveOutRegs.clear();

  // There is several possibilities to distinguish:

  // 1) Reg is not input to any instruction in the block, but is output of one

  // 2) 1) + read in the block and not needed after it

  // 3) 1) + read in the block but needed in another block

  // 4) Reg is input of an instruction but another block will read it too

  // 5) Reg is input of an instruction and then rewritten in the block.

  //    result is not read in the block (implies used in another block)

  // 6) Reg is input of an instruction and then rewritten in the block.

  //    result is read in the block and not needed in another block

  // 7) Reg is input of an instruction and then rewritten in the block.

  //    result is read in the block but also needed in another block

  // LiveInRegs will contains all the regs in situation 4, 5, 6, 7

  // We want LiveOutRegs to contain only Regs whose content will be read after

  // in another block, and whose content was written in the current block,

  // that is we want it to get 1, 3, 5, 7

  // Since we made the MIs of a block to be packed all together before

  // scheduling, then the LiveIntervals were correct, and the RPTracker was

  // able to correctly handle 5 vs 6, 2 vs 3.

  // (Note: This is not sufficient for RPTracker to not do mistakes for case 4)

  // The RPTracker's LiveOutRegs has 1, 3, (some correct or incorrect)4, 5, 7

  // Comparing to LiveInRegs is not sufficient to differentiate 4 vs 5, 7

  // The use of findDefBetween removes the case 4.

  for (const auto &RegMaskPair : RPTracker.getPressure().LiveOutRegs) {

    Register Reg = RegMaskPair.RegUnit;

    if (Reg.isVirtual() &&

        isDefBetween(Reg, LIS->getInstructionIndex(*BeginBlock).getRegSlot(),

                     LIS->getInstructionIndex(*EndBlock).getRegSlot(), MRI,

                     LIS)) {

      LiveOutRegs.insert(Reg);

    }

  }


  // Pressure = sum_alive_registers register size

  // Internally llvm will represent some registers as big 128 bits registers

  // for example, but they actually correspond to 4 actual 32 bits registers.

  // Thus Pressure is not equal to num_alive_registers * constant.

  LiveInPressure = TopPressure.MaxSetPressure;

  LiveOutPressure = BotPressure.MaxSetPressure;


  // Prepares TopRPTracker for top down scheduling.

  TopRPTracker.closeTop();

}


void SIScheduleBlock::schedule(MachineBasicBlock::iterator BeginBlock,

                               MachineBasicBlock::iterator EndBlock) {

  if (!Scheduled)

    fastSchedule();


  // PreScheduling phase to set LiveIn and LiveOut.

  initRegPressure(BeginBlock, EndBlock);

  undoSchedule();


  // Schedule for real now.


  TopReadySUs.clear();


  for (SUnit* SU : SUnits) {

    if (!SU->NumPredsLeft)

      TopReadySUs.push_back(SU);

  }


  while (!TopReadySUs.empty()) {

    SUnit *SU = pickNode();

    ScheduledSUnits.push_back(SU);

    TopRPTracker.setPos(SU->getInstr());

    TopRPTracker.advance();

    nodeScheduled(SU);

  }


  // TODO: compute InternalAdditionalPressure.

  InternalAdditionalPressure.resize(TopPressure.MaxSetPressure.size());


  // Check everything is right.

#ifndef NDEBUG

  assert(SUnits.size() == ScheduledSUnits.size() &&

            TopReadySUs.empty());

  for (SUnit* SU : SUnits) {

    assert(SU->isScheduled &&

              SU->NumPredsLeft == 0);

  }

#endif


  Scheduled = true;

}


void SIScheduleBlock::undoSchedule() {

  for (SUnit* SU : SUnits) {

    SU->isScheduled = false;

    for (SDep& Succ : SU->Succs) {

      if (BC->isSUInBlock(Succ.getSUnit(), ID))

        undoReleaseSucc(SU, &Succ);

    }

  }

  HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);

  ScheduledSUnits.clear();

  Scheduled = false;

}


void SIScheduleBlock::undoReleaseSucc(SUnit *SU, SDep *SuccEdge) {

  SUnit *SuccSU = SuccEdge->getSUnit();


  if (SuccEdge->isWeak()) {

    ++SuccSU->WeakPredsLeft;

    return;

  }

  ++SuccSU->NumPredsLeft;

}


void SIScheduleBlock::releaseSucc(SUnit *SU, SDep *SuccEdge) {

  SUnit *SuccSU = SuccEdge->getSUnit();


  if (SuccEdge->isWeak()) {

    --SuccSU->WeakPredsLeft;

    return;

  }

#ifndef NDEBUG

  if (SuccSU->NumPredsLeft == 0) {

    dbgs() << "*** Scheduling failed! ***\n";

    DAG->dumpNode(*SuccSU);

    dbgs() << " has been released too many times!\n";

    llvm_unreachable(nullptr);

  }

#endif


  --SuccSU->NumPredsLeft;

}


/// Release Successors of the SU that are in the block or not.

void SIScheduleBlock::releaseSuccessors(SUnit *SU, bool InOrOutBlock) {

  for (SDep& Succ : SU->Succs) {

    SUnit *SuccSU = Succ.getSUnit();


    if (SuccSU->NodeNum >= DAG->SUnits.size())

        continue;


    if (BC->isSUInBlock(SuccSU, ID) != InOrOutBlock)

      continue;


    releaseSucc(SU, &Succ);

    if (SuccSU->NumPredsLeft == 0 && InOrOutBlock)

      TopReadySUs.push_back(SuccSU);

  }

}


void SIScheduleBlock::nodeScheduled(SUnit *SU) {

  // Is in TopReadySUs

  assert (!SU->NumPredsLeft);

  std::vector<SUnit *>::iterator I = llvm::find(TopReadySUs, SU);

  if (I == TopReadySUs.end()) {

    dbgs() << "Data Structure Bug in SI Scheduler\n";

    llvm_unreachable(nullptr);

  }

  TopReadySUs.erase(I);


  releaseSuccessors(SU, true);

  // Scheduling this node will trigger a wait,

  // thus propagate to other instructions that they do not need to wait either.

  if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]])

    HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);


  if (DAG->IsLowLatencySU[SU->NodeNum]) {

     for (SDep& Succ : SU->Succs) {

      std::map<unsigned, unsigned>::iterator I =

        NodeNum2Index.find(Succ.getSUnit()->NodeNum);

      if (I != NodeNum2Index.end())

        HasLowLatencyNonWaitedParent[I->second] = 1;

    }

  }

  SU->isScheduled = true;

}


void SIScheduleBlock::finalizeUnits() {

  // We remove links from outside blocks to enable scheduling inside the block.

  for (SUnit* SU : SUnits) {

    releaseSuccessors(SU, false);

    if (DAG->IsHighLatencySU[SU->NodeNum])

      HighLatencyBlock = true;

  }

  HasLowLatencyNonWaitedParent.resize(SUnits.size(), 0);

}


// we maintain ascending order of IDs

void SIScheduleBlock::addPred(SIScheduleBlock *Pred) {

  unsigned PredID = Pred->getID();


  // Check if not already predecessor.

  for (SIScheduleBlock* P : Preds) {

    if (PredID == P->getID())

      return;

  }

  Preds.push_back(Pred);


  assert(none_of(Succs,

                 [=](std::pair<SIScheduleBlock*,

                     SIScheduleBlockLinkKind> S) {

                   return PredID == S.first->getID();

                    }) &&

         "Loop in the Block Graph!");

}


void SIScheduleBlock::addSucc(SIScheduleBlock *Succ,

                              SIScheduleBlockLinkKind Kind) {

  unsigned SuccID = Succ->getID();


  // Check if not already predecessor.

  for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> &S : Succs) {

    if (SuccID == S.first->getID()) {

      if (S.second == SIScheduleBlockLinkKind::NoData &&

          Kind == SIScheduleBlockLinkKind::Data)

        S.second = Kind;

      return;

    }

  }

  if (Succ->isHighLatencyBlock())

    ++NumHighLatencySuccessors;

  Succs.emplace_back(Succ, Kind);


  assert(none_of(Preds,

                 [=](SIScheduleBlock *P) { return SuccID == P->getID(); }) &&

         "Loop in the Block Graph!");

}


#ifndef NDEBUG

void SIScheduleBlock::printDebug(bool full) {

  dbgs() << "Block (" << ID << ")\n";

  if (!full)

    return;


  dbgs() << "\nContains High Latency Instruction: "

         << HighLatencyBlock << '\n';

  dbgs() << "\nDepends On:\n";

  for (SIScheduleBlock* P : Preds) {

    P->printDebug(false);

  }


  dbgs() << "\nSuccessors:\n";

  for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> S : Succs) {

    if (S.second == SIScheduleBlockLinkKind::Data)

      dbgs() << "(Data Dep) ";

    S.first->printDebug(false);

  }


  if (Scheduled) {

    dbgs() << "LiveInPressure "

           << LiveInPressure[AMDGPU::RegisterPressureSets::SReg_32] << ' '

           << LiveInPressure[AMDGPU::RegisterPressureSets::VGPR_32] << '\n';

    dbgs() << "LiveOutPressure "

           << LiveOutPressure[AMDGPU::RegisterPressureSets::SReg_32] << ' '

           << LiveOutPressure[AMDGPU::RegisterPressureSets::VGPR_32] << "\n\n";

    dbgs() << "LiveIns:\n";

    for (Register Reg : LiveInRegs)

      dbgs() << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';


    dbgs() << "\nLiveOuts:\n";

    for (Register Reg : LiveOutRegs)

      dbgs() << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';

  }


  dbgs() << "\nInstructions:\n";

  for (const SUnit* SU : SUnits)

      DAG->dumpNode(*SU);


  dbgs() << "///////////////////////\n";

}

#endif


// SIScheduleBlockCreator //


SIScheduleBlockCreator::SIScheduleBlockCreator(SIScheduleDAGMI *DAG)

    : DAG(DAG) {}


SIScheduleBlocks

SIScheduleBlockCreator::getBlocks(SISchedulerBlockCreatorVariant BlockVariant) {

  std::map<SISchedulerBlockCreatorVariant, SIScheduleBlocks>::iterator B =

    Blocks.find(BlockVariant);

  if (B == Blocks.end()) {

    SIScheduleBlocks Res;

    createBlocksForVariant(BlockVariant);

    topologicalSort();

    scheduleInsideBlocks();

    fillStats();

    Res.Blocks = CurrentBlocks;

    Res.TopDownIndex2Block = TopDownIndex2Block;

    Res.TopDownBlock2Index = TopDownBlock2Index;

    Blocks[BlockVariant] = Res;

    return Res;

  }

  return B->second;

}


bool SIScheduleBlockCreator::isSUInBlock(SUnit *SU, unsigned ID) {

  if (SU->NodeNum >= DAG->SUnits.size())

    return false;

  return CurrentBlocks[Node2CurrentBlock[SU->NodeNum]]->getID() == ID;

}


void SIScheduleBlockCreator::colorHighLatenciesAlone() {

  unsigned DAGSize = DAG->SUnits.size();


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SUnit *SU = &DAG->SUnits[i];

    if (DAG->IsHighLatencySU[SU->NodeNum]) {

      CurrentColoring[SU->NodeNum] = NextReservedID++;

    }

  }

}


static bool

hasDataDependencyPred(const SUnit &SU, const SUnit &FromSU) {

  for (const auto &PredDep : SU.Preds) {

    if (PredDep.getSUnit() == &FromSU &&

        PredDep.getKind() == llvm::SDep::Data)

      return true;

  }

  return false;

}


void SIScheduleBlockCreator::colorHighLatenciesGroups() {

  unsigned DAGSize = DAG->SUnits.size();

  unsigned NumHighLatencies = 0;

  unsigned GroupSize;

  int Color = NextReservedID;

  unsigned Count = 0;

  std::set<unsigned> FormingGroup;


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SUnit *SU = &DAG->SUnits[i];

    if (DAG->IsHighLatencySU[SU->NodeNum])

      ++NumHighLatencies;

  }


  if (NumHighLatencies == 0)

    return;


  if (NumHighLatencies <= 6)

    GroupSize = 2;

  else if (NumHighLatencies <= 12)

    GroupSize = 3;

  else

    GroupSize = 4;


  for (unsigned SUNum : DAG->TopDownIndex2SU) {

    const SUnit &SU = DAG->SUnits[SUNum];

    if (DAG->IsHighLatencySU[SU.NodeNum]) {

      unsigned CompatibleGroup = true;

      int ProposedColor = Color;

      std::vector<int> AdditionalElements;


      // We don't want to put in the same block

      // two high latency instructions that depend

      // on each other.

      // One way would be to check canAddEdge

      // in both directions, but that currently is not

      // enough because there the high latency order is

      // enforced (via links).

      // Instead, look at the dependencies between the

      // high latency instructions and deduce if it is

      // a data dependency or not.

      for (unsigned j : FormingGroup) {

        bool HasSubGraph;

        std::vector<int> SubGraph;

        // By construction (topological order), if SU and

        // DAG->SUnits[j] are linked, DAG->SUnits[j] is necessary

        // in the parent graph of SU.

#ifndef NDEBUG

        SubGraph = DAG->GetTopo()->GetSubGraph(SU, DAG->SUnits[j],

                                               HasSubGraph);

        assert(!HasSubGraph);

#endif

        SubGraph = DAG->GetTopo()->GetSubGraph(DAG->SUnits[j], SU,

                                               HasSubGraph);

        if (!HasSubGraph)

          continue; // No dependencies between each other

        if (SubGraph.size() > 5) {

          // Too many elements would be required to be added to the block.

          CompatibleGroup = false;

          break;

        }

        // Check the type of dependency

        for (unsigned k : SubGraph) {

          // If in the path to join the two instructions,

          // there is another high latency instruction,

          // or instructions colored for another block

          // abort the merge.

          if (DAG->IsHighLatencySU[k] || (CurrentColoring[k] != ProposedColor &&

                                          CurrentColoring[k] != 0)) {

            CompatibleGroup = false;

            break;

          }

          // If one of the SU in the subgraph depends on the result of SU j,

          // there'll be a data dependency.

          if (hasDataDependencyPred(DAG->SUnits[k], DAG->SUnits[j])) {

            CompatibleGroup = false;

            break;

          }

        }

        if (!CompatibleGroup)

          break;

        // Same check for the SU

        if (hasDataDependencyPred(SU, DAG->SUnits[j])) {

          CompatibleGroup = false;

          break;

        }

        // Add all the required instructions to the block

        // These cannot live in another block (because they

        // depend (order dependency) on one of the

        // instruction in the block, and are required for the

        // high latency instruction we add.

        llvm::append_range(AdditionalElements, SubGraph);

      }

      if (CompatibleGroup) {

        FormingGroup.insert(SU.NodeNum);

        for (unsigned j : AdditionalElements)

          CurrentColoring[j] = ProposedColor;

        CurrentColoring[SU.NodeNum] = ProposedColor;

        ++Count;

      }

      // Found one incompatible instruction,

      // or has filled a big enough group.

      // -> start a new one.

      if (!CompatibleGroup) {

        FormingGroup.clear();

        Color = ++NextReservedID;

        ProposedColor = Color;

        FormingGroup.insert(SU.NodeNum);

        CurrentColoring[SU.NodeNum] = ProposedColor;

        Count = 0;

      } else if (Count == GroupSize) {

        FormingGroup.clear();

        Color = ++NextReservedID;

        ProposedColor = Color;

        Count = 0;

      }

    }

  }

}


void SIScheduleBlockCreator::colorComputeReservedDependencies() {

  unsigned DAGSize = DAG->SUnits.size();

  std::map<std::set<unsigned>, unsigned> ColorCombinations;


  CurrentTopDownReservedDependencyColoring.clear();

  CurrentBottomUpReservedDependencyColoring.clear();


  CurrentTopDownReservedDependencyColoring.resize(DAGSize, 0);

  CurrentBottomUpReservedDependencyColoring.resize(DAGSize, 0);


  // Traverse TopDown, and give different colors to SUs depending

  // on which combination of High Latencies they depend on.


  for (unsigned SUNum : DAG->TopDownIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;


    // Already given.

    if (CurrentColoring[SU->NodeNum]) {

      CurrentTopDownReservedDependencyColoring[SU->NodeNum] =

        CurrentColoring[SU->NodeNum];

      continue;

    }


   for (SDep& PredDep : SU->Preds) {

      SUnit *Pred = PredDep.getSUnit();

      if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)

        continue;

      if (CurrentTopDownReservedDependencyColoring[Pred->NodeNum] > 0)

        SUColors.insert(CurrentTopDownReservedDependencyColoring[Pred->NodeNum]);

    }

    // Color 0 by default.

    if (SUColors.empty())

      continue;

    // Same color than parents.

    if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)

      CurrentTopDownReservedDependencyColoring[SU->NodeNum] =

        *SUColors.begin();

    else {

      auto [Pos, Inserted] =

          ColorCombinations.try_emplace(SUColors, NextNonReservedID);

      if (Inserted)

        ++NextNonReservedID;

      CurrentTopDownReservedDependencyColoring[SU->NodeNum] = Pos->second;

    }

  }


  ColorCombinations.clear();


  // Same as before, but BottomUp.


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;


    // Already given.

    if (CurrentColoring[SU->NodeNum]) {

      CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =

        CurrentColoring[SU->NodeNum];

      continue;

    }


    for (SDep& SuccDep : SU->Succs) {

      SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0)

        SUColors.insert(CurrentBottomUpReservedDependencyColoring[Succ->NodeNum]);

    }

    // Keep color 0.

    if (SUColors.empty())

      continue;

    // Same color than parents.

    if (SUColors.size() == 1 && *SUColors.begin() > DAGSize)

      CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =

        *SUColors.begin();

    else {

      std::map<std::set<unsigned>, unsigned>::iterator Pos =

        ColorCombinations.find(SUColors);

      if (Pos != ColorCombinations.end()) {

        CurrentBottomUpReservedDependencyColoring[SU->NodeNum] = Pos->second;

      } else {

        CurrentBottomUpReservedDependencyColoring[SU->NodeNum] =

          NextNonReservedID;

        ColorCombinations[SUColors] = NextNonReservedID++;

      }

    }

  }

}


void SIScheduleBlockCreator::colorAccordingToReservedDependencies() {

  std::map<std::pair<unsigned, unsigned>, unsigned> ColorCombinations;


  // Every combination of colors given by the top down

  // and bottom up Reserved node dependency


  for (const SUnit &SU : DAG->SUnits) {

    std::pair<unsigned, unsigned> SUColors;


    // High latency instructions: already given.

    if (CurrentColoring[SU.NodeNum])

      continue;


    SUColors.first = CurrentTopDownReservedDependencyColoring[SU.NodeNum];

    SUColors.second = CurrentBottomUpReservedDependencyColoring[SU.NodeNum];


    auto [Pos, Inserted] =

        ColorCombinations.try_emplace(SUColors, NextNonReservedID);

    CurrentColoring[SU.NodeNum] = Pos->second;

    if (Inserted)

      NextNonReservedID++;

  }

}


void SIScheduleBlockCreator::colorEndsAccordingToDependencies() {

  unsigned DAGSize = DAG->SUnits.size();

  std::vector<int> PendingColoring = CurrentColoring;


  assert(DAGSize >= 1 &&

         CurrentBottomUpReservedDependencyColoring.size() == DAGSize &&

         CurrentTopDownReservedDependencyColoring.size() == DAGSize);

  // If there is no reserved block at all, do nothing. We don't want

  // everything in one block.

  if (*llvm::max_element(CurrentBottomUpReservedDependencyColoring) == 0 &&

      *llvm::max_element(CurrentTopDownReservedDependencyColoring) == 0)

    return;


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;

    std::set<unsigned> SUColorsPending;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    if (CurrentBottomUpReservedDependencyColoring[SU->NodeNum] > 0 ||

        CurrentTopDownReservedDependencyColoring[SU->NodeNum] > 0)

      continue;


    for (SDep& SuccDep : SU->Succs) {

      SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      if (CurrentBottomUpReservedDependencyColoring[Succ->NodeNum] > 0 ||

          CurrentTopDownReservedDependencyColoring[Succ->NodeNum] > 0)

        SUColors.insert(CurrentColoring[Succ->NodeNum]);

      SUColorsPending.insert(PendingColoring[Succ->NodeNum]);

    }

    // If there is only one child/parent block, and that block

    // is not among the ones we are removing in this path, then

    // merge the instruction to that block

    if (SUColors.size() == 1 && SUColorsPending.size() == 1)

      PendingColoring[SU->NodeNum] = *SUColors.begin();

    else // TODO: Attribute new colors depending on color

         // combination of children.

      PendingColoring[SU->NodeNum] = NextNonReservedID++;

  }

  CurrentColoring = PendingColoring;

}


void SIScheduleBlockCreator::colorForceConsecutiveOrderInGroup() {

  unsigned DAGSize = DAG->SUnits.size();

  unsigned PreviousColor;

  std::set<unsigned> SeenColors;


  if (DAGSize <= 1)

    return;


  PreviousColor = CurrentColoring[0];


  for (unsigned i = 1, e = DAGSize; i != e; ++i) {

    SUnit *SU = &DAG->SUnits[i];

    unsigned CurrentColor = CurrentColoring[i];

    unsigned PreviousColorSave = PreviousColor;

    assert(i == SU->NodeNum);


    if (CurrentColor != PreviousColor)

      SeenColors.insert(PreviousColor);

    PreviousColor = CurrentColor;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    if (SeenColors.find(CurrentColor) == SeenColors.end())

      continue;


    if (PreviousColorSave != CurrentColor)

      CurrentColoring[i] = NextNonReservedID++;

    else

      CurrentColoring[i] = CurrentColoring[i-1];

  }

}


void SIScheduleBlockCreator::colorMergeConstantLoadsNextGroup() {

  unsigned DAGSize = DAG->SUnits.size();


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    // No predecessor: Vgpr constant loading.

    // Low latency instructions usually have a predecessor (the address)

    if (SU->Preds.size() > 0 && !DAG->IsLowLatencySU[SU->NodeNum])

      continue;


    for (SDep& SuccDep : SU->Succs) {

      SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      SUColors.insert(CurrentColoring[Succ->NodeNum]);

    }

    if (SUColors.size() == 1)

      CurrentColoring[SU->NodeNum] = *SUColors.begin();

  }

}


void SIScheduleBlockCreator::colorMergeIfPossibleNextGroup() {

  unsigned DAGSize = DAG->SUnits.size();


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    for (SDep& SuccDep : SU->Succs) {

       SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      SUColors.insert(CurrentColoring[Succ->NodeNum]);

    }

    if (SUColors.size() == 1)

      CurrentColoring[SU->NodeNum] = *SUColors.begin();

  }

}


void SIScheduleBlockCreator::colorMergeIfPossibleNextGroupOnlyForReserved() {

  unsigned DAGSize = DAG->SUnits.size();


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    std::set<unsigned> SUColors;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    for (SDep& SuccDep : SU->Succs) {

       SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      SUColors.insert(CurrentColoring[Succ->NodeNum]);

    }

    if (SUColors.size() == 1 && *SUColors.begin() <= DAGSize)

      CurrentColoring[SU->NodeNum] = *SUColors.begin();

  }

}


void SIScheduleBlockCreator::colorMergeIfPossibleSmallGroupsToNextGroup() {

  unsigned DAGSize = DAG->SUnits.size();

  std::map<unsigned, unsigned> ColorCount;


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    unsigned color = CurrentColoring[SU->NodeNum];

     ++ColorCount[color];

  }


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    unsigned color = CurrentColoring[SU->NodeNum];

    std::set<unsigned> SUColors;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    if (ColorCount[color] > 1)

      continue;


    for (SDep& SuccDep : SU->Succs) {

       SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      SUColors.insert(CurrentColoring[Succ->NodeNum]);

    }

    if (SUColors.size() == 1 && *SUColors.begin() != color) {

      --ColorCount[color];

      CurrentColoring[SU->NodeNum] = *SUColors.begin();

      ++ColorCount[*SUColors.begin()];

    }

  }

}


void SIScheduleBlockCreator::cutHugeBlocks() {

  // TODO

}


void SIScheduleBlockCreator::regroupNoUserInstructions() {

  unsigned DAGSize = DAG->SUnits.size();

  int GroupID = NextNonReservedID++;


  for (unsigned SUNum : DAG->BottomUpIndex2SU) {

    SUnit *SU = &DAG->SUnits[SUNum];

    bool hasSuccessor = false;


    if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)

      continue;


    for (SDep& SuccDep : SU->Succs) {

       SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      hasSuccessor = true;

    }

    if (!hasSuccessor)

      CurrentColoring[SU->NodeNum] = GroupID;

  }

}


void SIScheduleBlockCreator::colorExports() {

  unsigned ExportColor = NextNonReservedID++;

  SmallVector<unsigned, 8> ExpGroup;


  // Put all exports together in a block.

  // The block will naturally end up being scheduled last,

  // thus putting exports at the end of the schedule, which

  // is better for performance.

  // However we must ensure, for safety, the exports can be put

  // together in the same block without any other instruction.

  // This could happen, for example, when scheduling after regalloc

  // if reloading a spilled register from memory using the same

  // register than used in a previous export.

  // If that happens, do not regroup the exports.

  for (unsigned SUNum : DAG->TopDownIndex2SU) {

    const SUnit &SU = DAG->SUnits[SUNum];

    if (SIInstrInfo::isEXP(*SU.getInstr())) {

      // SU is an export instruction. Check whether one of its successor

      // dependencies is a non-export, in which case we skip export grouping.

      for (const SDep &SuccDep : SU.Succs) {

        const SUnit *SuccSU = SuccDep.getSUnit();

        if (SuccDep.isWeak() || SuccSU->NodeNum >= DAG->SUnits.size()) {

          // Ignore these dependencies.

          continue;

        }

        assert(SuccSU->isInstr() &&

               "SUnit unexpectedly not representing an instruction!");


        if (!SIInstrInfo::isEXP(*SuccSU->getInstr())) {

          // A non-export depends on us. Skip export grouping.

          // Note that this is a bit pessimistic: We could still group all other

          // exports that are not depended on by non-exports, directly or

          // indirectly. Simply skipping this particular export but grouping all

          // others would not account for indirect dependencies.

          return;

        }

      }

      ExpGroup.push_back(SUNum);

    }

  }


  // The group can be formed. Give the color.

  for (unsigned j : ExpGroup)

    CurrentColoring[j] = ExportColor;

}


void SIScheduleBlockCreator::createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant) {

  unsigned DAGSize = DAG->SUnits.size();

  std::map<unsigned,unsigned> RealID;


  CurrentBlocks.clear();

  CurrentColoring.clear();

  CurrentColoring.resize(DAGSize, 0);

  Node2CurrentBlock.clear();


  // Restore links previous scheduling variant has overridden.

  DAG->restoreSULinksLeft();


  NextReservedID = 1;

  NextNonReservedID = DAGSize + 1;


  LLVM_DEBUG(dbgs() << "Coloring the graph\n");


  if (BlockVariant == SISchedulerBlockCreatorVariant::LatenciesGrouped)

    colorHighLatenciesGroups();

  else

    colorHighLatenciesAlone();

  colorComputeReservedDependencies();

  colorAccordingToReservedDependencies();

  colorEndsAccordingToDependencies();

  if (BlockVariant == SISchedulerBlockCreatorVariant::LatenciesAlonePlusConsecutive)

    colorForceConsecutiveOrderInGroup();

  regroupNoUserInstructions();

  colorMergeConstantLoadsNextGroup();

  colorMergeIfPossibleNextGroupOnlyForReserved();

  colorExports();


  // Put SUs of same color into same block

  Node2CurrentBlock.resize(DAGSize, -1);

  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SUnit *SU = &DAG->SUnits[i];

    unsigned Color = CurrentColoring[SU->NodeNum];

    auto [It, Inserted] = RealID.try_emplace(Color);

    if (Inserted) {

      int ID = CurrentBlocks.size();

      BlockPtrs.push_back(std::make_unique<SIScheduleBlock>(DAG, this, ID));

      CurrentBlocks.push_back(BlockPtrs.rbegin()->get());

      It->second = ID;

    }

    CurrentBlocks[It->second]->addUnit(SU);

    Node2CurrentBlock[SU->NodeNum] = It->second;

  }


  // Build dependencies between blocks.

  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SUnit *SU = &DAG->SUnits[i];

    int SUID = Node2CurrentBlock[i];

     for (SDep& SuccDep : SU->Succs) {

       SUnit *Succ = SuccDep.getSUnit();

      if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

        continue;

      if (Node2CurrentBlock[Succ->NodeNum] != SUID)

        CurrentBlocks[SUID]->addSucc(CurrentBlocks[Node2CurrentBlock[Succ->NodeNum]],

                                     SuccDep.isCtrl() ? NoData : Data);

    }

    for (SDep& PredDep : SU->Preds) {

      SUnit *Pred = PredDep.getSUnit();

      if (PredDep.isWeak() || Pred->NodeNum >= DAGSize)

        continue;

      if (Node2CurrentBlock[Pred->NodeNum] != SUID)

        CurrentBlocks[SUID]->addPred(CurrentBlocks[Node2CurrentBlock[Pred->NodeNum]]);

    }

  }


  // Free root and leafs of all blocks to enable scheduling inside them.

  for (SIScheduleBlock *Block : CurrentBlocks)

    Block->finalizeUnits();

  LLVM_DEBUG({

    dbgs() << "Blocks created:\n\n";

    for (SIScheduleBlock *Block : CurrentBlocks)

      Block->printDebug(true);

  });

}


// Two functions taken from Codegen/MachineScheduler.cpp


/// Non-const version.

static MachineBasicBlock::iterator

nextIfDebug(MachineBasicBlock::iterator I,

            MachineBasicBlock::const_iterator End) {

  for (; I != End; ++I) {

    if (!I->isDebugInstr())

      break;

  }

  return I;

}


void SIScheduleBlockCreator::topologicalSort() {

  unsigned DAGSize = CurrentBlocks.size();

  std::vector<int> WorkList;


  LLVM_DEBUG(dbgs() << "Topological Sort\n");


  WorkList.reserve(DAGSize);

  TopDownIndex2Block.resize(DAGSize);

  TopDownBlock2Index.resize(DAGSize);

  BottomUpIndex2Block.resize(DAGSize);


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SIScheduleBlock *Block = CurrentBlocks[i];

    unsigned Degree = Block->getSuccs().size();

    TopDownBlock2Index[i] = Degree;

    if (Degree == 0) {

      WorkList.push_back(i);

    }

  }


  int Id = DAGSize;

  while (!WorkList.empty()) {

    int i = WorkList.back();

    SIScheduleBlock *Block = CurrentBlocks[i];

    WorkList.pop_back();

    TopDownBlock2Index[i] = --Id;

    TopDownIndex2Block[Id] = i;

    for (SIScheduleBlock* Pred : Block->getPreds()) {

      if (!--TopDownBlock2Index[Pred->getID()])

        WorkList.push_back(Pred->getID());

    }

  }


#ifndef NDEBUG

  // Check correctness of the ordering.

  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SIScheduleBlock *Block = CurrentBlocks[i];

    for (SIScheduleBlock* Pred : Block->getPreds()) {

      assert(TopDownBlock2Index[i] > TopDownBlock2Index[Pred->getID()] &&

      "Wrong Top Down topological sorting");

    }

  }

#endif


  BottomUpIndex2Block = std::vector<int>(TopDownIndex2Block.rbegin(),

                                         TopDownIndex2Block.rend());

}


void SIScheduleBlockCreator::scheduleInsideBlocks() {

  unsigned DAGSize = CurrentBlocks.size();


  LLVM_DEBUG(dbgs() << "\nScheduling Blocks\n\n");


  // We do schedule a valid scheduling such that a Block corresponds

  // to a range of instructions.

  LLVM_DEBUG(dbgs() << "First phase: Fast scheduling for Reg Liveness\n");

  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SIScheduleBlock *Block = CurrentBlocks[i];

    Block->fastSchedule();

  }


  // Note: the following code, and the part restoring previous position

  // is by far the most expensive operation of the Scheduler.


  // Do not update CurrentTop.

  MachineBasicBlock::iterator CurrentTopFastSched = DAG->getCurrentTop();

  std::vector<MachineBasicBlock::iterator> PosOld;

  std::vector<MachineBasicBlock::iterator> PosNew;

  PosOld.reserve(DAG->SUnits.size());

  PosNew.reserve(DAG->SUnits.size());


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    int BlockIndice = TopDownIndex2Block[i];

    SIScheduleBlock *Block = CurrentBlocks[BlockIndice];

    std::vector<SUnit*> SUs = Block->getScheduledUnits();


    for (SUnit* SU : SUs) {

      MachineInstr *MI = SU->getInstr();

      MachineBasicBlock::iterator Pos = MI;

      PosOld.push_back(Pos);

      if (&*CurrentTopFastSched == MI) {

        PosNew.push_back(Pos);

        CurrentTopFastSched = nextIfDebug(++CurrentTopFastSched,

                                          DAG->getCurrentBottom());

      } else {

        // Update the instruction stream.

        DAG->getBB()->splice(CurrentTopFastSched, DAG->getBB(), MI);


        // Update LiveIntervals.

        // Note: Moving all instructions and calling handleMove every time

        // is the most cpu intensive operation of the scheduler.

        // It would gain a lot if there was a way to recompute the

        // LiveIntervals for the entire scheduling region.

        DAG->getLIS()->handleMove(*MI, /*UpdateFlags=*/true);

        PosNew.push_back(CurrentTopFastSched);

      }

    }

  }


  // Now we have Block of SUs == Block of MI.

  // We do the final schedule for the instructions inside the block.

  // The property that all the SUs of the Block are grouped together as MI

  // is used for correct reg usage tracking.

  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    SIScheduleBlock *Block = CurrentBlocks[i];

    std::vector<SUnit*> SUs = Block->getScheduledUnits();

    Block->schedule((*SUs.begin())->getInstr(), (*SUs.rbegin())->getInstr());

  }


  LLVM_DEBUG(dbgs() << "Restoring MI Pos\n");

  // Restore old ordering (which prevents a LIS->handleMove bug).

  for (unsigned i = PosOld.size(), e = 0; i != e; --i) {

    MachineBasicBlock::iterator POld = PosOld[i-1];

    MachineBasicBlock::iterator PNew = PosNew[i-1];

    if (PNew != POld) {

      // Update the instruction stream.

      DAG->getBB()->splice(POld, DAG->getBB(), PNew);


      // Update LiveIntervals.

      DAG->getLIS()->handleMove(*POld, /*UpdateFlags=*/true);

    }

  }


  LLVM_DEBUG({

    for (SIScheduleBlock *Block : CurrentBlocks)

      Block->printDebug(true);

  });

}


void SIScheduleBlockCreator::fillStats() {

  unsigned DAGSize = CurrentBlocks.size();


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    int BlockIndice = TopDownIndex2Block[i];

    SIScheduleBlock *Block = CurrentBlocks[BlockIndice];

    if (Block->getPreds().empty())

      Block->Depth = 0;

    else {

      unsigned Depth = 0;

      for (SIScheduleBlock *Pred : Block->getPreds()) {

        if (Depth < Pred->Depth + Pred->getCost())

          Depth = Pred->Depth + Pred->getCost();

      }

      Block->Depth = Depth;

    }

  }


  for (unsigned i = 0, e = DAGSize; i != e; ++i) {

    int BlockIndice = BottomUpIndex2Block[i];

    SIScheduleBlock *Block = CurrentBlocks[BlockIndice];

    if (Block->getSuccs().empty())

      Block->Height = 0;

    else {

      unsigned Height = 0;

      for (const auto &Succ : Block->getSuccs())

        Height = std::max(Height, Succ.first->Height + Succ.first->getCost());

      Block->Height = Height;

    }

  }

}


// SIScheduleBlockScheduler //


SIScheduleBlockScheduler::SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,

                                                   SISchedulerBlockSchedulerVariant Variant,

                                                   SIScheduleBlocks  BlocksStruct) :

  DAG(DAG), Variant(Variant), Blocks(BlocksStruct.Blocks),

  LastPosWaitedHighLatency(0), NumBlockScheduled(0), VregCurrentUsage(0),

  SregCurrentUsage(0), maxVregUsage(0), maxSregUsage(0) {


  // Fill the usage of every output

  // Warning: while by construction we always have a link between two blocks

  // when one needs a result from the other, the number of users of an output

  // is not the sum of child blocks having as input the same virtual register.

  // Here is an example. A produces x and y. B eats x and produces x'.

  // C eats x' and y. The register coalescer may have attributed the same

  // virtual register to x and x'.

  // To count accurately, we do a topological sort. In case the register is

  // found for several parents, we increment the usage of the one with the

  // highest topological index.

  LiveOutRegsNumUsages.resize(Blocks.size());

  for (SIScheduleBlock *Block : Blocks) {

    for (Register Reg : Block->getInRegs()) {

      bool Found = false;

      int topoInd = -1;

      for (SIScheduleBlock* Pred: Block->getPreds()) {

        std::set<Register> PredOutRegs = Pred->getOutRegs();

        std::set<Register>::iterator RegPos = PredOutRegs.find(Reg);


        if (RegPos != PredOutRegs.end()) {

          Found = true;

          if (topoInd < BlocksStruct.TopDownBlock2Index[Pred->getID()]) {

            topoInd = BlocksStruct.TopDownBlock2Index[Pred->getID()];

          }

        }

      }


      if (!Found)

        continue;


      int PredID = BlocksStruct.TopDownIndex2Block[topoInd];

      ++LiveOutRegsNumUsages[PredID][Reg];

    }

  }


  LastPosHighLatencyParentScheduled.resize(Blocks.size(), 0);

  BlockNumPredsLeft.resize(Blocks.size());

  BlockNumSuccsLeft.resize(Blocks.size());


  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {

    SIScheduleBlock *Block = Blocks[i];

    BlockNumPredsLeft[i] = Block->getPreds().size();

    BlockNumSuccsLeft[i] = Block->getSuccs().size();

  }


#ifndef NDEBUG

  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {

    SIScheduleBlock *Block = Blocks[i];

    assert(Block->getID() == i);

  }

#endif


  std::set<Register> InRegs = DAG->getInRegs();

  addLiveRegs(InRegs);


  // Increase LiveOutRegsNumUsages for blocks

  // producing registers consumed in another

  // scheduling region.

  for (Register Reg : DAG->getOutRegs()) {

    for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {

      // Do reverse traversal

      int ID = BlocksStruct.TopDownIndex2Block[Blocks.size()-1-i];

      SIScheduleBlock *Block = Blocks[ID];

      const std::set<Register> &OutRegs = Block->getOutRegs();


      if (OutRegs.find(Reg) == OutRegs.end())

        continue;


      ++LiveOutRegsNumUsages[ID][Reg];

      break;

    }

  }


  // Fill LiveRegsConsumers for regs that were already

  // defined before scheduling.

  for (SIScheduleBlock *Block : Blocks) {

    for (Register Reg : Block->getInRegs()) {

      bool Found = false;

      for (SIScheduleBlock* Pred: Block->getPreds()) {

        std::set<Register> PredOutRegs = Pred->getOutRegs();

        std::set<Register>::iterator RegPos = PredOutRegs.find(Reg);


        if (RegPos != PredOutRegs.end()) {

          Found = true;

          break;

        }

      }


      if (!Found)

        ++LiveRegsConsumers[Reg];

    }

  }


  for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {

    SIScheduleBlock *Block = Blocks[i];

    if (BlockNumPredsLeft[i] == 0) {

      ReadyBlocks.push_back(Block);

    }

  }


  while (SIScheduleBlock *Block = pickBlock()) {

    BlocksScheduled.push_back(Block);

    blockScheduled(Block);

  }


  LLVM_DEBUG(dbgs() << "Block Order:"; for (SIScheduleBlock *Block

                                            : BlocksScheduled) {

    dbgs() << ' ' << Block->getID();

  } dbgs() << '\n';);

}


bool SIScheduleBlockScheduler::tryCandidateLatency(SIBlockSchedCandidate &Cand,

                                                   SIBlockSchedCandidate &TryCand) {

  if (!Cand.isValid()) {

    TryCand.Reason = NodeOrder;

    return true;

  }


  // Try to hide high latencies.

  if (SISched::tryLess(TryCand.LastPosHighLatParentScheduled,

                 Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))

    return true;

  // Schedule high latencies early so you can hide them better.

  if (SISched::tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,

                          TryCand, Cand, Latency))

    return true;

  if (TryCand.IsHighLatency && SISched::tryGreater(TryCand.Height, Cand.Height,

                                                   TryCand, Cand, Depth))

    return true;

  if (SISched::tryGreater(TryCand.NumHighLatencySuccessors,

                          Cand.NumHighLatencySuccessors,

                          TryCand, Cand, Successor))

    return true;

  return false;

}


bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,

                                                    SIBlockSchedCandidate &TryCand) {

  if (!Cand.isValid()) {

    TryCand.Reason = NodeOrder;

    return true;

  }


  if (SISched::tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,

                       TryCand, Cand, RegUsage))

    return true;

  if (SISched::tryGreater(TryCand.NumSuccessors > 0,

                          Cand.NumSuccessors > 0,

                          TryCand, Cand, Successor))

    return true;

  if (SISched::tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))

    return true;

  if (SISched::tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,

                       TryCand, Cand, RegUsage))

    return true;

  return false;

}


SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {

  SIBlockSchedCandidate Cand;

  std::vector<SIScheduleBlock*>::iterator Best;

  SIScheduleBlock *Block;

  if (ReadyBlocks.empty())

    return nullptr;


  DAG->fillVgprSgprCost(LiveRegs.begin(), LiveRegs.end(),

                        VregCurrentUsage, SregCurrentUsage);

  if (VregCurrentUsage > maxVregUsage)

    maxVregUsage = VregCurrentUsage;

  if (SregCurrentUsage > maxSregUsage)

    maxSregUsage = SregCurrentUsage;

  LLVM_DEBUG(dbgs() << "Picking New Blocks\n"; dbgs() << "Available: ";

             for (SIScheduleBlock *Block : ReadyBlocks)

               dbgs() << Block->getID() << ' ';

             dbgs() << "\nCurrent Live:\n";

             for (Register Reg : LiveRegs)

               dbgs() << printVRegOrUnit(Reg, DAG->getTRI()) << ' ';

             dbgs() << '\n';

             dbgs() << "Current VGPRs: " << VregCurrentUsage << '\n';

             dbgs() << "Current SGPRs: " << SregCurrentUsage << '\n';);


  Cand.Block = nullptr;

  for (std::vector<SIScheduleBlock*>::iterator I = ReadyBlocks.begin(),

       E = ReadyBlocks.end(); I != E; ++I) {

    SIBlockSchedCandidate TryCand;

    TryCand.Block = *I;

    TryCand.IsHighLatency = TryCand.Block->isHighLatencyBlock();

    TryCand.VGPRUsageDiff =

      checkRegUsageImpact(TryCand.Block->getInRegs(),

          TryCand.Block->getOutRegs())[AMDGPU::RegisterPressureSets::VGPR_32];

    TryCand.NumSuccessors = TryCand.Block->getSuccs().size();

    TryCand.NumHighLatencySuccessors =

      TryCand.Block->getNumHighLatencySuccessors();

    TryCand.LastPosHighLatParentScheduled =

      (unsigned int) std::max<int> (0,

         LastPosHighLatencyParentScheduled[TryCand.Block->getID()] -

           LastPosWaitedHighLatency);

    TryCand.Height = TryCand.Block->Height;

    // Try not to increase VGPR usage too much, else we may spill.

    if (VregCurrentUsage > 120 ||

        Variant != SISchedulerBlockSchedulerVariant::BlockLatencyRegUsage) {

      if (!tryCandidateRegUsage(Cand, TryCand) &&

          Variant != SISchedulerBlockSchedulerVariant::BlockRegUsage)

        tryCandidateLatency(Cand, TryCand);

    } else {

      if (!tryCandidateLatency(Cand, TryCand))

        tryCandidateRegUsage(Cand, TryCand);

    }

    if (TryCand.Reason != NoCand) {

      Cand.setBest(TryCand);

      Best = I;

      LLVM_DEBUG(dbgs() << "Best Current Choice: " << Cand.Block->getID() << ' '

                        << getReasonStr(Cand.Reason) << '\n');

    }

  }


  LLVM_DEBUG(dbgs() << "Picking: " << Cand.Block->getID() << '\n';

             dbgs() << "Is a block with high latency instruction: "

                    << (Cand.IsHighLatency ? "yes\n" : "no\n");

             dbgs() << "Position of last high latency dependency: "

                    << Cand.LastPosHighLatParentScheduled << '\n';

             dbgs() << "VGPRUsageDiff: " << Cand.VGPRUsageDiff << '\n';

             dbgs() << '\n';);


  Block = Cand.Block;

  ReadyBlocks.erase(Best);

  return Block;

}


// Tracking of currently alive registers to determine VGPR Usage.


void SIScheduleBlockScheduler::addLiveRegs(std::set<Register> &Regs) {

  for (Register Reg : Regs) {

    // For now only track virtual registers.

    if (!Reg.isVirtual())

      continue;

    // If not already in the live set, then add it.

    (void) LiveRegs.insert(Reg);

  }

}


void SIScheduleBlockScheduler::decreaseLiveRegs(SIScheduleBlock *Block,

                                                std::set<Register> &Regs) {

  for (Register Reg : Regs) {

    // For now only track virtual registers.

    std::set<Register>::iterator Pos = LiveRegs.find(Reg);

    assert (Pos != LiveRegs.end() && // Reg must be live.

               LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end() &&

               LiveRegsConsumers[Reg] >= 1);

    --LiveRegsConsumers[Reg];

    if (LiveRegsConsumers[Reg] == 0)

      LiveRegs.erase(Pos);

  }

}


void SIScheduleBlockScheduler::releaseBlockSuccs(SIScheduleBlock *Parent) {

  for (const auto &Block : Parent->getSuccs()) {

    if (--BlockNumPredsLeft[Block.first->getID()] == 0)

      ReadyBlocks.push_back(Block.first);


    if (Parent->isHighLatencyBlock() &&

        Block.second == SIScheduleBlockLinkKind::Data)

      LastPosHighLatencyParentScheduled[Block.first->getID()] = NumBlockScheduled;

  }

}


void SIScheduleBlockScheduler::blockScheduled(SIScheduleBlock *Block) {

  decreaseLiveRegs(Block, Block->getInRegs());

  addLiveRegs(Block->getOutRegs());

  releaseBlockSuccs(Block);

  for (const auto &RegP : LiveOutRegsNumUsages[Block->getID()]) {

    // We produce this register, thus it must not be previously alive.

    assert(LiveRegsConsumers.find(RegP.first) == LiveRegsConsumers.end() ||

           LiveRegsConsumers[RegP.first] == 0);

    LiveRegsConsumers[RegP.first] += RegP.second;

  }

  if (LastPosHighLatencyParentScheduled[Block->getID()] >

        (unsigned)LastPosWaitedHighLatency)

    LastPosWaitedHighLatency =

      LastPosHighLatencyParentScheduled[Block->getID()];

  ++NumBlockScheduled;

}


std::vector<int>

SIScheduleBlockScheduler::checkRegUsageImpact(std::set<Register> &InRegs,

                                              std::set<Register> &OutRegs) {

  std::vector<int> DiffSetPressure;

  DiffSetPressure.assign(DAG->getTRI()->getNumRegPressureSets(), 0);


  for (Register Reg : InRegs) {

    // For now only track virtual registers.

    if (!Reg.isVirtual())

      continue;

    if (LiveRegsConsumers[Reg] > 1)

      continue;

    PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);

    for (; PSetI.isValid(); ++PSetI) {

      DiffSetPressure[*PSetI] -= PSetI.getWeight();

    }

  }


  for (Register Reg : OutRegs) {

    // For now only track virtual registers.

    if (!Reg.isVirtual())

      continue;

    PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);

    for (; PSetI.isValid(); ++PSetI) {

      DiffSetPressure[*PSetI] += PSetI.getWeight();

    }

  }


  return DiffSetPressure;

}


// SIScheduler //


struct SIScheduleBlockResult

SIScheduler::scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,

                             SISchedulerBlockSchedulerVariant ScheduleVariant) {

  SIScheduleBlocks Blocks = BlockCreator.getBlocks(BlockVariant);

  SIScheduleBlockScheduler Scheduler(DAG, ScheduleVariant, Blocks);

  std::vector<SIScheduleBlock*> ScheduledBlocks;

  struct SIScheduleBlockResult Res;


  ScheduledBlocks = Scheduler.getBlocks();


  for (SIScheduleBlock *Block : ScheduledBlocks) {

    std::vector<SUnit*> SUs = Block->getScheduledUnits();


    for (SUnit* SU : SUs)

      Res.SUs.push_back(SU->NodeNum);

  }


  Res.MaxSGPRUsage = Scheduler.getSGPRUsage();

  Res.MaxVGPRUsage = Scheduler.getVGPRUsage();

  return Res;

}


// SIScheduleDAGMI //


SIScheduleDAGMI::SIScheduleDAGMI(MachineSchedContext *C) :

  ScheduleDAGMILive(C, std::make_unique<GenericScheduler>(C)) {

  SITII = static_cast<const SIInstrInfo*>(TII);

  SITRI = static_cast<const SIRegisterInfo*>(TRI);

}


SIScheduleDAGMI::~SIScheduleDAGMI() = default;


// Code adapted from scheduleDAG.cpp

// Does a topological sort over the SUs.

// Both TopDown and BottomUp

void SIScheduleDAGMI::topologicalSort() {

  Topo.InitDAGTopologicalSorting();


  TopDownIndex2SU = std::vector<int>(Topo.begin(), Topo.end());

  BottomUpIndex2SU = std::vector<int>(Topo.rbegin(), Topo.rend());

}


// Move low latencies further from their user without

// increasing SGPR usage (in general)

// This is to be replaced by a better pass that would

// take into account SGPR usage (based on VGPR Usage

// and the corresponding wavefront count), that would

// try to merge groups of loads if it make sense, etc

void SIScheduleDAGMI::moveLowLatencies() {

   unsigned DAGSize = SUnits.size();

   int LastLowLatencyUser = -1;

   int LastLowLatencyPos = -1;


   for (unsigned i = 0, e = ScheduledSUnits.size(); i != e; ++i) {

    SUnit *SU = &SUnits[ScheduledSUnits[i]];

    bool IsLowLatencyUser = false;

    unsigned MinPos = 0;


    for (SDep& PredDep : SU->Preds) {

      SUnit *Pred = PredDep.getSUnit();

      if (SITII->isLowLatencyInstruction(*Pred->getInstr())) {

        IsLowLatencyUser = true;

      }

      if (Pred->NodeNum >= DAGSize)

        continue;

      unsigned PredPos = ScheduledSUnitsInv[Pred->NodeNum];

      if (PredPos >= MinPos)

        MinPos = PredPos + 1;

    }


    if (SITII->isLowLatencyInstruction(*SU->getInstr())) {

      unsigned BestPos = LastLowLatencyUser + 1;

      if ((int)BestPos <= LastLowLatencyPos)

        BestPos = LastLowLatencyPos + 1;

      if (BestPos < MinPos)

        BestPos = MinPos;

      if (BestPos < i) {

        for (unsigned u = i; u > BestPos; --u) {

          ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];

          ScheduledSUnits[u] = ScheduledSUnits[u-1];

        }

        ScheduledSUnits[BestPos] = SU->NodeNum;

        ScheduledSUnitsInv[SU->NodeNum] = BestPos;

      }

      LastLowLatencyPos = BestPos;

      if (IsLowLatencyUser)

        LastLowLatencyUser = BestPos;

    } else if (IsLowLatencyUser) {

      LastLowLatencyUser = i;

    // Moves COPY instructions on which depends

    // the low latency instructions too.

    } else if (SU->getInstr()->getOpcode() == AMDGPU::COPY) {

      bool CopyForLowLat = false;

      for (SDep& SuccDep : SU->Succs) {

        SUnit *Succ = SuccDep.getSUnit();

        if (SuccDep.isWeak() || Succ->NodeNum >= DAGSize)

          continue;

        if (SITII->isLowLatencyInstruction(*Succ->getInstr())) {

          CopyForLowLat = true;

        }

      }

      if (!CopyForLowLat)

        continue;

      if (MinPos < i) {

        for (unsigned u = i; u > MinPos; --u) {

          ++ScheduledSUnitsInv[ScheduledSUnits[u-1]];

          ScheduledSUnits[u] = ScheduledSUnits[u-1];

        }

        ScheduledSUnits[MinPos] = SU->NodeNum;

        ScheduledSUnitsInv[SU->NodeNum] = MinPos;

      }

    }

  }

}


void SIScheduleDAGMI::restoreSULinksLeft() {

  for (unsigned i = 0, e = SUnits.size(); i != e; ++i) {

    SUnits[i].isScheduled = false;

    SUnits[i].WeakPredsLeft = SUnitsLinksBackup[i].WeakPredsLeft;

    SUnits[i].NumPredsLeft = SUnitsLinksBackup[i].NumPredsLeft;

    SUnits[i].WeakSuccsLeft = SUnitsLinksBackup[i].WeakSuccsLeft;

    SUnits[i].NumSuccsLeft = SUnitsLinksBackup[i].NumSuccsLeft;

  }

}


// Return the Vgpr and Sgpr usage corresponding to some virtual registers.

template<typename _Iterator> void

SIScheduleDAGMI::fillVgprSgprCost(_Iterator First, _Iterator End,

                                  unsigned &VgprUsage, unsigned &SgprUsage) {

  VgprUsage = 0;

  SgprUsage = 0;

  for (_Iterator RegI = First; RegI != End; ++RegI) {

    Register Reg = *RegI;

    // For now only track virtual registers

    if (!Reg.isVirtual())

      continue;

    PSetIterator PSetI = MRI.getPressureSets(Reg);

    for (; PSetI.isValid(); ++PSetI) {

      if (*PSetI == AMDGPU::RegisterPressureSets::VGPR_32)

        VgprUsage += PSetI.getWeight();

      else if (*PSetI == AMDGPU::RegisterPressureSets::SReg_32)

        SgprUsage += PSetI.getWeight();

    }

  }

}


void SIScheduleDAGMI::schedule()

{

  SmallVector<SUnit*, 8> TopRoots, BotRoots;

  SIScheduleBlockResult Best, Temp;

  LLVM_DEBUG(dbgs() << "Preparing Scheduling\n");


  buildDAGWithRegPressure();

  postProcessDAG();


  LLVM_DEBUG(dump());

  if (PrintDAGs)

    dump();

  if (ViewMISchedDAGs)

    viewGraph();


  topologicalSort();

  findRootsAndBiasEdges(TopRoots, BotRoots);

  // We reuse several ScheduleDAGMI and ScheduleDAGMILive

  // functions, but to make them happy we must initialize

  // the default Scheduler implementation (even if we do not

  // run it)

  SchedImpl->initialize(this);

  initQueues(TopRoots, BotRoots);


  // Fill some stats to help scheduling.


  SUnitsLinksBackup = SUnits;

  IsLowLatencySU.clear();

  LowLatencyOffset.clear();

  IsHighLatencySU.clear();


  IsLowLatencySU.resize(SUnits.size(), 0);

  LowLatencyOffset.resize(SUnits.size(), 0);

  IsHighLatencySU.resize(SUnits.size(), 0);


  for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {

    SUnit *SU = &SUnits[i];

    const MachineOperand *BaseLatOp;

    int64_t OffLatReg;

    if (SITII->isLowLatencyInstruction(*SU->getInstr())) {

      IsLowLatencySU[i] = 1;

      bool OffsetIsScalable;

      if (SITII->getMemOperandWithOffset(*SU->getInstr(), BaseLatOp, OffLatReg,

                                         OffsetIsScalable, TRI))

        LowLatencyOffset[i] = OffLatReg;

    } else if (SITII->isHighLatencyDef(SU->getInstr()->getOpcode()))

      IsHighLatencySU[i] = 1;

  }


  SIScheduler Scheduler(this);

  Best = Scheduler.scheduleVariant(SISchedulerBlockCreatorVariant::LatenciesAlone,

                                   SISchedulerBlockSchedulerVariant::BlockLatencyRegUsage);


  // if VGPR usage is extremely high, try other good performing variants

  // which could lead to lower VGPR usage

  if (Best.MaxVGPRUsage > 180) {

    static const std::pair<SISchedulerBlockCreatorVariant,

                           SISchedulerBlockSchedulerVariant>

        Variants[] = {

      { LatenciesAlone, BlockRegUsageLatency },

//      { LatenciesAlone, BlockRegUsage },

      { LatenciesGrouped, BlockLatencyRegUsage },

//      { LatenciesGrouped, BlockRegUsageLatency },

//      { LatenciesGrouped, BlockRegUsage },

      { LatenciesAlonePlusConsecutive, BlockLatencyRegUsage },

//      { LatenciesAlonePlusConsecutive, BlockRegUsageLatency },

//      { LatenciesAlonePlusConsecutive, BlockRegUsage }

    };

    for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {

      Temp = Scheduler.scheduleVariant(v.first, v.second);

      if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)

        Best = Temp;

    }

  }

  // if VGPR usage is still extremely high, we may spill. Try other variants

  // which are less performing, but that could lead to lower VGPR usage.

  if (Best.MaxVGPRUsage > 200) {

    static const std::pair<SISchedulerBlockCreatorVariant,

                           SISchedulerBlockSchedulerVariant>

        Variants[] = {

//      { LatenciesAlone, BlockRegUsageLatency },

      { LatenciesAlone, BlockRegUsage },

//      { LatenciesGrouped, BlockLatencyRegUsage },

      { LatenciesGrouped, BlockRegUsageLatency },

      { LatenciesGrouped, BlockRegUsage },

//      { LatenciesAlonePlusConsecutive, BlockLatencyRegUsage },

      { LatenciesAlonePlusConsecutive, BlockRegUsageLatency },

      { LatenciesAlonePlusConsecutive, BlockRegUsage }

    };

    for (std::pair<SISchedulerBlockCreatorVariant, SISchedulerBlockSchedulerVariant> v : Variants) {

      Temp = Scheduler.scheduleVariant(v.first, v.second);

      if (Temp.MaxVGPRUsage < Best.MaxVGPRUsage)

        Best = Temp;

    }

  }


  ScheduledSUnits = Best.SUs;

  ScheduledSUnitsInv.resize(SUnits.size());


  for (unsigned i = 0, e = (unsigned)SUnits.size(); i != e; ++i) {

    ScheduledSUnitsInv[ScheduledSUnits[i]] = i;

  }


  moveLowLatencies();


  // Tell the outside world about the result of the scheduling.


  assert(TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker");

  TopRPTracker.setPos(CurrentTop);


  for (unsigned I : ScheduledSUnits) {

    SUnit *SU = &SUnits[I];


    scheduleMI(SU, true);


    LLVM_DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "

                      << *SU->getInstr());

  }


  assert(CurrentTop == CurrentBottom && "Nonempty unscheduled zone.");


  placeDebugValues();


  LLVM_DEBUG({

    dbgs() << "*** Final schedule for "

           << printMBBReference(*begin()->getParent()) << " ***\n";

    dumpSchedule();

    dbgs() << '\n';

  });

}

MRI
unsigned const MachineRegisterInfo * MRI
Definition: AArch64AdvSIMDScalarPass.cpp:103

assert
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")

AMDGPUMCTargetDesc.h
Provides AMDGPU specific target descriptions.

getParent
static const Function * getParent(const Value *V)
Definition: BasicAliasAnalysis.cpp:882

B
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")

End
bool End
Definition: ELF_riscv.cpp:480

Blocks
DenseMap< Block *, BlockRelaxAux > Blocks
Definition: ELF_riscv.cpp:507

MI
IRTranslator LLVM IR MI
Definition: IRTranslator.cpp:110

LiveIntervals.h

I
#define I(x, y, z)
Definition: MD5.cpp:58

MachineRegisterInfo.h

Scheduler
Machine Instruction Scheduler
Definition: MachineScheduler.cpp:423

nextIfDebug
static MachineBasicBlock::const_iterator nextIfDebug(MachineBasicBlock::const_iterator I, MachineBasicBlock::const_iterator End)
If this iterator is a debug value, increment until reaching the End or a non-debug instruction.
Definition: MachineScheduler.cpp:521

Reg
Register Reg
Definition: MachineSink.cpp:2117

P
#define P(N)

SIInstrInfo.h
Interface definition for SIInstrInfo.

getReasonStr
static const char * getReasonStr(SIScheduleCandReason Reason)
Definition: SIMachineScheduler.cpp:124

hasDataDependencyPred
static bool hasDataDependencyPred(const SUnit &SU, const SUnit &FromSU)
Definition: SIMachineScheduler.cpp:638

isDefBetween
static bool isDefBetween(Register Reg, SlotIndex First, SlotIndex Last, const MachineRegisterInfo *MRI, const LiveIntervals *LIS)
Definition: SIMachineScheduler.cpp:287

SIMachineScheduler.h
SI Machine Scheduler interface.

LLVM_DEBUG
#define LLVM_DEBUG(...)
Definition: Debug.h:119

llvm::GenericScheduler
GenericScheduler shrinks the unscheduled zone using heuristics to balance the schedule.
Definition: MachineScheduler.h:1255

llvm::LiveIntervals
Definition: LiveIntervals.h:55

llvm::LiveIntervals::handleMove
LLVM_ABI void handleMove(MachineInstr &MI, bool UpdateFlags=false)
Call this method to notify LiveIntervals that instruction MI has been moved within a basic block.
Definition: LiveIntervals.cpp:1559

llvm::LiveIntervals::getInstructionIndex
SlotIndex getInstructionIndex(const MachineInstr &Instr) const
Returns the base index of the given instruction.
Definition: LiveIntervals.h:247

llvm::MachineBasicBlock::splice
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
Definition: MachineBasicBlock.h:1149

llvm::MachineInstrBundleIterator< MachineInstr >

llvm::MachineInstr
Representation of each machine instruction.
Definition: MachineInstr.h:72

llvm::MachineInstr::getOpcode
unsigned getOpcode() const
Returns the opcode of this MachineInstr.
Definition: MachineInstr.h:587

llvm::MachineOperand
MachineOperand class - Representation of each machine instruction operand.
Definition: MachineOperand.h:48

llvm::MachineRegisterInfo
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Definition: MachineRegisterInfo.h:53

llvm::MachineRegisterInfo::getPressureSets
PSetIterator getPressureSets(Register RegUnit) const
Get an iterator over the pressure sets affected by the given physical or virtual register.
Definition: MachineRegisterInfo.h:1277

llvm::PSetIterator
Iterate over the pressure sets affected by the given physical or virtual register.
Definition: MachineRegisterInfo.h:1241

llvm::PSetIterator::getWeight
unsigned getWeight() const
Definition: MachineRegisterInfo.h:1264

llvm::PSetIterator::isValid
bool isValid() const
Definition: MachineRegisterInfo.h:1262

llvm::RegPressureTracker
Track the current register pressure at some position in the instruction stream, and remember the high...
Definition: RegisterPressure.h:361

llvm::RegPressureTracker::setPos
void setPos(MachineBasicBlock::const_iterator Pos)
Definition: RegisterPressure.h:423

llvm::RegPressureTracker::addLiveRegs
LLVM_ABI void addLiveRegs(ArrayRef< VRegMaskOrUnit > Regs)
Force liveness of virtual registers or physical register units.
Definition: RegisterPressure.cpp:693

llvm::RegPressureTracker::getPos
MachineBasicBlock::const_iterator getPos() const
Get the MI position corresponding to this register pressure.
Definition: RegisterPressure.h:417

llvm::RegPressureTracker::closeTop
LLVM_ABI void closeTop()
Set the boundary for the top of the region and summarize live ins.
Definition: RegisterPressure.cpp:315

llvm::RegPressureTracker::advance
LLVM_ABI void advance()
Advance across the current instruction.
Definition: RegisterPressure.cpp:933

llvm::RegPressureTracker::getDownwardPressure
LLVM_ABI void getDownwardPressure(const MachineInstr *MI, std::vector< unsigned > &PressureResult, std::vector< unsigned > &MaxPressureResult)
Get the pressure of each PSet after traversing this instruction top-down.
Definition: RegisterPressure.cpp:1371

llvm::Register
Wrapper class representing virtual and physical registers.
Definition: Register.h:19

llvm::Register::isVirtual
constexpr bool isVirtual() const
Return true if the specified register number is in the virtual register namespace.
Definition: Register.h:74

llvm::SDep
Scheduling dependency.
Definition: ScheduleDAG.h:51

llvm::SDep::getSUnit
SUnit * getSUnit() const
Definition: ScheduleDAG.h:507

llvm::SDep::Data
@ Data
Regular data dependence (aka true-dependence).
Definition: ScheduleDAG.h:55

llvm::SDep::isWeak
bool isWeak() const
Tests if this a weak dependence.
Definition: ScheduleDAG.h:194

llvm::SDep::isCtrl
bool isCtrl() const
Shorthand for getKind() != SDep::Data.
Definition: ScheduleDAG.h:161

llvm::SIInstrInfo
Definition: SIInstrInfo.h:86

llvm::SIInstrInfo::isEXP
static bool isEXP(const MachineInstr &MI)
Definition: SIInstrInfo.h:701

llvm::SIInstrInfo::isHighLatencyDef
bool isHighLatencyDef(int Opc) const override
Definition: SIInstrInfo.cpp:9322

llvm::SIInstrInfo::isLowLatencyInstruction
bool isLowLatencyInstruction(const MachineInstr &MI) const
Definition: SIInstrInfo.cpp:9316

llvm::SIRegisterInfo
Definition: SIRegisterInfo.h:40

llvm::SIScheduleBlockCreator::isSUInBlock
bool isSUInBlock(SUnit *SU, unsigned ID)
Definition: SIMachineScheduler.cpp:620

llvm::SIScheduleBlockCreator::SIScheduleBlockCreator
SIScheduleBlockCreator(SIScheduleDAGMI *DAG)
Definition: SIMachineScheduler.cpp:598

llvm::SIScheduleBlockCreator::getBlocks
SIScheduleBlocks getBlocks(SISchedulerBlockCreatorVariant BlockVariant)
Definition: SIMachineScheduler.cpp:602

llvm::SIScheduleBlockScheduler
Definition: SIMachineScheduler.h:318

llvm::SIScheduleBlockScheduler::SIScheduleBlockScheduler
SIScheduleBlockScheduler(SIScheduleDAGMI *DAG, SISchedulerBlockSchedulerVariant Variant, SIScheduleBlocks BlocksStruct)
Definition: SIMachineScheduler.cpp:1390

llvm::SIScheduleBlock
Definition: SIMachineScheduler.h:58

llvm::SIScheduleBlock::fastSchedule
void fastSchedule()
Definition: SIMachineScheduler.cpp:267

llvm::SIScheduleBlock::getSuccs
ArrayRef< std::pair< SIScheduleBlock *, SIScheduleBlockLinkKind > > getSuccs() const
Definition: SIMachineScheduler.h:121

llvm::SIScheduleBlock::addPred
void addPred(SIScheduleBlock *Pred)
Definition: SIMachineScheduler.cpp:512

llvm::SIScheduleBlock::printDebug
void printDebug(bool Full)
Definition: SIMachineScheduler.cpp:553

llvm::SIScheduleBlock::addSucc
void addSucc(SIScheduleBlock *Succ, SIScheduleBlockLinkKind Kind)
Definition: SIMachineScheduler.cpp:530

llvm::SIScheduleBlock::getID
unsigned getID() const
Definition: SIMachineScheduler.h:107

llvm::SIScheduleBlock::schedule
void schedule(MachineBasicBlock::iterator BeginBlock, MachineBasicBlock::iterator EndBlock)
Definition: SIMachineScheduler.cpp:373

llvm::SIScheduleBlock::addUnit
void addUnit(SUnit *SU)
Functions for Block construction.
Definition: SIMachineScheduler.cpp:176

llvm::SIScheduleBlock::isHighLatencyBlock
bool isHighLatencyBlock()
Definition: SIMachineScheduler.h:130

llvm::SIScheduleBlock::finalizeUnits
void finalizeUnits()
Definition: SIMachineScheduler.cpp:501

llvm::SIScheduleDAGMI
Definition: SIMachineScheduler.h:425

llvm::SIScheduleDAGMI::restoreSULinksLeft
void restoreSULinksLeft()
Definition: SIMachineScheduler.cpp:1828

llvm::SIScheduleDAGMI::getInRegs
std::set< Register > getInRegs()
Definition: SIMachineScheduler.h:465

llvm::SIScheduleDAGMI::BottomUpIndex2SU
std::vector< int > BottomUpIndex2SU
Definition: SIMachineScheduler.h:494

llvm::SIScheduleDAGMI::IsHighLatencySU
std::vector< unsigned > IsHighLatencySU
Definition: SIMachineScheduler.h:490

llvm::SIScheduleDAGMI::LowLatencyOffset
std::vector< unsigned > LowLatencyOffset
Definition: SIMachineScheduler.h:489

llvm::SIScheduleDAGMI::TopDownIndex2SU
std::vector< int > TopDownIndex2SU
Definition: SIMachineScheduler.h:493

llvm::SIScheduleDAGMI::schedule
void schedule() override
Implement ScheduleDAGInstrs interface for scheduling a sequence of reorderable instructions.
Definition: SIMachineScheduler.cpp:1859

llvm::SIScheduleDAGMI::getLIS
LiveIntervals * getLIS()
Definition: SIMachineScheduler.h:451

llvm::SIScheduleDAGMI::GetTopo
ScheduleDAGTopologicalSort * GetTopo()
Definition: SIMachineScheduler.h:454

llvm::SIScheduleDAGMI::fillVgprSgprCost
void fillVgprSgprCost(_Iterator First, _Iterator End, unsigned &VgprUsage, unsigned &SgprUsage)
Definition: SIMachineScheduler.cpp:1840

llvm::SIScheduleDAGMI::getMRI
MachineRegisterInfo * getMRI()
Definition: SIMachineScheduler.h:452

llvm::SIScheduleDAGMI::SIScheduleDAGMI
SIScheduleDAGMI(MachineSchedContext *C)
Definition: SIMachineScheduler.cpp:1737

llvm::SIScheduleDAGMI::getCurrentBottom
MachineBasicBlock::iterator getCurrentBottom()
Definition: SIMachineScheduler.h:450

llvm::SIScheduleDAGMI::IsLowLatencySU
std::vector< unsigned > IsLowLatencySU
Definition: SIMachineScheduler.h:488

llvm::SIScheduleDAGMI::getCurrentTop
MachineBasicBlock::iterator getCurrentTop()
Definition: SIMachineScheduler.h:449

llvm::SIScheduleDAGMI::getBB
MachineBasicBlock * getBB()
Definition: SIMachineScheduler.h:448

llvm::SIScheduleDAGMI::initRPTracker
void initRPTracker(RegPressureTracker &RPTracker)
Definition: SIMachineScheduler.h:444

llvm::SIScheduleDAGMI::getOutRegs
std::set< unsigned > getOutRegs()
Definition: SIMachineScheduler.h:473

llvm::SIScheduleDAGMI::~SIScheduleDAGMI
~SIScheduleDAGMI() override

llvm::SIScheduleDAGMI::getTRI
const TargetRegisterInfo * getTRI()
Definition: SIMachineScheduler.h:453

llvm::SIScheduler
Definition: SIMachineScheduler.h:411

llvm::SUnit
Scheduling unit. This is a node in the scheduling DAG.
Definition: ScheduleDAG.h:249

llvm::SUnit::isInstr
bool isInstr() const
Returns true if this SUnit refers to a machine instruction as opposed to an SDNode.
Definition: ScheduleDAG.h:387

llvm::SUnit::NodeNum
unsigned NodeNum
Entry # of node in the node vector.
Definition: ScheduleDAG.h:277

llvm::SUnit::isScheduled
bool isScheduled
True once scheduled.
Definition: ScheduleDAG.h:305

llvm::SUnit::NumPredsLeft
unsigned NumPredsLeft
Definition: ScheduleDAG.h:281

llvm::SUnit::Succs
SmallVector< SDep, 4 > Succs
All sunit successors.
Definition: ScheduleDAG.h:270

llvm::SUnit::WeakPredsLeft
unsigned WeakPredsLeft
Definition: ScheduleDAG.h:283

llvm::SUnit::Preds
SmallVector< SDep, 4 > Preds
All sunit predecessors.
Definition: ScheduleDAG.h:269

llvm::SUnit::getInstr
MachineInstr * getInstr() const
Returns the representative MachineInstr for this SUnit.
Definition: ScheduleDAG.h:399

llvm::ScheduleDAGInstrs::Topo
ScheduleDAGTopologicalSort Topo
Topo - A topological ordering for SUnits which permits fast IsReachable and similar queries.
Definition: ScheduleDAGInstrs.h:262

llvm::ScheduleDAGInstrs::dumpNode
void dumpNode(const SUnit &SU) const override
Definition: ScheduleDAGInstrs.cpp:1195

llvm::ScheduleDAGInstrs::begin
MachineBasicBlock::iterator begin() const
Returns an iterator to the top of the current scheduling region.
Definition: ScheduleDAGInstrs.h:303

llvm::ScheduleDAGInstrs::RegionBegin
MachineBasicBlock::iterator RegionBegin
The beginning of the range to be scheduled.
Definition: ScheduleDAGInstrs.h:148

llvm::ScheduleDAGMILive
ScheduleDAGMILive is an implementation of ScheduleDAGInstrs that schedules machine instructions while...
Definition: MachineScheduler.h:422

llvm::ScheduleDAGMILive::scheduleMI
void scheduleMI(SUnit *SU, bool IsTopNode)
Move an instruction and update register pressure.
Definition: MachineScheduler.cpp:1884

llvm::ScheduleDAGMILive::initQueues
void initQueues(ArrayRef< SUnit * > TopRoots, ArrayRef< SUnit * > BotRoots)
Release ExitSU predecessors and setup scheduler queues.
Definition: MachineScheduler.cpp:1874

llvm::ScheduleDAGMILive::buildDAGWithRegPressure
void buildDAGWithRegPressure()
Call ScheduleDAGInstrs::buildSchedGraph with register pressure tracking enabled.
Definition: MachineScheduler.cpp:1754

llvm::ScheduleDAGMILive::dump
void dump() const override
Definition: MachineScheduler.cpp:1660

llvm::ScheduleDAGMILive::TopRPTracker
RegPressureTracker TopRPTracker
Definition: MachineScheduler.h:454

llvm::ScheduleDAGMI::dumpSchedule
void dumpSchedule() const
dump the scheduled Sequence.
Definition: MachineScheduler.cpp:1384

llvm::ScheduleDAGMI::SchedImpl
std::unique_ptr< MachineSchedStrategy > SchedImpl
Definition: MachineScheduler.h:312

llvm::ScheduleDAGMI::postProcessDAG
void postProcessDAG()
Apply each ScheduleDAGMutation step in order.
Definition: MachineScheduler.cpp:1134

llvm::ScheduleDAGMI::findRootsAndBiasEdges
void findRootsAndBiasEdges(SmallVectorImpl< SUnit * > &TopRoots, SmallVectorImpl< SUnit * > &BotRoots)
Definition: MachineScheduler.cpp:1140

llvm::ScheduleDAGMI::CurrentBottom
MachineBasicBlock::iterator CurrentBottom
The bottom of the unscheduled zone.
Definition: MachineScheduler.h:321

llvm::ScheduleDAGMI::viewGraph
void viewGraph() override
Out-of-line implementation with no arguments is handy for gdb.
Definition: MachineScheduler.cpp:4898

llvm::ScheduleDAGMI::placeDebugValues
void placeDebugValues()
Reinsert debug_values recorded in ScheduleDAGInstrs::DbgValues.
Definition: MachineScheduler.cpp:1197

llvm::ScheduleDAGMI::CurrentTop
MachineBasicBlock::iterator CurrentTop
The top of the unscheduled zone.
Definition: MachineScheduler.h:318

llvm::ScheduleDAGTopologicalSort::rend
reverse_iterator rend()
Definition: ScheduleDAG.h:817

llvm::ScheduleDAGTopologicalSort::end
iterator end()
Definition: ScheduleDAG.h:810

llvm::ScheduleDAGTopologicalSort::GetSubGraph
LLVM_ABI std::vector< int > GetSubGraph(const SUnit &StartSU, const SUnit &TargetSU, bool &Success)
Returns an array of SUs that are both in the successor subtree of StartSU and in the predecessor subt...
Definition: ScheduleDAG.cpp:602

llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting
LLVM_ABI void InitDAGTopologicalSorting()
Creates the initial topological ordering from the DAG to be scheduled.
Definition: ScheduleDAG.cpp:443

llvm::ScheduleDAGTopologicalSort::begin
iterator begin()
Definition: ScheduleDAG.h:808

llvm::ScheduleDAGTopologicalSort::rbegin
reverse_iterator rbegin()
Definition: ScheduleDAG.h:815

llvm::ScheduleDAG::MRI
MachineRegisterInfo & MRI
Virtual/real register map.
Definition: ScheduleDAG.h:587

llvm::ScheduleDAG::TII
const TargetInstrInfo * TII
Target instruction information.
Definition: ScheduleDAG.h:584

llvm::ScheduleDAG::SUnits
std::vector< SUnit > SUnits
The scheduling units.
Definition: ScheduleDAG.h:588

llvm::ScheduleDAG::TRI
const TargetRegisterInfo * TRI
Target processor register info.
Definition: ScheduleDAG.h:585

llvm::SlotIndex
SlotIndex - An opaque wrapper around machine indexes.
Definition: SlotIndexes.h:66

llvm::SlotIndex::getRegSlot
SlotIndex getRegSlot(bool EC=false) const
Returns the register use/def slot in the current instruction for a normal or early-clobber def.
Definition: SlotIndexes.h:238

llvm::SmallVectorTemplateBase::push_back
void push_back(const T &Elt)
Definition: SmallVector.h:414

llvm::SmallVector
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197

llvm::TargetRegisterInfo::getNumRegPressureSets
virtual unsigned getNumRegPressureSets() const =0
Get the number of dimensions of register pressure.

unsigned

llvm_unreachable
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
Definition: ErrorHandling.h:164

llvm::AMDGPU::VGPRIndexMode::Id
Id
Definition: SIDefines.h:295

llvm::CallingConv::ID
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24

llvm::CallingConv::C
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34

llvm::M68k::MemAddrModeKind::j
@ j

llvm::M68k::MemAddrModeKind::u
@ u

llvm::SISched
Definition: SIMachineScheduler.cpp:138

llvm::SISched::tryGreater
static bool tryGreater(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
Definition: SIMachineScheduler.cpp:156

llvm::SISched::tryLess
static bool tryLess(int TryVal, int CandVal, SISchedulerCandidate &TryCand, SISchedulerCandidate &Cand, SIScheduleCandReason Reason)
Definition: SIMachineScheduler.cpp:139

llvm::X86Disassembler::Reg
Reg
All possible values of the reg field in the ModR/M byte.
Definition: X86DisassemblerDecoder.h:621

llvm::logicalview::LVAttributeKind::Inserted
@ Inserted

llvm::numbers::e
constexpr double e
Definition: MathExtras.h:47

llvm
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18

llvm::find
auto find(R &&Range, const T &Val)
Provide wrappers to std::find which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1770

llvm::PseudoProbeType::Block
@ Block

llvm::PrintDAGs
cl::opt< bool > PrintDAGs

llvm::SIScheduleCandReason
SIScheduleCandReason
Definition: SIMachineScheduler.h:31

llvm::Latency
@ Latency
Definition: SIMachineScheduler.h:34

llvm::NodeOrder
@ NodeOrder
Definition: SIMachineScheduler.h:37

llvm::Successor
@ Successor
Definition: SIMachineScheduler.h:35

llvm::RegUsage
@ RegUsage
Definition: SIMachineScheduler.h:33

llvm::NoCand
@ NoCand
Definition: SIMachineScheduler.h:32

llvm::Depth
@ Depth
Definition: SIMachineScheduler.h:36

llvm::append_range
void append_range(Container &C, Range &&R)
Wrapper function to append range R to container C.
Definition: STLExtras.h:2155

llvm::SISchedulerBlockSchedulerVariant
SISchedulerBlockSchedulerVariant
Definition: SIMachineScheduler.h:312

llvm::BlockLatencyRegUsage
@ BlockLatencyRegUsage
Definition: SIMachineScheduler.h:313

llvm::BlockRegUsageLatency
@ BlockRegUsageLatency
Definition: SIMachineScheduler.h:314

llvm::BlockRegUsage
@ BlockRegUsage
Definition: SIMachineScheduler.h:315

llvm::ViewMISchedDAGs
cl::opt< bool > ViewMISchedDAGs

llvm::printVRegOrUnit
LLVM_ABI Printable printVRegOrUnit(unsigned VRegOrUnit, const TargetRegisterInfo *TRI)
Create Printable object to print virtual registers and physical registers on a raw_ostream.
Definition: TargetRegisterInfo.cpp:161

llvm::dbgs
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:207

llvm::none_of
bool none_of(R &&Range, UnaryPredicate P)
Provide wrappers to std::none_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1758

llvm::SISchedulerBlockCreatorVariant
SISchedulerBlockCreatorVariant
Definition: SIMachineScheduler.h:216

llvm::LatenciesAlonePlusConsecutive
@ LatenciesAlonePlusConsecutive
Definition: SIMachineScheduler.h:219

llvm::LatenciesAlone
@ LatenciesAlone
Definition: SIMachineScheduler.h:217

llvm::LatenciesGrouped
@ LatenciesGrouped
Definition: SIMachineScheduler.h:218

llvm::IRMemLocation::First
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.

llvm::max_element
auto max_element(R &&Range)
Provide wrappers to std::max_element which take ranges instead of having to pass begin/end explicitly...
Definition: STLExtras.h:2049

llvm::PseudoProbeReservedId::Last
@ Last

llvm::SIScheduleBlockLinkKind
SIScheduleBlockLinkKind
Definition: SIMachineScheduler.h:53

llvm::Data
@ Data
Definition: SIMachineScheduler.h:55

llvm::NoData
@ NoData
Definition: SIMachineScheduler.h:54

llvm::printMBBReference
LLVM_ABI Printable printMBBReference(const MachineBasicBlock &MBB)
Prints a machine basic block reference.
Definition: MachineBasicBlock.cpp:120

std
Implement std::hash so that hash_code can be used in STL containers.
Definition: BitVector.h:856

llvm::IntervalPressure
RegisterPressure computed within a region of instructions delimited by TopIdx and BottomIdx.
Definition: RegisterPressure.h:68

llvm::MachineSchedContext
MachineSchedContext provides enough context from the MachineScheduler pass for the target to instanti...
Definition: MachineScheduler.h:143

llvm::RegisterPressure::MaxSetPressure
std::vector< unsigned > MaxSetPressure
Map of max reg pressure indexed by pressure set ID, not class ID.
Definition: RegisterPressure.h:50

llvm::SIScheduleBlockResult
Definition: SIMachineScheduler.h:405

llvm::SIScheduleBlockResult::MaxVGPRUsage
unsigned MaxVGPRUsage
Definition: SIMachineScheduler.h:408

llvm::SIScheduleBlockResult::MaxSGPRUsage
unsigned MaxSGPRUsage
Definition: SIMachineScheduler.h:407

llvm::SIScheduleBlockResult::SUs
std::vector< unsigned > SUs
Definition: SIMachineScheduler.h:406

llvm::SIScheduleBlocks
Definition: SIMachineScheduler.h:210

llvm::SIScheduleBlocks::TopDownIndex2Block
std::vector< int > TopDownIndex2Block
Definition: SIMachineScheduler.h:212

llvm::SIScheduleBlocks::Blocks
std::vector< SIScheduleBlock * > Blocks
Definition: SIMachineScheduler.h:211

llvm::SIScheduleBlocks::TopDownBlock2Index
std::vector< int > TopDownBlock2Index
Definition: SIMachineScheduler.h:213

llvm::SISchedulerCandidate
Definition: SIMachineScheduler.h:40

llvm::SISchedulerCandidate::Reason
SIScheduleCandReason Reason
Definition: SIMachineScheduler.h:42

llvm::SISchedulerCandidate::setRepeat
void setRepeat(SIScheduleCandReason R)
Definition: SIMachineScheduler.h:50