LLVM 21.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the aarch64-stack-hazard-size option. Without changing the
130// layout of the stack frame in the diagram above, a stack object of size
131// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
132// to the stack objects section, and stack objects are sorted so that FPR >
133// Hazard padding slot > GPRs (where possible). Unfortunately some things are
134// not handled well (VLA area, arguments on the stack, objects with both GPR and
135// FPR accesses), but if those are controlled by the user then the entire stack
136// frame becomes GPR at the start/end with FPR in the middle, surrounded by
137// Hazard padding.
138//
139// An example of the prologue:
140//
141// .globl __foo
142// .align 2
143// __foo:
144// Ltmp0:
145// .cfi_startproc
146// .cfi_personality 155, ___gxx_personality_v0
147// Leh_func_begin:
148// .cfi_lsda 16, Lexception33
149//
150// stp xa,bx, [sp, -#offset]!
151// ...
152// stp x28, x27, [sp, #offset-32]
153// stp fp, lr, [sp, #offset-16]
154// add fp, sp, #offset - 16
155// sub sp, sp, #1360
156//
157// The Stack:
158// +-------------------------------------------+
159// 10000 | ........ | ........ | ........ | ........ |
160// 10004 | ........ | ........ | ........ | ........ |
161// +-------------------------------------------+
162// 10008 | ........ | ........ | ........ | ........ |
163// 1000c | ........ | ........ | ........ | ........ |
164// +===========================================+
165// 10010 | X28 Register |
166// 10014 | X28 Register |
167// +-------------------------------------------+
168// 10018 | X27 Register |
169// 1001c | X27 Register |
170// +===========================================+
171// 10020 | Frame Pointer |
172// 10024 | Frame Pointer |
173// +-------------------------------------------+
174// 10028 | Link Register |
175// 1002c | Link Register |
176// +===========================================+
177// 10030 | ........ | ........ | ........ | ........ |
178// 10034 | ........ | ........ | ........ | ........ |
179// +-------------------------------------------+
180// 10038 | ........ | ........ | ........ | ........ |
181// 1003c | ........ | ........ | ........ | ........ |
182// +-------------------------------------------+
183//
184// [sp] = 10030 :: >>initial value<<
185// sp = 10020 :: stp fp, lr, [sp, #-16]!
186// fp = sp == 10020 :: mov fp, sp
187// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
188// sp == 10010 :: >>final value<<
189//
190// The frame pointer (w29) points to address 10020. If we use an offset of
191// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
192// for w27, and -32 for w28:
193//
194// Ltmp1:
195// .cfi_def_cfa w29, 16
196// Ltmp2:
197// .cfi_offset w30, -8
198// Ltmp3:
199// .cfi_offset w29, -16
200// Ltmp4:
201// .cfi_offset w27, -24
202// Ltmp5:
203// .cfi_offset w28, -32
204//
205//===----------------------------------------------------------------------===//
206
207#include "AArch64FrameLowering.h"
208#include "AArch64InstrInfo.h"
210#include "AArch64RegisterInfo.h"
211#include "AArch64Subtarget.h"
215#include "llvm/ADT/ScopeExit.h"
216#include "llvm/ADT/SmallVector.h"
217#include "llvm/ADT/Statistic.h"
234#include "llvm/IR/Attributes.h"
235#include "llvm/IR/CallingConv.h"
236#include "llvm/IR/DataLayout.h"
237#include "llvm/IR/DebugLoc.h"
238#include "llvm/IR/Function.h"
239#include "llvm/MC/MCAsmInfo.h"
240#include "llvm/MC/MCDwarf.h"
242#include "llvm/Support/Debug.h"
249#include <cassert>
250#include <cstdint>
251#include <iterator>
252#include <optional>
253#include <vector>
254
255using namespace llvm;
256
257#define DEBUG_TYPE "frame-info"
258
259static cl::opt<bool> EnableRedZone("aarch64-redzone",
260 cl::desc("enable use of redzone on AArch64"),
261 cl::init(false), cl::Hidden);
262
264 "stack-tagging-merge-settag",
265 cl::desc("merge settag instruction in function epilog"), cl::init(true),
266 cl::Hidden);
267
268static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
269 cl::desc("sort stack allocations"),
270 cl::init(true), cl::Hidden);
271
273 "homogeneous-prolog-epilog", cl::Hidden,
274 cl::desc("Emit homogeneous prologue and epilogue for the size "
275 "optimization (default = off)"));
276
277// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
279 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
280 cl::Hidden);
281// Whether to insert padding into non-streaming functions (for testing).
282static cl::opt<bool>
283 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
284 cl::init(false), cl::Hidden);
285
287 "aarch64-disable-multivector-spill-fill",
288 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
289 cl::Hidden);
290
291STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
292
293/// Returns how much of the incoming argument stack area (in bytes) we should
294/// clean up in an epilogue. For the C calling convention this will be 0, for
295/// guaranteed tail call conventions it can be positive (a normal return or a
296/// tail call to a function that uses less stack space for arguments) or
297/// negative (for a tail call to a function that needs more stack space than us
298/// for arguments).
303 bool IsTailCallReturn = (MBB.end() != MBBI)
305 : false;
306
307 int64_t ArgumentPopSize = 0;
308 if (IsTailCallReturn) {
309 MachineOperand &StackAdjust = MBBI->getOperand(1);
310
311 // For a tail-call in a callee-pops-arguments environment, some or all of
312 // the stack may actually be in use for the call's arguments, this is
313 // calculated during LowerCall and consumed here...
314 ArgumentPopSize = StackAdjust.getImm();
315 } else {
316 // ... otherwise the amount to pop is *all* of the argument space,
317 // conveniently stored in the MachineFunctionInfo by
318 // LowerFormalArguments. This will, of course, be zero for the C calling
319 // convention.
320 ArgumentPopSize = AFI->getArgumentStackToRestore();
321 }
322
323 return ArgumentPopSize;
324}
325
327static bool needsWinCFI(const MachineFunction &MF);
330
331/// Returns true if a homogeneous prolog or epilog code can be emitted
332/// for the size optimization. If possible, a frame helper call is injected.
333/// When Exit block is given, this check is for epilog.
334bool AArch64FrameLowering::homogeneousPrologEpilog(
335 MachineFunction &MF, MachineBasicBlock *Exit) const {
336 if (!MF.getFunction().hasMinSize())
337 return false;
339 return false;
340 if (EnableRedZone)
341 return false;
342
343 // TODO: Window is supported yet.
344 if (needsWinCFI(MF))
345 return false;
346 // TODO: SVE is not supported yet.
347 if (getSVEStackSize(MF))
348 return false;
349
350 // Bail on stack adjustment needed on return for simplicity.
351 const MachineFrameInfo &MFI = MF.getFrameInfo();
353 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
354 return false;
355 if (Exit && getArgumentStackToRestore(MF, *Exit))
356 return false;
357
358 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
359 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
360 return false;
361
362 // If there are an odd number of GPRs before LR and FP in the CSRs list,
363 // they will not be paired into one RegPairInfo, which is incompatible with
364 // the assumption made by the homogeneous prolog epilog pass.
365 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
366 unsigned NumGPRs = 0;
367 for (unsigned I = 0; CSRegs[I]; ++I) {
368 Register Reg = CSRegs[I];
369 if (Reg == AArch64::LR) {
370 assert(CSRegs[I + 1] == AArch64::FP);
371 if (NumGPRs % 2 != 0)
372 return false;
373 break;
374 }
375 if (AArch64::GPR64RegClass.contains(Reg))
376 ++NumGPRs;
377 }
378
379 return true;
380}
381
382/// Returns true if CSRs should be paired.
383bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
384 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
385}
386
387/// This is the biggest offset to the stack pointer we can encode in aarch64
388/// instructions (without using a separate calculation and a temp register).
389/// Note that the exception here are vector stores/loads which cannot encode any
390/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
391static const unsigned DefaultSafeSPDisplacement = 255;
392
393/// Look at each instruction that references stack frames and return the stack
394/// size limit beyond which some of these instructions will require a scratch
395/// register during their expansion later.
397 // FIXME: For now, just conservatively guestimate based on unscaled indexing
398 // range. We'll end up allocating an unnecessary spill slot a lot, but
399 // realistically that's not a big deal at this stage of the game.
400 for (MachineBasicBlock &MBB : MF) {
401 for (MachineInstr &MI : MBB) {
402 if (MI.isDebugInstr() || MI.isPseudo() ||
403 MI.getOpcode() == AArch64::ADDXri ||
404 MI.getOpcode() == AArch64::ADDSXri)
405 continue;
406
407 for (const MachineOperand &MO : MI.operands()) {
408 if (!MO.isFI())
409 continue;
410
412 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
414 return 0;
415 }
416 }
417 }
419}
420
424}
425
426/// Returns the size of the fixed object area (allocated next to sp on entry)
427/// On Win64 this may include a var args area and an UnwindHelp object for EH.
428static unsigned getFixedObjectSize(const MachineFunction &MF,
429 const AArch64FunctionInfo *AFI, bool IsWin64,
430 bool IsFunclet) {
431 if (!IsWin64 || IsFunclet) {
432 return AFI->getTailCallReservedStack();
433 } else {
434 if (AFI->getTailCallReservedStack() != 0 &&
436 Attribute::SwiftAsync))
437 report_fatal_error("cannot generate ABI-changing tail call for Win64");
438 // Var args are stored here in the primary function.
439 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
440 // To support EH funclets we allocate an UnwindHelp object
441 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
442 return AFI->getTailCallReservedStack() +
443 alignTo(VarArgsArea + UnwindHelpObject, 16);
444 }
445}
446
447/// Returns the size of the entire SVE stackframe (calleesaves + spills).
450 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
451}
452
454 if (!EnableRedZone)
455 return false;
456
457 // Don't use the red zone if the function explicitly asks us not to.
458 // This is typically used for kernel code.
459 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
460 const unsigned RedZoneSize =
462 if (!RedZoneSize)
463 return false;
464
465 const MachineFrameInfo &MFI = MF.getFrameInfo();
467 uint64_t NumBytes = AFI->getLocalStackSize();
468
469 // If neither NEON or SVE are available, a COPY from one Q-reg to
470 // another requires a spill -> reload sequence. We can do that
471 // using a pre-decrementing store/post-decrementing load, but
472 // if we do so, we can't use the Red Zone.
473 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
474 !Subtarget.isNeonAvailable() &&
475 !Subtarget.hasSVE();
476
477 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
478 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
479}
480
481/// hasFPImpl - Return true if the specified function should have a dedicated
482/// frame pointer register.
484 const MachineFrameInfo &MFI = MF.getFrameInfo();
485 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
486
487 // Win64 EH requires a frame pointer if funclets are present, as the locals
488 // are accessed off the frame pointer in both the parent function and the
489 // funclets.
490 if (MF.hasEHFunclets())
491 return true;
492 // Retain behavior of always omitting the FP for leaf functions when possible.
494 return true;
495 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
496 MFI.hasStackMap() || MFI.hasPatchPoint() ||
497 RegInfo->hasStackRealignment(MF))
498 return true;
499 // With large callframes around we may need to use FP to access the scavenging
500 // emergency spillslot.
501 //
502 // Unfortunately some calls to hasFP() like machine verifier ->
503 // getReservedReg() -> hasFP in the middle of global isel are too early
504 // to know the max call frame size. Hopefully conservatively returning "true"
505 // in those cases is fine.
506 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
507 if (!MFI.isMaxCallFrameSizeComputed() ||
509 return true;
510
511 return false;
512}
513
514/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
515/// not required, we reserve argument space for call sites in the function
516/// immediately on entry to the current function. This eliminates the need for
517/// add/sub sp brackets around call sites. Returns true if the call frame is
518/// included as part of the stack frame.
520 const MachineFunction &MF) const {
521 // The stack probing code for the dynamically allocated outgoing arguments
522 // area assumes that the stack is probed at the top - either by the prologue
523 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
524 // most recent variable-sized object allocation. Changing the condition here
525 // may need to be followed up by changes to the probe issuing logic.
526 return !MF.getFrameInfo().hasVarSizedObjects();
527}
528
532 const AArch64InstrInfo *TII =
533 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
534 const AArch64TargetLowering *TLI =
535 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
536 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
537 DebugLoc DL = I->getDebugLoc();
538 unsigned Opc = I->getOpcode();
539 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
540 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
541
542 if (!hasReservedCallFrame(MF)) {
543 int64_t Amount = I->getOperand(0).getImm();
544 Amount = alignTo(Amount, getStackAlign());
545 if (!IsDestroy)
546 Amount = -Amount;
547
548 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
549 // doesn't have to pop anything), then the first operand will be zero too so
550 // this adjustment is a no-op.
551 if (CalleePopAmount == 0) {
552 // FIXME: in-function stack adjustment for calls is limited to 24-bits
553 // because there's no guaranteed temporary register available.
554 //
555 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
556 // 1) For offset <= 12-bit, we use LSL #0
557 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
558 // LSL #0, and the other uses LSL #12.
559 //
560 // Most call frames will be allocated at the start of a function so
561 // this is OK, but it is a limitation that needs dealing with.
562 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
563
564 if (TLI->hasInlineStackProbe(MF) &&
566 // When stack probing is enabled, the decrement of SP may need to be
567 // probed. We only need to do this if the call site needs 1024 bytes of
568 // space or more, because a region smaller than that is allowed to be
569 // unprobed at an ABI boundary. We rely on the fact that SP has been
570 // probed exactly at this point, either by the prologue or most recent
571 // dynamic allocation.
573 "non-reserved call frame without var sized objects?");
574 Register ScratchReg =
575 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
576 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
577 } else {
578 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
579 StackOffset::getFixed(Amount), TII);
580 }
581 }
582 } else if (CalleePopAmount != 0) {
583 // If the calling convention demands that the callee pops arguments from the
584 // stack, we want to add it back if we have a reserved call frame.
585 assert(CalleePopAmount < 0xffffff && "call frame too large");
586 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
587 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
588 }
589 return MBB.erase(I);
590}
591
592void AArch64FrameLowering::emitCalleeSavedGPRLocations(
595 MachineFrameInfo &MFI = MF.getFrameInfo();
597 SMEAttrs Attrs(MF.getFunction());
598 bool LocallyStreaming =
599 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
600
601 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
602 if (CSI.empty())
603 return;
604
605 const TargetSubtargetInfo &STI = MF.getSubtarget();
606 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
607 const TargetInstrInfo &TII = *STI.getInstrInfo();
609
610 for (const auto &Info : CSI) {
611 unsigned FrameIdx = Info.getFrameIdx();
612 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
613 continue;
614
615 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
616 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
617 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
618
619 // The location of VG will be emitted before each streaming-mode change in
620 // the function. Only locally-streaming functions require emitting the
621 // non-streaming VG location here.
622 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
623 (!LocallyStreaming &&
624 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
625 continue;
626
627 unsigned CFIIndex = MF.addFrameInst(
628 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
629 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
630 .addCFIIndex(CFIIndex)
632 }
633}
634
635void AArch64FrameLowering::emitCalleeSavedSVELocations(
638 MachineFrameInfo &MFI = MF.getFrameInfo();
639
640 // Add callee saved registers to move list.
641 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
642 if (CSI.empty())
643 return;
644
645 const TargetSubtargetInfo &STI = MF.getSubtarget();
646 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
647 const TargetInstrInfo &TII = *STI.getInstrInfo();
650
651 for (const auto &Info : CSI) {
652 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
653 continue;
654
655 // Not all unwinders may know about SVE registers, so assume the lowest
656 // common demoninator.
657 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
658 unsigned Reg = Info.getReg();
659 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
660 continue;
661
663 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
665
666 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
667 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
668 .addCFIIndex(CFIIndex)
670 }
671}
672
676 unsigned DwarfReg) {
677 unsigned CFIIndex =
678 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
679 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
680}
681
683 MachineBasicBlock &MBB) const {
684
686 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
687 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
688 const auto &TRI =
689 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
690 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
691
692 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
693 DebugLoc DL;
694
695 // Reset the CFA to `SP + 0`.
697 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
698 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
699 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
700
701 // Flip the RA sign state.
702 if (MFI.shouldSignReturnAddress(MF)) {
703 auto CFIInst = MFI.branchProtectionPAuthLR()
706 CFIIndex = MF.addFrameInst(CFIInst);
707 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
708 }
709
710 // Shadow call stack uses X18, reset it.
711 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
712 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
713 TRI.getDwarfRegNum(AArch64::X18, true));
714
715 // Emit .cfi_same_value for callee-saved registers.
716 const std::vector<CalleeSavedInfo> &CSI =
718 for (const auto &Info : CSI) {
719 unsigned Reg = Info.getReg();
720 if (!TRI.regNeedsCFI(Reg, Reg))
721 continue;
722 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
723 TRI.getDwarfRegNum(Reg, true));
724 }
725}
726
729 bool SVE) {
731 MachineFrameInfo &MFI = MF.getFrameInfo();
732
733 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
734 if (CSI.empty())
735 return;
736
737 const TargetSubtargetInfo &STI = MF.getSubtarget();
738 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
739 const TargetInstrInfo &TII = *STI.getInstrInfo();
741
742 for (const auto &Info : CSI) {
743 if (SVE !=
744 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
745 continue;
746
747 unsigned Reg = Info.getReg();
748 if (SVE &&
749 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
750 continue;
751
752 if (!Info.isRestored())
753 continue;
754
755 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
756 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
757 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
758 .addCFIIndex(CFIIndex)
760 }
761}
762
763void AArch64FrameLowering::emitCalleeSavedGPRRestores(
766}
767
768void AArch64FrameLowering::emitCalleeSavedSVERestores(
771}
772
773// Return the maximum possible number of bytes for `Size` due to the
774// architectural limit on the size of a SVE register.
775static int64_t upperBound(StackOffset Size) {
776 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
777 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
778}
779
780void AArch64FrameLowering::allocateStackSpace(
782 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
783 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
784 bool FollowupAllocs) const {
785
786 if (!AllocSize)
787 return;
788
789 DebugLoc DL;
791 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
792 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
794 const MachineFrameInfo &MFI = MF.getFrameInfo();
795
796 const int64_t MaxAlign = MFI.getMaxAlign().value();
797 const uint64_t AndMask = ~(MaxAlign - 1);
798
799 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
800 Register TargetReg = RealignmentPadding
802 : AArch64::SP;
803 // SUB Xd/SP, SP, AllocSize
804 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
805 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
806 EmitCFI, InitialOffset);
807
808 if (RealignmentPadding) {
809 // AND SP, X9, 0b11111...0000
810 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
811 .addReg(TargetReg, RegState::Kill)
814 AFI.setStackRealigned(true);
815
816 // No need for SEH instructions here; if we're realigning the stack,
817 // we've set a frame pointer and already finished the SEH prologue.
818 assert(!NeedsWinCFI);
819 }
820 return;
821 }
822
823 //
824 // Stack probing allocation.
825 //
826
827 // Fixed length allocation. If we don't need to re-align the stack and don't
828 // have SVE objects, we can use a more efficient sequence for stack probing.
829 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
831 assert(ScratchReg != AArch64::NoRegister);
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
833 .addDef(ScratchReg)
834 .addImm(AllocSize.getFixed())
835 .addImm(InitialOffset.getFixed())
836 .addImm(InitialOffset.getScalable());
837 // The fixed allocation may leave unprobed bytes at the top of the
838 // stack. If we have subsequent alocation (e.g. if we have variable-sized
839 // objects), we need to issue an extra probe, so these allocations start in
840 // a known state.
841 if (FollowupAllocs) {
842 // STR XZR, [SP]
843 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
844 .addReg(AArch64::XZR)
845 .addReg(AArch64::SP)
846 .addImm(0)
848 }
849
850 return;
851 }
852
853 // Variable length allocation.
854
855 // If the (unknown) allocation size cannot exceed the probe size, decrement
856 // the stack pointer right away.
857 int64_t ProbeSize = AFI.getStackProbeSize();
858 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
859 Register ScratchReg = RealignmentPadding
861 : AArch64::SP;
862 assert(ScratchReg != AArch64::NoRegister);
863 // SUB Xd, SP, AllocSize
864 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
865 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
866 EmitCFI, InitialOffset);
867 if (RealignmentPadding) {
868 // AND SP, Xn, 0b11111...0000
869 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
870 .addReg(ScratchReg, RegState::Kill)
873 AFI.setStackRealigned(true);
874 }
875 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
877 // STR XZR, [SP]
878 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
879 .addReg(AArch64::XZR)
880 .addReg(AArch64::SP)
881 .addImm(0)
883 }
884 return;
885 }
886
887 // Emit a variable-length allocation probing loop.
888 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
889 // each of them guaranteed to adjust the stack by less than the probe size.
891 assert(TargetReg != AArch64::NoRegister);
892 // SUB Xd, SP, AllocSize
893 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
894 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
895 EmitCFI, InitialOffset);
896 if (RealignmentPadding) {
897 // AND Xn, Xn, 0b11111...0000
898 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
899 .addReg(TargetReg, RegState::Kill)
902 }
903
904 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
905 .addReg(TargetReg);
906 if (EmitCFI) {
907 // Set the CFA register back to SP.
908 unsigned Reg =
909 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
910 unsigned CFIIndex =
912 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
913 .addCFIIndex(CFIIndex)
915 }
916 if (RealignmentPadding)
917 AFI.setStackRealigned(true);
918}
919
920static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
921 switch (Reg.id()) {
922 default:
923 // The called routine is expected to preserve r19-r28
924 // r29 and r30 are used as frame pointer and link register resp.
925 return 0;
926
927 // GPRs
928#define CASE(n) \
929 case AArch64::W##n: \
930 case AArch64::X##n: \
931 return AArch64::X##n
932 CASE(0);
933 CASE(1);
934 CASE(2);
935 CASE(3);
936 CASE(4);
937 CASE(5);
938 CASE(6);
939 CASE(7);
940 CASE(8);
941 CASE(9);
942 CASE(10);
943 CASE(11);
944 CASE(12);
945 CASE(13);
946 CASE(14);
947 CASE(15);
948 CASE(16);
949 CASE(17);
950 CASE(18);
951#undef CASE
952
953 // FPRs
954#define CASE(n) \
955 case AArch64::B##n: \
956 case AArch64::H##n: \
957 case AArch64::S##n: \
958 case AArch64::D##n: \
959 case AArch64::Q##n: \
960 return HasSVE ? AArch64::Z##n : AArch64::Q##n
961 CASE(0);
962 CASE(1);
963 CASE(2);
964 CASE(3);
965 CASE(4);
966 CASE(5);
967 CASE(6);
968 CASE(7);
969 CASE(8);
970 CASE(9);
971 CASE(10);
972 CASE(11);
973 CASE(12);
974 CASE(13);
975 CASE(14);
976 CASE(15);
977 CASE(16);
978 CASE(17);
979 CASE(18);
980 CASE(19);
981 CASE(20);
982 CASE(21);
983 CASE(22);
984 CASE(23);
985 CASE(24);
986 CASE(25);
987 CASE(26);
988 CASE(27);
989 CASE(28);
990 CASE(29);
991 CASE(30);
992 CASE(31);
993#undef CASE
994 }
995}
996
997void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
998 MachineBasicBlock &MBB) const {
999 // Insertion point.
1001
1002 // Fake a debug loc.
1003 DebugLoc DL;
1004 if (MBBI != MBB.end())
1005 DL = MBBI->getDebugLoc();
1006
1007 const MachineFunction &MF = *MBB.getParent();
1009 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
1010
1011 BitVector GPRsToZero(TRI.getNumRegs());
1012 BitVector FPRsToZero(TRI.getNumRegs());
1013 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
1014 for (MCRegister Reg : RegsToZero.set_bits()) {
1015 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1016 // For GPRs, we only care to clear out the 64-bit register.
1017 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1018 GPRsToZero.set(XReg);
1019 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1020 // For FPRs,
1021 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1022 FPRsToZero.set(XReg);
1023 }
1024 }
1025
1026 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1027
1028 // Zero out GPRs.
1029 for (MCRegister Reg : GPRsToZero.set_bits())
1030 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1031
1032 // Zero out FP/vector registers.
1033 for (MCRegister Reg : FPRsToZero.set_bits())
1034 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1035
1036 if (HasSVE) {
1037 for (MCRegister PReg :
1038 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1039 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1040 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1041 AArch64::P15}) {
1042 if (RegsToZero[PReg])
1043 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1044 }
1045 }
1046}
1047
1049 const MachineBasicBlock &MBB) {
1050 const MachineFunction *MF = MBB.getParent();
1051 LiveRegs.addLiveIns(MBB);
1052 // Mark callee saved registers as used so we will not choose them.
1053 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1054 for (unsigned i = 0; CSRegs[i]; ++i)
1055 LiveRegs.addReg(CSRegs[i]);
1056}
1057
1058// Find a scratch register that we can use at the start of the prologue to
1059// re-align the stack pointer. We avoid using callee-save registers since they
1060// may appear to be free when this is called from canUseAsPrologue (during
1061// shrink wrapping), but then no longer be free when this is called from
1062// emitPrologue.
1063//
1064// FIXME: This is a bit conservative, since in the above case we could use one
1065// of the callee-save registers as a scratch temp to re-align the stack pointer,
1066// but we would then have to make sure that we were in fact saving at least one
1067// callee-save register in the prologue, which is additional complexity that
1068// doesn't seem worth the benefit.
1070 MachineFunction *MF = MBB->getParent();
1071
1072 // If MBB is an entry block, use X9 as the scratch register
1073 // preserve_none functions may be using X9 to pass arguments,
1074 // so prefer to pick an available register below.
1075 if (&MF->front() == MBB &&
1077 return AArch64::X9;
1078
1079 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1080 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1081 LivePhysRegs LiveRegs(TRI);
1082 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1083
1084 // Prefer X9 since it was historically used for the prologue scratch reg.
1085 const MachineRegisterInfo &MRI = MF->getRegInfo();
1086 if (LiveRegs.available(MRI, AArch64::X9))
1087 return AArch64::X9;
1088
1089 for (unsigned Reg : AArch64::GPR64RegClass) {
1090 if (LiveRegs.available(MRI, Reg))
1091 return Reg;
1092 }
1093 return AArch64::NoRegister;
1094}
1095
1097 const MachineBasicBlock &MBB) const {
1098 const MachineFunction *MF = MBB.getParent();
1099 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1100 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1101 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1102 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1104
1105 if (AFI->hasSwiftAsyncContext()) {
1106 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1107 const MachineRegisterInfo &MRI = MF->getRegInfo();
1108 LivePhysRegs LiveRegs(TRI);
1109 getLiveRegsForEntryMBB(LiveRegs, MBB);
1110 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1111 // available.
1112 if (!LiveRegs.available(MRI, AArch64::X16) ||
1113 !LiveRegs.available(MRI, AArch64::X17))
1114 return false;
1115 }
1116
1117 // Certain stack probing sequences might clobber flags, then we can't use
1118 // the block as a prologue if the flags register is a live-in.
1120 MBB.isLiveIn(AArch64::NZCV))
1121 return false;
1122
1123 // Don't need a scratch register if we're not going to re-align the stack or
1124 // emit stack probes.
1125 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1126 return true;
1127 // Otherwise, we can use any block as long as it has a scratch register
1128 // available.
1129 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1130}
1131
1133 uint64_t StackSizeInBytes) {
1134 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1136 // TODO: When implementing stack protectors, take that into account
1137 // for the probe threshold.
1138 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1139 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1140}
1141
1142static bool needsWinCFI(const MachineFunction &MF) {
1143 const Function &F = MF.getFunction();
1144 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1145 F.needsUnwindTableEntry();
1146}
1147
1148bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1149 MachineFunction &MF, uint64_t StackBumpBytes) const {
1151 const MachineFrameInfo &MFI = MF.getFrameInfo();
1152 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1153 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1154 if (homogeneousPrologEpilog(MF))
1155 return false;
1156
1157 if (AFI->getLocalStackSize() == 0)
1158 return false;
1159
1160 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1161 // (to force a stp with predecrement) to match the packed unwind format,
1162 // provided that there actually are any callee saved registers to merge the
1163 // decrement with.
1164 // This is potentially marginally slower, but allows using the packed
1165 // unwind format for functions that both have a local area and callee saved
1166 // registers. Using the packed unwind format notably reduces the size of
1167 // the unwind info.
1168 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1169 MF.getFunction().hasOptSize())
1170 return false;
1171
1172 // 512 is the maximum immediate for stp/ldp that will be used for
1173 // callee-save save/restores
1174 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1175 return false;
1176
1177 if (MFI.hasVarSizedObjects())
1178 return false;
1179
1180 if (RegInfo->hasStackRealignment(MF))
1181 return false;
1182
1183 // This isn't strictly necessary, but it simplifies things a bit since the
1184 // current RedZone handling code assumes the SP is adjusted by the
1185 // callee-save save/restore code.
1186 if (canUseRedZone(MF))
1187 return false;
1188
1189 // When there is an SVE area on the stack, always allocate the
1190 // callee-saves and spills/locals separately.
1191 if (getSVEStackSize(MF))
1192 return false;
1193
1194 return true;
1195}
1196
1197bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1198 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1199 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1200 return false;
1201 if (MBB.empty())
1202 return true;
1203
1204 // Disable combined SP bump if the last instruction is an MTE tag store. It
1205 // is almost always better to merge SP adjustment into those instructions.
1208 while (LastI != Begin) {
1209 --LastI;
1210 if (LastI->isTransient())
1211 continue;
1212 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1213 break;
1214 }
1215 switch (LastI->getOpcode()) {
1216 case AArch64::STGloop:
1217 case AArch64::STZGloop:
1218 case AArch64::STGi:
1219 case AArch64::STZGi:
1220 case AArch64::ST2Gi:
1221 case AArch64::STZ2Gi:
1222 return false;
1223 default:
1224 return true;
1225 }
1226 llvm_unreachable("unreachable");
1227}
1228
1229// Given a load or a store instruction, generate an appropriate unwinding SEH
1230// code on Windows.
1232 const TargetInstrInfo &TII,
1233 MachineInstr::MIFlag Flag) {
1234 unsigned Opc = MBBI->getOpcode();
1236 MachineFunction &MF = *MBB->getParent();
1237 DebugLoc DL = MBBI->getDebugLoc();
1238 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1239 int Imm = MBBI->getOperand(ImmIdx).getImm();
1241 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1242 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1243
1244 switch (Opc) {
1245 default:
1246 llvm_unreachable("No SEH Opcode for this instruction");
1247 case AArch64::LDPDpost:
1248 Imm = -Imm;
1249 [[fallthrough]];
1250 case AArch64::STPDpre: {
1251 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1252 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1253 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1254 .addImm(Reg0)
1255 .addImm(Reg1)
1256 .addImm(Imm * 8)
1257 .setMIFlag(Flag);
1258 break;
1259 }
1260 case AArch64::LDPXpost:
1261 Imm = -Imm;
1262 [[fallthrough]];
1263 case AArch64::STPXpre: {
1264 Register Reg0 = MBBI->getOperand(1).getReg();
1265 Register Reg1 = MBBI->getOperand(2).getReg();
1266 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1267 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1268 .addImm(Imm * 8)
1269 .setMIFlag(Flag);
1270 else
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1272 .addImm(RegInfo->getSEHRegNum(Reg0))
1273 .addImm(RegInfo->getSEHRegNum(Reg1))
1274 .addImm(Imm * 8)
1275 .setMIFlag(Flag);
1276 break;
1277 }
1278 case AArch64::LDRDpost:
1279 Imm = -Imm;
1280 [[fallthrough]];
1281 case AArch64::STRDpre: {
1282 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1283 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1284 .addImm(Reg)
1285 .addImm(Imm)
1286 .setMIFlag(Flag);
1287 break;
1288 }
1289 case AArch64::LDRXpost:
1290 Imm = -Imm;
1291 [[fallthrough]];
1292 case AArch64::STRXpre: {
1293 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1294 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1295 .addImm(Reg)
1296 .addImm(Imm)
1297 .setMIFlag(Flag);
1298 break;
1299 }
1300 case AArch64::STPDi:
1301 case AArch64::LDPDi: {
1302 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1303 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1304 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1305 .addImm(Reg0)
1306 .addImm(Reg1)
1307 .addImm(Imm * 8)
1308 .setMIFlag(Flag);
1309 break;
1310 }
1311 case AArch64::STPXi:
1312 case AArch64::LDPXi: {
1313 Register Reg0 = MBBI->getOperand(0).getReg();
1314 Register Reg1 = MBBI->getOperand(1).getReg();
1315 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1316 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1317 .addImm(Imm * 8)
1318 .setMIFlag(Flag);
1319 else
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1321 .addImm(RegInfo->getSEHRegNum(Reg0))
1322 .addImm(RegInfo->getSEHRegNum(Reg1))
1323 .addImm(Imm * 8)
1324 .setMIFlag(Flag);
1325 break;
1326 }
1327 case AArch64::STRXui:
1328 case AArch64::LDRXui: {
1329 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1330 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1331 .addImm(Reg)
1332 .addImm(Imm * 8)
1333 .setMIFlag(Flag);
1334 break;
1335 }
1336 case AArch64::STRDui:
1337 case AArch64::LDRDui: {
1338 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1339 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1340 .addImm(Reg)
1341 .addImm(Imm * 8)
1342 .setMIFlag(Flag);
1343 break;
1344 }
1345 case AArch64::STPQi:
1346 case AArch64::LDPQi: {
1347 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1348 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1349 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1350 .addImm(Reg0)
1351 .addImm(Reg1)
1352 .addImm(Imm * 16)
1353 .setMIFlag(Flag);
1354 break;
1355 }
1356 case AArch64::LDPQpost:
1357 Imm = -Imm;
1358 [[fallthrough]];
1359 case AArch64::STPQpre: {
1360 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1361 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1362 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1363 .addImm(Reg0)
1364 .addImm(Reg1)
1365 .addImm(Imm * 16)
1366 .setMIFlag(Flag);
1367 break;
1368 }
1369 }
1370 auto I = MBB->insertAfter(MBBI, MIB);
1371 return I;
1372}
1373
1374// Fix up the SEH opcode associated with the save/restore instruction.
1376 unsigned LocalStackSize) {
1377 MachineOperand *ImmOpnd = nullptr;
1378 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1379 switch (MBBI->getOpcode()) {
1380 default:
1381 llvm_unreachable("Fix the offset in the SEH instruction");
1382 case AArch64::SEH_SaveFPLR:
1383 case AArch64::SEH_SaveRegP:
1384 case AArch64::SEH_SaveReg:
1385 case AArch64::SEH_SaveFRegP:
1386 case AArch64::SEH_SaveFReg:
1387 case AArch64::SEH_SaveAnyRegQP:
1388 case AArch64::SEH_SaveAnyRegQPX:
1389 ImmOpnd = &MBBI->getOperand(ImmIdx);
1390 break;
1391 }
1392 if (ImmOpnd)
1393 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1394}
1395
1398 return AFI->hasStreamingModeChanges() &&
1399 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1400}
1401
1404 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1405 // is enabled with streaming mode changes.
1406 if (!AFI->hasStreamingModeChanges())
1407 return false;
1408 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1409 if (ST.isTargetDarwin())
1410 return ST.hasSVE();
1411 return true;
1412}
1413
1415 unsigned Opc = MBBI->getOpcode();
1416 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1417 Opc == AArch64::UBFMXri)
1418 return true;
1419
1420 if (requiresGetVGCall(*MBBI->getMF())) {
1421 if (Opc == AArch64::ORRXrr)
1422 return true;
1423
1424 if (Opc == AArch64::BL) {
1425 auto Op1 = MBBI->getOperand(0);
1426 return Op1.isSymbol() &&
1427 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1428 }
1429 }
1430
1431 return false;
1432}
1433
1434// Convert callee-save register save/restore instruction to do stack pointer
1435// decrement/increment to allocate/deallocate the callee-save stack area by
1436// converting store/load to use pre/post increment version.
1439 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1440 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1442 int CFAOffset = 0) {
1443 unsigned NewOpc;
1444
1445 // If the function contains streaming mode changes, we expect instructions
1446 // to calculate the value of VG before spilling. For locally-streaming
1447 // functions, we need to do this for both the streaming and non-streaming
1448 // vector length. Move past these instructions if necessary.
1449 MachineFunction &MF = *MBB.getParent();
1450 if (requiresSaveVG(MF))
1451 while (isVGInstruction(MBBI))
1452 ++MBBI;
1453
1454 switch (MBBI->getOpcode()) {
1455 default:
1456 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1457 case AArch64::STPXi:
1458 NewOpc = AArch64::STPXpre;
1459 break;
1460 case AArch64::STPDi:
1461 NewOpc = AArch64::STPDpre;
1462 break;
1463 case AArch64::STPQi:
1464 NewOpc = AArch64::STPQpre;
1465 break;
1466 case AArch64::STRXui:
1467 NewOpc = AArch64::STRXpre;
1468 break;
1469 case AArch64::STRDui:
1470 NewOpc = AArch64::STRDpre;
1471 break;
1472 case AArch64::STRQui:
1473 NewOpc = AArch64::STRQpre;
1474 break;
1475 case AArch64::LDPXi:
1476 NewOpc = AArch64::LDPXpost;
1477 break;
1478 case AArch64::LDPDi:
1479 NewOpc = AArch64::LDPDpost;
1480 break;
1481 case AArch64::LDPQi:
1482 NewOpc = AArch64::LDPQpost;
1483 break;
1484 case AArch64::LDRXui:
1485 NewOpc = AArch64::LDRXpost;
1486 break;
1487 case AArch64::LDRDui:
1488 NewOpc = AArch64::LDRDpost;
1489 break;
1490 case AArch64::LDRQui:
1491 NewOpc = AArch64::LDRQpost;
1492 break;
1493 }
1494 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1495 int64_t MinOffset, MaxOffset;
1496 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1497 NewOpc, Scale, Width, MinOffset, MaxOffset);
1498 (void)Success;
1499 assert(Success && "unknown load/store opcode");
1500
1501 // If the first store isn't right where we want SP then we can't fold the
1502 // update in so create a normal arithmetic instruction instead.
1503 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1504 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1505 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1506 // If we are destroying the frame, make sure we add the increment after the
1507 // last frame operation.
1508 if (FrameFlag == MachineInstr::FrameDestroy) {
1509 ++MBBI;
1510 // Also skip the SEH instruction, if needed
1511 if (NeedsWinCFI && AArch64InstrInfo::isSEHInstruction(*MBBI))
1512 ++MBBI;
1513 }
1514 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1515 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1516 false, NeedsWinCFI, HasWinCFI, EmitCFI,
1517 StackOffset::getFixed(CFAOffset));
1518
1519 return std::prev(MBBI);
1520 }
1521
1522 // Get rid of the SEH code associated with the old instruction.
1523 if (NeedsWinCFI) {
1524 auto SEH = std::next(MBBI);
1526 SEH->eraseFromParent();
1527 }
1528
1529 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1530 MIB.addReg(AArch64::SP, RegState::Define);
1531
1532 // Copy all operands other than the immediate offset.
1533 unsigned OpndIdx = 0;
1534 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1535 ++OpndIdx)
1536 MIB.add(MBBI->getOperand(OpndIdx));
1537
1538 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1539 "Unexpected immediate offset in first/last callee-save save/restore "
1540 "instruction!");
1541 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1542 "Unexpected base register in callee-save save/restore instruction!");
1543 assert(CSStackSizeInc % Scale == 0);
1544 MIB.addImm(CSStackSizeInc / (int)Scale);
1545
1546 MIB.setMIFlags(MBBI->getFlags());
1547 MIB.setMemRefs(MBBI->memoperands());
1548
1549 // Generate a new SEH code that corresponds to the new instruction.
1550 if (NeedsWinCFI) {
1551 *HasWinCFI = true;
1552 InsertSEH(*MIB, *TII, FrameFlag);
1553 }
1554
1555 if (EmitCFI) {
1556 unsigned CFIIndex = MF.addFrameInst(
1557 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1558 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1559 .addCFIIndex(CFIIndex)
1560 .setMIFlags(FrameFlag);
1561 }
1562
1563 return std::prev(MBB.erase(MBBI));
1564}
1565
1566// Fixup callee-save register save/restore instructions to take into account
1567// combined SP bump by adding the local stack size to the stack offsets.
1569 uint64_t LocalStackSize,
1570 bool NeedsWinCFI,
1571 bool *HasWinCFI) {
1573 return;
1574
1575 unsigned Opc = MI.getOpcode();
1576 unsigned Scale;
1577 switch (Opc) {
1578 case AArch64::STPXi:
1579 case AArch64::STRXui:
1580 case AArch64::STPDi:
1581 case AArch64::STRDui:
1582 case AArch64::LDPXi:
1583 case AArch64::LDRXui:
1584 case AArch64::LDPDi:
1585 case AArch64::LDRDui:
1586 Scale = 8;
1587 break;
1588 case AArch64::STPQi:
1589 case AArch64::STRQui:
1590 case AArch64::LDPQi:
1591 case AArch64::LDRQui:
1592 Scale = 16;
1593 break;
1594 default:
1595 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1596 }
1597
1598 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1599 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1600 "Unexpected base register in callee-save save/restore instruction!");
1601 // Last operand is immediate offset that needs fixing.
1602 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1603 // All generated opcodes have scaled offsets.
1604 assert(LocalStackSize % Scale == 0);
1605 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1606
1607 if (NeedsWinCFI) {
1608 *HasWinCFI = true;
1609 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1610 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1612 "Expecting a SEH instruction");
1613 fixupSEHOpcode(MBBI, LocalStackSize);
1614 }
1615}
1616
1617static bool isTargetWindows(const MachineFunction &MF) {
1619}
1620
1621static unsigned getStackHazardSize(const MachineFunction &MF) {
1622 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1623}
1624
1625// Convenience function to determine whether I is an SVE callee save.
1627 switch (I->getOpcode()) {
1628 default:
1629 return false;
1630 case AArch64::PTRUE_C_B:
1631 case AArch64::LD1B_2Z_IMM:
1632 case AArch64::ST1B_2Z_IMM:
1633 case AArch64::STR_ZXI:
1634 case AArch64::STR_PXI:
1635 case AArch64::LDR_ZXI:
1636 case AArch64::LDR_PXI:
1637 case AArch64::PTRUE_B:
1638 case AArch64::CPY_ZPzI_B:
1639 case AArch64::CMPNE_PPzZI_B:
1640 return I->getFlag(MachineInstr::FrameSetup) ||
1641 I->getFlag(MachineInstr::FrameDestroy);
1642 }
1643}
1644
1646 MachineFunction &MF,
1649 const DebugLoc &DL, bool NeedsWinCFI,
1650 bool NeedsUnwindInfo) {
1651 // Shadow call stack prolog: str x30, [x18], #8
1652 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1653 .addReg(AArch64::X18, RegState::Define)
1654 .addReg(AArch64::LR)
1655 .addReg(AArch64::X18)
1656 .addImm(8)
1658
1659 // This instruction also makes x18 live-in to the entry block.
1660 MBB.addLiveIn(AArch64::X18);
1661
1662 if (NeedsWinCFI)
1663 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1665
1666 if (NeedsUnwindInfo) {
1667 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1668 // x18 when unwinding past this frame.
1669 static const char CFIInst[] = {
1670 dwarf::DW_CFA_val_expression,
1671 18, // register
1672 2, // length
1673 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1674 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1675 };
1676 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1677 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1678 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1679 .addCFIIndex(CFIIndex)
1681 }
1682}
1683
1685 MachineFunction &MF,
1688 const DebugLoc &DL) {
1689 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1690 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1691 .addReg(AArch64::X18, RegState::Define)
1692 .addReg(AArch64::LR, RegState::Define)
1693 .addReg(AArch64::X18)
1694 .addImm(-8)
1696
1698 unsigned CFIIndex =
1700 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1701 .addCFIIndex(CFIIndex)
1703 }
1704}
1705
1706// Define the current CFA rule to use the provided FP.
1709 const DebugLoc &DL, unsigned FixedObject) {
1712 const TargetInstrInfo *TII = STI.getInstrInfo();
1714
1715 const int OffsetToFirstCalleeSaveFromFP =
1718 Register FramePtr = TRI->getFrameRegister(MF);
1719 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1720 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1721 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1722 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1723 .addCFIIndex(CFIIndex)
1725}
1726
1727#ifndef NDEBUG
1728/// Collect live registers from the end of \p MI's parent up to (including) \p
1729/// MI in \p LiveRegs.
1731 LivePhysRegs &LiveRegs) {
1732
1733 MachineBasicBlock &MBB = *MI.getParent();
1734 LiveRegs.addLiveOuts(MBB);
1735 for (const MachineInstr &MI :
1736 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1737 LiveRegs.stepBackward(MI);
1738}
1739#endif
1740
1742 MachineBasicBlock &MBB) const {
1744 const MachineFrameInfo &MFI = MF.getFrameInfo();
1745 const Function &F = MF.getFunction();
1746 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1747 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1748 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1749
1751 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1752 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1753 bool HasFP = hasFP(MF);
1754 bool NeedsWinCFI = needsWinCFI(MF);
1755 bool HasWinCFI = false;
1756 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1757
1759#ifndef NDEBUG
1761 // Collect live register from the end of MBB up to the start of the existing
1762 // frame setup instructions.
1763 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1764 while (NonFrameStart != End &&
1765 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1766 ++NonFrameStart;
1767
1768 LivePhysRegs LiveRegs(*TRI);
1769 if (NonFrameStart != MBB.end()) {
1770 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1771 // Ignore registers used for stack management for now.
1772 LiveRegs.removeReg(AArch64::SP);
1773 LiveRegs.removeReg(AArch64::X19);
1774 LiveRegs.removeReg(AArch64::FP);
1775 LiveRegs.removeReg(AArch64::LR);
1776
1777 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1778 // This is necessary to spill VG if required where SVE is unavailable, but
1779 // X0 is preserved around this call.
1780 if (requiresGetVGCall(MF))
1781 LiveRegs.removeReg(AArch64::X0);
1782 }
1783
1784 auto VerifyClobberOnExit = make_scope_exit([&]() {
1785 if (NonFrameStart == MBB.end())
1786 return;
1787 // Check if any of the newly instructions clobber any of the live registers.
1788 for (MachineInstr &MI :
1789 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1790 for (auto &Op : MI.operands())
1791 if (Op.isReg() && Op.isDef())
1792 assert(!LiveRegs.contains(Op.getReg()) &&
1793 "live register clobbered by inserted prologue instructions");
1794 }
1795 });
1796#endif
1797
1798 bool IsFunclet = MBB.isEHFuncletEntry();
1799
1800 // At this point, we're going to decide whether or not the function uses a
1801 // redzone. In most cases, the function doesn't have a redzone so let's
1802 // assume that's false and set it to true in the case that there's a redzone.
1803 AFI->setHasRedZone(false);
1804
1805 // Debug location must be unknown since the first debug location is used
1806 // to determine the end of the prologue.
1807 DebugLoc DL;
1808
1809 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1810 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1811 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1812 MFnI.needsDwarfUnwindInfo(MF));
1813
1814 if (MFnI.shouldSignReturnAddress(MF)) {
1815 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1817 if (NeedsWinCFI)
1818 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1819 }
1820
1821 if (EmitCFI && MFnI.isMTETagged()) {
1822 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1824 }
1825
1826 // We signal the presence of a Swift extended frame to external tools by
1827 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1828 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1829 // bits so that is still true.
1830 if (HasFP && AFI->hasSwiftAsyncContext()) {
1833 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1834 // The special symbol below is absolute and has a *value* that can be
1835 // combined with the frame pointer to signal an extended frame.
1836 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1837 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1839 if (NeedsWinCFI) {
1840 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1842 HasWinCFI = true;
1843 }
1844 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1845 .addUse(AArch64::FP)
1846 .addUse(AArch64::X16)
1847 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1848 if (NeedsWinCFI) {
1849 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1851 HasWinCFI = true;
1852 }
1853 break;
1854 }
1855 [[fallthrough]];
1856
1858 // ORR x29, x29, #0x1000_0000_0000_0000
1859 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1860 .addUse(AArch64::FP)
1861 .addImm(0x1100)
1863 if (NeedsWinCFI) {
1864 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1866 HasWinCFI = true;
1867 }
1868 break;
1869
1871 break;
1872 }
1873 }
1874
1875 // All calls are tail calls in GHC calling conv, and functions have no
1876 // prologue/epilogue.
1878 return;
1879
1880 // Set tagged base pointer to the requested stack slot.
1881 // Ideally it should match SP value after prologue.
1882 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1883 if (TBPI)
1885 else
1887
1888 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1889
1890 // getStackSize() includes all the locals in its size calculation. We don't
1891 // include these locals when computing the stack size of a funclet, as they
1892 // are allocated in the parent's stack frame and accessed via the frame
1893 // pointer from the funclet. We only save the callee saved registers in the
1894 // funclet, which are really the callee saved registers of the parent
1895 // function, including the funclet.
1896 int64_t NumBytes =
1897 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1898 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1899 assert(!HasFP && "unexpected function without stack frame but with FP");
1900 assert(!SVEStackSize &&
1901 "unexpected function without stack frame but with SVE objects");
1902 // All of the stack allocation is for locals.
1903 AFI->setLocalStackSize(NumBytes);
1904 if (!NumBytes)
1905 return;
1906 // REDZONE: If the stack size is less than 128 bytes, we don't need
1907 // to actually allocate.
1908 if (canUseRedZone(MF)) {
1909 AFI->setHasRedZone(true);
1910 ++NumRedZoneFunctions;
1911 } else {
1912 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1913 StackOffset::getFixed(-NumBytes), TII,
1914 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1915 if (EmitCFI) {
1916 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1917 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
1918 // Encode the stack size of the leaf function.
1919 unsigned CFIIndex = MF.addFrameInst(
1920 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1921 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1922 .addCFIIndex(CFIIndex)
1924 }
1925 }
1926
1927 if (NeedsWinCFI) {
1928 HasWinCFI = true;
1929 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1931 }
1932
1933 return;
1934 }
1935
1936 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1937 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1938
1939 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1940 // All of the remaining stack allocations are for locals.
1941 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1942 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1943 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1944 if (CombineSPBump) {
1945 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1946 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1947 StackOffset::getFixed(-NumBytes), TII,
1948 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1949 EmitAsyncCFI);
1950 NumBytes = 0;
1951 } else if (HomPrologEpilog) {
1952 // Stack has been already adjusted.
1953 NumBytes -= PrologueSaveSize;
1954 } else if (PrologueSaveSize != 0) {
1956 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1957 EmitAsyncCFI);
1958 NumBytes -= PrologueSaveSize;
1959 }
1960 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1961
1962 // Move past the saves of the callee-saved registers, fixing up the offsets
1963 // and pre-inc if we decided to combine the callee-save and local stack
1964 // pointer bump above.
1965 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1967 if (CombineSPBump &&
1968 // Only fix-up frame-setup load/store instructions.
1971 NeedsWinCFI, &HasWinCFI);
1972 ++MBBI;
1973 }
1974
1975 // For funclets the FP belongs to the containing function.
1976 if (!IsFunclet && HasFP) {
1977 // Only set up FP if we actually need to.
1978 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1979
1980 if (CombineSPBump)
1981 FPOffset += AFI->getLocalStackSize();
1982
1983 if (AFI->hasSwiftAsyncContext()) {
1984 // Before we update the live FP we have to ensure there's a valid (or
1985 // null) asynchronous context in its slot just before FP in the frame
1986 // record, so store it now.
1987 const auto &Attrs = MF.getFunction().getAttributes();
1988 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1989 if (HaveInitialContext)
1990 MBB.addLiveIn(AArch64::X22);
1991 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1992 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1993 .addUse(Reg)
1994 .addUse(AArch64::SP)
1995 .addImm(FPOffset - 8)
1997 if (NeedsWinCFI) {
1998 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1999 // to multiple instructions, should be mutually-exclusive.
2000 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
2001 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2003 HasWinCFI = true;
2004 }
2005 }
2006
2007 if (HomPrologEpilog) {
2008 auto Prolog = MBBI;
2009 --Prolog;
2010 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
2011 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
2012 } else {
2013 // Issue sub fp, sp, FPOffset or
2014 // mov fp,sp when FPOffset is zero.
2015 // Note: All stores of callee-saved registers are marked as "FrameSetup".
2016 // This code marks the instruction(s) that set the FP also.
2017 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
2018 StackOffset::getFixed(FPOffset), TII,
2019 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2020 if (NeedsWinCFI && HasWinCFI) {
2021 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2023 // After setting up the FP, the rest of the prolog doesn't need to be
2024 // included in the SEH unwind info.
2025 NeedsWinCFI = false;
2026 }
2027 }
2028 if (EmitAsyncCFI)
2029 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2030 }
2031
2032 // Now emit the moves for whatever callee saved regs we have (including FP,
2033 // LR if those are saved). Frame instructions for SVE register are emitted
2034 // later, after the instruction which actually save SVE regs.
2035 if (EmitAsyncCFI)
2036 emitCalleeSavedGPRLocations(MBB, MBBI);
2037
2038 // Alignment is required for the parent frame, not the funclet
2039 const bool NeedsRealignment =
2040 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2041 const int64_t RealignmentPadding =
2042 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2043 ? MFI.getMaxAlign().value() - 16
2044 : 0;
2045
2046 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2047 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2048 if (NeedsWinCFI) {
2049 HasWinCFI = true;
2050 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2051 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2052 // This is at most two instructions, MOVZ follwed by MOVK.
2053 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2054 // exceeding 256MB in size.
2055 if (NumBytes >= (1 << 28))
2056 report_fatal_error("Stack size cannot exceed 256MB for stack "
2057 "unwinding purposes");
2058
2059 uint32_t LowNumWords = NumWords & 0xFFFF;
2060 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2061 .addImm(LowNumWords)
2064 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2066 if ((NumWords & 0xFFFF0000) != 0) {
2067 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2068 .addReg(AArch64::X15)
2069 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2072 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2074 }
2075 } else {
2076 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2077 .addImm(NumWords)
2079 }
2080
2081 const char *ChkStk = Subtarget.getChkStkName();
2082 switch (MF.getTarget().getCodeModel()) {
2083 case CodeModel::Tiny:
2084 case CodeModel::Small:
2085 case CodeModel::Medium:
2086 case CodeModel::Kernel:
2087 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2088 .addExternalSymbol(ChkStk)
2089 .addReg(AArch64::X15, RegState::Implicit)
2094 if (NeedsWinCFI) {
2095 HasWinCFI = true;
2096 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2098 }
2099 break;
2100 case CodeModel::Large:
2101 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2102 .addReg(AArch64::X16, RegState::Define)
2103 .addExternalSymbol(ChkStk)
2104 .addExternalSymbol(ChkStk)
2106 if (NeedsWinCFI) {
2107 HasWinCFI = true;
2108 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2110 }
2111
2112 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2113 .addReg(AArch64::X16, RegState::Kill)
2119 if (NeedsWinCFI) {
2120 HasWinCFI = true;
2121 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2123 }
2124 break;
2125 }
2126
2127 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2128 .addReg(AArch64::SP, RegState::Kill)
2129 .addReg(AArch64::X15, RegState::Kill)
2132 if (NeedsWinCFI) {
2133 HasWinCFI = true;
2134 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2135 .addImm(NumBytes)
2137 }
2138 NumBytes = 0;
2139
2140 if (RealignmentPadding > 0) {
2141 if (RealignmentPadding >= 4096) {
2142 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2143 .addReg(AArch64::X16, RegState::Define)
2144 .addImm(RealignmentPadding)
2146 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2147 .addReg(AArch64::SP)
2148 .addReg(AArch64::X16, RegState::Kill)
2151 } else {
2152 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2153 .addReg(AArch64::SP)
2154 .addImm(RealignmentPadding)
2155 .addImm(0)
2157 }
2158
2159 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2160 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2161 .addReg(AArch64::X15, RegState::Kill)
2163 AFI->setStackRealigned(true);
2164
2165 // No need for SEH instructions here; if we're realigning the stack,
2166 // we've set a frame pointer and already finished the SEH prologue.
2167 assert(!NeedsWinCFI);
2168 }
2169 }
2170
2171 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2172 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2173
2174 // Process the SVE callee-saves to determine what space needs to be
2175 // allocated.
2176 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2177 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2178 << "\n");
2179 // Find callee save instructions in frame.
2180 CalleeSavesBegin = MBBI;
2181 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2183 ++MBBI;
2184 CalleeSavesEnd = MBBI;
2185
2186 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2187 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2188 }
2189
2190 // Allocate space for the callee saves (if any).
2191 StackOffset CFAOffset =
2192 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2193 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2194 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2195 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2196 MFI.hasVarSizedObjects() || LocalsSize);
2197 CFAOffset += SVECalleeSavesSize;
2198
2199 if (EmitAsyncCFI)
2200 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2201
2202 // Allocate space for the rest of the frame including SVE locals. Align the
2203 // stack as necessary.
2204 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2205 "Cannot use redzone with stack realignment");
2206 if (!canUseRedZone(MF)) {
2207 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2208 // the correct value here, as NumBytes also includes padding bytes,
2209 // which shouldn't be counted here.
2210 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2211 SVELocalsSize + StackOffset::getFixed(NumBytes),
2212 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2213 CFAOffset, MFI.hasVarSizedObjects());
2214 }
2215
2216 // If we need a base pointer, set it up here. It's whatever the value of the
2217 // stack pointer is at this point. Any variable size objects will be allocated
2218 // after this, so we can still use the base pointer to reference locals.
2219 //
2220 // FIXME: Clarify FrameSetup flags here.
2221 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2222 // needed.
2223 // For funclets the BP belongs to the containing function.
2224 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2225 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2226 false);
2227 if (NeedsWinCFI) {
2228 HasWinCFI = true;
2229 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2231 }
2232 }
2233
2234 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2235 // SEH opcode indicating the prologue end.
2236 if (NeedsWinCFI && HasWinCFI) {
2237 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2239 }
2240
2241 // SEH funclets are passed the frame pointer in X1. If the parent
2242 // function uses the base register, then the base register is used
2243 // directly, and is not retrieved from X1.
2244 if (IsFunclet && F.hasPersonalityFn()) {
2245 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2246 if (isAsynchronousEHPersonality(Per)) {
2247 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2248 .addReg(AArch64::X1)
2250 MBB.addLiveIn(AArch64::X1);
2251 }
2252 }
2253
2254 if (EmitCFI && !EmitAsyncCFI) {
2255 if (HasFP) {
2256 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2257 } else {
2258 StackOffset TotalSize =
2259 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2260 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2261 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2262 /*LastAdjustmentWasScalable=*/false));
2263 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2264 .addCFIIndex(CFIIndex)
2266 }
2267 emitCalleeSavedGPRLocations(MBB, MBBI);
2268 emitCalleeSavedSVELocations(MBB, MBBI);
2269 }
2270}
2271
2273 switch (MI.getOpcode()) {
2274 default:
2275 return false;
2276 case AArch64::CATCHRET:
2277 case AArch64::CLEANUPRET:
2278 return true;
2279 }
2280}
2281
2283 MachineBasicBlock &MBB) const {
2285 MachineFrameInfo &MFI = MF.getFrameInfo();
2287 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2288 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2289 DebugLoc DL;
2290 bool NeedsWinCFI = needsWinCFI(MF);
2291 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2292 bool HasWinCFI = false;
2293 bool IsFunclet = false;
2294
2295 if (MBB.end() != MBBI) {
2296 DL = MBBI->getDebugLoc();
2297 IsFunclet = isFuncletReturnInstr(*MBBI);
2298 }
2299
2300 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2301
2302 auto FinishingTouches = make_scope_exit([&]() {
2303 if (AFI->shouldSignReturnAddress(MF)) {
2304 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2305 TII->get(AArch64::PAUTH_EPILOGUE))
2306 .setMIFlag(MachineInstr::FrameDestroy);
2307 if (NeedsWinCFI)
2308 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2309 }
2312 if (EmitCFI)
2313 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2314 if (HasWinCFI) {
2316 TII->get(AArch64::SEH_EpilogEnd))
2318 if (!MF.hasWinCFI())
2319 MF.setHasWinCFI(true);
2320 }
2321 if (NeedsWinCFI) {
2322 assert(EpilogStartI != MBB.end());
2323 if (!HasWinCFI)
2324 MBB.erase(EpilogStartI);
2325 }
2326 });
2327
2328 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2329 : MFI.getStackSize();
2330
2331 // All calls are tail calls in GHC calling conv, and functions have no
2332 // prologue/epilogue.
2334 return;
2335
2336 // How much of the stack used by incoming arguments this function is expected
2337 // to restore in this particular epilogue.
2338 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2339 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2340 MF.getFunction().isVarArg());
2341 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2342
2343 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2344 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2345 // We cannot rely on the local stack size set in emitPrologue if the function
2346 // has funclets, as funclets have different local stack size requirements, and
2347 // the current value set in emitPrologue may be that of the containing
2348 // function.
2349 if (MF.hasEHFunclets())
2350 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2351 if (homogeneousPrologEpilog(MF, &MBB)) {
2352 assert(!NeedsWinCFI);
2353 auto LastPopI = MBB.getFirstTerminator();
2354 if (LastPopI != MBB.begin()) {
2355 auto HomogeneousEpilog = std::prev(LastPopI);
2356 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2357 LastPopI = HomogeneousEpilog;
2358 }
2359
2360 // Adjust local stack
2361 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2363 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2364
2365 // SP has been already adjusted while restoring callee save regs.
2366 // We've bailed-out the case with adjusting SP for arguments.
2367 assert(AfterCSRPopSize == 0);
2368 return;
2369 }
2370 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2371 // Assume we can't combine the last pop with the sp restore.
2372 bool CombineAfterCSRBump = false;
2373 if (!CombineSPBump && PrologueSaveSize != 0) {
2375 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2377 Pop = std::prev(Pop);
2378 // Converting the last ldp to a post-index ldp is valid only if the last
2379 // ldp's offset is 0.
2380 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2381 // If the offset is 0 and the AfterCSR pop is not actually trying to
2382 // allocate more stack for arguments (in space that an untimely interrupt
2383 // may clobber), convert it to a post-index ldp.
2384 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2386 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2387 MachineInstr::FrameDestroy, PrologueSaveSize);
2388 } else {
2389 // If not, make sure to emit an add after the last ldp.
2390 // We're doing this by transfering the size to be restored from the
2391 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2392 // pops.
2393 AfterCSRPopSize += PrologueSaveSize;
2394 CombineAfterCSRBump = true;
2395 }
2396 }
2397
2398 // Move past the restores of the callee-saved registers.
2399 // If we plan on combining the sp bump of the local stack size and the callee
2400 // save stack size, we might need to adjust the CSR save and restore offsets.
2403 while (LastPopI != Begin) {
2404 --LastPopI;
2405 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2406 IsSVECalleeSave(LastPopI)) {
2407 ++LastPopI;
2408 break;
2409 } else if (CombineSPBump)
2411 NeedsWinCFI, &HasWinCFI);
2412 }
2413
2414 if (NeedsWinCFI) {
2415 // Note that there are cases where we insert SEH opcodes in the
2416 // epilogue when we had no SEH opcodes in the prologue. For
2417 // example, when there is no stack frame but there are stack
2418 // arguments. Insert the SEH_EpilogStart and remove it later if it
2419 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2420 // functions that don't need it.
2421 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2423 EpilogStartI = LastPopI;
2424 --EpilogStartI;
2425 }
2426
2427 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2430 // Avoid the reload as it is GOT relative, and instead fall back to the
2431 // hardcoded value below. This allows a mismatch between the OS and
2432 // application without immediately terminating on the difference.
2433 [[fallthrough]];
2435 // We need to reset FP to its untagged state on return. Bit 60 is
2436 // currently used to show the presence of an extended frame.
2437
2438 // BIC x29, x29, #0x1000_0000_0000_0000
2439 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2440 AArch64::FP)
2441 .addUse(AArch64::FP)
2442 .addImm(0x10fe)
2444 if (NeedsWinCFI) {
2445 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2447 HasWinCFI = true;
2448 }
2449 break;
2450
2452 break;
2453 }
2454 }
2455
2456 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2457
2458 // If there is a single SP update, insert it before the ret and we're done.
2459 if (CombineSPBump) {
2460 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2461
2462 // When we are about to restore the CSRs, the CFA register is SP again.
2463 if (EmitCFI && hasFP(MF)) {
2464 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2465 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2466 unsigned CFIIndex =
2467 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2468 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2469 .addCFIIndex(CFIIndex)
2471 }
2472
2473 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2474 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2475 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2476 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2477 return;
2478 }
2479
2480 NumBytes -= PrologueSaveSize;
2481 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2482
2483 // Process the SVE callee-saves to determine what space needs to be
2484 // deallocated.
2485 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2486 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2487 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2488 RestoreBegin = std::prev(RestoreEnd);
2489 while (RestoreBegin != MBB.begin() &&
2490 IsSVECalleeSave(std::prev(RestoreBegin)))
2491 --RestoreBegin;
2492
2493 assert(IsSVECalleeSave(RestoreBegin) &&
2494 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2495
2496 StackOffset CalleeSavedSizeAsOffset =
2497 StackOffset::getScalable(CalleeSavedSize);
2498 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2499 DeallocateAfter = CalleeSavedSizeAsOffset;
2500 }
2501
2502 // Deallocate the SVE area.
2503 if (SVEStackSize) {
2504 // If we have stack realignment or variable sized objects on the stack,
2505 // restore the stack pointer from the frame pointer prior to SVE CSR
2506 // restoration.
2507 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2508 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2509 // Set SP to start of SVE callee-save area from which they can
2510 // be reloaded. The code below will deallocate the stack space
2511 // space by moving FP -> SP.
2512 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2513 StackOffset::getScalable(-CalleeSavedSize), TII,
2515 }
2516 } else {
2517 if (AFI->getSVECalleeSavedStackSize()) {
2518 // Deallocate the non-SVE locals first before we can deallocate (and
2519 // restore callee saves) from the SVE area.
2521 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2523 false, false, nullptr, EmitCFI && !hasFP(MF),
2524 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2525 NumBytes = 0;
2526 }
2527
2528 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2529 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2530 false, nullptr, EmitCFI && !hasFP(MF),
2531 SVEStackSize +
2532 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2533
2534 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2535 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2536 false, nullptr, EmitCFI && !hasFP(MF),
2537 DeallocateAfter +
2538 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2539 }
2540 if (EmitCFI)
2541 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2542 }
2543
2544 if (!hasFP(MF)) {
2545 bool RedZone = canUseRedZone(MF);
2546 // If this was a redzone leaf function, we don't need to restore the
2547 // stack pointer (but we may need to pop stack args for fastcc).
2548 if (RedZone && AfterCSRPopSize == 0)
2549 return;
2550
2551 // Pop the local variables off the stack. If there are no callee-saved
2552 // registers, it means we are actually positioned at the terminator and can
2553 // combine stack increment for the locals and the stack increment for
2554 // callee-popped arguments into (possibly) a single instruction and be done.
2555 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2556 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2557 if (NoCalleeSaveRestore)
2558 StackRestoreBytes += AfterCSRPopSize;
2559
2561 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2562 StackOffset::getFixed(StackRestoreBytes), TII,
2563 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2564 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2565
2566 // If we were able to combine the local stack pop with the argument pop,
2567 // then we're done.
2568 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2569 return;
2570 }
2571
2572 NumBytes = 0;
2573 }
2574
2575 // Restore the original stack pointer.
2576 // FIXME: Rather than doing the math here, we should instead just use
2577 // non-post-indexed loads for the restores if we aren't actually going to
2578 // be able to save any instructions.
2579 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2581 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2583 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2584 } else if (NumBytes)
2585 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2586 StackOffset::getFixed(NumBytes), TII,
2587 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2588
2589 // When we are about to restore the CSRs, the CFA register is SP again.
2590 if (EmitCFI && hasFP(MF)) {
2591 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2592 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2593 unsigned CFIIndex = MF.addFrameInst(
2594 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2595 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2596 .addCFIIndex(CFIIndex)
2598 }
2599
2600 // This must be placed after the callee-save restore code because that code
2601 // assumes the SP is at the same location as it was after the callee-save save
2602 // code in the prologue.
2603 if (AfterCSRPopSize) {
2604 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2605 "interrupt may have clobbered");
2606
2608 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2610 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2611 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2612 }
2613}
2614
2617 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2618}
2619
2620/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2621/// debug info. It's the same as what we use for resolving the code-gen
2622/// references for now. FIXME: This can go wrong when references are
2623/// SP-relative and simple call frames aren't used.
2626 Register &FrameReg) const {
2628 MF, FI, FrameReg,
2629 /*PreferFP=*/
2630 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2631 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2632 /*ForSimm=*/false);
2633}
2634
2637 int FI) const {
2638 // This function serves to provide a comparable offset from a single reference
2639 // point (the value of SP at function entry) that can be used for analysis,
2640 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2641 // correct for all objects in the presence of VLA-area objects or dynamic
2642 // stack re-alignment.
2643
2644 const auto &MFI = MF.getFrameInfo();
2645
2646 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2647 StackOffset SVEStackSize = getSVEStackSize(MF);
2648
2649 // For VLA-area objects, just emit an offset at the end of the stack frame.
2650 // Whilst not quite correct, these objects do live at the end of the frame and
2651 // so it is more useful for analysis for the offset to reflect this.
2652 if (MFI.isVariableSizedObjectIndex(FI)) {
2653 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2654 }
2655
2656 // This is correct in the absence of any SVE stack objects.
2657 if (!SVEStackSize)
2658 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2659
2660 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2661 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2662 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2663 ObjectOffset);
2664 }
2665
2666 bool IsFixed = MFI.isFixedObjectIndex(FI);
2667 bool IsCSR =
2668 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2669
2670 StackOffset ScalableOffset = {};
2671 if (!IsFixed && !IsCSR)
2672 ScalableOffset = -SVEStackSize;
2673
2674 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2675}
2676
2679 int FI) const {
2681}
2682
2684 int64_t ObjectOffset) {
2685 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2686 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2687 const Function &F = MF.getFunction();
2688 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2689 unsigned FixedObject =
2690 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2691 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2692 int64_t FPAdjust =
2693 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2694 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2695}
2696
2698 int64_t ObjectOffset) {
2699 const auto &MFI = MF.getFrameInfo();
2700 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2701}
2702
2703// TODO: This function currently does not work for scalable vectors.
2705 int FI) const {
2706 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2708 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2709 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2710 ? getFPOffset(MF, ObjectOffset).getFixed()
2711 : getStackOffset(MF, ObjectOffset).getFixed();
2712}
2713
2715 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2716 bool ForSimm) const {
2717 const auto &MFI = MF.getFrameInfo();
2718 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2719 bool isFixed = MFI.isFixedObjectIndex(FI);
2720 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2721 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2722 PreferFP, ForSimm);
2723}
2724
2726 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2727 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2728 const auto &MFI = MF.getFrameInfo();
2729 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2731 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2732 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2733
2734 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2735 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2736 bool isCSR =
2737 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2738
2739 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2740
2741 // Use frame pointer to reference fixed objects. Use it for locals if
2742 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2743 // reliable as a base). Make sure useFPForScavengingIndex() does the
2744 // right thing for the emergency spill slot.
2745 bool UseFP = false;
2746 if (AFI->hasStackFrame() && !isSVE) {
2747 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2748 // there are scalable (SVE) objects in between the FP and the fixed-sized
2749 // objects.
2750 PreferFP &= !SVEStackSize;
2751
2752 // Note: Keeping the following as multiple 'if' statements rather than
2753 // merging to a single expression for readability.
2754 //
2755 // Argument access should always use the FP.
2756 if (isFixed) {
2757 UseFP = hasFP(MF);
2758 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2759 // References to the CSR area must use FP if we're re-aligning the stack
2760 // since the dynamically-sized alignment padding is between the SP/BP and
2761 // the CSR area.
2762 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2763 UseFP = true;
2764 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2765 // If the FPOffset is negative and we're producing a signed immediate, we
2766 // have to keep in mind that the available offset range for negative
2767 // offsets is smaller than for positive ones. If an offset is available
2768 // via the FP and the SP, use whichever is closest.
2769 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2770 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2771
2772 if (FPOffset >= 0) {
2773 // If the FPOffset is positive, that'll always be best, as the SP/BP
2774 // will be even further away.
2775 UseFP = true;
2776 } else if (MFI.hasVarSizedObjects()) {
2777 // If we have variable sized objects, we can use either FP or BP, as the
2778 // SP offset is unknown. We can use the base pointer if we have one and
2779 // FP is not preferred. If not, we're stuck with using FP.
2780 bool CanUseBP = RegInfo->hasBasePointer(MF);
2781 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2782 UseFP = PreferFP;
2783 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2784 UseFP = true;
2785 // else we can use BP and FP, but the offset from FP won't fit.
2786 // That will make us scavenge registers which we can probably avoid by
2787 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2788 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2789 // Funclets access the locals contained in the parent's stack frame
2790 // via the frame pointer, so we have to use the FP in the parent
2791 // function.
2792 (void) Subtarget;
2793 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2794 MF.getFunction().isVarArg()) &&
2795 "Funclets should only be present on Win64");
2796 UseFP = true;
2797 } else {
2798 // We have the choice between FP and (SP or BP).
2799 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2800 UseFP = true;
2801 }
2802 }
2803 }
2804
2805 assert(
2806 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2807 "In the presence of dynamic stack pointer realignment, "
2808 "non-argument/CSR objects cannot be accessed through the frame pointer");
2809
2810 if (isSVE) {
2811 StackOffset FPOffset =
2813 StackOffset SPOffset =
2814 SVEStackSize +
2815 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2816 ObjectOffset);
2817 // Always use the FP for SVE spills if available and beneficial.
2818 if (hasFP(MF) && (SPOffset.getFixed() ||
2819 FPOffset.getScalable() < SPOffset.getScalable() ||
2820 RegInfo->hasStackRealignment(MF))) {
2821 FrameReg = RegInfo->getFrameRegister(MF);
2822 return FPOffset;
2823 }
2824
2825 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2826 : (unsigned)AArch64::SP;
2827 return SPOffset;
2828 }
2829
2830 StackOffset ScalableOffset = {};
2831 if (UseFP && !(isFixed || isCSR))
2832 ScalableOffset = -SVEStackSize;
2833 if (!UseFP && (isFixed || isCSR))
2834 ScalableOffset = SVEStackSize;
2835
2836 if (UseFP) {
2837 FrameReg = RegInfo->getFrameRegister(MF);
2838 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2839 }
2840
2841 // Use the base pointer if we have one.
2842 if (RegInfo->hasBasePointer(MF))
2843 FrameReg = RegInfo->getBaseRegister();
2844 else {
2845 assert(!MFI.hasVarSizedObjects() &&
2846 "Can't use SP when we have var sized objects.");
2847 FrameReg = AArch64::SP;
2848 // If we're using the red zone for this function, the SP won't actually
2849 // be adjusted, so the offsets will be negative. They're also all
2850 // within range of the signed 9-bit immediate instructions.
2851 if (canUseRedZone(MF))
2852 Offset -= AFI->getLocalStackSize();
2853 }
2854
2855 return StackOffset::getFixed(Offset) + ScalableOffset;
2856}
2857
2858static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2859 // Do not set a kill flag on values that are also marked as live-in. This
2860 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2861 // callee saved registers.
2862 // Omitting the kill flags is conservatively correct even if the live-in
2863 // is not used after all.
2864 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2865 return getKillRegState(!IsLiveIn);
2866}
2867
2869 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2872 return Subtarget.isTargetMachO() &&
2873 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2874 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2876 !requiresSaveVG(MF) && AFI->getSVECalleeSavedStackSize() == 0;
2877}
2878
2879static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2880 bool NeedsWinCFI, bool IsFirst,
2881 const TargetRegisterInfo *TRI) {
2882 // If we are generating register pairs for a Windows function that requires
2883 // EH support, then pair consecutive registers only. There are no unwind
2884 // opcodes for saves/restores of non-consectuve register pairs.
2885 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2886 // save_lrpair.
2887 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2888
2889 if (Reg2 == AArch64::FP)
2890 return true;
2891 if (!NeedsWinCFI)
2892 return false;
2893 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2894 return false;
2895 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2896 // opcode. If this is the first register pair, it would end up with a
2897 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2898 // if LR is paired with something else than the first register.
2899 // The save_lrpair opcode requires the first register to be an odd one.
2900 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2901 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2902 return false;
2903 return true;
2904}
2905
2906/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2907/// WindowsCFI requires that only consecutive registers can be paired.
2908/// LR and FP need to be allocated together when the frame needs to save
2909/// the frame-record. This means any other register pairing with LR is invalid.
2910static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2911 bool UsesWinAAPCS, bool NeedsWinCFI,
2912 bool NeedsFrameRecord, bool IsFirst,
2913 const TargetRegisterInfo *TRI) {
2914 if (UsesWinAAPCS)
2915 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2916 TRI);
2917
2918 // If we need to store the frame record, don't pair any register
2919 // with LR other than FP.
2920 if (NeedsFrameRecord)
2921 return Reg2 == AArch64::LR;
2922
2923 return false;
2924}
2925
2926namespace {
2927
2928struct RegPairInfo {
2929 unsigned Reg1 = AArch64::NoRegister;
2930 unsigned Reg2 = AArch64::NoRegister;
2931 int FrameIdx;
2932 int Offset;
2933 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2934 const TargetRegisterClass *RC;
2935
2936 RegPairInfo() = default;
2937
2938 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2939
2940 bool isScalable() const { return Type == PPR || Type == ZPR; }
2941};
2942
2943} // end anonymous namespace
2944
2945unsigned findFreePredicateReg(BitVector &SavedRegs) {
2946 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2947 if (SavedRegs.test(PReg)) {
2948 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2949 return PNReg;
2950 }
2951 }
2952 return AArch64::NoRegister;
2953}
2954
2955// The multivector LD/ST are available only for SME or SVE2p1 targets
2957 MachineFunction &MF) {
2959 return false;
2960
2961 SMEAttrs FuncAttrs(MF.getFunction());
2962 bool IsLocallyStreaming =
2963 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
2964
2965 // Only when in streaming mode SME2 instructions can be safely used.
2966 // It is not safe to use SME2 instructions when in streaming compatible or
2967 // locally streaming mode.
2968 return Subtarget.hasSVE2p1() ||
2969 (Subtarget.hasSME2() &&
2970 (!IsLocallyStreaming && Subtarget.isStreaming()));
2971}
2972
2976 bool NeedsFrameRecord) {
2977
2978 if (CSI.empty())
2979 return;
2980
2981 bool IsWindows = isTargetWindows(MF);
2982 bool NeedsWinCFI = needsWinCFI(MF);
2984 unsigned StackHazardSize = getStackHazardSize(MF);
2985 MachineFrameInfo &MFI = MF.getFrameInfo();
2987 unsigned Count = CSI.size();
2988 (void)CC;
2989 // MachO's compact unwind format relies on all registers being stored in
2990 // pairs.
2993 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2994 "Odd number of callee-saved regs to spill!");
2995 int ByteOffset = AFI->getCalleeSavedStackSize();
2996 int StackFillDir = -1;
2997 int RegInc = 1;
2998 unsigned FirstReg = 0;
2999 if (NeedsWinCFI) {
3000 // For WinCFI, fill the stack from the bottom up.
3001 ByteOffset = 0;
3002 StackFillDir = 1;
3003 // As the CSI array is reversed to match PrologEpilogInserter, iterate
3004 // backwards, to pair up registers starting from lower numbered registers.
3005 RegInc = -1;
3006 FirstReg = Count - 1;
3007 }
3008 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
3009 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
3010 Register LastReg = 0;
3011
3012 // When iterating backwards, the loop condition relies on unsigned wraparound.
3013 for (unsigned i = FirstReg; i < Count; i += RegInc) {
3014 RegPairInfo RPI;
3015 RPI.Reg1 = CSI[i].getReg();
3016
3017 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
3018 RPI.Type = RegPairInfo::GPR;
3019 RPI.RC = &AArch64::GPR64RegClass;
3020 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
3021 RPI.Type = RegPairInfo::FPR64;
3022 RPI.RC = &AArch64::FPR64RegClass;
3023 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
3024 RPI.Type = RegPairInfo::FPR128;
3025 RPI.RC = &AArch64::FPR128RegClass;
3026 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
3027 RPI.Type = RegPairInfo::ZPR;
3028 RPI.RC = &AArch64::ZPRRegClass;
3029 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
3030 RPI.Type = RegPairInfo::PPR;
3031 RPI.RC = &AArch64::PPRRegClass;
3032 } else if (RPI.Reg1 == AArch64::VG) {
3033 RPI.Type = RegPairInfo::VG;
3034 RPI.RC = &AArch64::FIXED_REGSRegClass;
3035 } else {
3036 llvm_unreachable("Unsupported register class.");
3037 }
3038
3039 // Add the stack hazard size as we transition from GPR->FPR CSRs.
3040 if (AFI->hasStackHazardSlotIndex() &&
3041 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3043 ByteOffset += StackFillDir * StackHazardSize;
3044 LastReg = RPI.Reg1;
3045
3046 int Scale = TRI->getSpillSize(*RPI.RC);
3047 // Add the next reg to the pair if it is in the same register class.
3048 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
3049 Register NextReg = CSI[i + RegInc].getReg();
3050 bool IsFirst = i == FirstReg;
3051 switch (RPI.Type) {
3052 case RegPairInfo::GPR:
3053 if (AArch64::GPR64RegClass.contains(NextReg) &&
3054 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
3055 NeedsWinCFI, NeedsFrameRecord, IsFirst,
3056 TRI))
3057 RPI.Reg2 = NextReg;
3058 break;
3059 case RegPairInfo::FPR64:
3060 if (AArch64::FPR64RegClass.contains(NextReg) &&
3061 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
3062 IsFirst, TRI))
3063 RPI.Reg2 = NextReg;
3064 break;
3065 case RegPairInfo::FPR128:
3066 if (AArch64::FPR128RegClass.contains(NextReg))
3067 RPI.Reg2 = NextReg;
3068 break;
3069 case RegPairInfo::PPR:
3070 break;
3071 case RegPairInfo::ZPR:
3072 if (AFI->getPredicateRegForFillSpill() != 0 &&
3073 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
3074 // Calculate offset of register pair to see if pair instruction can be
3075 // used.
3076 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
3077 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
3078 RPI.Reg2 = NextReg;
3079 }
3080 break;
3081 case RegPairInfo::VG:
3082 break;
3083 }
3084 }
3085
3086 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
3087 // list to come in sorted by frame index so that we can issue the store
3088 // pair instructions directly. Assert if we see anything otherwise.
3089 //
3090 // The order of the registers in the list is controlled by
3091 // getCalleeSavedRegs(), so they will always be in-order, as well.
3092 assert((!RPI.isPaired() ||
3093 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3094 "Out of order callee saved regs!");
3095
3096 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3097 RPI.Reg1 == AArch64::LR) &&
3098 "FrameRecord must be allocated together with LR");
3099
3100 // Windows AAPCS has FP and LR reversed.
3101 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3102 RPI.Reg2 == AArch64::LR) &&
3103 "FrameRecord must be allocated together with LR");
3104
3105 // MachO's compact unwind format relies on all registers being stored in
3106 // adjacent register pairs.
3110 (RPI.isPaired() &&
3111 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3112 RPI.Reg1 + 1 == RPI.Reg2))) &&
3113 "Callee-save registers not saved as adjacent register pair!");
3114
3115 RPI.FrameIdx = CSI[i].getFrameIdx();
3116 if (NeedsWinCFI &&
3117 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3118 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3119
3120 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3121 assert(OffsetPre % Scale == 0);
3122
3123 if (RPI.isScalable())
3124 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3125 else
3126 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3127
3128 // Swift's async context is directly before FP, so allocate an extra
3129 // 8 bytes for it.
3130 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3131 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3132 (IsWindows && RPI.Reg2 == AArch64::LR)))
3133 ByteOffset += StackFillDir * 8;
3134
3135 // Round up size of non-pair to pair size if we need to pad the
3136 // callee-save area to ensure 16-byte alignment.
3137 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3138 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3139 ByteOffset % 16 != 0) {
3140 ByteOffset += 8 * StackFillDir;
3141 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3142 // A stack frame with a gap looks like this, bottom up:
3143 // d9, d8. x21, gap, x20, x19.
3144 // Set extra alignment on the x21 object to create the gap above it.
3145 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3146 NeedGapToAlignStack = false;
3147 }
3148
3149 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3150 assert(OffsetPost % Scale == 0);
3151 // If filling top down (default), we want the offset after incrementing it.
3152 // If filling bottom up (WinCFI) we need the original offset.
3153 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3154
3155 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3156 // Swift context can directly precede FP.
3157 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3158 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3159 (IsWindows && RPI.Reg2 == AArch64::LR)))
3160 Offset += 8;
3161 RPI.Offset = Offset / Scale;
3162
3163 assert((!RPI.isPaired() ||
3164 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3165 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3166 "Offset out of bounds for LDP/STP immediate");
3167
3168 auto isFrameRecord = [&] {
3169 if (RPI.isPaired())
3170 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
3171 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
3172 // Otherwise, look for the frame record as two unpaired registers. This is
3173 // needed for -aarch64-stack-hazard-size=<val>, which disables register
3174 // pairing (as the padding may be too large for the LDP/STP offset). Note:
3175 // On Windows, this check works out as current reg == FP, next reg == LR,
3176 // and on other platforms current reg == FP, previous reg == LR. This
3177 // works out as the correct pre-increment or post-increment offsets
3178 // respectively.
3179 return i > 0 && RPI.Reg1 == AArch64::FP &&
3180 CSI[i - 1].getReg() == AArch64::LR;
3181 };
3182
3183 // Save the offset to frame record so that the FP register can point to the
3184 // innermost frame record (spilled FP and LR registers).
3185 if (NeedsFrameRecord && isFrameRecord())
3187
3188 RegPairs.push_back(RPI);
3189 if (RPI.isPaired())
3190 i += RegInc;
3191 }
3192 if (NeedsWinCFI) {
3193 // If we need an alignment gap in the stack, align the topmost stack
3194 // object. A stack frame with a gap looks like this, bottom up:
3195 // x19, d8. d9, gap.
3196 // Set extra alignment on the topmost stack object (the first element in
3197 // CSI, which goes top down), to create the gap above it.
3198 if (AFI->hasCalleeSaveStackFreeSpace())
3199 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3200 // We iterated bottom up over the registers; flip RegPairs back to top
3201 // down order.
3202 std::reverse(RegPairs.begin(), RegPairs.end());
3203 }
3204}
3205
3209 MachineFunction &MF = *MBB.getParent();
3212 bool NeedsWinCFI = needsWinCFI(MF);
3213 DebugLoc DL;
3215
3216 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3217
3219 // Refresh the reserved regs in case there are any potential changes since the
3220 // last freeze.
3221 MRI.freezeReservedRegs();
3222
3223 if (homogeneousPrologEpilog(MF)) {
3224 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3226
3227 for (auto &RPI : RegPairs) {
3228 MIB.addReg(RPI.Reg1);
3229 MIB.addReg(RPI.Reg2);
3230
3231 // Update register live in.
3232 if (!MRI.isReserved(RPI.Reg1))
3233 MBB.addLiveIn(RPI.Reg1);
3234 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3235 MBB.addLiveIn(RPI.Reg2);
3236 }
3237 return true;
3238 }
3239 bool PTrueCreated = false;
3240 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3241 unsigned Reg1 = RPI.Reg1;
3242 unsigned Reg2 = RPI.Reg2;
3243 unsigned StrOpc;
3244
3245 // Issue sequence of spills for cs regs. The first spill may be converted
3246 // to a pre-decrement store later by emitPrologue if the callee-save stack
3247 // area allocation can't be combined with the local stack area allocation.
3248 // For example:
3249 // stp x22, x21, [sp, #0] // addImm(+0)
3250 // stp x20, x19, [sp, #16] // addImm(+2)
3251 // stp fp, lr, [sp, #32] // addImm(+4)
3252 // Rationale: This sequence saves uop updates compared to a sequence of
3253 // pre-increment spills like stp xi,xj,[sp,#-16]!
3254 // Note: Similar rationale and sequence for restores in epilog.
3255 unsigned Size = TRI->getSpillSize(*RPI.RC);
3256 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3257 switch (RPI.Type) {
3258 case RegPairInfo::GPR:
3259 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3260 break;
3261 case RegPairInfo::FPR64:
3262 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3263 break;
3264 case RegPairInfo::FPR128:
3265 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3266 break;
3267 case RegPairInfo::ZPR:
3268 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3269 break;
3270 case RegPairInfo::PPR:
3271 StrOpc =
3272 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
3273 break;
3274 case RegPairInfo::VG:
3275 StrOpc = AArch64::STRXui;
3276 break;
3277 }
3278
3279 unsigned X0Scratch = AArch64::NoRegister;
3280 if (Reg1 == AArch64::VG) {
3281 // Find an available register to store value of VG to.
3283 assert(Reg1 != AArch64::NoRegister);
3284 SMEAttrs Attrs(MF.getFunction());
3285
3286 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3287 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3288 // For locally-streaming functions, we need to store both the streaming
3289 // & non-streaming VG. Spill the streaming value first.
3290 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3291 .addImm(1)
3293 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3294 .addReg(Reg1)
3295 .addImm(3)
3296 .addImm(63)
3298
3299 AFI->setStreamingVGIdx(RPI.FrameIdx);
3300 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3301 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3302 .addImm(31)
3303 .addImm(1)
3305 AFI->setVGIdx(RPI.FrameIdx);
3306 } else {
3308 if (llvm::any_of(
3309 MBB.liveins(),
3310 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3311 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3312 AArch64::X0, LiveIn.PhysReg);
3313 }))
3314 X0Scratch = Reg1;
3315
3316 if (X0Scratch != AArch64::NoRegister)
3317 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3318 .addReg(AArch64::XZR)
3319 .addReg(AArch64::X0, RegState::Undef)
3320 .addReg(AArch64::X0, RegState::Implicit)
3322
3323 const uint32_t *RegMask = TRI->getCallPreservedMask(
3324 MF,
3326 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3327 .addExternalSymbol("__arm_get_current_vg")
3328 .addRegMask(RegMask)
3329 .addReg(AArch64::X0, RegState::ImplicitDefine)
3331 Reg1 = AArch64::X0;
3332 AFI->setVGIdx(RPI.FrameIdx);
3333 }
3334 }
3335
3336 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3337 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3338 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3339 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3340 dbgs() << ")\n");
3341
3342 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3343 "Windows unwdinding requires a consecutive (FP,LR) pair");
3344 // Windows unwind codes require consecutive registers if registers are
3345 // paired. Make the switch here, so that the code below will save (x,x+1)
3346 // and not (x+1,x).
3347 unsigned FrameIdxReg1 = RPI.FrameIdx;
3348 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3349 if (NeedsWinCFI && RPI.isPaired()) {
3350 std::swap(Reg1, Reg2);
3351 std::swap(FrameIdxReg1, FrameIdxReg2);
3352 }
3353
3354 if (RPI.isPaired() && RPI.isScalable()) {
3355 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3358 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3359 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3360 "Expects SVE2.1 or SME2 target and a predicate register");
3361#ifdef EXPENSIVE_CHECKS
3362 auto IsPPR = [](const RegPairInfo &c) {
3363 return c.Reg1 == RegPairInfo::PPR;
3364 };
3365 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3366 auto IsZPR = [](const RegPairInfo &c) {
3367 return c.Type == RegPairInfo::ZPR;
3368 };
3369 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3370 assert(!(PPRBegin < ZPRBegin) &&
3371 "Expected callee save predicate to be handled first");
3372#endif
3373 if (!PTrueCreated) {
3374 PTrueCreated = true;
3375 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3377 }
3378 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3379 if (!MRI.isReserved(Reg1))
3380 MBB.addLiveIn(Reg1);
3381 if (!MRI.isReserved(Reg2))
3382 MBB.addLiveIn(Reg2);
3383 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3385 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3386 MachineMemOperand::MOStore, Size, Alignment));
3387 MIB.addReg(PnReg);
3388 MIB.addReg(AArch64::SP)
3389 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
3390 // where 2*vscale is implicit
3393 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3394 MachineMemOperand::MOStore, Size, Alignment));
3395 if (NeedsWinCFI)
3397 } else { // The code when the pair of ZReg is not present
3398 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3399 if (!MRI.isReserved(Reg1))
3400 MBB.addLiveIn(Reg1);
3401 if (RPI.isPaired()) {
3402 if (!MRI.isReserved(Reg2))
3403 MBB.addLiveIn(Reg2);
3404 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3406 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3407 MachineMemOperand::MOStore, Size, Alignment));
3408 }
3409 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3410 .addReg(AArch64::SP)
3411 .addImm(RPI.Offset) // [sp, #offset*vscale],
3412 // where factor*vscale is implicit
3415 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3416 MachineMemOperand::MOStore, Size, Alignment));
3417 if (NeedsWinCFI)
3419 }
3420 // Update the StackIDs of the SVE stack slots.
3421 MachineFrameInfo &MFI = MF.getFrameInfo();
3422 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3423 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3424 if (RPI.isPaired())
3425 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3426 }
3427
3428 if (X0Scratch != AArch64::NoRegister)
3429 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3430 .addReg(AArch64::XZR)
3431 .addReg(X0Scratch, RegState::Undef)
3432 .addReg(X0Scratch, RegState::Implicit)
3434 }
3435 return true;
3436}
3437
3441 MachineFunction &MF = *MBB.getParent();
3443 DebugLoc DL;
3445 bool NeedsWinCFI = needsWinCFI(MF);
3446
3447 if (MBBI != MBB.end())
3448 DL = MBBI->getDebugLoc();
3449
3450 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3451 if (homogeneousPrologEpilog(MF, &MBB)) {
3452 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3454 for (auto &RPI : RegPairs) {
3455 MIB.addReg(RPI.Reg1, RegState::Define);
3456 MIB.addReg(RPI.Reg2, RegState::Define);
3457 }
3458 return true;
3459 }
3460
3461 // For performance reasons restore SVE register in increasing order
3462 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3463 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3464 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3465 std::reverse(PPRBegin, PPREnd);
3466 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3467 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3468 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3469 std::reverse(ZPRBegin, ZPREnd);
3470
3471 bool PTrueCreated = false;
3472 for (const RegPairInfo &RPI : RegPairs) {
3473 unsigned Reg1 = RPI.Reg1;
3474 unsigned Reg2 = RPI.Reg2;
3475
3476 // Issue sequence of restores for cs regs. The last restore may be converted
3477 // to a post-increment load later by emitEpilogue if the callee-save stack
3478 // area allocation can't be combined with the local stack area allocation.
3479 // For example:
3480 // ldp fp, lr, [sp, #32] // addImm(+4)
3481 // ldp x20, x19, [sp, #16] // addImm(+2)
3482 // ldp x22, x21, [sp, #0] // addImm(+0)
3483 // Note: see comment in spillCalleeSavedRegisters()
3484 unsigned LdrOpc;
3485 unsigned Size = TRI->getSpillSize(*RPI.RC);
3486 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3487 switch (RPI.Type) {
3488 case RegPairInfo::GPR:
3489 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3490 break;
3491 case RegPairInfo::FPR64:
3492 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3493 break;
3494 case RegPairInfo::FPR128:
3495 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3496 break;
3497 case RegPairInfo::ZPR:
3498 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3499 break;
3500 case RegPairInfo::PPR:
3501 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
3502 : AArch64::LDR_PXI;
3503 break;
3504 case RegPairInfo::VG:
3505 continue;
3506 }
3507 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3508 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3509 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3510 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3511 dbgs() << ")\n");
3512
3513 // Windows unwind codes require consecutive registers if registers are
3514 // paired. Make the switch here, so that the code below will save (x,x+1)
3515 // and not (x+1,x).
3516 unsigned FrameIdxReg1 = RPI.FrameIdx;
3517 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3518 if (NeedsWinCFI && RPI.isPaired()) {
3519 std::swap(Reg1, Reg2);
3520 std::swap(FrameIdxReg1, FrameIdxReg2);
3521 }
3522
3524 if (RPI.isPaired() && RPI.isScalable()) {
3525 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3527 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3528 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3529 "Expects SVE2.1 or SME2 target and a predicate register");
3530#ifdef EXPENSIVE_CHECKS
3531 assert(!(PPRBegin < ZPRBegin) &&
3532 "Expected callee save predicate to be handled first");
3533#endif
3534 if (!PTrueCreated) {
3535 PTrueCreated = true;
3536 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3538 }
3539 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3540 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3541 getDefRegState(true));
3543 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3544 MachineMemOperand::MOLoad, Size, Alignment));
3545 MIB.addReg(PnReg);
3546 MIB.addReg(AArch64::SP)
3547 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
3548 // where 2*vscale is implicit
3551 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3552 MachineMemOperand::MOLoad, Size, Alignment));
3553 if (NeedsWinCFI)
3555 } else {
3556 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3557 if (RPI.isPaired()) {
3558 MIB.addReg(Reg2, getDefRegState(true));
3560 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3561 MachineMemOperand::MOLoad, Size, Alignment));
3562 }
3563 MIB.addReg(Reg1, getDefRegState(true));
3564 MIB.addReg(AArch64::SP)
3565 .addImm(RPI.Offset) // [sp, #offset*vscale]
3566 // where factor*vscale is implicit
3569 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3570 MachineMemOperand::MOLoad, Size, Alignment));
3571 if (NeedsWinCFI)
3573 }
3574 }
3575 return true;
3576}
3577
3578// Return the FrameID for a MMO.
3579static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3580 const MachineFrameInfo &MFI) {
3581 auto *PSV =
3582 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3583 if (PSV)
3584 return std::optional<int>(PSV->getFrameIndex());
3585
3586 if (MMO->getValue()) {
3587 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3588 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3589 FI++)
3590 if (MFI.getObjectAllocation(FI) == Al)
3591 return FI;
3592 }
3593 }
3594
3595 return std::nullopt;
3596}
3597
3598// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3599static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3600 const MachineFrameInfo &MFI) {
3601 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3602 return std::nullopt;
3603
3604 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3605}
3606
3607// Check if a Hazard slot is needed for the current function, and if so create
3608// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3609// which can be used to determine if any hazard padding is needed.
3610void AArch64FrameLowering::determineStackHazardSlot(
3611 MachineFunction &MF, BitVector &SavedRegs) const {
3612 unsigned StackHazardSize = getStackHazardSize(MF);
3613 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3615 return;
3616
3617 // Stack hazards are only needed in streaming functions.
3619 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3620 return;
3621
3622 MachineFrameInfo &MFI = MF.getFrameInfo();
3623
3624 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3625 // stack objects.
3626 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3627 return AArch64::FPR64RegClass.contains(Reg) ||
3628 AArch64::FPR128RegClass.contains(Reg) ||
3629 AArch64::ZPRRegClass.contains(Reg) ||
3630 AArch64::PPRRegClass.contains(Reg);
3631 });
3632 bool HasFPRStackObjects = false;
3633 if (!HasFPRCSRs) {
3634 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3635 for (auto &MBB : MF) {
3636 for (auto &MI : MBB) {
3637 std::optional<int> FI = getLdStFrameID(MI, MFI);
3638 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3639 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3641 FrameObjects[*FI] |= 2;
3642 else
3643 FrameObjects[*FI] |= 1;
3644 }
3645 }
3646 }
3647 HasFPRStackObjects =
3648 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3649 }
3650
3651 if (HasFPRCSRs || HasFPRStackObjects) {
3652 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3653 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3654 << StackHazardSize << "\n");
3655 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3656 }
3657}
3658
3660 BitVector &SavedRegs,
3661 RegScavenger *RS) const {
3662 // All calls are tail calls in GHC calling conv, and functions have no
3663 // prologue/epilogue.
3665 return;
3666
3668 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3670 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3672 unsigned UnspilledCSGPR = AArch64::NoRegister;
3673 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3674
3675 MachineFrameInfo &MFI = MF.getFrameInfo();
3676 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3677
3678 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3679 ? RegInfo->getBaseRegister()
3680 : (unsigned)AArch64::NoRegister;
3681
3682 unsigned ExtraCSSpill = 0;
3683 bool HasUnpairedGPR64 = false;
3684 bool HasPairZReg = false;
3685 // Figure out which callee-saved registers to save/restore.
3686 for (unsigned i = 0; CSRegs[i]; ++i) {
3687 const unsigned Reg = CSRegs[i];
3688
3689 // Add the base pointer register to SavedRegs if it is callee-save.
3690 if (Reg == BasePointerReg)
3691 SavedRegs.set(Reg);
3692
3693 bool RegUsed = SavedRegs.test(Reg);
3694 unsigned PairedReg = AArch64::NoRegister;
3695 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3696 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3697 AArch64::FPR128RegClass.contains(Reg)) {
3698 // Compensate for odd numbers of GP CSRs.
3699 // For now, all the known cases of odd number of CSRs are of GPRs.
3700 if (HasUnpairedGPR64)
3701 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3702 else
3703 PairedReg = CSRegs[i ^ 1];
3704 }
3705
3706 // If the function requires all the GP registers to save (SavedRegs),
3707 // and there are an odd number of GP CSRs at the same time (CSRegs),
3708 // PairedReg could be in a different register class from Reg, which would
3709 // lead to a FPR (usually D8) accidentally being marked saved.
3710 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3711 PairedReg = AArch64::NoRegister;
3712 HasUnpairedGPR64 = true;
3713 }
3714 assert(PairedReg == AArch64::NoRegister ||
3715 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3716 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3717 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3718
3719 if (!RegUsed) {
3720 if (AArch64::GPR64RegClass.contains(Reg) &&
3721 !RegInfo->isReservedReg(MF, Reg)) {
3722 UnspilledCSGPR = Reg;
3723 UnspilledCSGPRPaired = PairedReg;
3724 }
3725 continue;
3726 }
3727
3728 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
3729 // spilled. If all of p0-p3 are used as return values p4 is must be free
3730 // to reload p8-p15.
3731 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
3732 AArch64::PPR_p8to15RegClass.contains(Reg)) {
3733 SavedRegs.set(AArch64::P4);
3734 }
3735
3736 // MachO's compact unwind format relies on all registers being stored in
3737 // pairs.
3738 // FIXME: the usual format is actually better if unwinding isn't needed.
3739 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3740 !SavedRegs.test(PairedReg)) {
3741 SavedRegs.set(PairedReg);
3742 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3743 !RegInfo->isReservedReg(MF, PairedReg))
3744 ExtraCSSpill = PairedReg;
3745 }
3746 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3747 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3748 SavedRegs.test(CSRegs[i ^ 1]));
3749 }
3750
3751 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
3753 // Find a suitable predicate register for the multi-vector spill/fill
3754 // instructions.
3755 unsigned PnReg = findFreePredicateReg(SavedRegs);
3756 if (PnReg != AArch64::NoRegister)
3757 AFI->setPredicateRegForFillSpill(PnReg);
3758 // If no free callee-save has been found assign one.
3759 if (!AFI->getPredicateRegForFillSpill() &&
3760 MF.getFunction().getCallingConv() ==
3762 SavedRegs.set(AArch64::P8);
3763 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3764 }
3765
3766 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3767 "Predicate cannot be a reserved register");
3768 }
3769
3771 !Subtarget.isTargetWindows()) {
3772 // For Windows calling convention on a non-windows OS, where X18 is treated
3773 // as reserved, back up X18 when entering non-windows code (marked with the
3774 // Windows calling convention) and restore when returning regardless of
3775 // whether the individual function uses it - it might call other functions
3776 // that clobber it.
3777 SavedRegs.set(AArch64::X18);
3778 }
3779
3780 // Calculates the callee saved stack size.
3781 unsigned CSStackSize = 0;
3782 unsigned SVECSStackSize = 0;
3784 for (unsigned Reg : SavedRegs.set_bits()) {
3785 auto *RC = TRI->getMinimalPhysRegClass(Reg);
3786 assert(RC && "expected register class!");
3787 auto SpillSize = TRI->getSpillSize(*RC);
3788 if (AArch64::PPRRegClass.contains(Reg) ||
3789 AArch64::ZPRRegClass.contains(Reg))
3790 SVECSStackSize += SpillSize;
3791 else
3792 CSStackSize += SpillSize;
3793 }
3794
3795 // Increase the callee-saved stack size if the function has streaming mode
3796 // changes, as we will need to spill the value of the VG register.
3797 // For locally streaming functions, we spill both the streaming and
3798 // non-streaming VG value.
3799 const Function &F = MF.getFunction();
3800 SMEAttrs Attrs(F);
3801 if (requiresSaveVG(MF)) {
3802 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3803 CSStackSize += 16;
3804 else
3805 CSStackSize += 8;
3806 }
3807
3808 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3809 // StackHazardSize if so.
3810 determineStackHazardSlot(MF, SavedRegs);
3811 if (AFI->hasStackHazardSlotIndex())
3812 CSStackSize += getStackHazardSize(MF);
3813
3814 // Save number of saved regs, so we can easily update CSStackSize later.
3815 unsigned NumSavedRegs = SavedRegs.count();
3816
3817 // The frame record needs to be created by saving the appropriate registers
3818 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3819 if (hasFP(MF) ||
3820 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3821 SavedRegs.set(AArch64::FP);
3822 SavedRegs.set(AArch64::LR);
3823 }
3824
3825 LLVM_DEBUG({
3826 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3827 for (unsigned Reg : SavedRegs.set_bits())
3828 dbgs() << ' ' << printReg(Reg, RegInfo);
3829 dbgs() << "\n";
3830 });
3831
3832 // If any callee-saved registers are used, the frame cannot be eliminated.
3833 int64_t SVEStackSize =
3834 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3835 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3836
3837 // The CSR spill slots have not been allocated yet, so estimateStackSize
3838 // won't include them.
3839 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3840
3841 // We may address some of the stack above the canonical frame address, either
3842 // for our own arguments or during a call. Include that in calculating whether
3843 // we have complicated addressing concerns.
3844 int64_t CalleeStackUsed = 0;
3845 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3846 int64_t FixedOff = MFI.getObjectOffset(I);
3847 if (FixedOff > CalleeStackUsed)
3848 CalleeStackUsed = FixedOff;
3849 }
3850
3851 // Conservatively always assume BigStack when there are SVE spills.
3852 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3853 CalleeStackUsed) > EstimatedStackSizeLimit;
3854 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3855 AFI->setHasStackFrame(true);
3856
3857 // Estimate if we might need to scavenge a register at some point in order
3858 // to materialize a stack offset. If so, either spill one additional
3859 // callee-saved register or reserve a special spill slot to facilitate
3860 // register scavenging. If we already spilled an extra callee-saved register
3861 // above to keep the number of spills even, we don't need to do anything else
3862 // here.
3863 if (BigStack) {
3864 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3865 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3866 << " to get a scratch register.\n");
3867 SavedRegs.set(UnspilledCSGPR);
3868 ExtraCSSpill = UnspilledCSGPR;
3869
3870 // MachO's compact unwind format relies on all registers being stored in
3871 // pairs, so if we need to spill one extra for BigStack, then we need to
3872 // store the pair.
3873 if (producePairRegisters(MF)) {
3874 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3875 // Failed to make a pair for compact unwind format, revert spilling.
3876 if (produceCompactUnwindFrame(MF)) {
3877 SavedRegs.reset(UnspilledCSGPR);
3878 ExtraCSSpill = AArch64::NoRegister;
3879 }
3880 } else
3881 SavedRegs.set(UnspilledCSGPRPaired);
3882 }
3883 }
3884
3885 // If we didn't find an extra callee-saved register to spill, create
3886 // an emergency spill slot.
3887 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3889 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3890 unsigned Size = TRI->getSpillSize(RC);
3891 Align Alignment = TRI->getSpillAlign(RC);
3892 int FI = MFI.CreateSpillStackObject(Size, Alignment);
3894 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3895 << " as the emergency spill slot.\n");
3896 }
3897 }
3898
3899 // Adding the size of additional 64bit GPR saves.
3900 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3901
3902 // A Swift asynchronous context extends the frame record with a pointer
3903 // directly before FP.
3904 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3905 CSStackSize += 8;
3906
3907 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3908 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3909 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3910
3912 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3913 "Should not invalidate callee saved info");
3914
3915 // Round up to register pair alignment to avoid additional SP adjustment
3916 // instructions.
3917 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3918 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3919 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3920}
3921
3923 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3924 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3925 unsigned &MaxCSFrameIndex) const {
3926 bool NeedsWinCFI = needsWinCFI(MF);
3927 unsigned StackHazardSize = getStackHazardSize(MF);
3928 // To match the canonical windows frame layout, reverse the list of
3929 // callee saved registers to get them laid out by PrologEpilogInserter
3930 // in the right order. (PrologEpilogInserter allocates stack objects top
3931 // down. Windows canonical prologs store higher numbered registers at
3932 // the top, thus have the CSI array start from the highest registers.)
3933 if (NeedsWinCFI)
3934 std::reverse(CSI.begin(), CSI.end());
3935
3936 if (CSI.empty())
3937 return true; // Early exit if no callee saved registers are modified!
3938
3939 // Now that we know which registers need to be saved and restored, allocate
3940 // stack slots for them.
3941 MachineFrameInfo &MFI = MF.getFrameInfo();
3942 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3943
3944 bool UsesWinAAPCS = isTargetWindows(MF);
3945 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3946 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3947 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3948 if ((unsigned)FrameIdx < MinCSFrameIndex)
3949 MinCSFrameIndex = FrameIdx;
3950 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3951 MaxCSFrameIndex = FrameIdx;
3952 }
3953
3954 // Insert VG into the list of CSRs, immediately before LR if saved.
3955 if (requiresSaveVG(MF)) {
3956 std::vector<CalleeSavedInfo> VGSaves;
3957 SMEAttrs Attrs(MF.getFunction());
3958
3959 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3960 VGInfo.setRestored(false);
3961 VGSaves.push_back(VGInfo);
3962
3963 // Add VG again if the function is locally-streaming, as we will spill two
3964 // values.
3965 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3966 VGSaves.push_back(VGInfo);
3967
3968 bool InsertBeforeLR = false;
3969
3970 for (unsigned I = 0; I < CSI.size(); I++)
3971 if (CSI[I].getReg() == AArch64::LR) {
3972 InsertBeforeLR = true;
3973 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3974 break;
3975 }
3976
3977 if (!InsertBeforeLR)
3978 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3979 }
3980
3981 Register LastReg = 0;
3982 int HazardSlotIndex = std::numeric_limits<int>::max();
3983 for (auto &CS : CSI) {
3984 Register Reg = CS.getReg();
3985 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3986
3987 // Create a hazard slot as we switch between GPR and FPR CSRs.
3988 if (AFI->hasStackHazardSlotIndex() &&
3989 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3991 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3992 "Unexpected register order for hazard slot");
3993 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3994 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3995 << "\n");
3996 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3997 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3998 MinCSFrameIndex = HazardSlotIndex;
3999 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4000 MaxCSFrameIndex = HazardSlotIndex;
4001 }
4002
4003 unsigned Size = RegInfo->getSpillSize(*RC);
4004 Align Alignment(RegInfo->getSpillAlign(*RC));
4005 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
4006 CS.setFrameIdx(FrameIdx);
4007
4008 if ((unsigned)FrameIdx < MinCSFrameIndex)
4009 MinCSFrameIndex = FrameIdx;
4010 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4011 MaxCSFrameIndex = FrameIdx;
4012
4013 // Grab 8 bytes below FP for the extended asynchronous frame info.
4014 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
4015 Reg == AArch64::FP) {
4016 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
4017 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
4018 if ((unsigned)FrameIdx < MinCSFrameIndex)
4019 MinCSFrameIndex = FrameIdx;
4020 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4021 MaxCSFrameIndex = FrameIdx;
4022 }
4023 LastReg = Reg;
4024 }
4025
4026 // Add hazard slot in the case where no FPR CSRs are present.
4027 if (AFI->hasStackHazardSlotIndex() &&
4028 HazardSlotIndex == std::numeric_limits<int>::max()) {
4029 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
4030 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4031 << "\n");
4032 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4033 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4034 MinCSFrameIndex = HazardSlotIndex;
4035 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4036 MaxCSFrameIndex = HazardSlotIndex;
4037 }
4038
4039 return true;
4040}
4041
4043 const MachineFunction &MF) const {
4045 // If the function has streaming-mode changes, don't scavenge a
4046 // spillslot in the callee-save area, as that might require an
4047 // 'addvl' in the streaming-mode-changing call-sequence when the
4048 // function doesn't use a FP.
4049 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
4050 return false;
4051 // Don't allow register salvaging with hazard slots, in case it moves objects
4052 // into the wrong place.
4053 if (AFI->hasStackHazardSlotIndex())
4054 return false;
4055 return AFI->hasCalleeSaveStackFreeSpace();
4056}
4057
4058/// returns true if there are any SVE callee saves.
4060 int &Min, int &Max) {
4061 Min = std::numeric_limits<int>::max();
4062 Max = std::numeric_limits<int>::min();
4063
4064 if (!MFI.isCalleeSavedInfoValid())
4065 return false;
4066
4067 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
4068 for (auto &CS : CSI) {
4069 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
4070 AArch64::PPRRegClass.contains(CS.getReg())) {
4071 assert((Max == std::numeric_limits<int>::min() ||
4072 Max + 1 == CS.getFrameIdx()) &&
4073 "SVE CalleeSaves are not consecutive");
4074
4075 Min = std::min(Min, CS.getFrameIdx());
4076 Max = std::max(Max, CS.getFrameIdx());
4077 }
4078 }
4079 return Min != std::numeric_limits<int>::max();
4080}
4081
4082// Process all the SVE stack objects and determine offsets for each
4083// object. If AssignOffsets is true, the offsets get assigned.
4084// Fills in the first and last callee-saved frame indices into
4085// Min/MaxCSFrameIndex, respectively.
4086// Returns the size of the stack.
4088 int &MinCSFrameIndex,
4089 int &MaxCSFrameIndex,
4090 bool AssignOffsets) {
4091#ifndef NDEBUG
4092 // First process all fixed stack objects.
4093 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
4095 "SVE vectors should never be passed on the stack by value, only by "
4096 "reference.");
4097#endif
4098
4099 auto Assign = [&MFI](int FI, int64_t Offset) {
4100 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4101 MFI.setObjectOffset(FI, Offset);
4102 };
4103
4104 int64_t Offset = 0;
4105
4106 // Then process all callee saved slots.
4107 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4108 // Assign offsets to the callee save slots.
4109 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4110 Offset += MFI.getObjectSize(I);
4112 if (AssignOffsets)
4113 Assign(I, -Offset);
4114 }
4115 }
4116
4117 // Ensure that the Callee-save area is aligned to 16bytes.
4118 Offset = alignTo(Offset, Align(16U));
4119
4120 // Create a buffer of SVE objects to allocate and sort it.
4121 SmallVector<int, 8> ObjectsToAllocate;
4122 // If we have a stack protector, and we've previously decided that we have SVE
4123 // objects on the stack and thus need it to go in the SVE stack area, then it
4124 // needs to go first.
4125 int StackProtectorFI = -1;
4126 if (MFI.hasStackProtectorIndex()) {
4127 StackProtectorFI = MFI.getStackProtectorIndex();
4128 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4129 ObjectsToAllocate.push_back(StackProtectorFI);
4130 }
4131 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4132 unsigned StackID = MFI.getStackID(I);
4133 if (StackID != TargetStackID::ScalableVector)
4134 continue;
4135 if (I == StackProtectorFI)
4136 continue;
4137 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4138 continue;
4139 if (MFI.isDeadObjectIndex(I))
4140 continue;
4141
4142 ObjectsToAllocate.push_back(I);
4143 }
4144
4145 // Allocate all SVE locals and spills
4146 for (unsigned FI : ObjectsToAllocate) {
4147 Align Alignment = MFI.getObjectAlign(FI);
4148 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4149 // two, we'd need to align every object dynamically at runtime if the
4150 // alignment is larger than 16. This is not yet supported.
4151 if (Alignment > Align(16))
4153 "Alignment of scalable vectors > 16 bytes is not yet supported");
4154
4155 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4156 if (AssignOffsets)
4157 Assign(FI, -Offset);
4158 }
4159
4160 return Offset;
4161}
4162
4163int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4164 MachineFrameInfo &MFI) const {
4165 int MinCSFrameIndex, MaxCSFrameIndex;
4166 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4167}
4168
4169int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4170 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4171 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4172 true);
4173}
4174
4175/// Attempts to scavenge a register from \p ScavengeableRegs given the used
4176/// registers in \p UsedRegs.
4178 BitVector const &ScavengeableRegs) {
4179 for (auto Reg : ScavengeableRegs.set_bits()) {
4180 if (UsedRegs.available(Reg))
4181 return Reg;
4182 }
4183 return AArch64::NoRegister;
4184}
4185
4186/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
4187/// \p MachineInstrs.
4188static void propagateFrameFlags(MachineInstr &SourceMI,
4189 ArrayRef<MachineInstr *> MachineInstrs) {
4190 for (MachineInstr *MI : MachineInstrs) {
4191 if (SourceMI.getFlag(MachineInstr::FrameSetup))
4192 MI->setFlag(MachineInstr::FrameSetup);
4193 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
4195 }
4196}
4197
4198/// RAII helper class for scavenging or spilling a register. On construction
4199/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
4200/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
4201/// MaybeSpillFI to free a register. The free'd register is returned via the \p
4202/// FreeReg output parameter. On destruction, if there is a spill, its previous
4203/// value is reloaded. The spilling and scavenging is only valid at the
4204/// insertion point \p MBBI, this class should _not_ be used in places that
4205/// create or manipulate basic blocks, moving the expected insertion point.
4209
4212 Register SpillCandidate, const TargetRegisterClass &RC,
4213 LiveRegUnits const &UsedRegs,
4214 BitVector const &AllocatableRegs,
4215 std::optional<int> *MaybeSpillFI)
4216 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
4217 *MF.getSubtarget().getInstrInfo())),
4218 TRI(*MF.getSubtarget().getRegisterInfo()) {
4219 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs);
4220 if (FreeReg != AArch64::NoRegister)
4221 return;
4222 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
4223 "(attempted to spill in prologue/epilogue?)");
4224 if (!MaybeSpillFI->has_value()) {
4225 MachineFrameInfo &MFI = MF.getFrameInfo();
4226 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
4227 TRI.getSpillAlign(RC));
4228 }
4229 FreeReg = SpillCandidate;
4230 SpillFI = MaybeSpillFI->value();
4231 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
4232 Register());
4233 }
4234
4235 bool hasSpilled() const { return SpillFI.has_value(); }
4236
4237 /// Returns the free register (found from scavenging or spilling a register).
4238 Register freeRegister() const { return FreeReg; }
4239
4240 Register operator*() const { return freeRegister(); }
4241
4243 if (hasSpilled())
4244 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
4245 Register());
4246 }
4247
4248private:
4251 const TargetRegisterClass &RC;
4252 const AArch64InstrInfo &TII;
4253 const TargetRegisterInfo &TRI;
4254 Register FreeReg = AArch64::NoRegister;
4255 std::optional<int> SpillFI;
4256};
4257
4258/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
4259/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4261 std::optional<int> ZPRSpillFI;
4262 std::optional<int> PPRSpillFI;
4263 std::optional<int> GPRSpillFI;
4264};
4265
4266/// Registers available for scavenging (ZPR, PPR3b, GPR).
4271};
4272
4274 return MI.getFlag(MachineInstr::FrameSetup) ||
4276}
4277
4278/// Expands:
4279/// ```
4280/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
4281/// ```
4282/// To:
4283/// ```
4284/// $z0 = CPY_ZPzI_B $p0, 1, 0
4285/// STR_ZXI $z0, $stack.0, 0
4286/// ```
4287/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4288/// spilling if necessary).
4291 const TargetRegisterInfo &TRI,
4292 LiveRegUnits const &UsedRegs,
4293 ScavengeableRegs const &SR,
4294 EmergencyStackSlots &SpillSlots) {
4295 MachineFunction &MF = *MBB.getParent();
4296 auto *TII =
4297 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4298
4299 ScopedScavengeOrSpill ZPredReg(
4300 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4301 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4302
4303 SmallVector<MachineInstr *, 2> MachineInstrs;
4304 const DebugLoc &DL = MI.getDebugLoc();
4305 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
4306 .addReg(*ZPredReg, RegState::Define)
4307 .add(MI.getOperand(0))
4308 .addImm(1)
4309 .addImm(0)
4310 .getInstr());
4311 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
4312 .addReg(*ZPredReg)
4313 .add(MI.getOperand(1))
4314 .addImm(MI.getOperand(2).getImm())
4315 .setMemRefs(MI.memoperands())
4316 .getInstr());
4317 propagateFrameFlags(MI, MachineInstrs);
4318}
4319
4320/// Expands:
4321/// ```
4322/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
4323/// ```
4324/// To:
4325/// ```
4326/// $z0 = LDR_ZXI %stack.0, 0
4327/// $p0 = PTRUE_B 31, implicit $vg
4328/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
4329/// ```
4330/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4331/// spilling if necessary). If the status flags are in use at the point of
4332/// expansion they are preserved (by moving them to/from a GPR). This may cause
4333/// an additional spill if no GPR is free at the expansion point.
4336 const TargetRegisterInfo &TRI,
4337 LiveRegUnits const &UsedRegs,
4338 ScavengeableRegs const &SR,
4339 EmergencyStackSlots &SpillSlots) {
4340 MachineFunction &MF = *MBB.getParent();
4341 auto *TII =
4342 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4343
4344 ScopedScavengeOrSpill ZPredReg(
4345 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4346 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4347
4348 ScopedScavengeOrSpill PredReg(
4349 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
4350 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI);
4351
4352 // Elide NZCV spills if we know it is not used.
4353 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
4354 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
4355 if (IsNZCVUsed)
4356 NZCVSaveReg.emplace(
4357 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
4358 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
4359 SmallVector<MachineInstr *, 4> MachineInstrs;
4360 const DebugLoc &DL = MI.getDebugLoc();
4361 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
4362 .addReg(*ZPredReg, RegState::Define)
4363 .add(MI.getOperand(1))
4364 .addImm(MI.getOperand(2).getImm())
4365 .setMemRefs(MI.memoperands())
4366 .getInstr());
4367 if (IsNZCVUsed)
4368 MachineInstrs.push_back(
4369 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
4370 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
4371 .addImm(AArch64SysReg::NZCV)
4372 .addReg(AArch64::NZCV, RegState::Implicit)
4373 .getInstr());
4374 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
4375 .addReg(*PredReg, RegState::Define)
4376 .addImm(31));
4377 MachineInstrs.push_back(
4378 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
4379 .addReg(MI.getOperand(0).getReg(), RegState::Define)
4380 .addReg(*PredReg)
4381 .addReg(*ZPredReg)
4382 .addImm(0)
4383 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4384 .getInstr());
4385 if (IsNZCVUsed)
4386 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
4387 .addImm(AArch64SysReg::NZCV)
4388 .addReg(NZCVSaveReg->freeRegister())
4389 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4390 .getInstr());
4391
4392 propagateFrameFlags(MI, MachineInstrs);
4393 return PredReg.hasSpilled();
4394}
4395
4396/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
4397/// operations within the MachineBasicBlock \p MBB.
4399 const TargetRegisterInfo &TRI,
4400 ScavengeableRegs const &SR,
4401 EmergencyStackSlots &SpillSlots) {
4402 LiveRegUnits UsedRegs(TRI);
4403 UsedRegs.addLiveOuts(MBB);
4404 bool HasPPRSpills = false;
4406 UsedRegs.stepBackward(MI);
4407 switch (MI.getOpcode()) {
4408 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
4409 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
4410 SpillSlots);
4411 MI.eraseFromParent();
4412 break;
4413 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
4414 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
4415 MI.eraseFromParent();
4416 break;
4417 default:
4418 break;
4419 }
4420 }
4421
4422 return HasPPRSpills;
4423}
4424
4426 MachineFunction &MF, RegScavenger *RS) const {
4427
4429 const TargetSubtargetInfo &TSI = MF.getSubtarget();
4430 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
4431
4432 // If predicates spills are 16-bytes we may need to expand
4433 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4434 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
4435 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
4436 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
4437 assert(Regs.count() > 0 && "Expected scavengeable registers");
4438 return Regs;
4439 };
4440
4441 ScavengeableRegs SR{};
4442 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
4443 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
4444 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
4445 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
4446
4447 EmergencyStackSlots SpillSlots;
4448 for (MachineBasicBlock &MBB : MF) {
4449 // In the case we had to spill a predicate (in the range p0-p7) to reload
4450 // a predicate (>= p8), additional spill/fill pseudos will be created.
4451 // These need an additional expansion pass. Note: There will only be at
4452 // most two expansion passes, as spilling/filling a predicate in the range
4453 // p0-p7 never requires spilling another predicate.
4454 for (int Pass = 0; Pass < 2; Pass++) {
4455 bool HasPPRSpills =
4456 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
4457 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
4458 if (!HasPPRSpills)
4459 break;
4460 }
4461 }
4462 }
4463
4464 MachineFrameInfo &MFI = MF.getFrameInfo();
4465
4467 "Upwards growing stack unsupported");
4468
4469 int MinCSFrameIndex, MaxCSFrameIndex;
4470 int64_t SVEStackSize =
4471 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4472
4473 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4474 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4475
4476 // If this function isn't doing Win64-style C++ EH, we don't need to do
4477 // anything.
4478 if (!MF.hasEHFunclets())
4479 return;
4481 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4482
4483 MachineBasicBlock &MBB = MF.front();
4484 auto MBBI = MBB.begin();
4485 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4486 ++MBBI;
4487
4488 // Create an UnwindHelp object.
4489 // The UnwindHelp object is allocated at the start of the fixed object area
4490 int64_t FixedObject =
4491 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4492 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4493 /*SPOffset*/ -FixedObject,
4494 /*IsImmutable=*/false);
4495 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4496
4497 // We need to store -2 into the UnwindHelp object at the start of the
4498 // function.
4499 DebugLoc DL;
4501 RS->backward(MBBI);
4502 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4503 assert(DstReg && "There must be a free register after frame setup");
4504 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4505 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4506 .addReg(DstReg, getKillRegState(true))
4507 .addFrameIndex(UnwindHelpFI)
4508 .addImm(0);
4509}
4510
4511namespace {
4512struct TagStoreInstr {
4514 int64_t Offset, Size;
4515 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4516 : MI(MI), Offset(Offset), Size(Size) {}
4517};
4518
4519class TagStoreEdit {
4520 MachineFunction *MF;
4523 // Tag store instructions that are being replaced.
4525 // Combined memref arguments of the above instructions.
4527
4528 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4529 // FrameRegOffset + Size) with the address tag of SP.
4530 Register FrameReg;
4531 StackOffset FrameRegOffset;
4532 int64_t Size;
4533 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4534 // end.
4535 std::optional<int64_t> FrameRegUpdate;
4536 // MIFlags for any FrameReg updating instructions.
4537 unsigned FrameRegUpdateFlags;
4538
4539 // Use zeroing instruction variants.
4540 bool ZeroData;
4541 DebugLoc DL;
4542
4543 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4544 void emitLoop(MachineBasicBlock::iterator InsertI);
4545
4546public:
4547 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4548 : MBB(MBB), ZeroData(ZeroData) {
4549 MF = MBB->getParent();
4550 MRI = &MF->getRegInfo();
4551 }
4552 // Add an instruction to be replaced. Instructions must be added in the
4553 // ascending order of Offset, and have to be adjacent.
4554 void addInstruction(TagStoreInstr I) {
4555 assert((TagStores.empty() ||
4556 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4557 "Non-adjacent tag store instructions.");
4558 TagStores.push_back(I);
4559 }
4560 void clear() { TagStores.clear(); }
4561 // Emit equivalent code at the given location, and erase the current set of
4562 // instructions. May skip if the replacement is not profitable. May invalidate
4563 // the input iterator and replace it with a valid one.
4564 void emitCode(MachineBasicBlock::iterator &InsertI,
4565 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4566};
4567
4568void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4569 const AArch64InstrInfo *TII =
4570 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4571
4572 const int64_t kMinOffset = -256 * 16;
4573 const int64_t kMaxOffset = 255 * 16;
4574
4575 Register BaseReg = FrameReg;
4576 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4577 if (BaseRegOffsetBytes < kMinOffset ||
4578 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4579 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4580 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4581 // is required for the offset of ST2G.
4582 BaseRegOffsetBytes % 16 != 0) {
4583 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4584 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4585 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4586 BaseReg = ScratchReg;
4587 BaseRegOffsetBytes = 0;
4588 }
4589
4590 MachineInstr *LastI = nullptr;
4591 while (Size) {
4592 int64_t InstrSize = (Size > 16) ? 32 : 16;
4593 unsigned Opcode =
4594 InstrSize == 16
4595 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4596 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4597 assert(BaseRegOffsetBytes % 16 == 0);
4598 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4599 .addReg(AArch64::SP)
4600 .addReg(BaseReg)
4601 .addImm(BaseRegOffsetBytes / 16)
4602 .setMemRefs(CombinedMemRefs);
4603 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4604 // final SP adjustment in the epilogue.
4605 if (BaseRegOffsetBytes == 0)
4606 LastI = I;
4607 BaseRegOffsetBytes += InstrSize;
4608 Size -= InstrSize;
4609 }
4610
4611 if (LastI)
4612 MBB->splice(InsertI, MBB, LastI);
4613}
4614
4615void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4616 const AArch64InstrInfo *TII =
4617 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4618
4619 Register BaseReg = FrameRegUpdate
4620 ? FrameReg
4621 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4622 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4623
4624 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4625
4626 int64_t LoopSize = Size;
4627 // If the loop size is not a multiple of 32, split off one 16-byte store at
4628 // the end to fold BaseReg update into.
4629 if (FrameRegUpdate && *FrameRegUpdate)
4630 LoopSize -= LoopSize % 32;
4631 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4632 TII->get(ZeroData ? AArch64::STZGloop_wback
4633 : AArch64::STGloop_wback))
4634 .addDef(SizeReg)
4635 .addDef(BaseReg)
4636 .addImm(LoopSize)
4637 .addReg(BaseReg)
4638 .setMemRefs(CombinedMemRefs);
4639 if (FrameRegUpdate)
4640 LoopI->setFlags(FrameRegUpdateFlags);
4641
4642 int64_t ExtraBaseRegUpdate =
4643 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4644 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
4645 << ", Size=" << Size
4646 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
4647 << ", FrameRegUpdate=" << FrameRegUpdate
4648 << ", FrameRegOffset.getFixed()="
4649 << FrameRegOffset.getFixed() << "\n");
4650 if (LoopSize < Size) {
4651 assert(FrameRegUpdate);
4652 assert(Size - LoopSize == 16);
4653 // Tag 16 more bytes at BaseReg and update BaseReg.
4654 int64_t STGOffset = ExtraBaseRegUpdate + 16;
4655 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
4656 "STG immediate out of range");
4657 BuildMI(*MBB, InsertI, DL,
4658 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4659 .addDef(BaseReg)
4660 .addReg(BaseReg)
4661 .addReg(BaseReg)
4662 .addImm(STGOffset / 16)
4663 .setMemRefs(CombinedMemRefs)
4664 .setMIFlags(FrameRegUpdateFlags);
4665 } else if (ExtraBaseRegUpdate) {
4666 // Update BaseReg.
4667 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
4668 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
4669 BuildMI(
4670 *MBB, InsertI, DL,
4671 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4672 .addDef(BaseReg)
4673 .addReg(BaseReg)
4674 .addImm(AddSubOffset)
4675 .addImm(0)
4676 .setMIFlags(FrameRegUpdateFlags);
4677 }
4678}
4679
4680// Check if *II is a register update that can be merged into STGloop that ends
4681// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4682// end of the loop.
4683bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4684 int64_t Size, int64_t *TotalOffset) {
4685 MachineInstr &MI = *II;
4686 if ((MI.getOpcode() == AArch64::ADDXri ||
4687 MI.getOpcode() == AArch64::SUBXri) &&
4688 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4689 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4690 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4691 if (MI.getOpcode() == AArch64::SUBXri)
4692 Offset = -Offset;
4693 int64_t PostOffset = Offset - Size;
4694 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
4695 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
4696 // chosen depends on the alignment of the loop size, but the difference
4697 // between the valid ranges for the two instructions is small, so we
4698 // conservatively assume that it could be either case here.
4699 //
4700 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
4701 // instruction.
4702 const int64_t kMaxOffset = 4080 - 16;
4703 // Max offset of SUBXri.
4704 const int64_t kMinOffset = -4095;
4705 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
4706 PostOffset % 16 == 0) {
4707 *TotalOffset = Offset;
4708 return true;
4709 }
4710 }
4711 return false;
4712}
4713
4714void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4716 MemRefs.clear();
4717 for (auto &TS : TSE) {
4718 MachineInstr *MI = TS.MI;
4719 // An instruction without memory operands may access anything. Be
4720 // conservative and return an empty list.
4721 if (MI->memoperands_empty()) {
4722 MemRefs.clear();
4723 return;
4724 }
4725 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4726 }
4727}
4728
4729void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4730 const AArch64FrameLowering *TFI,
4731 bool TryMergeSPUpdate) {
4732 if (TagStores.empty())
4733 return;
4734 TagStoreInstr &FirstTagStore = TagStores[0];
4735 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4736 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4737 DL = TagStores[0].MI->getDebugLoc();
4738
4739 Register Reg;
4740 FrameRegOffset = TFI->resolveFrameOffsetReference(
4741 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4742 /*PreferFP=*/false, /*ForSimm=*/true);
4743 FrameReg = Reg;
4744 FrameRegUpdate = std::nullopt;
4745
4746 mergeMemRefs(TagStores, CombinedMemRefs);
4747
4748 LLVM_DEBUG({
4749 dbgs() << "Replacing adjacent STG instructions:\n";
4750 for (const auto &Instr : TagStores) {
4751 dbgs() << " " << *Instr.MI;
4752 }
4753 });
4754
4755 // Size threshold where a loop becomes shorter than a linear sequence of
4756 // tagging instructions.
4757 const int kSetTagLoopThreshold = 176;
4758 if (Size < kSetTagLoopThreshold) {
4759 if (TagStores.size() < 2)
4760 return;
4761 emitUnrolled(InsertI);
4762 } else {
4763 MachineInstr *UpdateInstr = nullptr;
4764 int64_t TotalOffset = 0;
4765 if (TryMergeSPUpdate) {
4766 // See if we can merge base register update into the STGloop.
4767 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4768 // but STGloop is way too unusual for that, and also it only
4769 // realistically happens in function epilogue. Also, STGloop is expanded
4770 // before that pass.
4771 if (InsertI != MBB->end() &&
4772 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4773 &TotalOffset)) {
4774 UpdateInstr = &*InsertI++;
4775 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4776 << *UpdateInstr);
4777 }
4778 }
4779
4780 if (!UpdateInstr && TagStores.size() < 2)
4781 return;
4782
4783 if (UpdateInstr) {
4784 FrameRegUpdate = TotalOffset;
4785 FrameRegUpdateFlags = UpdateInstr->getFlags();
4786 }
4787 emitLoop(InsertI);
4788 if (UpdateInstr)
4789 UpdateInstr->eraseFromParent();
4790 }
4791
4792 for (auto &TS : TagStores)
4793 TS.MI->eraseFromParent();
4794}
4795
4796bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4797 int64_t &Size, bool &ZeroData) {
4798 MachineFunction &MF = *MI.getParent()->getParent();
4799 const MachineFrameInfo &MFI = MF.getFrameInfo();
4800
4801 unsigned Opcode = MI.getOpcode();
4802 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4803 Opcode == AArch64::STZ2Gi);
4804
4805 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4806 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4807 return false;
4808 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4809 return false;
4810 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4811 Size = MI.getOperand(2).getImm();
4812 return true;
4813 }
4814
4815 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4816 Size = 16;
4817 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4818 Size = 32;
4819 else
4820 return false;
4821
4822 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4823 return false;
4824
4825 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4826 16 * MI.getOperand(2).getImm();
4827 return true;
4828}
4829
4830// Detect a run of memory tagging instructions for adjacent stack frame slots,
4831// and replace them with a shorter instruction sequence:
4832// * replace STG + STG with ST2G
4833// * replace STGloop + STGloop with STGloop
4834// This code needs to run when stack slot offsets are already known, but before
4835// FrameIndex operands in STG instructions are eliminated.
4837 const AArch64FrameLowering *TFI,
4838 RegScavenger *RS) {
4839 bool FirstZeroData;
4840 int64_t Size, Offset;
4841 MachineInstr &MI = *II;
4842 MachineBasicBlock *MBB = MI.getParent();
4844 if (&MI == &MBB->instr_back())
4845 return II;
4846 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4847 return II;
4848
4850 Instrs.emplace_back(&MI, Offset, Size);
4851
4852 constexpr int kScanLimit = 10;
4853 int Count = 0;
4855 NextI != E && Count < kScanLimit; ++NextI) {
4856 MachineInstr &MI = *NextI;
4857 bool ZeroData;
4858 int64_t Size, Offset;
4859 // Collect instructions that update memory tags with a FrameIndex operand
4860 // and (when applicable) constant size, and whose output registers are dead
4861 // (the latter is almost always the case in practice). Since these
4862 // instructions effectively have no inputs or outputs, we are free to skip
4863 // any non-aliasing instructions in between without tracking used registers.
4864 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4865 if (ZeroData != FirstZeroData)
4866 break;
4867 Instrs.emplace_back(&MI, Offset, Size);
4868 continue;
4869 }
4870
4871 // Only count non-transient, non-tagging instructions toward the scan
4872 // limit.
4873 if (!MI.isTransient())
4874 ++Count;
4875
4876 // Just in case, stop before the epilogue code starts.
4877 if (MI.getFlag(MachineInstr::FrameSetup) ||
4879 break;
4880
4881 // Reject anything that may alias the collected instructions.
4882 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
4883 break;
4884 }
4885
4886 // New code will be inserted after the last tagging instruction we've found.
4887 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4888
4889 // All the gathered stack tag instructions are merged and placed after
4890 // last tag store in the list. The check should be made if the nzcv
4891 // flag is live at the point where we are trying to insert. Otherwise
4892 // the nzcv flag might get clobbered if any stg loops are present.
4893
4894 // FIXME : This approach of bailing out from merge is conservative in
4895 // some ways like even if stg loops are not present after merge the
4896 // insert list, this liveness check is done (which is not needed).
4898 LiveRegs.addLiveOuts(*MBB);
4899 for (auto I = MBB->rbegin();; ++I) {
4900 MachineInstr &MI = *I;
4901 if (MI == InsertI)
4902 break;
4903 LiveRegs.stepBackward(*I);
4904 }
4905 InsertI++;
4906 if (LiveRegs.contains(AArch64::NZCV))
4907 return InsertI;
4908
4909 llvm::stable_sort(Instrs,
4910 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4911 return Left.Offset < Right.Offset;
4912 });
4913
4914 // Make sure that we don't have any overlapping stores.
4915 int64_t CurOffset = Instrs[0].Offset;
4916 for (auto &Instr : Instrs) {
4917 if (CurOffset > Instr.Offset)
4918 return NextI;
4919 CurOffset = Instr.Offset + Instr.Size;
4920 }
4921
4922 // Find contiguous runs of tagged memory and emit shorter instruction
4923 // sequencies for them when possible.
4924 TagStoreEdit TSE(MBB, FirstZeroData);
4925 std::optional<int64_t> EndOffset;
4926 for (auto &Instr : Instrs) {
4927 if (EndOffset && *EndOffset != Instr.Offset) {
4928 // Found a gap.
4929 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4930 TSE.clear();
4931 }
4932
4933 TSE.addInstruction(Instr);
4934 EndOffset = Instr.Offset + Instr.Size;
4935 }
4936
4937 const MachineFunction *MF = MBB->getParent();
4938 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4939 TSE.emitCode(
4940 InsertI, TFI, /*TryMergeSPUpdate = */
4942
4943 return InsertI;
4944}
4945} // namespace
4946
4948 const AArch64FrameLowering *TFI) {
4949 MachineInstr &MI = *II;
4950 MachineBasicBlock *MBB = MI.getParent();
4951 MachineFunction *MF = MBB->getParent();
4952
4953 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4954 MI.getOpcode() != AArch64::VGRestorePseudo)
4955 return II;
4956
4957 SMEAttrs FuncAttrs(MF->getFunction());
4958 bool LocallyStreaming =
4959 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4962 const AArch64InstrInfo *TII =
4963 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4964
4965 int64_t VGFrameIdx =
4966 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4967 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4968 "Expected FrameIdx for VG");
4969
4970 unsigned CFIIndex;
4971 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4972 const MachineFrameInfo &MFI = MF->getFrameInfo();
4973 int64_t Offset =
4974 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4976 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4977 } else
4979 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
4980
4981 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
4982 TII->get(TargetOpcode::CFI_INSTRUCTION))
4983 .addCFIIndex(CFIIndex);
4984
4985 MI.eraseFromParent();
4986 return UnwindInst->getIterator();
4987}
4988
4990 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4991 for (auto &BB : MF)
4992 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4993 if (requiresSaveVG(MF))
4994 II = emitVGSaveRestore(II, this);
4996 II = tryMergeAdjacentSTG(II, this, RS);
4997 }
4998}
4999
5000/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
5001/// before the update. This is easily retrieved as it is exactly the offset
5002/// that is set in processFunctionBeforeFrameFinalized.
5004 const MachineFunction &MF, int FI, Register &FrameReg,
5005 bool IgnoreSPUpdates) const {
5006 const MachineFrameInfo &MFI = MF.getFrameInfo();
5007 if (IgnoreSPUpdates) {
5008 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
5009 << MFI.getObjectOffset(FI) << "\n");
5010 FrameReg = AArch64::SP;
5011 return StackOffset::getFixed(MFI.getObjectOffset(FI));
5012 }
5013
5014 // Go to common code if we cannot provide sp + offset.
5015 if (MFI.hasVarSizedObjects() ||
5018 return getFrameIndexReference(MF, FI, FrameReg);
5019
5020 FrameReg = AArch64::SP;
5021 return getStackOffset(MF, MFI.getObjectOffset(FI));
5022}
5023
5024/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
5025/// the parent's frame pointer
5027 const MachineFunction &MF) const {
5028 return 0;
5029}
5030
5031/// Funclets only need to account for space for the callee saved registers,
5032/// as the locals are accounted for in the parent's stack frame.
5034 const MachineFunction &MF) const {
5035 // This is the size of the pushed CSRs.
5036 unsigned CSSize =
5037 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
5038 // This is the amount of stack a funclet needs to allocate.
5039 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
5040 getStackAlign());
5041}
5042
5043namespace {
5044struct FrameObject {
5045 bool IsValid = false;
5046 // Index of the object in MFI.
5047 int ObjectIndex = 0;
5048 // Group ID this object belongs to.
5049 int GroupIndex = -1;
5050 // This object should be placed first (closest to SP).
5051 bool ObjectFirst = false;
5052 // This object's group (which always contains the object with
5053 // ObjectFirst==true) should be placed first.
5054 bool GroupFirst = false;
5055
5056 // Used to distinguish between FP and GPR accesses. The values are decided so
5057 // that they sort FPR < Hazard < GPR and they can be or'd together.
5058 unsigned Accesses = 0;
5059 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
5060};
5061
5062class GroupBuilder {
5063 SmallVector<int, 8> CurrentMembers;
5064 int NextGroupIndex = 0;
5065 std::vector<FrameObject> &Objects;
5066
5067public:
5068 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
5069 void AddMember(int Index) { CurrentMembers.push_back(Index); }
5070 void EndCurrentGroup() {
5071 if (CurrentMembers.size() > 1) {
5072 // Create a new group with the current member list. This might remove them
5073 // from their pre-existing groups. That's OK, dealing with overlapping
5074 // groups is too hard and unlikely to make a difference.
5075 LLVM_DEBUG(dbgs() << "group:");
5076 for (int Index : CurrentMembers) {
5077 Objects[Index].GroupIndex = NextGroupIndex;
5078 LLVM_DEBUG(dbgs() << " " << Index);
5079 }
5080 LLVM_DEBUG(dbgs() << "\n");
5081 NextGroupIndex++;
5082 }
5083 CurrentMembers.clear();
5084 }
5085};
5086
5087bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
5088 // Objects at a lower index are closer to FP; objects at a higher index are
5089 // closer to SP.
5090 //
5091 // For consistency in our comparison, all invalid objects are placed
5092 // at the end. This also allows us to stop walking when we hit the
5093 // first invalid item after it's all sorted.
5094 //
5095 // If we want to include a stack hazard region, order FPR accesses < the
5096 // hazard object < GPRs accesses in order to create a separation between the
5097 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
5098 //
5099 // Otherwise the "first" object goes first (closest to SP), followed by the
5100 // members of the "first" group.
5101 //
5102 // The rest are sorted by the group index to keep the groups together.
5103 // Higher numbered groups are more likely to be around longer (i.e. untagged
5104 // in the function epilogue and not at some earlier point). Place them closer
5105 // to SP.
5106 //
5107 // If all else equal, sort by the object index to keep the objects in the
5108 // original order.
5109 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
5110 A.GroupIndex, A.ObjectIndex) <
5111 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
5112 B.GroupIndex, B.ObjectIndex);
5113}
5114} // namespace
5115
5117 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
5118 if (!OrderFrameObjects || ObjectsToAllocate.empty())
5119 return;
5120
5122 const MachineFrameInfo &MFI = MF.getFrameInfo();
5123 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
5124 for (auto &Obj : ObjectsToAllocate) {
5125 FrameObjects[Obj].IsValid = true;
5126 FrameObjects[Obj].ObjectIndex = Obj;
5127 }
5128
5129 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
5130 // the same time.
5131 GroupBuilder GB(FrameObjects);
5132 for (auto &MBB : MF) {
5133 for (auto &MI : MBB) {
5134 if (MI.isDebugInstr())
5135 continue;
5136
5137 if (AFI.hasStackHazardSlotIndex()) {
5138 std::optional<int> FI = getLdStFrameID(MI, MFI);
5139 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
5140 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
5142 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
5143 else
5144 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
5145 }
5146 }
5147
5148 int OpIndex;
5149 switch (MI.getOpcode()) {
5150 case AArch64::STGloop:
5151 case AArch64::STZGloop:
5152 OpIndex = 3;
5153 break;
5154 case AArch64::STGi:
5155 case AArch64::STZGi:
5156 case AArch64::ST2Gi:
5157 case AArch64::STZ2Gi:
5158 OpIndex = 1;
5159 break;
5160 default:
5161 OpIndex = -1;
5162 }
5163
5164 int TaggedFI = -1;
5165 if (OpIndex >= 0) {
5166 const MachineOperand &MO = MI.getOperand(OpIndex);
5167 if (MO.isFI()) {
5168 int FI = MO.getIndex();
5169 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
5170 FrameObjects[FI].IsValid)
5171 TaggedFI = FI;
5172 }
5173 }
5174
5175 // If this is a stack tagging instruction for a slot that is not part of a
5176 // group yet, either start a new group or add it to the current one.
5177 if (TaggedFI >= 0)
5178 GB.AddMember(TaggedFI);
5179 else
5180 GB.EndCurrentGroup();
5181 }
5182 // Groups should never span multiple basic blocks.
5183 GB.EndCurrentGroup();
5184 }
5185
5186 if (AFI.hasStackHazardSlotIndex()) {
5187 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
5188 FrameObject::AccessHazard;
5189 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
5190 for (auto &Obj : FrameObjects)
5191 if (!Obj.Accesses ||
5192 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
5193 Obj.Accesses = FrameObject::AccessGPR;
5194 }
5195
5196 // If the function's tagged base pointer is pinned to a stack slot, we want to
5197 // put that slot first when possible. This will likely place it at SP + 0,
5198 // and save one instruction when generating the base pointer because IRG does
5199 // not allow an immediate offset.
5200 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
5201 if (TBPI) {
5202 FrameObjects[*TBPI].ObjectFirst = true;
5203 FrameObjects[*TBPI].GroupFirst = true;
5204 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
5205 if (FirstGroupIndex >= 0)
5206 for (FrameObject &Object : FrameObjects)
5207 if (Object.GroupIndex == FirstGroupIndex)
5208 Object.GroupFirst = true;
5209 }
5210
5211 llvm::stable_sort(FrameObjects, FrameObjectCompare);
5212
5213 int i = 0;
5214 for (auto &Obj : FrameObjects) {
5215 // All invalid items are sorted at the end, so it's safe to stop.
5216 if (!Obj.IsValid)
5217 break;
5218 ObjectsToAllocate[i++] = Obj.ObjectIndex;
5219 }
5220
5221 LLVM_DEBUG({
5222 dbgs() << "Final frame order:\n";
5223 for (auto &Obj : FrameObjects) {
5224 if (!Obj.IsValid)
5225 break;
5226 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
5227 if (Obj.ObjectFirst)
5228 dbgs() << ", first";
5229 if (Obj.GroupFirst)
5230 dbgs() << ", group-first";
5231 dbgs() << "\n";
5232 }
5233 });
5234}
5235
5236/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
5237/// least every ProbeSize bytes. Returns an iterator of the first instruction
5238/// after the loop. The difference between SP and TargetReg must be an exact
5239/// multiple of ProbeSize.
5241AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
5242 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
5243 Register TargetReg) const {
5245 MachineFunction &MF = *MBB.getParent();
5246 const AArch64InstrInfo *TII =
5247 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5249
5250 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
5252 MF.insert(MBBInsertPoint, LoopMBB);
5254 MF.insert(MBBInsertPoint, ExitMBB);
5255
5256 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
5257 // in SUB).
5258 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
5259 StackOffset::getFixed(-ProbeSize), TII,
5261 // STR XZR, [SP]
5262 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
5263 .addReg(AArch64::XZR)
5264 .addReg(AArch64::SP)
5265 .addImm(0)
5267 // CMP SP, TargetReg
5268 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
5269 AArch64::XZR)
5270 .addReg(AArch64::SP)
5271 .addReg(TargetReg)
5274 // B.CC Loop
5275 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
5277 .addMBB(LoopMBB)
5279
5280 LoopMBB->addSuccessor(ExitMBB);
5281 LoopMBB->addSuccessor(LoopMBB);
5282 // Synthesize the exit MBB.
5283 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
5285 MBB.addSuccessor(LoopMBB);
5286 // Update liveins.
5287 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
5288
5289 return ExitMBB->begin();
5290}
5291
5292void AArch64FrameLowering::inlineStackProbeFixed(
5293 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
5294 StackOffset CFAOffset) const {
5296 MachineFunction &MF = *MBB->getParent();
5297 const AArch64InstrInfo *TII =
5298 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5300 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
5301 bool HasFP = hasFP(MF);
5302
5303 DebugLoc DL;
5304 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
5305 int64_t NumBlocks = FrameSize / ProbeSize;
5306 int64_t ResidualSize = FrameSize % ProbeSize;
5307
5308 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
5309 << NumBlocks << " blocks of " << ProbeSize
5310 << " bytes, plus " << ResidualSize << " bytes\n");
5311
5312 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
5313 // ordinary loop.
5314 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
5315 for (int i = 0; i < NumBlocks; ++i) {
5316 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
5317 // encodable in a SUB).
5318 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5319 StackOffset::getFixed(-ProbeSize), TII,
5320 MachineInstr::FrameSetup, false, false, nullptr,
5321 EmitAsyncCFI && !HasFP, CFAOffset);
5322 CFAOffset += StackOffset::getFixed(ProbeSize);
5323 // STR XZR, [SP]
5324 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5325 .addReg(AArch64::XZR)
5326 .addReg(AArch64::SP)
5327 .addImm(0)
5329 }
5330 } else if (NumBlocks != 0) {
5331 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
5332 // encodable in ADD). ScrathReg may temporarily become the CFA register.
5333 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
5334 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
5335 MachineInstr::FrameSetup, false, false, nullptr,
5336 EmitAsyncCFI && !HasFP, CFAOffset);
5337 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
5338 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
5339 MBB = MBBI->getParent();
5340 if (EmitAsyncCFI && !HasFP) {
5341 // Set the CFA register back to SP.
5343 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
5344 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
5345 unsigned CFIIndex =
5347 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
5348 .addCFIIndex(CFIIndex)
5350 }
5351 }
5352
5353 if (ResidualSize != 0) {
5354 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
5355 // in SUB).
5356 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5357 StackOffset::getFixed(-ResidualSize), TII,
5358 MachineInstr::FrameSetup, false, false, nullptr,
5359 EmitAsyncCFI && !HasFP, CFAOffset);
5360 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
5361 // STR XZR, [SP]
5362 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5363 .addReg(AArch64::XZR)
5364 .addReg(AArch64::SP)
5365 .addImm(0)
5367 }
5368 }
5369}
5370
5371void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
5372 MachineBasicBlock &MBB) const {
5373 // Get the instructions that need to be replaced. We emit at most two of
5374 // these. Remember them in order to avoid complications coming from the need
5375 // to traverse the block while potentially creating more blocks.
5377 for (MachineInstr &MI : MBB)
5378 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
5379 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
5380 ToReplace.push_back(&MI);
5381
5382 for (MachineInstr *MI : ToReplace) {
5383 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
5384 Register ScratchReg = MI->getOperand(0).getReg();
5385 int64_t FrameSize = MI->getOperand(1).getImm();
5386 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
5387 MI->getOperand(3).getImm());
5388 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
5389 CFAOffset);
5390 } else {
5391 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
5392 "Stack probe pseudo-instruction expected");
5393 const AArch64InstrInfo *TII =
5394 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
5395 Register TargetReg = MI->getOperand(0).getReg();
5396 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
5397 }
5398 MI->eraseFromParent();
5399 }
5400}
5401
5404 NotAccessed = 0, // Stack object not accessed by load/store instructions.
5405 GPR = 1 << 0, // A general purpose register.
5406 PPR = 1 << 1, // A predicate register.
5407 FPR = 1 << 2, // A floating point/Neon/SVE register.
5408 };
5409
5410 int Idx;
5412 int64_t Size;
5413 unsigned AccessTypes;
5414
5415 StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
5416
5417 bool operator<(const StackAccess &Rhs) const {
5418 return std::make_tuple(start(), Idx) <
5419 std::make_tuple(Rhs.start(), Rhs.Idx);
5420 }
5421
5422 bool isCPU() const {
5423 // Predicate register load and store instructions execute on the CPU.
5424 return AccessTypes & (AccessType::GPR | AccessType::PPR);
5425 }
5426 bool isSME() const { return AccessTypes & AccessType::FPR; }
5427 bool isMixed() const { return isCPU() && isSME(); }
5428
5429 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
5430 int64_t end() const { return start() + Size; }
5431
5432 std::string getTypeString() const {
5433 switch (AccessTypes) {
5434 case AccessType::FPR:
5435 return "FPR";
5436 case AccessType::PPR:
5437 return "PPR";
5438 case AccessType::GPR:
5439 return "GPR";
5440 case AccessType::NotAccessed:
5441 return "NA";
5442 default:
5443 return "Mixed";
5444 }
5445 }
5446
5447 void print(raw_ostream &OS) const {
5448 OS << getTypeString() << " stack object at [SP"
5449 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
5450 if (Offset.getScalable())
5451 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
5452 << " * vscale";
5453 OS << "]";
5454 }
5455};
5456
5457static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
5458 SA.print(OS);
5459 return OS;
5460}
5461
5462void AArch64FrameLowering::emitRemarks(
5463 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
5464
5466 if (Attrs.hasNonStreamingInterfaceAndBody())
5467 return;
5468
5469 unsigned StackHazardSize = getStackHazardSize(MF);
5470 const uint64_t HazardSize =
5471 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
5472
5473 if (HazardSize == 0)
5474 return;
5475
5476 const MachineFrameInfo &MFI = MF.getFrameInfo();
5477 // Bail if function has no stack objects.
5478 if (!MFI.hasStackObjects())
5479 return;
5480
5481 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
5482
5483 size_t NumFPLdSt = 0;
5484 size_t NumNonFPLdSt = 0;
5485
5486 // Collect stack accesses via Load/Store instructions.
5487 for (const MachineBasicBlock &MBB : MF) {
5488 for (const MachineInstr &MI : MBB) {
5489 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
5490 continue;
5491 for (MachineMemOperand *MMO : MI.memoperands()) {
5492 std::optional<int> FI = getMMOFrameID(MMO, MFI);
5493 if (FI && !MFI.isDeadObjectIndex(*FI)) {
5494 int FrameIdx = *FI;
5495
5496 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
5497 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
5498 StackAccesses[ArrIdx].Idx = FrameIdx;
5499 StackAccesses[ArrIdx].Offset =
5500 getFrameIndexReferenceFromSP(MF, FrameIdx);
5501 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
5502 }
5503
5504 unsigned RegTy = StackAccess::AccessType::GPR;
5505 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
5506 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
5507 // spill/fill the predicate as a data vector (so are an FPR acess).
5508 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
5509 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
5510 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
5511 RegTy = StackAccess::PPR;
5512 } else
5513 RegTy = StackAccess::FPR;
5514 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5515 RegTy = StackAccess::FPR;
5516 }
5517
5518 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5519
5520 if (RegTy == StackAccess::FPR)
5521 ++NumFPLdSt;
5522 else
5523 ++NumNonFPLdSt;
5524 }
5525 }
5526 }
5527 }
5528
5529 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5530 return;
5531
5532 llvm::sort(StackAccesses);
5533 StackAccesses.erase(llvm::remove_if(StackAccesses,
5534 [](const StackAccess &S) {
5535 return S.AccessTypes ==
5537 }),
5538 StackAccesses.end());
5539
5542
5543 if (StackAccesses.front().isMixed())
5544 MixedObjects.push_back(&StackAccesses.front());
5545
5546 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5547 It != End; ++It) {
5548 const auto &First = *It;
5549 const auto &Second = *(It + 1);
5550
5551 if (Second.isMixed())
5552 MixedObjects.push_back(&Second);
5553
5554 if ((First.isSME() && Second.isCPU()) ||
5555 (First.isCPU() && Second.isSME())) {
5556 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5557 if (Distance < HazardSize)
5558 HazardPairs.emplace_back(&First, &Second);
5559 }
5560 }
5561
5562 auto EmitRemark = [&](llvm::StringRef Str) {
5563 ORE->emit([&]() {
5565 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5566 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5567 });
5568 };
5569
5570 for (const auto &P : HazardPairs)
5571 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5572
5573 for (const auto *Obj : MixedObjects)
5574 EmitRemark(
5575 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5576}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
static bool requiresSaveVG(MachineFunction &MF)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(...)
Definition: Debug.h:106
uint32_t Index
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition: LLParser.cpp:71
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
#define P(N)
static const MCPhysReg FPR[]
FPR - The set of FP registers that should be allocated for arguments on Darwin and AIX.
if(PassOpts->AAPipeline)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
raw_pwrite_stream & OS
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:166
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
bool enableCFIFixup(MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon fucntion entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:168
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:163
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:719
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:716
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:277
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:365
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:234
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:731
void storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SrcReg, bool isKill, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Store the specified register of the given register class to the specified stack frame index.
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc, bool RenamableDest=false, bool RenamableSrc=false) const override
Emit instructions to copy a pair of physical registers.
void loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register DestReg, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Load the specified register of the given register class from the specified stack frame index.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
A set of register units used to track register liveness.
Definition: LiveRegUnits.h:30
bool available(MCPhysReg Reg) const
Returns true if no part of physical register Reg is live.
Definition: LiveRegUnits.h:116
void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:661
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:582
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:656
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:575
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:617
static MCCFIInstruction createNegateRAStateWithPC(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state_with_pc AArch64 negate RA state with PC.
Definition: MCDwarf.h:648
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:643
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:590
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:687
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:670
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:345
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
bool isLiveIn(MCRegister Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:71
void setFlags(unsigned flags)
Definition: MachineInstr.h:412
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
Definition: MachineInstr.h:399
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:394
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
Diagnostic information for optimization analysis remarks.
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:310
Pass interface - Implemented by all 'passes'.
Definition: Pass.h:94
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:81
size_t size() const
Definition: SmallVector.h:78
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:573
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:937
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:683
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:51
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1354
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:202
self_iterator getIterator()
Definition: ilist_node.h:132
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition: raw_ostream.h:52
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:2037
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:657
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1746
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:420
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1664
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
auto remove_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::remove_if which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1778
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
Definition: APFixedPoint.h:303
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI)
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.