LLVM 21.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | callee-saved gpr registers | <--.
48// | | | On Darwin platforms these
49// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
50// | prev_lr | | (frame record first)
51// | prev_fp | <--'
52// | async context if needed |
53// | (a.k.a. "frame record") |
54// |-----------------------------------| <- fp(=x29)
55// | <hazard padding> |
56// |-----------------------------------|
57// | |
58// | callee-saved fp/simd/SVE regs |
59// | |
60// |-----------------------------------|
61// | |
62// | SVE stack objects |
63// | |
64// |-----------------------------------|
65// |.empty.space.to.make.part.below....|
66// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
67// |.the.standard.16-byte.alignment....| compile time; if present)
68// |-----------------------------------|
69// | local variables of fixed size |
70// | including spill slots |
71// | <FPR> |
72// | <hazard padding> |
73// | <GPR> |
74// |-----------------------------------| <- bp(not defined by ABI,
75// |.variable-sized.local.variables....| LLVM chooses X19)
76// |.(VLAs)............................| (size of this area is unknown at
77// |...................................| compile time)
78// |-----------------------------------| <- sp
79// | | Lower address
80//
81//
82// To access the data in a frame, at-compile time, a constant offset must be
83// computable from one of the pointers (fp, bp, sp) to access it. The size
84// of the areas with a dotted background cannot be computed at compile-time
85// if they are present, making it required to have all three of fp, bp and
86// sp to be set up to be able to access all contents in the frame areas,
87// assuming all of the frame areas are non-empty.
88//
89// For most functions, some of the frame areas are empty. For those functions,
90// it may not be necessary to set up fp or bp:
91// * A base pointer is definitely needed when there are both VLAs and local
92// variables with more-than-default alignment requirements.
93// * A frame pointer is definitely needed when there are local variables with
94// more-than-default alignment requirements.
95//
96// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
97// callee-saved area, since the unwind encoding does not allow for encoding
98// this dynamically and existing tools depend on this layout. For other
99// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
100// area to allow SVE stack objects (allocated directly below the callee-saves,
101// if available) to be accessed directly from the framepointer.
102// The SVE spill/fill instructions have VL-scaled addressing modes such
103// as:
104// ldr z8, [fp, #-7 mul vl]
105// For SVE the size of the vector length (VL) is not known at compile-time, so
106// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
107// layout, we don't need to add an unscaled offset to the framepointer before
108// accessing the SVE object in the frame.
109//
110// In some cases when a base pointer is not strictly needed, it is generated
111// anyway when offsets from the frame pointer to access local variables become
112// so large that the offset can't be encoded in the immediate fields of loads
113// or stores.
114//
115// Outgoing function arguments must be at the bottom of the stack frame when
116// calling another function. If we do not have variable-sized stack objects, we
117// can allocate a "reserved call frame" area at the bottom of the local
118// variable area, large enough for all outgoing calls. If we do have VLAs, then
119// the stack pointer must be decremented and incremented around each call to
120// make space for the arguments below the VLAs.
121//
122// FIXME: also explain the redzone concept.
123//
124// About stack hazards: Under some SME contexts, a coprocessor with its own
125// separate cache can used for FP operations. This can create hazards if the CPU
126// and the SME unit try to access the same area of memory, including if the
127// access is to an area of the stack. To try to alleviate this we attempt to
128// introduce extra padding into the stack frame between FP and GPR accesses,
129// controlled by the aarch64-stack-hazard-size option. Without changing the
130// layout of the stack frame in the diagram above, a stack object of size
131// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
132// to the stack objects section, and stack objects are sorted so that FPR >
133// Hazard padding slot > GPRs (where possible). Unfortunately some things are
134// not handled well (VLA area, arguments on the stack, objects with both GPR and
135// FPR accesses), but if those are controlled by the user then the entire stack
136// frame becomes GPR at the start/end with FPR in the middle, surrounded by
137// Hazard padding.
138//
139// An example of the prologue:
140//
141// .globl __foo
142// .align 2
143// __foo:
144// Ltmp0:
145// .cfi_startproc
146// .cfi_personality 155, ___gxx_personality_v0
147// Leh_func_begin:
148// .cfi_lsda 16, Lexception33
149//
150// stp xa,bx, [sp, -#offset]!
151// ...
152// stp x28, x27, [sp, #offset-32]
153// stp fp, lr, [sp, #offset-16]
154// add fp, sp, #offset - 16
155// sub sp, sp, #1360
156//
157// The Stack:
158// +-------------------------------------------+
159// 10000 | ........ | ........ | ........ | ........ |
160// 10004 | ........ | ........ | ........ | ........ |
161// +-------------------------------------------+
162// 10008 | ........ | ........ | ........ | ........ |
163// 1000c | ........ | ........ | ........ | ........ |
164// +===========================================+
165// 10010 | X28 Register |
166// 10014 | X28 Register |
167// +-------------------------------------------+
168// 10018 | X27 Register |
169// 1001c | X27 Register |
170// +===========================================+
171// 10020 | Frame Pointer |
172// 10024 | Frame Pointer |
173// +-------------------------------------------+
174// 10028 | Link Register |
175// 1002c | Link Register |
176// +===========================================+
177// 10030 | ........ | ........ | ........ | ........ |
178// 10034 | ........ | ........ | ........ | ........ |
179// +-------------------------------------------+
180// 10038 | ........ | ........ | ........ | ........ |
181// 1003c | ........ | ........ | ........ | ........ |
182// +-------------------------------------------+
183//
184// [sp] = 10030 :: >>initial value<<
185// sp = 10020 :: stp fp, lr, [sp, #-16]!
186// fp = sp == 10020 :: mov fp, sp
187// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
188// sp == 10010 :: >>final value<<
189//
190// The frame pointer (w29) points to address 10020. If we use an offset of
191// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
192// for w27, and -32 for w28:
193//
194// Ltmp1:
195// .cfi_def_cfa w29, 16
196// Ltmp2:
197// .cfi_offset w30, -8
198// Ltmp3:
199// .cfi_offset w29, -16
200// Ltmp4:
201// .cfi_offset w27, -24
202// Ltmp5:
203// .cfi_offset w28, -32
204//
205//===----------------------------------------------------------------------===//
206
207#include "AArch64FrameLowering.h"
208#include "AArch64InstrInfo.h"
210#include "AArch64RegisterInfo.h"
211#include "AArch64Subtarget.h"
215#include "llvm/ADT/ScopeExit.h"
216#include "llvm/ADT/SmallVector.h"
217#include "llvm/ADT/Statistic.h"
234#include "llvm/IR/Attributes.h"
235#include "llvm/IR/CallingConv.h"
236#include "llvm/IR/DataLayout.h"
237#include "llvm/IR/DebugLoc.h"
238#include "llvm/IR/Function.h"
239#include "llvm/MC/MCAsmInfo.h"
240#include "llvm/MC/MCDwarf.h"
242#include "llvm/Support/Debug.h"
249#include <cassert>
250#include <cstdint>
251#include <iterator>
252#include <optional>
253#include <vector>
254
255using namespace llvm;
256
257#define DEBUG_TYPE "frame-info"
258
259static cl::opt<bool> EnableRedZone("aarch64-redzone",
260 cl::desc("enable use of redzone on AArch64"),
261 cl::init(false), cl::Hidden);
262
264 "stack-tagging-merge-settag",
265 cl::desc("merge settag instruction in function epilog"), cl::init(true),
266 cl::Hidden);
267
268static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
269 cl::desc("sort stack allocations"),
270 cl::init(true), cl::Hidden);
271
273 "homogeneous-prolog-epilog", cl::Hidden,
274 cl::desc("Emit homogeneous prologue and epilogue for the size "
275 "optimization (default = off)"));
276
277// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
279 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
280 cl::Hidden);
281// Whether to insert padding into non-streaming functions (for testing).
282static cl::opt<bool>
283 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
284 cl::init(false), cl::Hidden);
285
287 "aarch64-disable-multivector-spill-fill",
288 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
289 cl::Hidden);
290
291STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
292
293/// Returns how much of the incoming argument stack area (in bytes) we should
294/// clean up in an epilogue. For the C calling convention this will be 0, for
295/// guaranteed tail call conventions it can be positive (a normal return or a
296/// tail call to a function that uses less stack space for arguments) or
297/// negative (for a tail call to a function that needs more stack space than us
298/// for arguments).
303 bool IsTailCallReturn = (MBB.end() != MBBI)
305 : false;
306
307 int64_t ArgumentPopSize = 0;
308 if (IsTailCallReturn) {
309 MachineOperand &StackAdjust = MBBI->getOperand(1);
310
311 // For a tail-call in a callee-pops-arguments environment, some or all of
312 // the stack may actually be in use for the call's arguments, this is
313 // calculated during LowerCall and consumed here...
314 ArgumentPopSize = StackAdjust.getImm();
315 } else {
316 // ... otherwise the amount to pop is *all* of the argument space,
317 // conveniently stored in the MachineFunctionInfo by
318 // LowerFormalArguments. This will, of course, be zero for the C calling
319 // convention.
320 ArgumentPopSize = AFI->getArgumentStackToRestore();
321 }
322
323 return ArgumentPopSize;
324}
325
327static bool needsWinCFI(const MachineFunction &MF);
330
331/// Returns true if a homogeneous prolog or epilog code can be emitted
332/// for the size optimization. If possible, a frame helper call is injected.
333/// When Exit block is given, this check is for epilog.
334bool AArch64FrameLowering::homogeneousPrologEpilog(
335 MachineFunction &MF, MachineBasicBlock *Exit) const {
336 if (!MF.getFunction().hasMinSize())
337 return false;
339 return false;
340 if (EnableRedZone)
341 return false;
342
343 // TODO: Window is supported yet.
344 if (needsWinCFI(MF))
345 return false;
346 // TODO: SVE is not supported yet.
347 if (getSVEStackSize(MF))
348 return false;
349
350 // Bail on stack adjustment needed on return for simplicity.
351 const MachineFrameInfo &MFI = MF.getFrameInfo();
353 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
354 return false;
355 if (Exit && getArgumentStackToRestore(MF, *Exit))
356 return false;
357
358 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
359 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
360 return false;
361
362 // If there are an odd number of GPRs before LR and FP in the CSRs list,
363 // they will not be paired into one RegPairInfo, which is incompatible with
364 // the assumption made by the homogeneous prolog epilog pass.
365 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
366 unsigned NumGPRs = 0;
367 for (unsigned I = 0; CSRegs[I]; ++I) {
368 Register Reg = CSRegs[I];
369 if (Reg == AArch64::LR) {
370 assert(CSRegs[I + 1] == AArch64::FP);
371 if (NumGPRs % 2 != 0)
372 return false;
373 break;
374 }
375 if (AArch64::GPR64RegClass.contains(Reg))
376 ++NumGPRs;
377 }
378
379 return true;
380}
381
382/// Returns true if CSRs should be paired.
383bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
384 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
385}
386
387/// This is the biggest offset to the stack pointer we can encode in aarch64
388/// instructions (without using a separate calculation and a temp register).
389/// Note that the exception here are vector stores/loads which cannot encode any
390/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
391static const unsigned DefaultSafeSPDisplacement = 255;
392
393/// Look at each instruction that references stack frames and return the stack
394/// size limit beyond which some of these instructions will require a scratch
395/// register during their expansion later.
397 // FIXME: For now, just conservatively guestimate based on unscaled indexing
398 // range. We'll end up allocating an unnecessary spill slot a lot, but
399 // realistically that's not a big deal at this stage of the game.
400 for (MachineBasicBlock &MBB : MF) {
401 for (MachineInstr &MI : MBB) {
402 if (MI.isDebugInstr() || MI.isPseudo() ||
403 MI.getOpcode() == AArch64::ADDXri ||
404 MI.getOpcode() == AArch64::ADDSXri)
405 continue;
406
407 for (const MachineOperand &MO : MI.operands()) {
408 if (!MO.isFI())
409 continue;
410
412 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
414 return 0;
415 }
416 }
417 }
419}
420
424}
425
426/// Returns the size of the fixed object area (allocated next to sp on entry)
427/// On Win64 this may include a var args area and an UnwindHelp object for EH.
428static unsigned getFixedObjectSize(const MachineFunction &MF,
429 const AArch64FunctionInfo *AFI, bool IsWin64,
430 bool IsFunclet) {
431 if (!IsWin64 || IsFunclet) {
432 return AFI->getTailCallReservedStack();
433 } else {
434 if (AFI->getTailCallReservedStack() != 0 &&
436 Attribute::SwiftAsync))
437 report_fatal_error("cannot generate ABI-changing tail call for Win64");
438 // Var args are stored here in the primary function.
439 const unsigned VarArgsArea = AFI->getVarArgsGPRSize();
440 // To support EH funclets we allocate an UnwindHelp object
441 const unsigned UnwindHelpObject = (MF.hasEHFunclets() ? 8 : 0);
442 return AFI->getTailCallReservedStack() +
443 alignTo(VarArgsArea + UnwindHelpObject, 16);
444 }
445}
446
447/// Returns the size of the entire SVE stackframe (calleesaves + spills).
450 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
451}
452
454 if (!EnableRedZone)
455 return false;
456
457 // Don't use the red zone if the function explicitly asks us not to.
458 // This is typically used for kernel code.
459 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
460 const unsigned RedZoneSize =
462 if (!RedZoneSize)
463 return false;
464
465 const MachineFrameInfo &MFI = MF.getFrameInfo();
467 uint64_t NumBytes = AFI->getLocalStackSize();
468
469 // If neither NEON or SVE are available, a COPY from one Q-reg to
470 // another requires a spill -> reload sequence. We can do that
471 // using a pre-decrementing store/post-decrementing load, but
472 // if we do so, we can't use the Red Zone.
473 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
474 !Subtarget.isNeonAvailable() &&
475 !Subtarget.hasSVE();
476
477 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
478 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
479}
480
481/// hasFPImpl - Return true if the specified function should have a dedicated
482/// frame pointer register.
484 const MachineFrameInfo &MFI = MF.getFrameInfo();
485 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
486
487 // Win64 EH requires a frame pointer if funclets are present, as the locals
488 // are accessed off the frame pointer in both the parent function and the
489 // funclets.
490 if (MF.hasEHFunclets())
491 return true;
492 // Retain behavior of always omitting the FP for leaf functions when possible.
494 return true;
495 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
496 MFI.hasStackMap() || MFI.hasPatchPoint() ||
497 RegInfo->hasStackRealignment(MF))
498 return true;
499 // With large callframes around we may need to use FP to access the scavenging
500 // emergency spillslot.
501 //
502 // Unfortunately some calls to hasFP() like machine verifier ->
503 // getReservedReg() -> hasFP in the middle of global isel are too early
504 // to know the max call frame size. Hopefully conservatively returning "true"
505 // in those cases is fine.
506 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
507 if (!MFI.isMaxCallFrameSizeComputed() ||
509 return true;
510
511 return false;
512}
513
514/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
515/// not required, we reserve argument space for call sites in the function
516/// immediately on entry to the current function. This eliminates the need for
517/// add/sub sp brackets around call sites. Returns true if the call frame is
518/// included as part of the stack frame.
520 const MachineFunction &MF) const {
521 // The stack probing code for the dynamically allocated outgoing arguments
522 // area assumes that the stack is probed at the top - either by the prologue
523 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
524 // most recent variable-sized object allocation. Changing the condition here
525 // may need to be followed up by changes to the probe issuing logic.
526 return !MF.getFrameInfo().hasVarSizedObjects();
527}
528
532 const AArch64InstrInfo *TII =
533 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
534 const AArch64TargetLowering *TLI =
535 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
536 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
537 DebugLoc DL = I->getDebugLoc();
538 unsigned Opc = I->getOpcode();
539 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
540 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
541
542 if (!hasReservedCallFrame(MF)) {
543 int64_t Amount = I->getOperand(0).getImm();
544 Amount = alignTo(Amount, getStackAlign());
545 if (!IsDestroy)
546 Amount = -Amount;
547
548 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
549 // doesn't have to pop anything), then the first operand will be zero too so
550 // this adjustment is a no-op.
551 if (CalleePopAmount == 0) {
552 // FIXME: in-function stack adjustment for calls is limited to 24-bits
553 // because there's no guaranteed temporary register available.
554 //
555 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
556 // 1) For offset <= 12-bit, we use LSL #0
557 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
558 // LSL #0, and the other uses LSL #12.
559 //
560 // Most call frames will be allocated at the start of a function so
561 // this is OK, but it is a limitation that needs dealing with.
562 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
563
564 if (TLI->hasInlineStackProbe(MF) &&
566 // When stack probing is enabled, the decrement of SP may need to be
567 // probed. We only need to do this if the call site needs 1024 bytes of
568 // space or more, because a region smaller than that is allowed to be
569 // unprobed at an ABI boundary. We rely on the fact that SP has been
570 // probed exactly at this point, either by the prologue or most recent
571 // dynamic allocation.
573 "non-reserved call frame without var sized objects?");
574 Register ScratchReg =
575 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
576 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
577 } else {
578 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
579 StackOffset::getFixed(Amount), TII);
580 }
581 }
582 } else if (CalleePopAmount != 0) {
583 // If the calling convention demands that the callee pops arguments from the
584 // stack, we want to add it back if we have a reserved call frame.
585 assert(CalleePopAmount < 0xffffff && "call frame too large");
586 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
587 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
588 }
589 return MBB.erase(I);
590}
591
592void AArch64FrameLowering::emitCalleeSavedGPRLocations(
595 MachineFrameInfo &MFI = MF.getFrameInfo();
597 SMEAttrs Attrs(MF.getFunction());
598 bool LocallyStreaming =
599 Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface();
600
601 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
602 if (CSI.empty())
603 return;
604
605 const TargetSubtargetInfo &STI = MF.getSubtarget();
606 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
607 const TargetInstrInfo &TII = *STI.getInstrInfo();
609
610 for (const auto &Info : CSI) {
611 unsigned FrameIdx = Info.getFrameIdx();
612 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
613 continue;
614
615 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
616 int64_t DwarfReg = TRI.getDwarfRegNum(Info.getReg(), true);
617 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
618
619 // The location of VG will be emitted before each streaming-mode change in
620 // the function. Only locally-streaming functions require emitting the
621 // non-streaming VG location here.
622 if ((LocallyStreaming && FrameIdx == AFI->getStreamingVGIdx()) ||
623 (!LocallyStreaming &&
624 DwarfReg == TRI.getDwarfRegNum(AArch64::VG, true)))
625 continue;
626
627 unsigned CFIIndex = MF.addFrameInst(
628 MCCFIInstruction::createOffset(nullptr, DwarfReg, Offset));
629 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
630 .addCFIIndex(CFIIndex)
632 }
633}
634
635void AArch64FrameLowering::emitCalleeSavedSVELocations(
638 MachineFrameInfo &MFI = MF.getFrameInfo();
639
640 // Add callee saved registers to move list.
641 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
642 if (CSI.empty())
643 return;
644
645 const TargetSubtargetInfo &STI = MF.getSubtarget();
646 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
647 const TargetInstrInfo &TII = *STI.getInstrInfo();
650
651 for (const auto &Info : CSI) {
652 if (!(MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
653 continue;
654
655 // Not all unwinders may know about SVE registers, so assume the lowest
656 // common demoninator.
657 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
658 unsigned Reg = Info.getReg();
659 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
660 continue;
661
663 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
665
666 unsigned CFIIndex = MF.addFrameInst(createCFAOffset(TRI, Reg, Offset));
667 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
668 .addCFIIndex(CFIIndex)
670 }
671}
672
676 unsigned DwarfReg) {
677 unsigned CFIIndex =
678 MF.addFrameInst(MCCFIInstruction::createSameValue(nullptr, DwarfReg));
679 BuildMI(MBB, InsertPt, DebugLoc(), Desc).addCFIIndex(CFIIndex);
680}
681
683 MachineBasicBlock &MBB) const {
684
686 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
687 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
688 const auto &TRI =
689 static_cast<const AArch64RegisterInfo &>(*Subtarget.getRegisterInfo());
690 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
691
692 const MCInstrDesc &CFIDesc = TII.get(TargetOpcode::CFI_INSTRUCTION);
693 DebugLoc DL;
694
695 // Reset the CFA to `SP + 0`.
697 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
698 nullptr, TRI.getDwarfRegNum(AArch64::SP, true), 0));
699 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
700
701 // Flip the RA sign state.
702 if (MFI.shouldSignReturnAddress(MF)) {
703 auto CFIInst = MFI.branchProtectionPAuthLR()
706 CFIIndex = MF.addFrameInst(CFIInst);
707 BuildMI(MBB, InsertPt, DL, CFIDesc).addCFIIndex(CFIIndex);
708 }
709
710 // Shadow call stack uses X18, reset it.
711 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
712 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
713 TRI.getDwarfRegNum(AArch64::X18, true));
714
715 // Emit .cfi_same_value for callee-saved registers.
716 const std::vector<CalleeSavedInfo> &CSI =
718 for (const auto &Info : CSI) {
719 unsigned Reg = Info.getReg();
720 if (!TRI.regNeedsCFI(Reg, Reg))
721 continue;
722 insertCFISameValue(CFIDesc, MF, MBB, InsertPt,
723 TRI.getDwarfRegNum(Reg, true));
724 }
725}
726
729 bool SVE) {
731 MachineFrameInfo &MFI = MF.getFrameInfo();
732
733 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
734 if (CSI.empty())
735 return;
736
737 const TargetSubtargetInfo &STI = MF.getSubtarget();
738 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
739 const TargetInstrInfo &TII = *STI.getInstrInfo();
741
742 for (const auto &Info : CSI) {
743 if (SVE !=
744 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
745 continue;
746
747 unsigned Reg = Info.getReg();
748 if (SVE &&
749 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
750 continue;
751
752 if (!Info.isRestored())
753 continue;
754
755 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createRestore(
756 nullptr, TRI.getDwarfRegNum(Info.getReg(), true)));
757 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
758 .addCFIIndex(CFIIndex)
760 }
761}
762
763void AArch64FrameLowering::emitCalleeSavedGPRRestores(
766}
767
768void AArch64FrameLowering::emitCalleeSavedSVERestores(
771}
772
773// Return the maximum possible number of bytes for `Size` due to the
774// architectural limit on the size of a SVE register.
775static int64_t upperBound(StackOffset Size) {
776 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
777 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
778}
779
780void AArch64FrameLowering::allocateStackSpace(
782 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
783 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
784 bool FollowupAllocs) const {
785
786 if (!AllocSize)
787 return;
788
789 DebugLoc DL;
791 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
792 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
794 const MachineFrameInfo &MFI = MF.getFrameInfo();
795
796 const int64_t MaxAlign = MFI.getMaxAlign().value();
797 const uint64_t AndMask = ~(MaxAlign - 1);
798
799 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
800 Register TargetReg = RealignmentPadding
802 : AArch64::SP;
803 // SUB Xd/SP, SP, AllocSize
804 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
805 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
806 EmitCFI, InitialOffset);
807
808 if (RealignmentPadding) {
809 // AND SP, X9, 0b11111...0000
810 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
811 .addReg(TargetReg, RegState::Kill)
814 AFI.setStackRealigned(true);
815
816 // No need for SEH instructions here; if we're realigning the stack,
817 // we've set a frame pointer and already finished the SEH prologue.
818 assert(!NeedsWinCFI);
819 }
820 return;
821 }
822
823 //
824 // Stack probing allocation.
825 //
826
827 // Fixed length allocation. If we don't need to re-align the stack and don't
828 // have SVE objects, we can use a more efficient sequence for stack probing.
829 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
831 assert(ScratchReg != AArch64::NoRegister);
832 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
833 .addDef(ScratchReg)
834 .addImm(AllocSize.getFixed())
835 .addImm(InitialOffset.getFixed())
836 .addImm(InitialOffset.getScalable());
837 // The fixed allocation may leave unprobed bytes at the top of the
838 // stack. If we have subsequent alocation (e.g. if we have variable-sized
839 // objects), we need to issue an extra probe, so these allocations start in
840 // a known state.
841 if (FollowupAllocs) {
842 // STR XZR, [SP]
843 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
844 .addReg(AArch64::XZR)
845 .addReg(AArch64::SP)
846 .addImm(0)
848 }
849
850 return;
851 }
852
853 // Variable length allocation.
854
855 // If the (unknown) allocation size cannot exceed the probe size, decrement
856 // the stack pointer right away.
857 int64_t ProbeSize = AFI.getStackProbeSize();
858 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
859 Register ScratchReg = RealignmentPadding
861 : AArch64::SP;
862 assert(ScratchReg != AArch64::NoRegister);
863 // SUB Xd, SP, AllocSize
864 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
865 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
866 EmitCFI, InitialOffset);
867 if (RealignmentPadding) {
868 // AND SP, Xn, 0b11111...0000
869 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
870 .addReg(ScratchReg, RegState::Kill)
873 AFI.setStackRealigned(true);
874 }
875 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
877 // STR XZR, [SP]
878 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
879 .addReg(AArch64::XZR)
880 .addReg(AArch64::SP)
881 .addImm(0)
883 }
884 return;
885 }
886
887 // Emit a variable-length allocation probing loop.
888 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
889 // each of them guaranteed to adjust the stack by less than the probe size.
891 assert(TargetReg != AArch64::NoRegister);
892 // SUB Xd, SP, AllocSize
893 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
894 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
895 EmitCFI, InitialOffset);
896 if (RealignmentPadding) {
897 // AND Xn, Xn, 0b11111...0000
898 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
899 .addReg(TargetReg, RegState::Kill)
902 }
903
904 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
905 .addReg(TargetReg);
906 if (EmitCFI) {
907 // Set the CFA register back to SP.
908 unsigned Reg =
909 Subtarget.getRegisterInfo()->getDwarfRegNum(AArch64::SP, true);
910 unsigned CFIIndex =
912 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
913 .addCFIIndex(CFIIndex)
915 }
916 if (RealignmentPadding)
917 AFI.setStackRealigned(true);
918}
919
920static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
921 switch (Reg.id()) {
922 default:
923 // The called routine is expected to preserve r19-r28
924 // r29 and r30 are used as frame pointer and link register resp.
925 return 0;
926
927 // GPRs
928#define CASE(n) \
929 case AArch64::W##n: \
930 case AArch64::X##n: \
931 return AArch64::X##n
932 CASE(0);
933 CASE(1);
934 CASE(2);
935 CASE(3);
936 CASE(4);
937 CASE(5);
938 CASE(6);
939 CASE(7);
940 CASE(8);
941 CASE(9);
942 CASE(10);
943 CASE(11);
944 CASE(12);
945 CASE(13);
946 CASE(14);
947 CASE(15);
948 CASE(16);
949 CASE(17);
950 CASE(18);
951#undef CASE
952
953 // FPRs
954#define CASE(n) \
955 case AArch64::B##n: \
956 case AArch64::H##n: \
957 case AArch64::S##n: \
958 case AArch64::D##n: \
959 case AArch64::Q##n: \
960 return HasSVE ? AArch64::Z##n : AArch64::Q##n
961 CASE(0);
962 CASE(1);
963 CASE(2);
964 CASE(3);
965 CASE(4);
966 CASE(5);
967 CASE(6);
968 CASE(7);
969 CASE(8);
970 CASE(9);
971 CASE(10);
972 CASE(11);
973 CASE(12);
974 CASE(13);
975 CASE(14);
976 CASE(15);
977 CASE(16);
978 CASE(17);
979 CASE(18);
980 CASE(19);
981 CASE(20);
982 CASE(21);
983 CASE(22);
984 CASE(23);
985 CASE(24);
986 CASE(25);
987 CASE(26);
988 CASE(27);
989 CASE(28);
990 CASE(29);
991 CASE(30);
992 CASE(31);
993#undef CASE
994 }
995}
996
997void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
998 MachineBasicBlock &MBB) const {
999 // Insertion point.
1001
1002 // Fake a debug loc.
1003 DebugLoc DL;
1004 if (MBBI != MBB.end())
1005 DL = MBBI->getDebugLoc();
1006
1007 const MachineFunction &MF = *MBB.getParent();
1009 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
1010
1011 BitVector GPRsToZero(TRI.getNumRegs());
1012 BitVector FPRsToZero(TRI.getNumRegs());
1013 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
1014 for (MCRegister Reg : RegsToZero.set_bits()) {
1015 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1016 // For GPRs, we only care to clear out the 64-bit register.
1017 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1018 GPRsToZero.set(XReg);
1019 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1020 // For FPRs,
1021 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1022 FPRsToZero.set(XReg);
1023 }
1024 }
1025
1026 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1027
1028 // Zero out GPRs.
1029 for (MCRegister Reg : GPRsToZero.set_bits())
1030 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1031
1032 // Zero out FP/vector registers.
1033 for (MCRegister Reg : FPRsToZero.set_bits())
1034 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1035
1036 if (HasSVE) {
1037 for (MCRegister PReg :
1038 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1039 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1040 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1041 AArch64::P15}) {
1042 if (RegsToZero[PReg])
1043 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1044 }
1045 }
1046}
1047
1049 const MachineBasicBlock &MBB) {
1050 const MachineFunction *MF = MBB.getParent();
1051 LiveRegs.addLiveIns(MBB);
1052 // Mark callee saved registers as used so we will not choose them.
1053 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1054 for (unsigned i = 0; CSRegs[i]; ++i)
1055 LiveRegs.addReg(CSRegs[i]);
1056}
1057
1058// Find a scratch register that we can use at the start of the prologue to
1059// re-align the stack pointer. We avoid using callee-save registers since they
1060// may appear to be free when this is called from canUseAsPrologue (during
1061// shrink wrapping), but then no longer be free when this is called from
1062// emitPrologue.
1063//
1064// FIXME: This is a bit conservative, since in the above case we could use one
1065// of the callee-save registers as a scratch temp to re-align the stack pointer,
1066// but we would then have to make sure that we were in fact saving at least one
1067// callee-save register in the prologue, which is additional complexity that
1068// doesn't seem worth the benefit.
1070 MachineFunction *MF = MBB->getParent();
1071
1072 // If MBB is an entry block, use X9 as the scratch register
1073 // preserve_none functions may be using X9 to pass arguments,
1074 // so prefer to pick an available register below.
1075 if (&MF->front() == MBB &&
1077 return AArch64::X9;
1078
1079 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1080 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1081 LivePhysRegs LiveRegs(TRI);
1082 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1083
1084 // Prefer X9 since it was historically used for the prologue scratch reg.
1085 const MachineRegisterInfo &MRI = MF->getRegInfo();
1086 if (LiveRegs.available(MRI, AArch64::X9))
1087 return AArch64::X9;
1088
1089 for (unsigned Reg : AArch64::GPR64RegClass) {
1090 if (LiveRegs.available(MRI, Reg))
1091 return Reg;
1092 }
1093 return AArch64::NoRegister;
1094}
1095
1097 const MachineBasicBlock &MBB) const {
1098 const MachineFunction *MF = MBB.getParent();
1099 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1100 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1101 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1102 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1104
1105 if (AFI->hasSwiftAsyncContext()) {
1106 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1107 const MachineRegisterInfo &MRI = MF->getRegInfo();
1108 LivePhysRegs LiveRegs(TRI);
1109 getLiveRegsForEntryMBB(LiveRegs, MBB);
1110 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1111 // available.
1112 if (!LiveRegs.available(MRI, AArch64::X16) ||
1113 !LiveRegs.available(MRI, AArch64::X17))
1114 return false;
1115 }
1116
1117 // Certain stack probing sequences might clobber flags, then we can't use
1118 // the block as a prologue if the flags register is a live-in.
1120 MBB.isLiveIn(AArch64::NZCV))
1121 return false;
1122
1123 // Don't need a scratch register if we're not going to re-align the stack or
1124 // emit stack probes.
1125 if (!RegInfo->hasStackRealignment(*MF) && !TLI->hasInlineStackProbe(*MF))
1126 return true;
1127 // Otherwise, we can use any block as long as it has a scratch register
1128 // available.
1129 return findScratchNonCalleeSaveRegister(TmpMBB) != AArch64::NoRegister;
1130}
1131
1133 uint64_t StackSizeInBytes) {
1134 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1136 // TODO: When implementing stack protectors, take that into account
1137 // for the probe threshold.
1138 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1139 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1140}
1141
1142static bool needsWinCFI(const MachineFunction &MF) {
1143 const Function &F = MF.getFunction();
1144 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1145 F.needsUnwindTableEntry();
1146}
1147
1148bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1149 MachineFunction &MF, uint64_t StackBumpBytes) const {
1151 const MachineFrameInfo &MFI = MF.getFrameInfo();
1152 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1153 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1154 if (homogeneousPrologEpilog(MF))
1155 return false;
1156
1157 if (AFI->getLocalStackSize() == 0)
1158 return false;
1159
1160 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1161 // (to force a stp with predecrement) to match the packed unwind format,
1162 // provided that there actually are any callee saved registers to merge the
1163 // decrement with.
1164 // This is potentially marginally slower, but allows using the packed
1165 // unwind format for functions that both have a local area and callee saved
1166 // registers. Using the packed unwind format notably reduces the size of
1167 // the unwind info.
1168 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1169 MF.getFunction().hasOptSize())
1170 return false;
1171
1172 // 512 is the maximum immediate for stp/ldp that will be used for
1173 // callee-save save/restores
1174 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1175 return false;
1176
1177 if (MFI.hasVarSizedObjects())
1178 return false;
1179
1180 if (RegInfo->hasStackRealignment(MF))
1181 return false;
1182
1183 // This isn't strictly necessary, but it simplifies things a bit since the
1184 // current RedZone handling code assumes the SP is adjusted by the
1185 // callee-save save/restore code.
1186 if (canUseRedZone(MF))
1187 return false;
1188
1189 // When there is an SVE area on the stack, always allocate the
1190 // callee-saves and spills/locals separately.
1191 if (getSVEStackSize(MF))
1192 return false;
1193
1194 return true;
1195}
1196
1197bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1198 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1199 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1200 return false;
1201 if (MBB.empty())
1202 return true;
1203
1204 // Disable combined SP bump if the last instruction is an MTE tag store. It
1205 // is almost always better to merge SP adjustment into those instructions.
1208 while (LastI != Begin) {
1209 --LastI;
1210 if (LastI->isTransient())
1211 continue;
1212 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1213 break;
1214 }
1215 switch (LastI->getOpcode()) {
1216 case AArch64::STGloop:
1217 case AArch64::STZGloop:
1218 case AArch64::STGi:
1219 case AArch64::STZGi:
1220 case AArch64::ST2Gi:
1221 case AArch64::STZ2Gi:
1222 return false;
1223 default:
1224 return true;
1225 }
1226 llvm_unreachable("unreachable");
1227}
1228
1229// Given a load or a store instruction, generate an appropriate unwinding SEH
1230// code on Windows.
1232 const TargetInstrInfo &TII,
1233 MachineInstr::MIFlag Flag) {
1234 unsigned Opc = MBBI->getOpcode();
1236 MachineFunction &MF = *MBB->getParent();
1237 DebugLoc DL = MBBI->getDebugLoc();
1238 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1239 int Imm = MBBI->getOperand(ImmIdx).getImm();
1241 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1242 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1243
1244 switch (Opc) {
1245 default:
1246 llvm_unreachable("No SEH Opcode for this instruction");
1247 case AArch64::LDPDpost:
1248 Imm = -Imm;
1249 [[fallthrough]];
1250 case AArch64::STPDpre: {
1251 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1252 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1253 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1254 .addImm(Reg0)
1255 .addImm(Reg1)
1256 .addImm(Imm * 8)
1257 .setMIFlag(Flag);
1258 break;
1259 }
1260 case AArch64::LDPXpost:
1261 Imm = -Imm;
1262 [[fallthrough]];
1263 case AArch64::STPXpre: {
1264 Register Reg0 = MBBI->getOperand(1).getReg();
1265 Register Reg1 = MBBI->getOperand(2).getReg();
1266 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1267 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1268 .addImm(Imm * 8)
1269 .setMIFlag(Flag);
1270 else
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1272 .addImm(RegInfo->getSEHRegNum(Reg0))
1273 .addImm(RegInfo->getSEHRegNum(Reg1))
1274 .addImm(Imm * 8)
1275 .setMIFlag(Flag);
1276 break;
1277 }
1278 case AArch64::LDRDpost:
1279 Imm = -Imm;
1280 [[fallthrough]];
1281 case AArch64::STRDpre: {
1282 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1283 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1284 .addImm(Reg)
1285 .addImm(Imm)
1286 .setMIFlag(Flag);
1287 break;
1288 }
1289 case AArch64::LDRXpost:
1290 Imm = -Imm;
1291 [[fallthrough]];
1292 case AArch64::STRXpre: {
1293 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1294 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1295 .addImm(Reg)
1296 .addImm(Imm)
1297 .setMIFlag(Flag);
1298 break;
1299 }
1300 case AArch64::STPDi:
1301 case AArch64::LDPDi: {
1302 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1303 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1304 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1305 .addImm(Reg0)
1306 .addImm(Reg1)
1307 .addImm(Imm * 8)
1308 .setMIFlag(Flag);
1309 break;
1310 }
1311 case AArch64::STPXi:
1312 case AArch64::LDPXi: {
1313 Register Reg0 = MBBI->getOperand(0).getReg();
1314 Register Reg1 = MBBI->getOperand(1).getReg();
1315 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1316 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1317 .addImm(Imm * 8)
1318 .setMIFlag(Flag);
1319 else
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1321 .addImm(RegInfo->getSEHRegNum(Reg0))
1322 .addImm(RegInfo->getSEHRegNum(Reg1))
1323 .addImm(Imm * 8)
1324 .setMIFlag(Flag);
1325 break;
1326 }
1327 case AArch64::STRXui:
1328 case AArch64::LDRXui: {
1329 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1330 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1331 .addImm(Reg)
1332 .addImm(Imm * 8)
1333 .setMIFlag(Flag);
1334 break;
1335 }
1336 case AArch64::STRDui:
1337 case AArch64::LDRDui: {
1338 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1339 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1340 .addImm(Reg)
1341 .addImm(Imm * 8)
1342 .setMIFlag(Flag);
1343 break;
1344 }
1345 case AArch64::STPQi:
1346 case AArch64::LDPQi: {
1347 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1348 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1349 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1350 .addImm(Reg0)
1351 .addImm(Reg1)
1352 .addImm(Imm * 16)
1353 .setMIFlag(Flag);
1354 break;
1355 }
1356 case AArch64::LDPQpost:
1357 Imm = -Imm;
1358 [[fallthrough]];
1359 case AArch64::STPQpre: {
1360 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1361 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1362 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1363 .addImm(Reg0)
1364 .addImm(Reg1)
1365 .addImm(Imm * 16)
1366 .setMIFlag(Flag);
1367 break;
1368 }
1369 }
1370 auto I = MBB->insertAfter(MBBI, MIB);
1371 return I;
1372}
1373
1374// Fix up the SEH opcode associated with the save/restore instruction.
1376 unsigned LocalStackSize) {
1377 MachineOperand *ImmOpnd = nullptr;
1378 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1379 switch (MBBI->getOpcode()) {
1380 default:
1381 llvm_unreachable("Fix the offset in the SEH instruction");
1382 case AArch64::SEH_SaveFPLR:
1383 case AArch64::SEH_SaveRegP:
1384 case AArch64::SEH_SaveReg:
1385 case AArch64::SEH_SaveFRegP:
1386 case AArch64::SEH_SaveFReg:
1387 case AArch64::SEH_SaveAnyRegQP:
1388 case AArch64::SEH_SaveAnyRegQPX:
1389 ImmOpnd = &MBBI->getOperand(ImmIdx);
1390 break;
1391 }
1392 if (ImmOpnd)
1393 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1394}
1395
1398 return AFI->hasStreamingModeChanges() &&
1399 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1400}
1401
1404 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1405 // is enabled with streaming mode changes.
1406 if (!AFI->hasStreamingModeChanges())
1407 return false;
1408 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1409 if (ST.isTargetDarwin())
1410 return ST.hasSVE();
1411 return true;
1412}
1413
1415 unsigned Opc = MBBI->getOpcode();
1416 if (Opc == AArch64::CNTD_XPiI || Opc == AArch64::RDSVLI_XI ||
1417 Opc == AArch64::UBFMXri)
1418 return true;
1419
1420 if (requiresGetVGCall(*MBBI->getMF())) {
1421 if (Opc == AArch64::ORRXrr)
1422 return true;
1423
1424 if (Opc == AArch64::BL) {
1425 auto Op1 = MBBI->getOperand(0);
1426 return Op1.isSymbol() &&
1427 (StringRef(Op1.getSymbolName()) == "__arm_get_current_vg");
1428 }
1429 }
1430
1431 return false;
1432}
1433
1434// Convert callee-save register save/restore instruction to do stack pointer
1435// decrement/increment to allocate/deallocate the callee-save stack area by
1436// converting store/load to use pre/post increment version.
1439 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1440 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1442 int CFAOffset = 0) {
1443 unsigned NewOpc;
1444
1445 // If the function contains streaming mode changes, we expect instructions
1446 // to calculate the value of VG before spilling. For locally-streaming
1447 // functions, we need to do this for both the streaming and non-streaming
1448 // vector length. Move past these instructions if necessary.
1449 MachineFunction &MF = *MBB.getParent();
1450 if (requiresSaveVG(MF))
1451 while (isVGInstruction(MBBI))
1452 ++MBBI;
1453
1454 switch (MBBI->getOpcode()) {
1455 default:
1456 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1457 case AArch64::STPXi:
1458 NewOpc = AArch64::STPXpre;
1459 break;
1460 case AArch64::STPDi:
1461 NewOpc = AArch64::STPDpre;
1462 break;
1463 case AArch64::STPQi:
1464 NewOpc = AArch64::STPQpre;
1465 break;
1466 case AArch64::STRXui:
1467 NewOpc = AArch64::STRXpre;
1468 break;
1469 case AArch64::STRDui:
1470 NewOpc = AArch64::STRDpre;
1471 break;
1472 case AArch64::STRQui:
1473 NewOpc = AArch64::STRQpre;
1474 break;
1475 case AArch64::LDPXi:
1476 NewOpc = AArch64::LDPXpost;
1477 break;
1478 case AArch64::LDPDi:
1479 NewOpc = AArch64::LDPDpost;
1480 break;
1481 case AArch64::LDPQi:
1482 NewOpc = AArch64::LDPQpost;
1483 break;
1484 case AArch64::LDRXui:
1485 NewOpc = AArch64::LDRXpost;
1486 break;
1487 case AArch64::LDRDui:
1488 NewOpc = AArch64::LDRDpost;
1489 break;
1490 case AArch64::LDRQui:
1491 NewOpc = AArch64::LDRQpost;
1492 break;
1493 }
1494 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1495 int64_t MinOffset, MaxOffset;
1496 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1497 NewOpc, Scale, Width, MinOffset, MaxOffset);
1498 (void)Success;
1499 assert(Success && "unknown load/store opcode");
1500
1501 // If the first store isn't right where we want SP then we can't fold the
1502 // update in so create a normal arithmetic instruction instead.
1503 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1504 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1505 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1506 // If we are destroying the frame, make sure we add the increment after the
1507 // last frame operation.
1508 if (FrameFlag == MachineInstr::FrameDestroy) {
1509 ++MBBI;
1510 // Also skip the SEH instruction, if needed
1511 if (NeedsWinCFI && AArch64InstrInfo::isSEHInstruction(*MBBI))
1512 ++MBBI;
1513 }
1514 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1515 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1516 false, NeedsWinCFI, HasWinCFI, EmitCFI,
1517 StackOffset::getFixed(CFAOffset));
1518
1519 return std::prev(MBBI);
1520 }
1521
1522 // Get rid of the SEH code associated with the old instruction.
1523 if (NeedsWinCFI) {
1524 auto SEH = std::next(MBBI);
1526 SEH->eraseFromParent();
1527 }
1528
1529 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1530 MIB.addReg(AArch64::SP, RegState::Define);
1531
1532 // Copy all operands other than the immediate offset.
1533 unsigned OpndIdx = 0;
1534 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1535 ++OpndIdx)
1536 MIB.add(MBBI->getOperand(OpndIdx));
1537
1538 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1539 "Unexpected immediate offset in first/last callee-save save/restore "
1540 "instruction!");
1541 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1542 "Unexpected base register in callee-save save/restore instruction!");
1543 assert(CSStackSizeInc % Scale == 0);
1544 MIB.addImm(CSStackSizeInc / (int)Scale);
1545
1546 MIB.setMIFlags(MBBI->getFlags());
1547 MIB.setMemRefs(MBBI->memoperands());
1548
1549 // Generate a new SEH code that corresponds to the new instruction.
1550 if (NeedsWinCFI) {
1551 *HasWinCFI = true;
1552 InsertSEH(*MIB, *TII, FrameFlag);
1553 }
1554
1555 if (EmitCFI) {
1556 unsigned CFIIndex = MF.addFrameInst(
1557 MCCFIInstruction::cfiDefCfaOffset(nullptr, CFAOffset - CSStackSizeInc));
1558 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1559 .addCFIIndex(CFIIndex)
1560 .setMIFlags(FrameFlag);
1561 }
1562
1563 return std::prev(MBB.erase(MBBI));
1564}
1565
1566// Fixup callee-save register save/restore instructions to take into account
1567// combined SP bump by adding the local stack size to the stack offsets.
1569 uint64_t LocalStackSize,
1570 bool NeedsWinCFI,
1571 bool *HasWinCFI) {
1573 return;
1574
1575 unsigned Opc = MI.getOpcode();
1576 unsigned Scale;
1577 switch (Opc) {
1578 case AArch64::STPXi:
1579 case AArch64::STRXui:
1580 case AArch64::STPDi:
1581 case AArch64::STRDui:
1582 case AArch64::LDPXi:
1583 case AArch64::LDRXui:
1584 case AArch64::LDPDi:
1585 case AArch64::LDRDui:
1586 Scale = 8;
1587 break;
1588 case AArch64::STPQi:
1589 case AArch64::STRQui:
1590 case AArch64::LDPQi:
1591 case AArch64::LDRQui:
1592 Scale = 16;
1593 break;
1594 default:
1595 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1596 }
1597
1598 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1599 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1600 "Unexpected base register in callee-save save/restore instruction!");
1601 // Last operand is immediate offset that needs fixing.
1602 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1603 // All generated opcodes have scaled offsets.
1604 assert(LocalStackSize % Scale == 0);
1605 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1606
1607 if (NeedsWinCFI) {
1608 *HasWinCFI = true;
1609 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1610 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1612 "Expecting a SEH instruction");
1613 fixupSEHOpcode(MBBI, LocalStackSize);
1614 }
1615}
1616
1617static bool isTargetWindows(const MachineFunction &MF) {
1619}
1620
1621static unsigned getStackHazardSize(const MachineFunction &MF) {
1622 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1623}
1624
1625// Convenience function to determine whether I is an SVE callee save.
1627 switch (I->getOpcode()) {
1628 default:
1629 return false;
1630 case AArch64::PTRUE_C_B:
1631 case AArch64::LD1B_2Z_IMM:
1632 case AArch64::ST1B_2Z_IMM:
1633 case AArch64::STR_ZXI:
1634 case AArch64::STR_PXI:
1635 case AArch64::LDR_ZXI:
1636 case AArch64::LDR_PXI:
1637 case AArch64::PTRUE_B:
1638 case AArch64::CPY_ZPzI_B:
1639 case AArch64::CMPNE_PPzZI_B:
1640 return I->getFlag(MachineInstr::FrameSetup) ||
1641 I->getFlag(MachineInstr::FrameDestroy);
1642 }
1643}
1644
1646 MachineFunction &MF,
1649 const DebugLoc &DL, bool NeedsWinCFI,
1650 bool NeedsUnwindInfo) {
1651 // Shadow call stack prolog: str x30, [x18], #8
1652 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1653 .addReg(AArch64::X18, RegState::Define)
1654 .addReg(AArch64::LR)
1655 .addReg(AArch64::X18)
1656 .addImm(8)
1658
1659 // This instruction also makes x18 live-in to the entry block.
1660 MBB.addLiveIn(AArch64::X18);
1661
1662 if (NeedsWinCFI)
1663 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1665
1666 if (NeedsUnwindInfo) {
1667 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1668 // x18 when unwinding past this frame.
1669 static const char CFIInst[] = {
1670 dwarf::DW_CFA_val_expression,
1671 18, // register
1672 2, // length
1673 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1674 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1675 };
1676 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createEscape(
1677 nullptr, StringRef(CFIInst, sizeof(CFIInst))));
1678 BuildMI(MBB, MBBI, DL, TII.get(AArch64::CFI_INSTRUCTION))
1679 .addCFIIndex(CFIIndex)
1681 }
1682}
1683
1685 MachineFunction &MF,
1688 const DebugLoc &DL) {
1689 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1690 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1691 .addReg(AArch64::X18, RegState::Define)
1692 .addReg(AArch64::LR, RegState::Define)
1693 .addReg(AArch64::X18)
1694 .addImm(-8)
1696
1698 unsigned CFIIndex =
1700 BuildMI(MBB, MBBI, DL, TII.get(TargetOpcode::CFI_INSTRUCTION))
1701 .addCFIIndex(CFIIndex)
1703 }
1704}
1705
1706// Define the current CFA rule to use the provided FP.
1709 const DebugLoc &DL, unsigned FixedObject) {
1712 const TargetInstrInfo *TII = STI.getInstrInfo();
1714
1715 const int OffsetToFirstCalleeSaveFromFP =
1718 Register FramePtr = TRI->getFrameRegister(MF);
1719 unsigned Reg = TRI->getDwarfRegNum(FramePtr, true);
1720 unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::cfiDefCfa(
1721 nullptr, Reg, FixedObject - OffsetToFirstCalleeSaveFromFP));
1722 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1723 .addCFIIndex(CFIIndex)
1725}
1726
1727#ifndef NDEBUG
1728/// Collect live registers from the end of \p MI's parent up to (including) \p
1729/// MI in \p LiveRegs.
1731 LivePhysRegs &LiveRegs) {
1732
1733 MachineBasicBlock &MBB = *MI.getParent();
1734 LiveRegs.addLiveOuts(MBB);
1735 for (const MachineInstr &MI :
1736 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1737 LiveRegs.stepBackward(MI);
1738}
1739#endif
1740
1742 MachineBasicBlock &MBB) const {
1744 const MachineFrameInfo &MFI = MF.getFrameInfo();
1745 const Function &F = MF.getFunction();
1746 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1747 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1748 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1749
1751 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1752 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1753 bool HasFP = hasFP(MF);
1754 bool NeedsWinCFI = needsWinCFI(MF);
1755 bool HasWinCFI = false;
1756 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1757
1759#ifndef NDEBUG
1761 // Collect live register from the end of MBB up to the start of the existing
1762 // frame setup instructions.
1763 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1764 while (NonFrameStart != End &&
1765 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1766 ++NonFrameStart;
1767
1768 LivePhysRegs LiveRegs(*TRI);
1769 if (NonFrameStart != MBB.end()) {
1770 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1771 // Ignore registers used for stack management for now.
1772 LiveRegs.removeReg(AArch64::SP);
1773 LiveRegs.removeReg(AArch64::X19);
1774 LiveRegs.removeReg(AArch64::FP);
1775 LiveRegs.removeReg(AArch64::LR);
1776
1777 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1778 // This is necessary to spill VG if required where SVE is unavailable, but
1779 // X0 is preserved around this call.
1780 if (requiresGetVGCall(MF))
1781 LiveRegs.removeReg(AArch64::X0);
1782 }
1783
1784 auto VerifyClobberOnExit = make_scope_exit([&]() {
1785 if (NonFrameStart == MBB.end())
1786 return;
1787 // Check if any of the newly instructions clobber any of the live registers.
1788 for (MachineInstr &MI :
1789 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1790 for (auto &Op : MI.operands())
1791 if (Op.isReg() && Op.isDef())
1792 assert(!LiveRegs.contains(Op.getReg()) &&
1793 "live register clobbered by inserted prologue instructions");
1794 }
1795 });
1796#endif
1797
1798 bool IsFunclet = MBB.isEHFuncletEntry();
1799
1800 // At this point, we're going to decide whether or not the function uses a
1801 // redzone. In most cases, the function doesn't have a redzone so let's
1802 // assume that's false and set it to true in the case that there's a redzone.
1803 AFI->setHasRedZone(false);
1804
1805 // Debug location must be unknown since the first debug location is used
1806 // to determine the end of the prologue.
1807 DebugLoc DL;
1808
1809 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1810 if (MFnI.needsShadowCallStackPrologueEpilogue(MF))
1811 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1812 MFnI.needsDwarfUnwindInfo(MF));
1813
1814 if (MFnI.shouldSignReturnAddress(MF)) {
1815 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1817 if (NeedsWinCFI)
1818 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
1819 }
1820
1821 if (EmitCFI && MFnI.isMTETagged()) {
1822 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1824 }
1825
1826 // We signal the presence of a Swift extended frame to external tools by
1827 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1828 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1829 // bits so that is still true.
1830 if (HasFP && AFI->hasSwiftAsyncContext()) {
1833 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1834 // The special symbol below is absolute and has a *value* that can be
1835 // combined with the frame pointer to signal an extended frame.
1836 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1837 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1839 if (NeedsWinCFI) {
1840 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1842 HasWinCFI = true;
1843 }
1844 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1845 .addUse(AArch64::FP)
1846 .addUse(AArch64::X16)
1847 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1848 if (NeedsWinCFI) {
1849 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1851 HasWinCFI = true;
1852 }
1853 break;
1854 }
1855 [[fallthrough]];
1856
1858 // ORR x29, x29, #0x1000_0000_0000_0000
1859 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1860 .addUse(AArch64::FP)
1861 .addImm(0x1100)
1863 if (NeedsWinCFI) {
1864 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1866 HasWinCFI = true;
1867 }
1868 break;
1869
1871 break;
1872 }
1873 }
1874
1875 // All calls are tail calls in GHC calling conv, and functions have no
1876 // prologue/epilogue.
1878 return;
1879
1880 // Set tagged base pointer to the requested stack slot.
1881 // Ideally it should match SP value after prologue.
1882 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1883 if (TBPI)
1885 else
1887
1888 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1889
1890 // getStackSize() includes all the locals in its size calculation. We don't
1891 // include these locals when computing the stack size of a funclet, as they
1892 // are allocated in the parent's stack frame and accessed via the frame
1893 // pointer from the funclet. We only save the callee saved registers in the
1894 // funclet, which are really the callee saved registers of the parent
1895 // function, including the funclet.
1896 int64_t NumBytes =
1897 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
1898 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
1899 assert(!HasFP && "unexpected function without stack frame but with FP");
1900 assert(!SVEStackSize &&
1901 "unexpected function without stack frame but with SVE objects");
1902 // All of the stack allocation is for locals.
1903 AFI->setLocalStackSize(NumBytes);
1904 if (!NumBytes)
1905 return;
1906 // REDZONE: If the stack size is less than 128 bytes, we don't need
1907 // to actually allocate.
1908 if (canUseRedZone(MF)) {
1909 AFI->setHasRedZone(true);
1910 ++NumRedZoneFunctions;
1911 } else {
1912 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1913 StackOffset::getFixed(-NumBytes), TII,
1914 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
1915 if (EmitCFI) {
1916 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
1917 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
1918 // Encode the stack size of the leaf function.
1919 unsigned CFIIndex = MF.addFrameInst(
1920 MCCFIInstruction::cfiDefCfaOffset(FrameLabel, NumBytes));
1921 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
1922 .addCFIIndex(CFIIndex)
1924 }
1925 }
1926
1927 if (NeedsWinCFI) {
1928 HasWinCFI = true;
1929 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
1931 }
1932
1933 return;
1934 }
1935
1936 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1937 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1938
1939 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1940 // All of the remaining stack allocations are for locals.
1941 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1942 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
1943 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
1944 if (CombineSPBump) {
1945 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1946 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1947 StackOffset::getFixed(-NumBytes), TII,
1948 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
1949 EmitAsyncCFI);
1950 NumBytes = 0;
1951 } else if (HomPrologEpilog) {
1952 // Stack has been already adjusted.
1953 NumBytes -= PrologueSaveSize;
1954 } else if (PrologueSaveSize != 0) {
1956 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
1957 EmitAsyncCFI);
1958 NumBytes -= PrologueSaveSize;
1959 }
1960 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1961
1962 // Move past the saves of the callee-saved registers, fixing up the offsets
1963 // and pre-inc if we decided to combine the callee-save and local stack
1964 // pointer bump above.
1965 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
1967 if (CombineSPBump &&
1968 // Only fix-up frame-setup load/store instructions.
1971 NeedsWinCFI, &HasWinCFI);
1972 ++MBBI;
1973 }
1974
1975 // For funclets the FP belongs to the containing function.
1976 if (!IsFunclet && HasFP) {
1977 // Only set up FP if we actually need to.
1978 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
1979
1980 if (CombineSPBump)
1981 FPOffset += AFI->getLocalStackSize();
1982
1983 if (AFI->hasSwiftAsyncContext()) {
1984 // Before we update the live FP we have to ensure there's a valid (or
1985 // null) asynchronous context in its slot just before FP in the frame
1986 // record, so store it now.
1987 const auto &Attrs = MF.getFunction().getAttributes();
1988 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
1989 if (HaveInitialContext)
1990 MBB.addLiveIn(AArch64::X22);
1991 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
1992 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
1993 .addUse(Reg)
1994 .addUse(AArch64::SP)
1995 .addImm(FPOffset - 8)
1997 if (NeedsWinCFI) {
1998 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
1999 // to multiple instructions, should be mutually-exclusive.
2000 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
2001 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2003 HasWinCFI = true;
2004 }
2005 }
2006
2007 if (HomPrologEpilog) {
2008 auto Prolog = MBBI;
2009 --Prolog;
2010 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
2011 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
2012 } else {
2013 // Issue sub fp, sp, FPOffset or
2014 // mov fp,sp when FPOffset is zero.
2015 // Note: All stores of callee-saved registers are marked as "FrameSetup".
2016 // This code marks the instruction(s) that set the FP also.
2017 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
2018 StackOffset::getFixed(FPOffset), TII,
2019 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2020 if (NeedsWinCFI && HasWinCFI) {
2021 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2023 // After setting up the FP, the rest of the prolog doesn't need to be
2024 // included in the SEH unwind info.
2025 NeedsWinCFI = false;
2026 }
2027 }
2028 if (EmitAsyncCFI)
2029 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2030 }
2031
2032 // Now emit the moves for whatever callee saved regs we have (including FP,
2033 // LR if those are saved). Frame instructions for SVE register are emitted
2034 // later, after the instruction which actually save SVE regs.
2035 if (EmitAsyncCFI)
2036 emitCalleeSavedGPRLocations(MBB, MBBI);
2037
2038 // Alignment is required for the parent frame, not the funclet
2039 const bool NeedsRealignment =
2040 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2041 const int64_t RealignmentPadding =
2042 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2043 ? MFI.getMaxAlign().value() - 16
2044 : 0;
2045
2046 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2047 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2048 if (NeedsWinCFI) {
2049 HasWinCFI = true;
2050 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2051 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2052 // This is at most two instructions, MOVZ follwed by MOVK.
2053 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2054 // exceeding 256MB in size.
2055 if (NumBytes >= (1 << 28))
2056 report_fatal_error("Stack size cannot exceed 256MB for stack "
2057 "unwinding purposes");
2058
2059 uint32_t LowNumWords = NumWords & 0xFFFF;
2060 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2061 .addImm(LowNumWords)
2064 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2066 if ((NumWords & 0xFFFF0000) != 0) {
2067 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2068 .addReg(AArch64::X15)
2069 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2072 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2074 }
2075 } else {
2076 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2077 .addImm(NumWords)
2079 }
2080
2081 const char *ChkStk = Subtarget.getChkStkName();
2082 switch (MF.getTarget().getCodeModel()) {
2083 case CodeModel::Tiny:
2084 case CodeModel::Small:
2085 case CodeModel::Medium:
2086 case CodeModel::Kernel:
2087 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2088 .addExternalSymbol(ChkStk)
2089 .addReg(AArch64::X15, RegState::Implicit)
2094 if (NeedsWinCFI) {
2095 HasWinCFI = true;
2096 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2098 }
2099 break;
2100 case CodeModel::Large:
2101 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2102 .addReg(AArch64::X16, RegState::Define)
2103 .addExternalSymbol(ChkStk)
2104 .addExternalSymbol(ChkStk)
2106 if (NeedsWinCFI) {
2107 HasWinCFI = true;
2108 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2110 }
2111
2112 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2113 .addReg(AArch64::X16, RegState::Kill)
2119 if (NeedsWinCFI) {
2120 HasWinCFI = true;
2121 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2123 }
2124 break;
2125 }
2126
2127 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2128 .addReg(AArch64::SP, RegState::Kill)
2129 .addReg(AArch64::X15, RegState::Kill)
2132 if (NeedsWinCFI) {
2133 HasWinCFI = true;
2134 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2135 .addImm(NumBytes)
2137 }
2138 NumBytes = 0;
2139
2140 if (RealignmentPadding > 0) {
2141 if (RealignmentPadding >= 4096) {
2142 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2143 .addReg(AArch64::X16, RegState::Define)
2144 .addImm(RealignmentPadding)
2146 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2147 .addReg(AArch64::SP)
2148 .addReg(AArch64::X16, RegState::Kill)
2151 } else {
2152 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2153 .addReg(AArch64::SP)
2154 .addImm(RealignmentPadding)
2155 .addImm(0)
2157 }
2158
2159 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2160 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2161 .addReg(AArch64::X15, RegState::Kill)
2163 AFI->setStackRealigned(true);
2164
2165 // No need for SEH instructions here; if we're realigning the stack,
2166 // we've set a frame pointer and already finished the SEH prologue.
2167 assert(!NeedsWinCFI);
2168 }
2169 }
2170
2171 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2172 MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
2173
2174 // Process the SVE callee-saves to determine what space needs to be
2175 // allocated.
2176 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2177 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2178 << "\n");
2179 // Find callee save instructions in frame.
2180 CalleeSavesBegin = MBBI;
2181 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2183 ++MBBI;
2184 CalleeSavesEnd = MBBI;
2185
2186 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2187 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2188 }
2189
2190 // Allocate space for the callee saves (if any).
2191 StackOffset CFAOffset =
2192 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2193 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2194 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2195 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2196 MFI.hasVarSizedObjects() || LocalsSize);
2197 CFAOffset += SVECalleeSavesSize;
2198
2199 if (EmitAsyncCFI)
2200 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2201
2202 // Allocate space for the rest of the frame including SVE locals. Align the
2203 // stack as necessary.
2204 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2205 "Cannot use redzone with stack realignment");
2206 if (!canUseRedZone(MF)) {
2207 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2208 // the correct value here, as NumBytes also includes padding bytes,
2209 // which shouldn't be counted here.
2210 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2211 SVELocalsSize + StackOffset::getFixed(NumBytes),
2212 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2213 CFAOffset, MFI.hasVarSizedObjects());
2214 }
2215
2216 // If we need a base pointer, set it up here. It's whatever the value of the
2217 // stack pointer is at this point. Any variable size objects will be allocated
2218 // after this, so we can still use the base pointer to reference locals.
2219 //
2220 // FIXME: Clarify FrameSetup flags here.
2221 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2222 // needed.
2223 // For funclets the BP belongs to the containing function.
2224 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2225 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2226 false);
2227 if (NeedsWinCFI) {
2228 HasWinCFI = true;
2229 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2231 }
2232 }
2233
2234 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2235 // SEH opcode indicating the prologue end.
2236 if (NeedsWinCFI && HasWinCFI) {
2237 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2239 }
2240
2241 // SEH funclets are passed the frame pointer in X1. If the parent
2242 // function uses the base register, then the base register is used
2243 // directly, and is not retrieved from X1.
2244 if (IsFunclet && F.hasPersonalityFn()) {
2245 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2246 if (isAsynchronousEHPersonality(Per)) {
2247 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2248 .addReg(AArch64::X1)
2250 MBB.addLiveIn(AArch64::X1);
2251 }
2252 }
2253
2254 if (EmitCFI && !EmitAsyncCFI) {
2255 if (HasFP) {
2256 emitDefineCFAWithFP(MF, MBB, MBBI, DL, FixedObject);
2257 } else {
2258 StackOffset TotalSize =
2259 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2260 unsigned CFIIndex = MF.addFrameInst(createDefCFA(
2261 *RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP, TotalSize,
2262 /*LastAdjustmentWasScalable=*/false));
2263 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2264 .addCFIIndex(CFIIndex)
2266 }
2267 emitCalleeSavedGPRLocations(MBB, MBBI);
2268 emitCalleeSavedSVELocations(MBB, MBBI);
2269 }
2270}
2271
2273 switch (MI.getOpcode()) {
2274 default:
2275 return false;
2276 case AArch64::CATCHRET:
2277 case AArch64::CLEANUPRET:
2278 return true;
2279 }
2280}
2281
2283 MachineBasicBlock &MBB) const {
2285 MachineFrameInfo &MFI = MF.getFrameInfo();
2287 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2288 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2289 DebugLoc DL;
2290 bool NeedsWinCFI = needsWinCFI(MF);
2291 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2292 bool HasWinCFI = false;
2293 bool IsFunclet = false;
2294
2295 if (MBB.end() != MBBI) {
2296 DL = MBBI->getDebugLoc();
2297 IsFunclet = isFuncletReturnInstr(*MBBI);
2298 }
2299
2300 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2301
2302 auto FinishingTouches = make_scope_exit([&]() {
2303 if (AFI->shouldSignReturnAddress(MF)) {
2304 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2305 TII->get(AArch64::PAUTH_EPILOGUE))
2306 .setMIFlag(MachineInstr::FrameDestroy);
2307 if (NeedsWinCFI)
2308 HasWinCFI = true; // AArch64PointerAuth pass will insert SEH_PACSignLR
2309 }
2312 if (EmitCFI)
2313 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2314 if (HasWinCFI) {
2316 TII->get(AArch64::SEH_EpilogEnd))
2318 if (!MF.hasWinCFI())
2319 MF.setHasWinCFI(true);
2320 }
2321 if (NeedsWinCFI) {
2322 assert(EpilogStartI != MBB.end());
2323 if (!HasWinCFI)
2324 MBB.erase(EpilogStartI);
2325 }
2326 });
2327
2328 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2329 : MFI.getStackSize();
2330
2331 // All calls are tail calls in GHC calling conv, and functions have no
2332 // prologue/epilogue.
2334 return;
2335
2336 // How much of the stack used by incoming arguments this function is expected
2337 // to restore in this particular epilogue.
2338 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2339 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2340 MF.getFunction().isVarArg());
2341 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2342
2343 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2344 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2345 // We cannot rely on the local stack size set in emitPrologue if the function
2346 // has funclets, as funclets have different local stack size requirements, and
2347 // the current value set in emitPrologue may be that of the containing
2348 // function.
2349 if (MF.hasEHFunclets())
2350 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2351 if (homogeneousPrologEpilog(MF, &MBB)) {
2352 assert(!NeedsWinCFI);
2353 auto LastPopI = MBB.getFirstTerminator();
2354 if (LastPopI != MBB.begin()) {
2355 auto HomogeneousEpilog = std::prev(LastPopI);
2356 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2357 LastPopI = HomogeneousEpilog;
2358 }
2359
2360 // Adjust local stack
2361 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2363 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2364
2365 // SP has been already adjusted while restoring callee save regs.
2366 // We've bailed-out the case with adjusting SP for arguments.
2367 assert(AfterCSRPopSize == 0);
2368 return;
2369 }
2370 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2371 // Assume we can't combine the last pop with the sp restore.
2372 bool CombineAfterCSRBump = false;
2373 if (!CombineSPBump && PrologueSaveSize != 0) {
2375 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2377 Pop = std::prev(Pop);
2378 // Converting the last ldp to a post-index ldp is valid only if the last
2379 // ldp's offset is 0.
2380 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2381 // If the offset is 0 and the AfterCSR pop is not actually trying to
2382 // allocate more stack for arguments (in space that an untimely interrupt
2383 // may clobber), convert it to a post-index ldp.
2384 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2386 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2387 MachineInstr::FrameDestroy, PrologueSaveSize);
2388 } else {
2389 // If not, make sure to emit an add after the last ldp.
2390 // We're doing this by transfering the size to be restored from the
2391 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2392 // pops.
2393 AfterCSRPopSize += PrologueSaveSize;
2394 CombineAfterCSRBump = true;
2395 }
2396 }
2397
2398 // Move past the restores of the callee-saved registers.
2399 // If we plan on combining the sp bump of the local stack size and the callee
2400 // save stack size, we might need to adjust the CSR save and restore offsets.
2403 while (LastPopI != Begin) {
2404 --LastPopI;
2405 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2406 IsSVECalleeSave(LastPopI)) {
2407 ++LastPopI;
2408 break;
2409 } else if (CombineSPBump)
2411 NeedsWinCFI, &HasWinCFI);
2412 }
2413
2414 if (NeedsWinCFI) {
2415 // Note that there are cases where we insert SEH opcodes in the
2416 // epilogue when we had no SEH opcodes in the prologue. For
2417 // example, when there is no stack frame but there are stack
2418 // arguments. Insert the SEH_EpilogStart and remove it later if it
2419 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2420 // functions that don't need it.
2421 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2423 EpilogStartI = LastPopI;
2424 --EpilogStartI;
2425 }
2426
2427 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2430 // Avoid the reload as it is GOT relative, and instead fall back to the
2431 // hardcoded value below. This allows a mismatch between the OS and
2432 // application without immediately terminating on the difference.
2433 [[fallthrough]];
2435 // We need to reset FP to its untagged state on return. Bit 60 is
2436 // currently used to show the presence of an extended frame.
2437
2438 // BIC x29, x29, #0x1000_0000_0000_0000
2439 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2440 AArch64::FP)
2441 .addUse(AArch64::FP)
2442 .addImm(0x10fe)
2444 if (NeedsWinCFI) {
2445 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2447 HasWinCFI = true;
2448 }
2449 break;
2450
2452 break;
2453 }
2454 }
2455
2456 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2457
2458 // If there is a single SP update, insert it before the ret and we're done.
2459 if (CombineSPBump) {
2460 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2461
2462 // When we are about to restore the CSRs, the CFA register is SP again.
2463 if (EmitCFI && hasFP(MF)) {
2464 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2465 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2466 unsigned CFIIndex =
2467 MF.addFrameInst(MCCFIInstruction::cfiDefCfa(nullptr, Reg, NumBytes));
2468 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2469 .addCFIIndex(CFIIndex)
2471 }
2472
2473 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2474 StackOffset::getFixed(NumBytes + (int64_t)AfterCSRPopSize),
2475 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI,
2476 &HasWinCFI, EmitCFI, StackOffset::getFixed(NumBytes));
2477 return;
2478 }
2479
2480 NumBytes -= PrologueSaveSize;
2481 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2482
2483 // Process the SVE callee-saves to determine what space needs to be
2484 // deallocated.
2485 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2486 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2487 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2488 RestoreBegin = std::prev(RestoreEnd);
2489 while (RestoreBegin != MBB.begin() &&
2490 IsSVECalleeSave(std::prev(RestoreBegin)))
2491 --RestoreBegin;
2492
2493 assert(IsSVECalleeSave(RestoreBegin) &&
2494 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2495
2496 StackOffset CalleeSavedSizeAsOffset =
2497 StackOffset::getScalable(CalleeSavedSize);
2498 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2499 DeallocateAfter = CalleeSavedSizeAsOffset;
2500 }
2501
2502 // Deallocate the SVE area.
2503 if (SVEStackSize) {
2504 // If we have stack realignment or variable sized objects on the stack,
2505 // restore the stack pointer from the frame pointer prior to SVE CSR
2506 // restoration.
2507 if (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) {
2508 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2509 // Set SP to start of SVE callee-save area from which they can
2510 // be reloaded. The code below will deallocate the stack space
2511 // space by moving FP -> SP.
2512 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::FP,
2513 StackOffset::getScalable(-CalleeSavedSize), TII,
2515 }
2516 } else {
2517 if (AFI->getSVECalleeSavedStackSize()) {
2518 // Deallocate the non-SVE locals first before we can deallocate (and
2519 // restore callee saves) from the SVE area.
2521 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2523 false, false, nullptr, EmitCFI && !hasFP(MF),
2524 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2525 NumBytes = 0;
2526 }
2527
2528 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2529 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2530 false, nullptr, EmitCFI && !hasFP(MF),
2531 SVEStackSize +
2532 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2533
2534 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2535 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2536 false, nullptr, EmitCFI && !hasFP(MF),
2537 DeallocateAfter +
2538 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2539 }
2540 if (EmitCFI)
2541 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2542 }
2543
2544 if (!hasFP(MF)) {
2545 bool RedZone = canUseRedZone(MF);
2546 // If this was a redzone leaf function, we don't need to restore the
2547 // stack pointer (but we may need to pop stack args for fastcc).
2548 if (RedZone && AfterCSRPopSize == 0)
2549 return;
2550
2551 // Pop the local variables off the stack. If there are no callee-saved
2552 // registers, it means we are actually positioned at the terminator and can
2553 // combine stack increment for the locals and the stack increment for
2554 // callee-popped arguments into (possibly) a single instruction and be done.
2555 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2556 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2557 if (NoCalleeSaveRestore)
2558 StackRestoreBytes += AfterCSRPopSize;
2559
2561 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2562 StackOffset::getFixed(StackRestoreBytes), TII,
2563 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2564 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2565
2566 // If we were able to combine the local stack pop with the argument pop,
2567 // then we're done.
2568 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2569 return;
2570 }
2571
2572 NumBytes = 0;
2573 }
2574
2575 // Restore the original stack pointer.
2576 // FIXME: Rather than doing the math here, we should instead just use
2577 // non-post-indexed loads for the restores if we aren't actually going to
2578 // be able to save any instructions.
2579 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2581 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2583 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2584 } else if (NumBytes)
2585 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2586 StackOffset::getFixed(NumBytes), TII,
2587 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2588
2589 // When we are about to restore the CSRs, the CFA register is SP again.
2590 if (EmitCFI && hasFP(MF)) {
2591 const AArch64RegisterInfo &RegInfo = *Subtarget.getRegisterInfo();
2592 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
2593 unsigned CFIIndex = MF.addFrameInst(
2594 MCCFIInstruction::cfiDefCfa(nullptr, Reg, PrologueSaveSize));
2595 BuildMI(MBB, LastPopI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
2596 .addCFIIndex(CFIIndex)
2598 }
2599
2600 // This must be placed after the callee-save restore code because that code
2601 // assumes the SP is at the same location as it was after the callee-save save
2602 // code in the prologue.
2603 if (AfterCSRPopSize) {
2604 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2605 "interrupt may have clobbered");
2606
2608 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2610 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2611 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2612 }
2613}
2614
2617 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
2618}
2619
2621 return enableCFIFixup(MF) &&
2622 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2623}
2624
2625/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2626/// debug info. It's the same as what we use for resolving the code-gen
2627/// references for now. FIXME: This can go wrong when references are
2628/// SP-relative and simple call frames aren't used.
2631 Register &FrameReg) const {
2633 MF, FI, FrameReg,
2634 /*PreferFP=*/
2635 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2636 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2637 /*ForSimm=*/false);
2638}
2639
2642 int FI) const {
2643 // This function serves to provide a comparable offset from a single reference
2644 // point (the value of SP at function entry) that can be used for analysis,
2645 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2646 // correct for all objects in the presence of VLA-area objects or dynamic
2647 // stack re-alignment.
2648
2649 const auto &MFI = MF.getFrameInfo();
2650
2651 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2652 StackOffset SVEStackSize = getSVEStackSize(MF);
2653
2654 // For VLA-area objects, just emit an offset at the end of the stack frame.
2655 // Whilst not quite correct, these objects do live at the end of the frame and
2656 // so it is more useful for analysis for the offset to reflect this.
2657 if (MFI.isVariableSizedObjectIndex(FI)) {
2658 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2659 }
2660
2661 // This is correct in the absence of any SVE stack objects.
2662 if (!SVEStackSize)
2663 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2664
2665 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2666 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2667 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2668 ObjectOffset);
2669 }
2670
2671 bool IsFixed = MFI.isFixedObjectIndex(FI);
2672 bool IsCSR =
2673 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2674
2675 StackOffset ScalableOffset = {};
2676 if (!IsFixed && !IsCSR)
2677 ScalableOffset = -SVEStackSize;
2678
2679 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2680}
2681
2684 int FI) const {
2686}
2687
2689 int64_t ObjectOffset) {
2690 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2691 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2692 const Function &F = MF.getFunction();
2693 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2694 unsigned FixedObject =
2695 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2696 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2697 int64_t FPAdjust =
2698 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2699 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2700}
2701
2703 int64_t ObjectOffset) {
2704 const auto &MFI = MF.getFrameInfo();
2705 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2706}
2707
2708// TODO: This function currently does not work for scalable vectors.
2710 int FI) const {
2711 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2713 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2714 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2715 ? getFPOffset(MF, ObjectOffset).getFixed()
2716 : getStackOffset(MF, ObjectOffset).getFixed();
2717}
2718
2720 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2721 bool ForSimm) const {
2722 const auto &MFI = MF.getFrameInfo();
2723 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2724 bool isFixed = MFI.isFixedObjectIndex(FI);
2725 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2726 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2727 PreferFP, ForSimm);
2728}
2729
2731 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2732 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2733 const auto &MFI = MF.getFrameInfo();
2734 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2736 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2737 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2738
2739 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2740 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2741 bool isCSR =
2742 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2743
2744 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2745
2746 // Use frame pointer to reference fixed objects. Use it for locals if
2747 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2748 // reliable as a base). Make sure useFPForScavengingIndex() does the
2749 // right thing for the emergency spill slot.
2750 bool UseFP = false;
2751 if (AFI->hasStackFrame() && !isSVE) {
2752 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2753 // there are scalable (SVE) objects in between the FP and the fixed-sized
2754 // objects.
2755 PreferFP &= !SVEStackSize;
2756
2757 // Note: Keeping the following as multiple 'if' statements rather than
2758 // merging to a single expression for readability.
2759 //
2760 // Argument access should always use the FP.
2761 if (isFixed) {
2762 UseFP = hasFP(MF);
2763 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2764 // References to the CSR area must use FP if we're re-aligning the stack
2765 // since the dynamically-sized alignment padding is between the SP/BP and
2766 // the CSR area.
2767 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2768 UseFP = true;
2769 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2770 // If the FPOffset is negative and we're producing a signed immediate, we
2771 // have to keep in mind that the available offset range for negative
2772 // offsets is smaller than for positive ones. If an offset is available
2773 // via the FP and the SP, use whichever is closest.
2774 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2775 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2776
2777 if (FPOffset >= 0) {
2778 // If the FPOffset is positive, that'll always be best, as the SP/BP
2779 // will be even further away.
2780 UseFP = true;
2781 } else if (MFI.hasVarSizedObjects()) {
2782 // If we have variable sized objects, we can use either FP or BP, as the
2783 // SP offset is unknown. We can use the base pointer if we have one and
2784 // FP is not preferred. If not, we're stuck with using FP.
2785 bool CanUseBP = RegInfo->hasBasePointer(MF);
2786 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2787 UseFP = PreferFP;
2788 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2789 UseFP = true;
2790 // else we can use BP and FP, but the offset from FP won't fit.
2791 // That will make us scavenge registers which we can probably avoid by
2792 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2793 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2794 // Funclets access the locals contained in the parent's stack frame
2795 // via the frame pointer, so we have to use the FP in the parent
2796 // function.
2797 (void) Subtarget;
2798 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2799 MF.getFunction().isVarArg()) &&
2800 "Funclets should only be present on Win64");
2801 UseFP = true;
2802 } else {
2803 // We have the choice between FP and (SP or BP).
2804 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2805 UseFP = true;
2806 }
2807 }
2808 }
2809
2810 assert(
2811 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2812 "In the presence of dynamic stack pointer realignment, "
2813 "non-argument/CSR objects cannot be accessed through the frame pointer");
2814
2815 if (isSVE) {
2816 StackOffset FPOffset =
2818 StackOffset SPOffset =
2819 SVEStackSize +
2820 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2821 ObjectOffset);
2822 // Always use the FP for SVE spills if available and beneficial.
2823 if (hasFP(MF) && (SPOffset.getFixed() ||
2824 FPOffset.getScalable() < SPOffset.getScalable() ||
2825 RegInfo->hasStackRealignment(MF))) {
2826 FrameReg = RegInfo->getFrameRegister(MF);
2827 return FPOffset;
2828 }
2829
2830 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2831 : (unsigned)AArch64::SP;
2832 return SPOffset;
2833 }
2834
2835 StackOffset ScalableOffset = {};
2836 if (UseFP && !(isFixed || isCSR))
2837 ScalableOffset = -SVEStackSize;
2838 if (!UseFP && (isFixed || isCSR))
2839 ScalableOffset = SVEStackSize;
2840
2841 if (UseFP) {
2842 FrameReg = RegInfo->getFrameRegister(MF);
2843 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2844 }
2845
2846 // Use the base pointer if we have one.
2847 if (RegInfo->hasBasePointer(MF))
2848 FrameReg = RegInfo->getBaseRegister();
2849 else {
2850 assert(!MFI.hasVarSizedObjects() &&
2851 "Can't use SP when we have var sized objects.");
2852 FrameReg = AArch64::SP;
2853 // If we're using the red zone for this function, the SP won't actually
2854 // be adjusted, so the offsets will be negative. They're also all
2855 // within range of the signed 9-bit immediate instructions.
2856 if (canUseRedZone(MF))
2857 Offset -= AFI->getLocalStackSize();
2858 }
2859
2860 return StackOffset::getFixed(Offset) + ScalableOffset;
2861}
2862
2863static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2864 // Do not set a kill flag on values that are also marked as live-in. This
2865 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2866 // callee saved registers.
2867 // Omitting the kill flags is conservatively correct even if the live-in
2868 // is not used after all.
2869 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2870 return getKillRegState(!IsLiveIn);
2871}
2872
2874 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2877 return Subtarget.isTargetMachO() &&
2878 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2879 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2881 !requiresSaveVG(MF) && AFI->getSVECalleeSavedStackSize() == 0;
2882}
2883
2884static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2885 bool NeedsWinCFI, bool IsFirst,
2886 const TargetRegisterInfo *TRI) {
2887 // If we are generating register pairs for a Windows function that requires
2888 // EH support, then pair consecutive registers only. There are no unwind
2889 // opcodes for saves/restores of non-consectuve register pairs.
2890 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2891 // save_lrpair.
2892 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2893
2894 if (Reg2 == AArch64::FP)
2895 return true;
2896 if (!NeedsWinCFI)
2897 return false;
2898 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2899 return false;
2900 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2901 // opcode. If this is the first register pair, it would end up with a
2902 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2903 // if LR is paired with something else than the first register.
2904 // The save_lrpair opcode requires the first register to be an odd one.
2905 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2906 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2907 return false;
2908 return true;
2909}
2910
2911/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2912/// WindowsCFI requires that only consecutive registers can be paired.
2913/// LR and FP need to be allocated together when the frame needs to save
2914/// the frame-record. This means any other register pairing with LR is invalid.
2915static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2916 bool UsesWinAAPCS, bool NeedsWinCFI,
2917 bool NeedsFrameRecord, bool IsFirst,
2918 const TargetRegisterInfo *TRI) {
2919 if (UsesWinAAPCS)
2920 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2921 TRI);
2922
2923 // If we need to store the frame record, don't pair any register
2924 // with LR other than FP.
2925 if (NeedsFrameRecord)
2926 return Reg2 == AArch64::LR;
2927
2928 return false;
2929}
2930
2931namespace {
2932
2933struct RegPairInfo {
2934 unsigned Reg1 = AArch64::NoRegister;
2935 unsigned Reg2 = AArch64::NoRegister;
2936 int FrameIdx;
2937 int Offset;
2938 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2939 const TargetRegisterClass *RC;
2940
2941 RegPairInfo() = default;
2942
2943 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2944
2945 bool isScalable() const { return Type == PPR || Type == ZPR; }
2946};
2947
2948} // end anonymous namespace
2949
2950unsigned findFreePredicateReg(BitVector &SavedRegs) {
2951 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2952 if (SavedRegs.test(PReg)) {
2953 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2954 return PNReg;
2955 }
2956 }
2957 return AArch64::NoRegister;
2958}
2959
2960// The multivector LD/ST are available only for SME or SVE2p1 targets
2962 MachineFunction &MF) {
2964 return false;
2965
2966 SMEAttrs FuncAttrs(MF.getFunction());
2967 bool IsLocallyStreaming =
2968 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
2969
2970 // Only when in streaming mode SME2 instructions can be safely used.
2971 // It is not safe to use SME2 instructions when in streaming compatible or
2972 // locally streaming mode.
2973 return Subtarget.hasSVE2p1() ||
2974 (Subtarget.hasSME2() &&
2975 (!IsLocallyStreaming && Subtarget.isStreaming()));
2976}
2977
2981 bool NeedsFrameRecord) {
2982
2983 if (CSI.empty())
2984 return;
2985
2986 bool IsWindows = isTargetWindows(MF);
2987 bool NeedsWinCFI = needsWinCFI(MF);
2989 unsigned StackHazardSize = getStackHazardSize(MF);
2990 MachineFrameInfo &MFI = MF.getFrameInfo();
2992 unsigned Count = CSI.size();
2993 (void)CC;
2994 // MachO's compact unwind format relies on all registers being stored in
2995 // pairs.
2998 CC == CallingConv::Win64 || (Count & 1) == 0) &&
2999 "Odd number of callee-saved regs to spill!");
3000 int ByteOffset = AFI->getCalleeSavedStackSize();
3001 int StackFillDir = -1;
3002 int RegInc = 1;
3003 unsigned FirstReg = 0;
3004 if (NeedsWinCFI) {
3005 // For WinCFI, fill the stack from the bottom up.
3006 ByteOffset = 0;
3007 StackFillDir = 1;
3008 // As the CSI array is reversed to match PrologEpilogInserter, iterate
3009 // backwards, to pair up registers starting from lower numbered registers.
3010 RegInc = -1;
3011 FirstReg = Count - 1;
3012 }
3013 int ScalableByteOffset = AFI->getSVECalleeSavedStackSize();
3014 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
3015 Register LastReg = 0;
3016
3017 // When iterating backwards, the loop condition relies on unsigned wraparound.
3018 for (unsigned i = FirstReg; i < Count; i += RegInc) {
3019 RegPairInfo RPI;
3020 RPI.Reg1 = CSI[i].getReg();
3021
3022 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
3023 RPI.Type = RegPairInfo::GPR;
3024 RPI.RC = &AArch64::GPR64RegClass;
3025 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
3026 RPI.Type = RegPairInfo::FPR64;
3027 RPI.RC = &AArch64::FPR64RegClass;
3028 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
3029 RPI.Type = RegPairInfo::FPR128;
3030 RPI.RC = &AArch64::FPR128RegClass;
3031 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
3032 RPI.Type = RegPairInfo::ZPR;
3033 RPI.RC = &AArch64::ZPRRegClass;
3034 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
3035 RPI.Type = RegPairInfo::PPR;
3036 RPI.RC = &AArch64::PPRRegClass;
3037 } else if (RPI.Reg1 == AArch64::VG) {
3038 RPI.Type = RegPairInfo::VG;
3039 RPI.RC = &AArch64::FIXED_REGSRegClass;
3040 } else {
3041 llvm_unreachable("Unsupported register class.");
3042 }
3043
3044 // Add the stack hazard size as we transition from GPR->FPR CSRs.
3045 if (AFI->hasStackHazardSlotIndex() &&
3046 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3048 ByteOffset += StackFillDir * StackHazardSize;
3049 LastReg = RPI.Reg1;
3050
3051 int Scale = TRI->getSpillSize(*RPI.RC);
3052 // Add the next reg to the pair if it is in the same register class.
3053 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
3054 Register NextReg = CSI[i + RegInc].getReg();
3055 bool IsFirst = i == FirstReg;
3056 switch (RPI.Type) {
3057 case RegPairInfo::GPR:
3058 if (AArch64::GPR64RegClass.contains(NextReg) &&
3059 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
3060 NeedsWinCFI, NeedsFrameRecord, IsFirst,
3061 TRI))
3062 RPI.Reg2 = NextReg;
3063 break;
3064 case RegPairInfo::FPR64:
3065 if (AArch64::FPR64RegClass.contains(NextReg) &&
3066 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
3067 IsFirst, TRI))
3068 RPI.Reg2 = NextReg;
3069 break;
3070 case RegPairInfo::FPR128:
3071 if (AArch64::FPR128RegClass.contains(NextReg))
3072 RPI.Reg2 = NextReg;
3073 break;
3074 case RegPairInfo::PPR:
3075 break;
3076 case RegPairInfo::ZPR:
3077 if (AFI->getPredicateRegForFillSpill() != 0 &&
3078 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
3079 // Calculate offset of register pair to see if pair instruction can be
3080 // used.
3081 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
3082 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
3083 RPI.Reg2 = NextReg;
3084 }
3085 break;
3086 case RegPairInfo::VG:
3087 break;
3088 }
3089 }
3090
3091 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
3092 // list to come in sorted by frame index so that we can issue the store
3093 // pair instructions directly. Assert if we see anything otherwise.
3094 //
3095 // The order of the registers in the list is controlled by
3096 // getCalleeSavedRegs(), so they will always be in-order, as well.
3097 assert((!RPI.isPaired() ||
3098 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3099 "Out of order callee saved regs!");
3100
3101 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3102 RPI.Reg1 == AArch64::LR) &&
3103 "FrameRecord must be allocated together with LR");
3104
3105 // Windows AAPCS has FP and LR reversed.
3106 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3107 RPI.Reg2 == AArch64::LR) &&
3108 "FrameRecord must be allocated together with LR");
3109
3110 // MachO's compact unwind format relies on all registers being stored in
3111 // adjacent register pairs.
3115 (RPI.isPaired() &&
3116 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3117 RPI.Reg1 + 1 == RPI.Reg2))) &&
3118 "Callee-save registers not saved as adjacent register pair!");
3119
3120 RPI.FrameIdx = CSI[i].getFrameIdx();
3121 if (NeedsWinCFI &&
3122 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3123 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3124
3125 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3126 assert(OffsetPre % Scale == 0);
3127
3128 if (RPI.isScalable())
3129 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3130 else
3131 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3132
3133 // Swift's async context is directly before FP, so allocate an extra
3134 // 8 bytes for it.
3135 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3136 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3137 (IsWindows && RPI.Reg2 == AArch64::LR)))
3138 ByteOffset += StackFillDir * 8;
3139
3140 // Round up size of non-pair to pair size if we need to pad the
3141 // callee-save area to ensure 16-byte alignment.
3142 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3143 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3144 ByteOffset % 16 != 0) {
3145 ByteOffset += 8 * StackFillDir;
3146 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3147 // A stack frame with a gap looks like this, bottom up:
3148 // d9, d8. x21, gap, x20, x19.
3149 // Set extra alignment on the x21 object to create the gap above it.
3150 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3151 NeedGapToAlignStack = false;
3152 }
3153
3154 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3155 assert(OffsetPost % Scale == 0);
3156 // If filling top down (default), we want the offset after incrementing it.
3157 // If filling bottom up (WinCFI) we need the original offset.
3158 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3159
3160 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3161 // Swift context can directly precede FP.
3162 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3163 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3164 (IsWindows && RPI.Reg2 == AArch64::LR)))
3165 Offset += 8;
3166 RPI.Offset = Offset / Scale;
3167
3168 assert((!RPI.isPaired() ||
3169 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3170 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3171 "Offset out of bounds for LDP/STP immediate");
3172
3173 auto isFrameRecord = [&] {
3174 if (RPI.isPaired())
3175 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
3176 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
3177 // Otherwise, look for the frame record as two unpaired registers. This is
3178 // needed for -aarch64-stack-hazard-size=<val>, which disables register
3179 // pairing (as the padding may be too large for the LDP/STP offset). Note:
3180 // On Windows, this check works out as current reg == FP, next reg == LR,
3181 // and on other platforms current reg == FP, previous reg == LR. This
3182 // works out as the correct pre-increment or post-increment offsets
3183 // respectively.
3184 return i > 0 && RPI.Reg1 == AArch64::FP &&
3185 CSI[i - 1].getReg() == AArch64::LR;
3186 };
3187
3188 // Save the offset to frame record so that the FP register can point to the
3189 // innermost frame record (spilled FP and LR registers).
3190 if (NeedsFrameRecord && isFrameRecord())
3192
3193 RegPairs.push_back(RPI);
3194 if (RPI.isPaired())
3195 i += RegInc;
3196 }
3197 if (NeedsWinCFI) {
3198 // If we need an alignment gap in the stack, align the topmost stack
3199 // object. A stack frame with a gap looks like this, bottom up:
3200 // x19, d8. d9, gap.
3201 // Set extra alignment on the topmost stack object (the first element in
3202 // CSI, which goes top down), to create the gap above it.
3203 if (AFI->hasCalleeSaveStackFreeSpace())
3204 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3205 // We iterated bottom up over the registers; flip RegPairs back to top
3206 // down order.
3207 std::reverse(RegPairs.begin(), RegPairs.end());
3208 }
3209}
3210
3214 MachineFunction &MF = *MBB.getParent();
3217 bool NeedsWinCFI = needsWinCFI(MF);
3218 DebugLoc DL;
3220
3221 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3222
3224 // Refresh the reserved regs in case there are any potential changes since the
3225 // last freeze.
3226 MRI.freezeReservedRegs();
3227
3228 if (homogeneousPrologEpilog(MF)) {
3229 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3231
3232 for (auto &RPI : RegPairs) {
3233 MIB.addReg(RPI.Reg1);
3234 MIB.addReg(RPI.Reg2);
3235
3236 // Update register live in.
3237 if (!MRI.isReserved(RPI.Reg1))
3238 MBB.addLiveIn(RPI.Reg1);
3239 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3240 MBB.addLiveIn(RPI.Reg2);
3241 }
3242 return true;
3243 }
3244 bool PTrueCreated = false;
3245 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3246 unsigned Reg1 = RPI.Reg1;
3247 unsigned Reg2 = RPI.Reg2;
3248 unsigned StrOpc;
3249
3250 // Issue sequence of spills for cs regs. The first spill may be converted
3251 // to a pre-decrement store later by emitPrologue if the callee-save stack
3252 // area allocation can't be combined with the local stack area allocation.
3253 // For example:
3254 // stp x22, x21, [sp, #0] // addImm(+0)
3255 // stp x20, x19, [sp, #16] // addImm(+2)
3256 // stp fp, lr, [sp, #32] // addImm(+4)
3257 // Rationale: This sequence saves uop updates compared to a sequence of
3258 // pre-increment spills like stp xi,xj,[sp,#-16]!
3259 // Note: Similar rationale and sequence for restores in epilog.
3260 unsigned Size = TRI->getSpillSize(*RPI.RC);
3261 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3262 switch (RPI.Type) {
3263 case RegPairInfo::GPR:
3264 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3265 break;
3266 case RegPairInfo::FPR64:
3267 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3268 break;
3269 case RegPairInfo::FPR128:
3270 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3271 break;
3272 case RegPairInfo::ZPR:
3273 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3274 break;
3275 case RegPairInfo::PPR:
3276 StrOpc =
3277 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
3278 break;
3279 case RegPairInfo::VG:
3280 StrOpc = AArch64::STRXui;
3281 break;
3282 }
3283
3284 unsigned X0Scratch = AArch64::NoRegister;
3285 if (Reg1 == AArch64::VG) {
3286 // Find an available register to store value of VG to.
3288 assert(Reg1 != AArch64::NoRegister);
3289 SMEAttrs Attrs(MF.getFunction());
3290
3291 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface() &&
3292 AFI->getStreamingVGIdx() == std::numeric_limits<int>::max()) {
3293 // For locally-streaming functions, we need to store both the streaming
3294 // & non-streaming VG. Spill the streaming value first.
3295 BuildMI(MBB, MI, DL, TII.get(AArch64::RDSVLI_XI), Reg1)
3296 .addImm(1)
3298 BuildMI(MBB, MI, DL, TII.get(AArch64::UBFMXri), Reg1)
3299 .addReg(Reg1)
3300 .addImm(3)
3301 .addImm(63)
3303
3304 AFI->setStreamingVGIdx(RPI.FrameIdx);
3305 } else if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3306 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3307 .addImm(31)
3308 .addImm(1)
3310 AFI->setVGIdx(RPI.FrameIdx);
3311 } else {
3313 if (llvm::any_of(
3314 MBB.liveins(),
3315 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3316 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3317 AArch64::X0, LiveIn.PhysReg);
3318 }))
3319 X0Scratch = Reg1;
3320
3321 if (X0Scratch != AArch64::NoRegister)
3322 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), Reg1)
3323 .addReg(AArch64::XZR)
3324 .addReg(AArch64::X0, RegState::Undef)
3325 .addReg(AArch64::X0, RegState::Implicit)
3327
3328 const uint32_t *RegMask = TRI->getCallPreservedMask(
3329 MF,
3331 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3332 .addExternalSymbol("__arm_get_current_vg")
3333 .addRegMask(RegMask)
3334 .addReg(AArch64::X0, RegState::ImplicitDefine)
3336 Reg1 = AArch64::X0;
3337 AFI->setVGIdx(RPI.FrameIdx);
3338 }
3339 }
3340
3341 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3342 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3343 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3344 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3345 dbgs() << ")\n");
3346
3347 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3348 "Windows unwdinding requires a consecutive (FP,LR) pair");
3349 // Windows unwind codes require consecutive registers if registers are
3350 // paired. Make the switch here, so that the code below will save (x,x+1)
3351 // and not (x+1,x).
3352 unsigned FrameIdxReg1 = RPI.FrameIdx;
3353 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3354 if (NeedsWinCFI && RPI.isPaired()) {
3355 std::swap(Reg1, Reg2);
3356 std::swap(FrameIdxReg1, FrameIdxReg2);
3357 }
3358
3359 if (RPI.isPaired() && RPI.isScalable()) {
3360 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3363 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3364 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3365 "Expects SVE2.1 or SME2 target and a predicate register");
3366#ifdef EXPENSIVE_CHECKS
3367 auto IsPPR = [](const RegPairInfo &c) {
3368 return c.Reg1 == RegPairInfo::PPR;
3369 };
3370 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3371 auto IsZPR = [](const RegPairInfo &c) {
3372 return c.Type == RegPairInfo::ZPR;
3373 };
3374 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3375 assert(!(PPRBegin < ZPRBegin) &&
3376 "Expected callee save predicate to be handled first");
3377#endif
3378 if (!PTrueCreated) {
3379 PTrueCreated = true;
3380 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3382 }
3383 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3384 if (!MRI.isReserved(Reg1))
3385 MBB.addLiveIn(Reg1);
3386 if (!MRI.isReserved(Reg2))
3387 MBB.addLiveIn(Reg2);
3388 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3390 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3391 MachineMemOperand::MOStore, Size, Alignment));
3392 MIB.addReg(PnReg);
3393 MIB.addReg(AArch64::SP)
3394 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
3395 // where 2*vscale is implicit
3398 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3399 MachineMemOperand::MOStore, Size, Alignment));
3400 if (NeedsWinCFI)
3402 } else { // The code when the pair of ZReg is not present
3403 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3404 if (!MRI.isReserved(Reg1))
3405 MBB.addLiveIn(Reg1);
3406 if (RPI.isPaired()) {
3407 if (!MRI.isReserved(Reg2))
3408 MBB.addLiveIn(Reg2);
3409 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3411 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3412 MachineMemOperand::MOStore, Size, Alignment));
3413 }
3414 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3415 .addReg(AArch64::SP)
3416 .addImm(RPI.Offset) // [sp, #offset*vscale],
3417 // where factor*vscale is implicit
3420 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3421 MachineMemOperand::MOStore, Size, Alignment));
3422 if (NeedsWinCFI)
3424 }
3425 // Update the StackIDs of the SVE stack slots.
3426 MachineFrameInfo &MFI = MF.getFrameInfo();
3427 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3428 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3429 if (RPI.isPaired())
3430 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3431 }
3432
3433 if (X0Scratch != AArch64::NoRegister)
3434 BuildMI(MBB, MI, DL, TII.get(AArch64::ORRXrr), AArch64::X0)
3435 .addReg(AArch64::XZR)
3436 .addReg(X0Scratch, RegState::Undef)
3437 .addReg(X0Scratch, RegState::Implicit)
3439 }
3440 return true;
3441}
3442
3446 MachineFunction &MF = *MBB.getParent();
3448 DebugLoc DL;
3450 bool NeedsWinCFI = needsWinCFI(MF);
3451
3452 if (MBBI != MBB.end())
3453 DL = MBBI->getDebugLoc();
3454
3455 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3456 if (homogeneousPrologEpilog(MF, &MBB)) {
3457 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3459 for (auto &RPI : RegPairs) {
3460 MIB.addReg(RPI.Reg1, RegState::Define);
3461 MIB.addReg(RPI.Reg2, RegState::Define);
3462 }
3463 return true;
3464 }
3465
3466 // For performance reasons restore SVE register in increasing order
3467 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3468 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3469 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3470 std::reverse(PPRBegin, PPREnd);
3471 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3472 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3473 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3474 std::reverse(ZPRBegin, ZPREnd);
3475
3476 bool PTrueCreated = false;
3477 for (const RegPairInfo &RPI : RegPairs) {
3478 unsigned Reg1 = RPI.Reg1;
3479 unsigned Reg2 = RPI.Reg2;
3480
3481 // Issue sequence of restores for cs regs. The last restore may be converted
3482 // to a post-increment load later by emitEpilogue if the callee-save stack
3483 // area allocation can't be combined with the local stack area allocation.
3484 // For example:
3485 // ldp fp, lr, [sp, #32] // addImm(+4)
3486 // ldp x20, x19, [sp, #16] // addImm(+2)
3487 // ldp x22, x21, [sp, #0] // addImm(+0)
3488 // Note: see comment in spillCalleeSavedRegisters()
3489 unsigned LdrOpc;
3490 unsigned Size = TRI->getSpillSize(*RPI.RC);
3491 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3492 switch (RPI.Type) {
3493 case RegPairInfo::GPR:
3494 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3495 break;
3496 case RegPairInfo::FPR64:
3497 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3498 break;
3499 case RegPairInfo::FPR128:
3500 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3501 break;
3502 case RegPairInfo::ZPR:
3503 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3504 break;
3505 case RegPairInfo::PPR:
3506 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
3507 : AArch64::LDR_PXI;
3508 break;
3509 case RegPairInfo::VG:
3510 continue;
3511 }
3512 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3513 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3514 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3515 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3516 dbgs() << ")\n");
3517
3518 // Windows unwind codes require consecutive registers if registers are
3519 // paired. Make the switch here, so that the code below will save (x,x+1)
3520 // and not (x+1,x).
3521 unsigned FrameIdxReg1 = RPI.FrameIdx;
3522 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3523 if (NeedsWinCFI && RPI.isPaired()) {
3524 std::swap(Reg1, Reg2);
3525 std::swap(FrameIdxReg1, FrameIdxReg2);
3526 }
3527
3529 if (RPI.isPaired() && RPI.isScalable()) {
3530 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3532 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3533 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3534 "Expects SVE2.1 or SME2 target and a predicate register");
3535#ifdef EXPENSIVE_CHECKS
3536 assert(!(PPRBegin < ZPRBegin) &&
3537 "Expected callee save predicate to be handled first");
3538#endif
3539 if (!PTrueCreated) {
3540 PTrueCreated = true;
3541 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3543 }
3544 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3545 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3546 getDefRegState(true));
3548 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3549 MachineMemOperand::MOLoad, Size, Alignment));
3550 MIB.addReg(PnReg);
3551 MIB.addReg(AArch64::SP)
3552 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
3553 // where 2*vscale is implicit
3556 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3557 MachineMemOperand::MOLoad, Size, Alignment));
3558 if (NeedsWinCFI)
3560 } else {
3561 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3562 if (RPI.isPaired()) {
3563 MIB.addReg(Reg2, getDefRegState(true));
3565 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3566 MachineMemOperand::MOLoad, Size, Alignment));
3567 }
3568 MIB.addReg(Reg1, getDefRegState(true));
3569 MIB.addReg(AArch64::SP)
3570 .addImm(RPI.Offset) // [sp, #offset*vscale]
3571 // where factor*vscale is implicit
3574 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3575 MachineMemOperand::MOLoad, Size, Alignment));
3576 if (NeedsWinCFI)
3578 }
3579 }
3580 return true;
3581}
3582
3583// Return the FrameID for a MMO.
3584static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3585 const MachineFrameInfo &MFI) {
3586 auto *PSV =
3587 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3588 if (PSV)
3589 return std::optional<int>(PSV->getFrameIndex());
3590
3591 if (MMO->getValue()) {
3592 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3593 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3594 FI++)
3595 if (MFI.getObjectAllocation(FI) == Al)
3596 return FI;
3597 }
3598 }
3599
3600 return std::nullopt;
3601}
3602
3603// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3604static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3605 const MachineFrameInfo &MFI) {
3606 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3607 return std::nullopt;
3608
3609 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3610}
3611
3612// Check if a Hazard slot is needed for the current function, and if so create
3613// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3614// which can be used to determine if any hazard padding is needed.
3615void AArch64FrameLowering::determineStackHazardSlot(
3616 MachineFunction &MF, BitVector &SavedRegs) const {
3617 unsigned StackHazardSize = getStackHazardSize(MF);
3618 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3620 return;
3621
3622 // Stack hazards are only needed in streaming functions.
3624 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3625 return;
3626
3627 MachineFrameInfo &MFI = MF.getFrameInfo();
3628
3629 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3630 // stack objects.
3631 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3632 return AArch64::FPR64RegClass.contains(Reg) ||
3633 AArch64::FPR128RegClass.contains(Reg) ||
3634 AArch64::ZPRRegClass.contains(Reg) ||
3635 AArch64::PPRRegClass.contains(Reg);
3636 });
3637 bool HasFPRStackObjects = false;
3638 if (!HasFPRCSRs) {
3639 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3640 for (auto &MBB : MF) {
3641 for (auto &MI : MBB) {
3642 std::optional<int> FI = getLdStFrameID(MI, MFI);
3643 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3644 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3646 FrameObjects[*FI] |= 2;
3647 else
3648 FrameObjects[*FI] |= 1;
3649 }
3650 }
3651 }
3652 HasFPRStackObjects =
3653 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3654 }
3655
3656 if (HasFPRCSRs || HasFPRStackObjects) {
3657 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3658 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3659 << StackHazardSize << "\n");
3660 MF.getInfo<AArch64FunctionInfo>()->setStackHazardSlotIndex(ID);
3661 }
3662}
3663
3665 BitVector &SavedRegs,
3666 RegScavenger *RS) const {
3667 // All calls are tail calls in GHC calling conv, and functions have no
3668 // prologue/epilogue.
3670 return;
3671
3673 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3675 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3677 unsigned UnspilledCSGPR = AArch64::NoRegister;
3678 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3679
3680 MachineFrameInfo &MFI = MF.getFrameInfo();
3681 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3682
3683 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3684 ? RegInfo->getBaseRegister()
3685 : (unsigned)AArch64::NoRegister;
3686
3687 unsigned ExtraCSSpill = 0;
3688 bool HasUnpairedGPR64 = false;
3689 bool HasPairZReg = false;
3690 // Figure out which callee-saved registers to save/restore.
3691 for (unsigned i = 0; CSRegs[i]; ++i) {
3692 const unsigned Reg = CSRegs[i];
3693
3694 // Add the base pointer register to SavedRegs if it is callee-save.
3695 if (Reg == BasePointerReg)
3696 SavedRegs.set(Reg);
3697
3698 bool RegUsed = SavedRegs.test(Reg);
3699 unsigned PairedReg = AArch64::NoRegister;
3700 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3701 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3702 AArch64::FPR128RegClass.contains(Reg)) {
3703 // Compensate for odd numbers of GP CSRs.
3704 // For now, all the known cases of odd number of CSRs are of GPRs.
3705 if (HasUnpairedGPR64)
3706 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3707 else
3708 PairedReg = CSRegs[i ^ 1];
3709 }
3710
3711 // If the function requires all the GP registers to save (SavedRegs),
3712 // and there are an odd number of GP CSRs at the same time (CSRegs),
3713 // PairedReg could be in a different register class from Reg, which would
3714 // lead to a FPR (usually D8) accidentally being marked saved.
3715 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3716 PairedReg = AArch64::NoRegister;
3717 HasUnpairedGPR64 = true;
3718 }
3719 assert(PairedReg == AArch64::NoRegister ||
3720 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3721 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3722 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3723
3724 if (!RegUsed) {
3725 if (AArch64::GPR64RegClass.contains(Reg) &&
3726 !RegInfo->isReservedReg(MF, Reg)) {
3727 UnspilledCSGPR = Reg;
3728 UnspilledCSGPRPaired = PairedReg;
3729 }
3730 continue;
3731 }
3732
3733 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
3734 // spilled. If all of p0-p3 are used as return values p4 is must be free
3735 // to reload p8-p15.
3736 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
3737 AArch64::PPR_p8to15RegClass.contains(Reg)) {
3738 SavedRegs.set(AArch64::P4);
3739 }
3740
3741 // MachO's compact unwind format relies on all registers being stored in
3742 // pairs.
3743 // FIXME: the usual format is actually better if unwinding isn't needed.
3744 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3745 !SavedRegs.test(PairedReg)) {
3746 SavedRegs.set(PairedReg);
3747 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3748 !RegInfo->isReservedReg(MF, PairedReg))
3749 ExtraCSSpill = PairedReg;
3750 }
3751 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3752 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3753 SavedRegs.test(CSRegs[i ^ 1]));
3754 }
3755
3756 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
3758 // Find a suitable predicate register for the multi-vector spill/fill
3759 // instructions.
3760 unsigned PnReg = findFreePredicateReg(SavedRegs);
3761 if (PnReg != AArch64::NoRegister)
3762 AFI->setPredicateRegForFillSpill(PnReg);
3763 // If no free callee-save has been found assign one.
3764 if (!AFI->getPredicateRegForFillSpill() &&
3765 MF.getFunction().getCallingConv() ==
3767 SavedRegs.set(AArch64::P8);
3768 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3769 }
3770
3771 assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
3772 "Predicate cannot be a reserved register");
3773 }
3774
3776 !Subtarget.isTargetWindows()) {
3777 // For Windows calling convention on a non-windows OS, where X18 is treated
3778 // as reserved, back up X18 when entering non-windows code (marked with the
3779 // Windows calling convention) and restore when returning regardless of
3780 // whether the individual function uses it - it might call other functions
3781 // that clobber it.
3782 SavedRegs.set(AArch64::X18);
3783 }
3784
3785 // Calculates the callee saved stack size.
3786 unsigned CSStackSize = 0;
3787 unsigned SVECSStackSize = 0;
3789 for (unsigned Reg : SavedRegs.set_bits()) {
3790 auto *RC = TRI->getMinimalPhysRegClass(Reg);
3791 assert(RC && "expected register class!");
3792 auto SpillSize = TRI->getSpillSize(*RC);
3793 if (AArch64::PPRRegClass.contains(Reg) ||
3794 AArch64::ZPRRegClass.contains(Reg))
3795 SVECSStackSize += SpillSize;
3796 else
3797 CSStackSize += SpillSize;
3798 }
3799
3800 // Increase the callee-saved stack size if the function has streaming mode
3801 // changes, as we will need to spill the value of the VG register.
3802 // For locally streaming functions, we spill both the streaming and
3803 // non-streaming VG value.
3804 const Function &F = MF.getFunction();
3805 SMEAttrs Attrs(F);
3806 if (requiresSaveVG(MF)) {
3807 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3808 CSStackSize += 16;
3809 else
3810 CSStackSize += 8;
3811 }
3812
3813 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3814 // StackHazardSize if so.
3815 determineStackHazardSlot(MF, SavedRegs);
3816 if (AFI->hasStackHazardSlotIndex())
3817 CSStackSize += getStackHazardSize(MF);
3818
3819 // Save number of saved regs, so we can easily update CSStackSize later.
3820 unsigned NumSavedRegs = SavedRegs.count();
3821
3822 // The frame record needs to be created by saving the appropriate registers
3823 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3824 if (hasFP(MF) ||
3825 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3826 SavedRegs.set(AArch64::FP);
3827 SavedRegs.set(AArch64::LR);
3828 }
3829
3830 LLVM_DEBUG({
3831 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3832 for (unsigned Reg : SavedRegs.set_bits())
3833 dbgs() << ' ' << printReg(Reg, RegInfo);
3834 dbgs() << "\n";
3835 });
3836
3837 // If any callee-saved registers are used, the frame cannot be eliminated.
3838 int64_t SVEStackSize =
3839 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3840 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3841
3842 // The CSR spill slots have not been allocated yet, so estimateStackSize
3843 // won't include them.
3844 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3845
3846 // We may address some of the stack above the canonical frame address, either
3847 // for our own arguments or during a call. Include that in calculating whether
3848 // we have complicated addressing concerns.
3849 int64_t CalleeStackUsed = 0;
3850 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3851 int64_t FixedOff = MFI.getObjectOffset(I);
3852 if (FixedOff > CalleeStackUsed)
3853 CalleeStackUsed = FixedOff;
3854 }
3855
3856 // Conservatively always assume BigStack when there are SVE spills.
3857 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3858 CalleeStackUsed) > EstimatedStackSizeLimit;
3859 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3860 AFI->setHasStackFrame(true);
3861
3862 // Estimate if we might need to scavenge a register at some point in order
3863 // to materialize a stack offset. If so, either spill one additional
3864 // callee-saved register or reserve a special spill slot to facilitate
3865 // register scavenging. If we already spilled an extra callee-saved register
3866 // above to keep the number of spills even, we don't need to do anything else
3867 // here.
3868 if (BigStack) {
3869 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3870 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3871 << " to get a scratch register.\n");
3872 SavedRegs.set(UnspilledCSGPR);
3873 ExtraCSSpill = UnspilledCSGPR;
3874
3875 // MachO's compact unwind format relies on all registers being stored in
3876 // pairs, so if we need to spill one extra for BigStack, then we need to
3877 // store the pair.
3878 if (producePairRegisters(MF)) {
3879 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3880 // Failed to make a pair for compact unwind format, revert spilling.
3881 if (produceCompactUnwindFrame(MF)) {
3882 SavedRegs.reset(UnspilledCSGPR);
3883 ExtraCSSpill = AArch64::NoRegister;
3884 }
3885 } else
3886 SavedRegs.set(UnspilledCSGPRPaired);
3887 }
3888 }
3889
3890 // If we didn't find an extra callee-saved register to spill, create
3891 // an emergency spill slot.
3892 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3894 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3895 unsigned Size = TRI->getSpillSize(RC);
3896 Align Alignment = TRI->getSpillAlign(RC);
3897 int FI = MFI.CreateSpillStackObject(Size, Alignment);
3899 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3900 << " as the emergency spill slot.\n");
3901 }
3902 }
3903
3904 // Adding the size of additional 64bit GPR saves.
3905 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3906
3907 // A Swift asynchronous context extends the frame record with a pointer
3908 // directly before FP.
3909 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3910 CSStackSize += 8;
3911
3912 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3913 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3914 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3915
3917 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3918 "Should not invalidate callee saved info");
3919
3920 // Round up to register pair alignment to avoid additional SP adjustment
3921 // instructions.
3922 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3923 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3924 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3925}
3926
3928 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3929 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3930 unsigned &MaxCSFrameIndex) const {
3931 bool NeedsWinCFI = needsWinCFI(MF);
3932 unsigned StackHazardSize = getStackHazardSize(MF);
3933 // To match the canonical windows frame layout, reverse the list of
3934 // callee saved registers to get them laid out by PrologEpilogInserter
3935 // in the right order. (PrologEpilogInserter allocates stack objects top
3936 // down. Windows canonical prologs store higher numbered registers at
3937 // the top, thus have the CSI array start from the highest registers.)
3938 if (NeedsWinCFI)
3939 std::reverse(CSI.begin(), CSI.end());
3940
3941 if (CSI.empty())
3942 return true; // Early exit if no callee saved registers are modified!
3943
3944 // Now that we know which registers need to be saved and restored, allocate
3945 // stack slots for them.
3946 MachineFrameInfo &MFI = MF.getFrameInfo();
3947 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3948
3949 bool UsesWinAAPCS = isTargetWindows(MF);
3950 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3951 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3952 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3953 if ((unsigned)FrameIdx < MinCSFrameIndex)
3954 MinCSFrameIndex = FrameIdx;
3955 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3956 MaxCSFrameIndex = FrameIdx;
3957 }
3958
3959 // Insert VG into the list of CSRs, immediately before LR if saved.
3960 if (requiresSaveVG(MF)) {
3961 std::vector<CalleeSavedInfo> VGSaves;
3962 SMEAttrs Attrs(MF.getFunction());
3963
3964 auto VGInfo = CalleeSavedInfo(AArch64::VG);
3965 VGInfo.setRestored(false);
3966 VGSaves.push_back(VGInfo);
3967
3968 // Add VG again if the function is locally-streaming, as we will spill two
3969 // values.
3970 if (Attrs.hasStreamingBody() && !Attrs.hasStreamingInterface())
3971 VGSaves.push_back(VGInfo);
3972
3973 bool InsertBeforeLR = false;
3974
3975 for (unsigned I = 0; I < CSI.size(); I++)
3976 if (CSI[I].getReg() == AArch64::LR) {
3977 InsertBeforeLR = true;
3978 CSI.insert(CSI.begin() + I, VGSaves.begin(), VGSaves.end());
3979 break;
3980 }
3981
3982 if (!InsertBeforeLR)
3983 CSI.insert(CSI.end(), VGSaves.begin(), VGSaves.end());
3984 }
3985
3986 Register LastReg = 0;
3987 int HazardSlotIndex = std::numeric_limits<int>::max();
3988 for (auto &CS : CSI) {
3989 Register Reg = CS.getReg();
3990 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3991
3992 // Create a hazard slot as we switch between GPR and FPR CSRs.
3993 if (AFI->hasStackHazardSlotIndex() &&
3994 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3996 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3997 "Unexpected register order for hazard slot");
3998 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3999 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4000 << "\n");
4001 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4002 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4003 MinCSFrameIndex = HazardSlotIndex;
4004 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4005 MaxCSFrameIndex = HazardSlotIndex;
4006 }
4007
4008 unsigned Size = RegInfo->getSpillSize(*RC);
4009 Align Alignment(RegInfo->getSpillAlign(*RC));
4010 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
4011 CS.setFrameIdx(FrameIdx);
4012
4013 if ((unsigned)FrameIdx < MinCSFrameIndex)
4014 MinCSFrameIndex = FrameIdx;
4015 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4016 MaxCSFrameIndex = FrameIdx;
4017
4018 // Grab 8 bytes below FP for the extended asynchronous frame info.
4019 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
4020 Reg == AArch64::FP) {
4021 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
4022 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
4023 if ((unsigned)FrameIdx < MinCSFrameIndex)
4024 MinCSFrameIndex = FrameIdx;
4025 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4026 MaxCSFrameIndex = FrameIdx;
4027 }
4028 LastReg = Reg;
4029 }
4030
4031 // Add hazard slot in the case where no FPR CSRs are present.
4032 if (AFI->hasStackHazardSlotIndex() &&
4033 HazardSlotIndex == std::numeric_limits<int>::max()) {
4034 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
4035 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4036 << "\n");
4037 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4038 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4039 MinCSFrameIndex = HazardSlotIndex;
4040 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4041 MaxCSFrameIndex = HazardSlotIndex;
4042 }
4043
4044 return true;
4045}
4046
4048 const MachineFunction &MF) const {
4050 // If the function has streaming-mode changes, don't scavenge a
4051 // spillslot in the callee-save area, as that might require an
4052 // 'addvl' in the streaming-mode-changing call-sequence when the
4053 // function doesn't use a FP.
4054 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
4055 return false;
4056 // Don't allow register salvaging with hazard slots, in case it moves objects
4057 // into the wrong place.
4058 if (AFI->hasStackHazardSlotIndex())
4059 return false;
4060 return AFI->hasCalleeSaveStackFreeSpace();
4061}
4062
4063/// returns true if there are any SVE callee saves.
4065 int &Min, int &Max) {
4066 Min = std::numeric_limits<int>::max();
4067 Max = std::numeric_limits<int>::min();
4068
4069 if (!MFI.isCalleeSavedInfoValid())
4070 return false;
4071
4072 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
4073 for (auto &CS : CSI) {
4074 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
4075 AArch64::PPRRegClass.contains(CS.getReg())) {
4076 assert((Max == std::numeric_limits<int>::min() ||
4077 Max + 1 == CS.getFrameIdx()) &&
4078 "SVE CalleeSaves are not consecutive");
4079
4080 Min = std::min(Min, CS.getFrameIdx());
4081 Max = std::max(Max, CS.getFrameIdx());
4082 }
4083 }
4084 return Min != std::numeric_limits<int>::max();
4085}
4086
4087// Process all the SVE stack objects and determine offsets for each
4088// object. If AssignOffsets is true, the offsets get assigned.
4089// Fills in the first and last callee-saved frame indices into
4090// Min/MaxCSFrameIndex, respectively.
4091// Returns the size of the stack.
4093 int &MinCSFrameIndex,
4094 int &MaxCSFrameIndex,
4095 bool AssignOffsets) {
4096#ifndef NDEBUG
4097 // First process all fixed stack objects.
4098 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
4100 "SVE vectors should never be passed on the stack by value, only by "
4101 "reference.");
4102#endif
4103
4104 auto Assign = [&MFI](int FI, int64_t Offset) {
4105 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4106 MFI.setObjectOffset(FI, Offset);
4107 };
4108
4109 int64_t Offset = 0;
4110
4111 // Then process all callee saved slots.
4112 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4113 // Assign offsets to the callee save slots.
4114 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4115 Offset += MFI.getObjectSize(I);
4117 if (AssignOffsets)
4118 Assign(I, -Offset);
4119 }
4120 }
4121
4122 // Ensure that the Callee-save area is aligned to 16bytes.
4123 Offset = alignTo(Offset, Align(16U));
4124
4125 // Create a buffer of SVE objects to allocate and sort it.
4126 SmallVector<int, 8> ObjectsToAllocate;
4127 // If we have a stack protector, and we've previously decided that we have SVE
4128 // objects on the stack and thus need it to go in the SVE stack area, then it
4129 // needs to go first.
4130 int StackProtectorFI = -1;
4131 if (MFI.hasStackProtectorIndex()) {
4132 StackProtectorFI = MFI.getStackProtectorIndex();
4133 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4134 ObjectsToAllocate.push_back(StackProtectorFI);
4135 }
4136 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4137 unsigned StackID = MFI.getStackID(I);
4138 if (StackID != TargetStackID::ScalableVector)
4139 continue;
4140 if (I == StackProtectorFI)
4141 continue;
4142 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4143 continue;
4144 if (MFI.isDeadObjectIndex(I))
4145 continue;
4146
4147 ObjectsToAllocate.push_back(I);
4148 }
4149
4150 // Allocate all SVE locals and spills
4151 for (unsigned FI : ObjectsToAllocate) {
4152 Align Alignment = MFI.getObjectAlign(FI);
4153 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4154 // two, we'd need to align every object dynamically at runtime if the
4155 // alignment is larger than 16. This is not yet supported.
4156 if (Alignment > Align(16))
4158 "Alignment of scalable vectors > 16 bytes is not yet supported");
4159
4160 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4161 if (AssignOffsets)
4162 Assign(FI, -Offset);
4163 }
4164
4165 return Offset;
4166}
4167
4168int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4169 MachineFrameInfo &MFI) const {
4170 int MinCSFrameIndex, MaxCSFrameIndex;
4171 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4172}
4173
4174int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4175 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4176 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4177 true);
4178}
4179
4180/// Attempts to scavenge a register from \p ScavengeableRegs given the used
4181/// registers in \p UsedRegs.
4184 Register PreferredReg) {
4185 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
4186 return PreferredReg;
4187 for (auto Reg : ScavengeableRegs.set_bits()) {
4188 if (UsedRegs.available(Reg))
4189 return Reg;
4190 }
4191 return AArch64::NoRegister;
4192}
4193
4194/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
4195/// \p MachineInstrs.
4196static void propagateFrameFlags(MachineInstr &SourceMI,
4197 ArrayRef<MachineInstr *> MachineInstrs) {
4198 for (MachineInstr *MI : MachineInstrs) {
4199 if (SourceMI.getFlag(MachineInstr::FrameSetup))
4200 MI->setFlag(MachineInstr::FrameSetup);
4201 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
4203 }
4204}
4205
4206/// RAII helper class for scavenging or spilling a register. On construction
4207/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
4208/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
4209/// MaybeSpillFI to free a register. The free'd register is returned via the \p
4210/// FreeReg output parameter. On destruction, if there is a spill, its previous
4211/// value is reloaded. The spilling and scavenging is only valid at the
4212/// insertion point \p MBBI, this class should _not_ be used in places that
4213/// create or manipulate basic blocks, moving the expected insertion point.
4217
4220 Register SpillCandidate, const TargetRegisterClass &RC,
4221 LiveRegUnits const &UsedRegs,
4222 BitVector const &AllocatableRegs,
4223 std::optional<int> *MaybeSpillFI,
4224 Register PreferredReg = AArch64::NoRegister)
4225 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
4226 *MF.getSubtarget().getInstrInfo())),
4227 TRI(*MF.getSubtarget().getRegisterInfo()) {
4228 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
4229 if (FreeReg != AArch64::NoRegister)
4230 return;
4231 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
4232 "(attempted to spill in prologue/epilogue?)");
4233 if (!MaybeSpillFI->has_value()) {
4234 MachineFrameInfo &MFI = MF.getFrameInfo();
4235 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
4236 TRI.getSpillAlign(RC));
4237 }
4238 FreeReg = SpillCandidate;
4239 SpillFI = MaybeSpillFI->value();
4240 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
4241 Register());
4242 }
4243
4244 bool hasSpilled() const { return SpillFI.has_value(); }
4245
4246 /// Returns the free register (found from scavenging or spilling a register).
4247 Register freeRegister() const { return FreeReg; }
4248
4249 Register operator*() const { return freeRegister(); }
4250
4252 if (hasSpilled())
4253 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
4254 Register());
4255 }
4256
4257private:
4260 const TargetRegisterClass &RC;
4261 const AArch64InstrInfo &TII;
4262 const TargetRegisterInfo &TRI;
4263 Register FreeReg = AArch64::NoRegister;
4264 std::optional<int> SpillFI;
4265};
4266
4267/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
4268/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4270 std::optional<int> ZPRSpillFI;
4271 std::optional<int> PPRSpillFI;
4272 std::optional<int> GPRSpillFI;
4273};
4274
4275/// Registers available for scavenging (ZPR, PPR3b, GPR).
4280};
4281
4283 return MI.getFlag(MachineInstr::FrameSetup) ||
4285}
4286
4287/// Expands:
4288/// ```
4289/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
4290/// ```
4291/// To:
4292/// ```
4293/// $z0 = CPY_ZPzI_B $p0, 1, 0
4294/// STR_ZXI $z0, $stack.0, 0
4295/// ```
4296/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4297/// spilling if necessary).
4300 const TargetRegisterInfo &TRI,
4301 LiveRegUnits const &UsedRegs,
4302 ScavengeableRegs const &SR,
4303 EmergencyStackSlots &SpillSlots) {
4304 MachineFunction &MF = *MBB.getParent();
4305 auto *TII =
4306 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4307
4308 ScopedScavengeOrSpill ZPredReg(
4309 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4310 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4311
4312 SmallVector<MachineInstr *, 2> MachineInstrs;
4313 const DebugLoc &DL = MI.getDebugLoc();
4314 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
4315 .addReg(*ZPredReg, RegState::Define)
4316 .add(MI.getOperand(0))
4317 .addImm(1)
4318 .addImm(0)
4319 .getInstr());
4320 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
4321 .addReg(*ZPredReg)
4322 .add(MI.getOperand(1))
4323 .addImm(MI.getOperand(2).getImm())
4324 .setMemRefs(MI.memoperands())
4325 .getInstr());
4326 propagateFrameFlags(MI, MachineInstrs);
4327}
4328
4329/// Expands:
4330/// ```
4331/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
4332/// ```
4333/// To:
4334/// ```
4335/// $z0 = LDR_ZXI %stack.0, 0
4336/// $p0 = PTRUE_B 31, implicit $vg
4337/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
4338/// ```
4339/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4340/// spilling if necessary). If the status flags are in use at the point of
4341/// expansion they are preserved (by moving them to/from a GPR). This may cause
4342/// an additional spill if no GPR is free at the expansion point.
4345 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
4346 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
4347 MachineFunction &MF = *MBB.getParent();
4348 auto *TII =
4349 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4350
4351 ScopedScavengeOrSpill ZPredReg(
4352 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4353 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4354
4355 ScopedScavengeOrSpill PredReg(
4356 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
4357 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
4358 /*PreferredReg=*/
4359 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
4360
4361 // Elide NZCV spills if we know it is not used.
4362 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
4363 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
4364 if (IsNZCVUsed)
4365 NZCVSaveReg.emplace(
4366 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
4367 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
4368 SmallVector<MachineInstr *, 4> MachineInstrs;
4369 const DebugLoc &DL = MI.getDebugLoc();
4370 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
4371 .addReg(*ZPredReg, RegState::Define)
4372 .add(MI.getOperand(1))
4373 .addImm(MI.getOperand(2).getImm())
4374 .setMemRefs(MI.memoperands())
4375 .getInstr());
4376 if (IsNZCVUsed)
4377 MachineInstrs.push_back(
4378 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
4379 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
4380 .addImm(AArch64SysReg::NZCV)
4381 .addReg(AArch64::NZCV, RegState::Implicit)
4382 .getInstr());
4383
4384 // Reuse previous ptrue if we know it has not been clobbered.
4385 if (LastPTrue) {
4386 assert(*PredReg == LastPTrue->getOperand(0).getReg());
4387 LastPTrue->moveBefore(&MI);
4388 } else {
4389 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
4390 .addReg(*PredReg, RegState::Define)
4391 .addImm(31);
4392 }
4393 MachineInstrs.push_back(LastPTrue);
4394 MachineInstrs.push_back(
4395 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
4396 .addReg(MI.getOperand(0).getReg(), RegState::Define)
4397 .addReg(*PredReg)
4398 .addReg(*ZPredReg)
4399 .addImm(0)
4400 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4401 .getInstr());
4402 if (IsNZCVUsed)
4403 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
4404 .addImm(AArch64SysReg::NZCV)
4405 .addReg(NZCVSaveReg->freeRegister())
4406 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4407 .getInstr());
4408
4409 propagateFrameFlags(MI, MachineInstrs);
4410 return PredReg.hasSpilled();
4411}
4412
4413/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
4414/// operations within the MachineBasicBlock \p MBB.
4416 const TargetRegisterInfo &TRI,
4417 ScavengeableRegs const &SR,
4418 EmergencyStackSlots &SpillSlots) {
4419 LiveRegUnits UsedRegs(TRI);
4420 UsedRegs.addLiveOuts(MBB);
4421 bool HasPPRSpills = false;
4422 MachineInstr *LastPTrue = nullptr;
4424 UsedRegs.stepBackward(MI);
4425 switch (MI.getOpcode()) {
4426 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
4427 if (LastPTrue &&
4428 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
4429 LastPTrue = nullptr;
4430 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
4431 LastPTrue, SpillSlots);
4432 MI.eraseFromParent();
4433 break;
4434 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
4435 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
4436 MI.eraseFromParent();
4437 [[fallthrough]];
4438 default:
4439 LastPTrue = nullptr;
4440 break;
4441 }
4442 }
4443
4444 return HasPPRSpills;
4445}
4446
4448 MachineFunction &MF, RegScavenger *RS) const {
4449
4451 const TargetSubtargetInfo &TSI = MF.getSubtarget();
4452 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
4453
4454 // If predicates spills are 16-bytes we may need to expand
4455 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4456 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
4457 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
4458 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
4459 assert(Regs.count() > 0 && "Expected scavengeable registers");
4460 return Regs;
4461 };
4462
4463 ScavengeableRegs SR{};
4464 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
4465 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
4466 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
4467 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
4468
4469 EmergencyStackSlots SpillSlots;
4470 for (MachineBasicBlock &MBB : MF) {
4471 // In the case we had to spill a predicate (in the range p0-p7) to reload
4472 // a predicate (>= p8), additional spill/fill pseudos will be created.
4473 // These need an additional expansion pass. Note: There will only be at
4474 // most two expansion passes, as spilling/filling a predicate in the range
4475 // p0-p7 never requires spilling another predicate.
4476 for (int Pass = 0; Pass < 2; Pass++) {
4477 bool HasPPRSpills =
4478 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
4479 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
4480 if (!HasPPRSpills)
4481 break;
4482 }
4483 }
4484 }
4485
4486 MachineFrameInfo &MFI = MF.getFrameInfo();
4487
4489 "Upwards growing stack unsupported");
4490
4491 int MinCSFrameIndex, MaxCSFrameIndex;
4492 int64_t SVEStackSize =
4493 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4494
4495 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4496 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4497
4498 // If this function isn't doing Win64-style C++ EH, we don't need to do
4499 // anything.
4500 if (!MF.hasEHFunclets())
4501 return;
4503 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4504
4505 MachineBasicBlock &MBB = MF.front();
4506 auto MBBI = MBB.begin();
4507 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4508 ++MBBI;
4509
4510 // Create an UnwindHelp object.
4511 // The UnwindHelp object is allocated at the start of the fixed object area
4512 int64_t FixedObject =
4513 getFixedObjectSize(MF, AFI, /*IsWin64*/ true, /*IsFunclet*/ false);
4514 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8,
4515 /*SPOffset*/ -FixedObject,
4516 /*IsImmutable=*/false);
4517 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4518
4519 // We need to store -2 into the UnwindHelp object at the start of the
4520 // function.
4521 DebugLoc DL;
4523 RS->backward(MBBI);
4524 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4525 assert(DstReg && "There must be a free register after frame setup");
4526 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4527 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4528 .addReg(DstReg, getKillRegState(true))
4529 .addFrameIndex(UnwindHelpFI)
4530 .addImm(0);
4531}
4532
4533namespace {
4534struct TagStoreInstr {
4536 int64_t Offset, Size;
4537 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4538 : MI(MI), Offset(Offset), Size(Size) {}
4539};
4540
4541class TagStoreEdit {
4542 MachineFunction *MF;
4545 // Tag store instructions that are being replaced.
4547 // Combined memref arguments of the above instructions.
4549
4550 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4551 // FrameRegOffset + Size) with the address tag of SP.
4552 Register FrameReg;
4553 StackOffset FrameRegOffset;
4554 int64_t Size;
4555 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4556 // end.
4557 std::optional<int64_t> FrameRegUpdate;
4558 // MIFlags for any FrameReg updating instructions.
4559 unsigned FrameRegUpdateFlags;
4560
4561 // Use zeroing instruction variants.
4562 bool ZeroData;
4563 DebugLoc DL;
4564
4565 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4566 void emitLoop(MachineBasicBlock::iterator InsertI);
4567
4568public:
4569 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4570 : MBB(MBB), ZeroData(ZeroData) {
4571 MF = MBB->getParent();
4572 MRI = &MF->getRegInfo();
4573 }
4574 // Add an instruction to be replaced. Instructions must be added in the
4575 // ascending order of Offset, and have to be adjacent.
4576 void addInstruction(TagStoreInstr I) {
4577 assert((TagStores.empty() ||
4578 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4579 "Non-adjacent tag store instructions.");
4580 TagStores.push_back(I);
4581 }
4582 void clear() { TagStores.clear(); }
4583 // Emit equivalent code at the given location, and erase the current set of
4584 // instructions. May skip if the replacement is not profitable. May invalidate
4585 // the input iterator and replace it with a valid one.
4586 void emitCode(MachineBasicBlock::iterator &InsertI,
4587 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4588};
4589
4590void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4591 const AArch64InstrInfo *TII =
4592 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4593
4594 const int64_t kMinOffset = -256 * 16;
4595 const int64_t kMaxOffset = 255 * 16;
4596
4597 Register BaseReg = FrameReg;
4598 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4599 if (BaseRegOffsetBytes < kMinOffset ||
4600 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4601 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4602 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4603 // is required for the offset of ST2G.
4604 BaseRegOffsetBytes % 16 != 0) {
4605 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4606 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4607 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4608 BaseReg = ScratchReg;
4609 BaseRegOffsetBytes = 0;
4610 }
4611
4612 MachineInstr *LastI = nullptr;
4613 while (Size) {
4614 int64_t InstrSize = (Size > 16) ? 32 : 16;
4615 unsigned Opcode =
4616 InstrSize == 16
4617 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4618 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4619 assert(BaseRegOffsetBytes % 16 == 0);
4620 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4621 .addReg(AArch64::SP)
4622 .addReg(BaseReg)
4623 .addImm(BaseRegOffsetBytes / 16)
4624 .setMemRefs(CombinedMemRefs);
4625 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4626 // final SP adjustment in the epilogue.
4627 if (BaseRegOffsetBytes == 0)
4628 LastI = I;
4629 BaseRegOffsetBytes += InstrSize;
4630 Size -= InstrSize;
4631 }
4632
4633 if (LastI)
4634 MBB->splice(InsertI, MBB, LastI);
4635}
4636
4637void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4638 const AArch64InstrInfo *TII =
4639 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4640
4641 Register BaseReg = FrameRegUpdate
4642 ? FrameReg
4643 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4644 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4645
4646 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4647
4648 int64_t LoopSize = Size;
4649 // If the loop size is not a multiple of 32, split off one 16-byte store at
4650 // the end to fold BaseReg update into.
4651 if (FrameRegUpdate && *FrameRegUpdate)
4652 LoopSize -= LoopSize % 32;
4653 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4654 TII->get(ZeroData ? AArch64::STZGloop_wback
4655 : AArch64::STGloop_wback))
4656 .addDef(SizeReg)
4657 .addDef(BaseReg)
4658 .addImm(LoopSize)
4659 .addReg(BaseReg)
4660 .setMemRefs(CombinedMemRefs);
4661 if (FrameRegUpdate)
4662 LoopI->setFlags(FrameRegUpdateFlags);
4663
4664 int64_t ExtraBaseRegUpdate =
4665 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4666 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
4667 << ", Size=" << Size
4668 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
4669 << ", FrameRegUpdate=" << FrameRegUpdate
4670 << ", FrameRegOffset.getFixed()="
4671 << FrameRegOffset.getFixed() << "\n");
4672 if (LoopSize < Size) {
4673 assert(FrameRegUpdate);
4674 assert(Size - LoopSize == 16);
4675 // Tag 16 more bytes at BaseReg and update BaseReg.
4676 int64_t STGOffset = ExtraBaseRegUpdate + 16;
4677 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
4678 "STG immediate out of range");
4679 BuildMI(*MBB, InsertI, DL,
4680 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4681 .addDef(BaseReg)
4682 .addReg(BaseReg)
4683 .addReg(BaseReg)
4684 .addImm(STGOffset / 16)
4685 .setMemRefs(CombinedMemRefs)
4686 .setMIFlags(FrameRegUpdateFlags);
4687 } else if (ExtraBaseRegUpdate) {
4688 // Update BaseReg.
4689 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
4690 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
4691 BuildMI(
4692 *MBB, InsertI, DL,
4693 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4694 .addDef(BaseReg)
4695 .addReg(BaseReg)
4696 .addImm(AddSubOffset)
4697 .addImm(0)
4698 .setMIFlags(FrameRegUpdateFlags);
4699 }
4700}
4701
4702// Check if *II is a register update that can be merged into STGloop that ends
4703// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4704// end of the loop.
4705bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4706 int64_t Size, int64_t *TotalOffset) {
4707 MachineInstr &MI = *II;
4708 if ((MI.getOpcode() == AArch64::ADDXri ||
4709 MI.getOpcode() == AArch64::SUBXri) &&
4710 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4711 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4712 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4713 if (MI.getOpcode() == AArch64::SUBXri)
4714 Offset = -Offset;
4715 int64_t PostOffset = Offset - Size;
4716 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
4717 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
4718 // chosen depends on the alignment of the loop size, but the difference
4719 // between the valid ranges for the two instructions is small, so we
4720 // conservatively assume that it could be either case here.
4721 //
4722 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
4723 // instruction.
4724 const int64_t kMaxOffset = 4080 - 16;
4725 // Max offset of SUBXri.
4726 const int64_t kMinOffset = -4095;
4727 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
4728 PostOffset % 16 == 0) {
4729 *TotalOffset = Offset;
4730 return true;
4731 }
4732 }
4733 return false;
4734}
4735
4736void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4738 MemRefs.clear();
4739 for (auto &TS : TSE) {
4740 MachineInstr *MI = TS.MI;
4741 // An instruction without memory operands may access anything. Be
4742 // conservative and return an empty list.
4743 if (MI->memoperands_empty()) {
4744 MemRefs.clear();
4745 return;
4746 }
4747 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4748 }
4749}
4750
4751void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4752 const AArch64FrameLowering *TFI,
4753 bool TryMergeSPUpdate) {
4754 if (TagStores.empty())
4755 return;
4756 TagStoreInstr &FirstTagStore = TagStores[0];
4757 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4758 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4759 DL = TagStores[0].MI->getDebugLoc();
4760
4761 Register Reg;
4762 FrameRegOffset = TFI->resolveFrameOffsetReference(
4763 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4764 /*PreferFP=*/false, /*ForSimm=*/true);
4765 FrameReg = Reg;
4766 FrameRegUpdate = std::nullopt;
4767
4768 mergeMemRefs(TagStores, CombinedMemRefs);
4769
4770 LLVM_DEBUG({
4771 dbgs() << "Replacing adjacent STG instructions:\n";
4772 for (const auto &Instr : TagStores) {
4773 dbgs() << " " << *Instr.MI;
4774 }
4775 });
4776
4777 // Size threshold where a loop becomes shorter than a linear sequence of
4778 // tagging instructions.
4779 const int kSetTagLoopThreshold = 176;
4780 if (Size < kSetTagLoopThreshold) {
4781 if (TagStores.size() < 2)
4782 return;
4783 emitUnrolled(InsertI);
4784 } else {
4785 MachineInstr *UpdateInstr = nullptr;
4786 int64_t TotalOffset = 0;
4787 if (TryMergeSPUpdate) {
4788 // See if we can merge base register update into the STGloop.
4789 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4790 // but STGloop is way too unusual for that, and also it only
4791 // realistically happens in function epilogue. Also, STGloop is expanded
4792 // before that pass.
4793 if (InsertI != MBB->end() &&
4794 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4795 &TotalOffset)) {
4796 UpdateInstr = &*InsertI++;
4797 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4798 << *UpdateInstr);
4799 }
4800 }
4801
4802 if (!UpdateInstr && TagStores.size() < 2)
4803 return;
4804
4805 if (UpdateInstr) {
4806 FrameRegUpdate = TotalOffset;
4807 FrameRegUpdateFlags = UpdateInstr->getFlags();
4808 }
4809 emitLoop(InsertI);
4810 if (UpdateInstr)
4811 UpdateInstr->eraseFromParent();
4812 }
4813
4814 for (auto &TS : TagStores)
4815 TS.MI->eraseFromParent();
4816}
4817
4818bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4819 int64_t &Size, bool &ZeroData) {
4820 MachineFunction &MF = *MI.getParent()->getParent();
4821 const MachineFrameInfo &MFI = MF.getFrameInfo();
4822
4823 unsigned Opcode = MI.getOpcode();
4824 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4825 Opcode == AArch64::STZ2Gi);
4826
4827 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4828 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4829 return false;
4830 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4831 return false;
4832 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4833 Size = MI.getOperand(2).getImm();
4834 return true;
4835 }
4836
4837 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4838 Size = 16;
4839 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4840 Size = 32;
4841 else
4842 return false;
4843
4844 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4845 return false;
4846
4847 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4848 16 * MI.getOperand(2).getImm();
4849 return true;
4850}
4851
4852// Detect a run of memory tagging instructions for adjacent stack frame slots,
4853// and replace them with a shorter instruction sequence:
4854// * replace STG + STG with ST2G
4855// * replace STGloop + STGloop with STGloop
4856// This code needs to run when stack slot offsets are already known, but before
4857// FrameIndex operands in STG instructions are eliminated.
4859 const AArch64FrameLowering *TFI,
4860 RegScavenger *RS) {
4861 bool FirstZeroData;
4862 int64_t Size, Offset;
4863 MachineInstr &MI = *II;
4864 MachineBasicBlock *MBB = MI.getParent();
4866 if (&MI == &MBB->instr_back())
4867 return II;
4868 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4869 return II;
4870
4872 Instrs.emplace_back(&MI, Offset, Size);
4873
4874 constexpr int kScanLimit = 10;
4875 int Count = 0;
4877 NextI != E && Count < kScanLimit; ++NextI) {
4878 MachineInstr &MI = *NextI;
4879 bool ZeroData;
4880 int64_t Size, Offset;
4881 // Collect instructions that update memory tags with a FrameIndex operand
4882 // and (when applicable) constant size, and whose output registers are dead
4883 // (the latter is almost always the case in practice). Since these
4884 // instructions effectively have no inputs or outputs, we are free to skip
4885 // any non-aliasing instructions in between without tracking used registers.
4886 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4887 if (ZeroData != FirstZeroData)
4888 break;
4889 Instrs.emplace_back(&MI, Offset, Size);
4890 continue;
4891 }
4892
4893 // Only count non-transient, non-tagging instructions toward the scan
4894 // limit.
4895 if (!MI.isTransient())
4896 ++Count;
4897
4898 // Just in case, stop before the epilogue code starts.
4899 if (MI.getFlag(MachineInstr::FrameSetup) ||
4901 break;
4902
4903 // Reject anything that may alias the collected instructions.
4904 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
4905 break;
4906 }
4907
4908 // New code will be inserted after the last tagging instruction we've found.
4909 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4910
4911 // All the gathered stack tag instructions are merged and placed after
4912 // last tag store in the list. The check should be made if the nzcv
4913 // flag is live at the point where we are trying to insert. Otherwise
4914 // the nzcv flag might get clobbered if any stg loops are present.
4915
4916 // FIXME : This approach of bailing out from merge is conservative in
4917 // some ways like even if stg loops are not present after merge the
4918 // insert list, this liveness check is done (which is not needed).
4920 LiveRegs.addLiveOuts(*MBB);
4921 for (auto I = MBB->rbegin();; ++I) {
4922 MachineInstr &MI = *I;
4923 if (MI == InsertI)
4924 break;
4925 LiveRegs.stepBackward(*I);
4926 }
4927 InsertI++;
4928 if (LiveRegs.contains(AArch64::NZCV))
4929 return InsertI;
4930
4931 llvm::stable_sort(Instrs,
4932 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4933 return Left.Offset < Right.Offset;
4934 });
4935
4936 // Make sure that we don't have any overlapping stores.
4937 int64_t CurOffset = Instrs[0].Offset;
4938 for (auto &Instr : Instrs) {
4939 if (CurOffset > Instr.Offset)
4940 return NextI;
4941 CurOffset = Instr.Offset + Instr.Size;
4942 }
4943
4944 // Find contiguous runs of tagged memory and emit shorter instruction
4945 // sequencies for them when possible.
4946 TagStoreEdit TSE(MBB, FirstZeroData);
4947 std::optional<int64_t> EndOffset;
4948 for (auto &Instr : Instrs) {
4949 if (EndOffset && *EndOffset != Instr.Offset) {
4950 // Found a gap.
4951 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4952 TSE.clear();
4953 }
4954
4955 TSE.addInstruction(Instr);
4956 EndOffset = Instr.Offset + Instr.Size;
4957 }
4958
4959 const MachineFunction *MF = MBB->getParent();
4960 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4961 TSE.emitCode(
4962 InsertI, TFI, /*TryMergeSPUpdate = */
4964
4965 return InsertI;
4966}
4967} // namespace
4968
4970 const AArch64FrameLowering *TFI) {
4971 MachineInstr &MI = *II;
4972 MachineBasicBlock *MBB = MI.getParent();
4973 MachineFunction *MF = MBB->getParent();
4974
4975 if (MI.getOpcode() != AArch64::VGSavePseudo &&
4976 MI.getOpcode() != AArch64::VGRestorePseudo)
4977 return II;
4978
4979 SMEAttrs FuncAttrs(MF->getFunction());
4980 bool LocallyStreaming =
4981 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
4984 const AArch64InstrInfo *TII =
4985 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4986
4987 int64_t VGFrameIdx =
4988 LocallyStreaming ? AFI->getStreamingVGIdx() : AFI->getVGIdx();
4989 assert(VGFrameIdx != std::numeric_limits<int>::max() &&
4990 "Expected FrameIdx for VG");
4991
4992 unsigned CFIIndex;
4993 if (MI.getOpcode() == AArch64::VGSavePseudo) {
4994 const MachineFrameInfo &MFI = MF->getFrameInfo();
4995 int64_t Offset =
4996 MFI.getObjectOffset(VGFrameIdx) - TFI->getOffsetOfLocalArea();
4998 nullptr, TRI->getDwarfRegNum(AArch64::VG, true), Offset));
4999 } else
5001 nullptr, TRI->getDwarfRegNum(AArch64::VG, true)));
5002
5003 MachineInstr *UnwindInst = BuildMI(*MBB, II, II->getDebugLoc(),
5004 TII->get(TargetOpcode::CFI_INSTRUCTION))
5005 .addCFIIndex(CFIIndex);
5006
5007 MI.eraseFromParent();
5008 return UnwindInst->getIterator();
5009}
5010
5012 MachineFunction &MF, RegScavenger *RS = nullptr) const {
5013 for (auto &BB : MF)
5014 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
5015 if (requiresSaveVG(MF))
5016 II = emitVGSaveRestore(II, this);
5018 II = tryMergeAdjacentSTG(II, this, RS);
5019 }
5020}
5021
5022/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
5023/// before the update. This is easily retrieved as it is exactly the offset
5024/// that is set in processFunctionBeforeFrameFinalized.
5026 const MachineFunction &MF, int FI, Register &FrameReg,
5027 bool IgnoreSPUpdates) const {
5028 const MachineFrameInfo &MFI = MF.getFrameInfo();
5029 if (IgnoreSPUpdates) {
5030 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
5031 << MFI.getObjectOffset(FI) << "\n");
5032 FrameReg = AArch64::SP;
5033 return StackOffset::getFixed(MFI.getObjectOffset(FI));
5034 }
5035
5036 // Go to common code if we cannot provide sp + offset.
5037 if (MFI.hasVarSizedObjects() ||
5040 return getFrameIndexReference(MF, FI, FrameReg);
5041
5042 FrameReg = AArch64::SP;
5043 return getStackOffset(MF, MFI.getObjectOffset(FI));
5044}
5045
5046/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
5047/// the parent's frame pointer
5049 const MachineFunction &MF) const {
5050 return 0;
5051}
5052
5053/// Funclets only need to account for space for the callee saved registers,
5054/// as the locals are accounted for in the parent's stack frame.
5056 const MachineFunction &MF) const {
5057 // This is the size of the pushed CSRs.
5058 unsigned CSSize =
5059 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
5060 // This is the amount of stack a funclet needs to allocate.
5061 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
5062 getStackAlign());
5063}
5064
5065namespace {
5066struct FrameObject {
5067 bool IsValid = false;
5068 // Index of the object in MFI.
5069 int ObjectIndex = 0;
5070 // Group ID this object belongs to.
5071 int GroupIndex = -1;
5072 // This object should be placed first (closest to SP).
5073 bool ObjectFirst = false;
5074 // This object's group (which always contains the object with
5075 // ObjectFirst==true) should be placed first.
5076 bool GroupFirst = false;
5077
5078 // Used to distinguish between FP and GPR accesses. The values are decided so
5079 // that they sort FPR < Hazard < GPR and they can be or'd together.
5080 unsigned Accesses = 0;
5081 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
5082};
5083
5084class GroupBuilder {
5085 SmallVector<int, 8> CurrentMembers;
5086 int NextGroupIndex = 0;
5087 std::vector<FrameObject> &Objects;
5088
5089public:
5090 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
5091 void AddMember(int Index) { CurrentMembers.push_back(Index); }
5092 void EndCurrentGroup() {
5093 if (CurrentMembers.size() > 1) {
5094 // Create a new group with the current member list. This might remove them
5095 // from their pre-existing groups. That's OK, dealing with overlapping
5096 // groups is too hard and unlikely to make a difference.
5097 LLVM_DEBUG(dbgs() << "group:");
5098 for (int Index : CurrentMembers) {
5099 Objects[Index].GroupIndex = NextGroupIndex;
5100 LLVM_DEBUG(dbgs() << " " << Index);
5101 }
5102 LLVM_DEBUG(dbgs() << "\n");
5103 NextGroupIndex++;
5104 }
5105 CurrentMembers.clear();
5106 }
5107};
5108
5109bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
5110 // Objects at a lower index are closer to FP; objects at a higher index are
5111 // closer to SP.
5112 //
5113 // For consistency in our comparison, all invalid objects are placed
5114 // at the end. This also allows us to stop walking when we hit the
5115 // first invalid item after it's all sorted.
5116 //
5117 // If we want to include a stack hazard region, order FPR accesses < the
5118 // hazard object < GPRs accesses in order to create a separation between the
5119 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
5120 //
5121 // Otherwise the "first" object goes first (closest to SP), followed by the
5122 // members of the "first" group.
5123 //
5124 // The rest are sorted by the group index to keep the groups together.
5125 // Higher numbered groups are more likely to be around longer (i.e. untagged
5126 // in the function epilogue and not at some earlier point). Place them closer
5127 // to SP.
5128 //
5129 // If all else equal, sort by the object index to keep the objects in the
5130 // original order.
5131 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
5132 A.GroupIndex, A.ObjectIndex) <
5133 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
5134 B.GroupIndex, B.ObjectIndex);
5135}
5136} // namespace
5137
5139 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
5140 if (!OrderFrameObjects || ObjectsToAllocate.empty())
5141 return;
5142
5144 const MachineFrameInfo &MFI = MF.getFrameInfo();
5145 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
5146 for (auto &Obj : ObjectsToAllocate) {
5147 FrameObjects[Obj].IsValid = true;
5148 FrameObjects[Obj].ObjectIndex = Obj;
5149 }
5150
5151 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
5152 // the same time.
5153 GroupBuilder GB(FrameObjects);
5154 for (auto &MBB : MF) {
5155 for (auto &MI : MBB) {
5156 if (MI.isDebugInstr())
5157 continue;
5158
5159 if (AFI.hasStackHazardSlotIndex()) {
5160 std::optional<int> FI = getLdStFrameID(MI, MFI);
5161 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
5162 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
5164 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
5165 else
5166 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
5167 }
5168 }
5169
5170 int OpIndex;
5171 switch (MI.getOpcode()) {
5172 case AArch64::STGloop:
5173 case AArch64::STZGloop:
5174 OpIndex = 3;
5175 break;
5176 case AArch64::STGi:
5177 case AArch64::STZGi:
5178 case AArch64::ST2Gi:
5179 case AArch64::STZ2Gi:
5180 OpIndex = 1;
5181 break;
5182 default:
5183 OpIndex = -1;
5184 }
5185
5186 int TaggedFI = -1;
5187 if (OpIndex >= 0) {
5188 const MachineOperand &MO = MI.getOperand(OpIndex);
5189 if (MO.isFI()) {
5190 int FI = MO.getIndex();
5191 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
5192 FrameObjects[FI].IsValid)
5193 TaggedFI = FI;
5194 }
5195 }
5196
5197 // If this is a stack tagging instruction for a slot that is not part of a
5198 // group yet, either start a new group or add it to the current one.
5199 if (TaggedFI >= 0)
5200 GB.AddMember(TaggedFI);
5201 else
5202 GB.EndCurrentGroup();
5203 }
5204 // Groups should never span multiple basic blocks.
5205 GB.EndCurrentGroup();
5206 }
5207
5208 if (AFI.hasStackHazardSlotIndex()) {
5209 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
5210 FrameObject::AccessHazard;
5211 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
5212 for (auto &Obj : FrameObjects)
5213 if (!Obj.Accesses ||
5214 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
5215 Obj.Accesses = FrameObject::AccessGPR;
5216 }
5217
5218 // If the function's tagged base pointer is pinned to a stack slot, we want to
5219 // put that slot first when possible. This will likely place it at SP + 0,
5220 // and save one instruction when generating the base pointer because IRG does
5221 // not allow an immediate offset.
5222 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
5223 if (TBPI) {
5224 FrameObjects[*TBPI].ObjectFirst = true;
5225 FrameObjects[*TBPI].GroupFirst = true;
5226 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
5227 if (FirstGroupIndex >= 0)
5228 for (FrameObject &Object : FrameObjects)
5229 if (Object.GroupIndex == FirstGroupIndex)
5230 Object.GroupFirst = true;
5231 }
5232
5233 llvm::stable_sort(FrameObjects, FrameObjectCompare);
5234
5235 int i = 0;
5236 for (auto &Obj : FrameObjects) {
5237 // All invalid items are sorted at the end, so it's safe to stop.
5238 if (!Obj.IsValid)
5239 break;
5240 ObjectsToAllocate[i++] = Obj.ObjectIndex;
5241 }
5242
5243 LLVM_DEBUG({
5244 dbgs() << "Final frame order:\n";
5245 for (auto &Obj : FrameObjects) {
5246 if (!Obj.IsValid)
5247 break;
5248 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
5249 if (Obj.ObjectFirst)
5250 dbgs() << ", first";
5251 if (Obj.GroupFirst)
5252 dbgs() << ", group-first";
5253 dbgs() << "\n";
5254 }
5255 });
5256}
5257
5258/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
5259/// least every ProbeSize bytes. Returns an iterator of the first instruction
5260/// after the loop. The difference between SP and TargetReg must be an exact
5261/// multiple of ProbeSize.
5263AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
5264 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
5265 Register TargetReg) const {
5267 MachineFunction &MF = *MBB.getParent();
5268 const AArch64InstrInfo *TII =
5269 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5271
5272 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
5274 MF.insert(MBBInsertPoint, LoopMBB);
5276 MF.insert(MBBInsertPoint, ExitMBB);
5277
5278 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
5279 // in SUB).
5280 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
5281 StackOffset::getFixed(-ProbeSize), TII,
5283 // STR XZR, [SP]
5284 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
5285 .addReg(AArch64::XZR)
5286 .addReg(AArch64::SP)
5287 .addImm(0)
5289 // CMP SP, TargetReg
5290 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
5291 AArch64::XZR)
5292 .addReg(AArch64::SP)
5293 .addReg(TargetReg)
5296 // B.CC Loop
5297 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
5299 .addMBB(LoopMBB)
5301
5302 LoopMBB->addSuccessor(ExitMBB);
5303 LoopMBB->addSuccessor(LoopMBB);
5304 // Synthesize the exit MBB.
5305 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
5307 MBB.addSuccessor(LoopMBB);
5308 // Update liveins.
5309 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
5310
5311 return ExitMBB->begin();
5312}
5313
5314void AArch64FrameLowering::inlineStackProbeFixed(
5315 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
5316 StackOffset CFAOffset) const {
5318 MachineFunction &MF = *MBB->getParent();
5319 const AArch64InstrInfo *TII =
5320 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5322 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
5323 bool HasFP = hasFP(MF);
5324
5325 DebugLoc DL;
5326 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
5327 int64_t NumBlocks = FrameSize / ProbeSize;
5328 int64_t ResidualSize = FrameSize % ProbeSize;
5329
5330 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
5331 << NumBlocks << " blocks of " << ProbeSize
5332 << " bytes, plus " << ResidualSize << " bytes\n");
5333
5334 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
5335 // ordinary loop.
5336 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
5337 for (int i = 0; i < NumBlocks; ++i) {
5338 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
5339 // encodable in a SUB).
5340 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5341 StackOffset::getFixed(-ProbeSize), TII,
5342 MachineInstr::FrameSetup, false, false, nullptr,
5343 EmitAsyncCFI && !HasFP, CFAOffset);
5344 CFAOffset += StackOffset::getFixed(ProbeSize);
5345 // STR XZR, [SP]
5346 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5347 .addReg(AArch64::XZR)
5348 .addReg(AArch64::SP)
5349 .addImm(0)
5351 }
5352 } else if (NumBlocks != 0) {
5353 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
5354 // encodable in ADD). ScrathReg may temporarily become the CFA register.
5355 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
5356 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
5357 MachineInstr::FrameSetup, false, false, nullptr,
5358 EmitAsyncCFI && !HasFP, CFAOffset);
5359 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
5360 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
5361 MBB = MBBI->getParent();
5362 if (EmitAsyncCFI && !HasFP) {
5363 // Set the CFA register back to SP.
5365 *MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
5366 unsigned Reg = RegInfo.getDwarfRegNum(AArch64::SP, true);
5367 unsigned CFIIndex =
5369 BuildMI(*MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
5370 .addCFIIndex(CFIIndex)
5372 }
5373 }
5374
5375 if (ResidualSize != 0) {
5376 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
5377 // in SUB).
5378 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5379 StackOffset::getFixed(-ResidualSize), TII,
5380 MachineInstr::FrameSetup, false, false, nullptr,
5381 EmitAsyncCFI && !HasFP, CFAOffset);
5382 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
5383 // STR XZR, [SP]
5384 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5385 .addReg(AArch64::XZR)
5386 .addReg(AArch64::SP)
5387 .addImm(0)
5389 }
5390 }
5391}
5392
5393void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
5394 MachineBasicBlock &MBB) const {
5395 // Get the instructions that need to be replaced. We emit at most two of
5396 // these. Remember them in order to avoid complications coming from the need
5397 // to traverse the block while potentially creating more blocks.
5399 for (MachineInstr &MI : MBB)
5400 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
5401 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
5402 ToReplace.push_back(&MI);
5403
5404 for (MachineInstr *MI : ToReplace) {
5405 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
5406 Register ScratchReg = MI->getOperand(0).getReg();
5407 int64_t FrameSize = MI->getOperand(1).getImm();
5408 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
5409 MI->getOperand(3).getImm());
5410 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
5411 CFAOffset);
5412 } else {
5413 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
5414 "Stack probe pseudo-instruction expected");
5415 const AArch64InstrInfo *TII =
5416 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
5417 Register TargetReg = MI->getOperand(0).getReg();
5418 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
5419 }
5420 MI->eraseFromParent();
5421 }
5422}
5423
5426 NotAccessed = 0, // Stack object not accessed by load/store instructions.
5427 GPR = 1 << 0, // A general purpose register.
5428 PPR = 1 << 1, // A predicate register.
5429 FPR = 1 << 2, // A floating point/Neon/SVE register.
5430 };
5431
5432 int Idx;
5434 int64_t Size;
5435 unsigned AccessTypes;
5436
5437 StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
5438
5439 bool operator<(const StackAccess &Rhs) const {
5440 return std::make_tuple(start(), Idx) <
5441 std::make_tuple(Rhs.start(), Rhs.Idx);
5442 }
5443
5444 bool isCPU() const {
5445 // Predicate register load and store instructions execute on the CPU.
5446 return AccessTypes & (AccessType::GPR | AccessType::PPR);
5447 }
5448 bool isSME() const { return AccessTypes & AccessType::FPR; }
5449 bool isMixed() const { return isCPU() && isSME(); }
5450
5451 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
5452 int64_t end() const { return start() + Size; }
5453
5454 std::string getTypeString() const {
5455 switch (AccessTypes) {
5456 case AccessType::FPR:
5457 return "FPR";
5458 case AccessType::PPR:
5459 return "PPR";
5460 case AccessType::GPR:
5461 return "GPR";
5462 case AccessType::NotAccessed:
5463 return "NA";
5464 default:
5465 return "Mixed";
5466 }
5467 }
5468
5469 void print(raw_ostream &OS) const {
5470 OS << getTypeString() << " stack object at [SP"
5471 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
5472 if (Offset.getScalable())
5473 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
5474 << " * vscale";
5475 OS << "]";
5476 }
5477};
5478
5479static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
5480 SA.print(OS);
5481 return OS;
5482}
5483
5484void AArch64FrameLowering::emitRemarks(
5485 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
5486
5488 if (Attrs.hasNonStreamingInterfaceAndBody())
5489 return;
5490
5491 unsigned StackHazardSize = getStackHazardSize(MF);
5492 const uint64_t HazardSize =
5493 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
5494
5495 if (HazardSize == 0)
5496 return;
5497
5498 const MachineFrameInfo &MFI = MF.getFrameInfo();
5499 // Bail if function has no stack objects.
5500 if (!MFI.hasStackObjects())
5501 return;
5502
5503 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
5504
5505 size_t NumFPLdSt = 0;
5506 size_t NumNonFPLdSt = 0;
5507
5508 // Collect stack accesses via Load/Store instructions.
5509 for (const MachineBasicBlock &MBB : MF) {
5510 for (const MachineInstr &MI : MBB) {
5511 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
5512 continue;
5513 for (MachineMemOperand *MMO : MI.memoperands()) {
5514 std::optional<int> FI = getMMOFrameID(MMO, MFI);
5515 if (FI && !MFI.isDeadObjectIndex(*FI)) {
5516 int FrameIdx = *FI;
5517
5518 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
5519 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
5520 StackAccesses[ArrIdx].Idx = FrameIdx;
5521 StackAccesses[ArrIdx].Offset =
5522 getFrameIndexReferenceFromSP(MF, FrameIdx);
5523 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
5524 }
5525
5526 unsigned RegTy = StackAccess::AccessType::GPR;
5527 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
5528 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
5529 // spill/fill the predicate as a data vector (so are an FPR acess).
5530 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
5531 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
5532 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
5533 RegTy = StackAccess::PPR;
5534 } else
5535 RegTy = StackAccess::FPR;
5536 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5537 RegTy = StackAccess::FPR;
5538 }
5539
5540 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5541
5542 if (RegTy == StackAccess::FPR)
5543 ++NumFPLdSt;
5544 else
5545 ++NumNonFPLdSt;
5546 }
5547 }
5548 }
5549 }
5550
5551 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5552 return;
5553
5554 llvm::sort(StackAccesses);
5555 StackAccesses.erase(llvm::remove_if(StackAccesses,
5556 [](const StackAccess &S) {
5557 return S.AccessTypes ==
5559 }),
5560 StackAccesses.end());
5561
5564
5565 if (StackAccesses.front().isMixed())
5566 MixedObjects.push_back(&StackAccesses.front());
5567
5568 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5569 It != End; ++It) {
5570 const auto &First = *It;
5571 const auto &Second = *(It + 1);
5572
5573 if (Second.isMixed())
5574 MixedObjects.push_back(&Second);
5575
5576 if ((First.isSME() && Second.isCPU()) ||
5577 (First.isCPU() && Second.isSME())) {
5578 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5579 if (Distance < HazardSize)
5580 HazardPairs.emplace_back(&First, &Second);
5581 }
5582 }
5583
5584 auto EmitRemark = [&](llvm::StringRef Str) {
5585 ORE->emit([&]() {
5587 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5588 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5589 });
5590 };
5591
5592 for (const auto &P : HazardPairs)
5593 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5594
5595 for (const auto *Obj : MixedObjects)
5596 EmitRemark(
5597 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5598}
unsigned const MachineRegisterInfo * MRI
#define Success
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL)
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned FixedObject)
static bool needsWinCFI(const MachineFunction &MF)
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static void insertCFISameValue(const MCInstrDesc &Desc, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator InsertPt, unsigned DwarfReg)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool windowsRequiresStackProbe(MachineFunction &MF, uint64_t StackSizeInBytes)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
MachineBasicBlock::iterator emitVGSaveRestore(MachineBasicBlock::iterator II, const AArch64FrameLowering *TFI)
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
static bool requiresSaveVG(MachineFunction &MF)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
#define LLVM_DEBUG(...)
Definition: Debug.h:106
uint32_t Index
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition: LLParser.cpp:71
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
unsigned const TargetRegisterInfo * TRI
static unsigned getReg(const MCDisassembler *D, unsigned RC, unsigned RegNo)
uint64_t IntrinsicInst * II
#define P(N)
static const MCPhysReg FPR[]
FPR - The set of FP registers that should be allocated for arguments on Darwin and AIX.
if(PassOpts->AAPipeline)
This file declares the machine register scavenger class.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
unsigned OpIndex
raw_pwrite_stream & OS
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:469
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:166
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
void setStreamingVGIdx(unsigned FrameIdx)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruciton used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
bool isReservedReg(const MachineFunction &MF, MCRegister Reg) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:168
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:163
bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:33
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:713
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:710
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:277
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:359
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:234
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:731
void storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SrcReg, bool isKill, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Store the specified register of the given register class to the specified stack frame index.
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, MCRegister DestReg, MCRegister SrcReg, bool KillSrc, bool RenamableDest=false, bool RenamableSrc=false) const override
Emit instructions to copy a pair of physical registers.
void loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register DestReg, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Load the specified register of the given register class from the specified stack frame index.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
bool available(const MachineRegisterInfo &MRI, MCPhysReg Reg) const
Returns true if register Reg and no aliasing register is in the set.
void stepBackward(const MachineInstr &MI)
Simulates liveness when stepping backwards over an instruction(bundle).
void removeReg(MCPhysReg Reg)
Removes a physical register, all its sub-registers, and all its super-registers from the set.
Definition: LivePhysRegs.h:92
void addLiveIns(const MachineBasicBlock &MBB)
Adds all live-in registers of basic block MBB.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds all live-out registers of basic block MBB.
void addReg(MCPhysReg Reg)
Adds a physical register and all its sub-registers to the set.
Definition: LivePhysRegs.h:83
A set of register units used to track register liveness.
Definition: LiveRegUnits.h:30
bool available(MCPhysReg Reg) const
Returns true if no part of physical register Reg is live.
Definition: LiveRegUnits.h:116
void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:661
static MCCFIInstruction createDefCfaRegister(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_def_cfa_register modifies a rule for computing CFA.
Definition: MCDwarf.h:582
static MCCFIInstruction createRestore(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_restore says that the rule for Register is now the same as it was at the beginning of the functi...
Definition: MCDwarf.h:656
static MCCFIInstruction cfiDefCfa(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa defines a rule for computing CFA as: take address from Register and add Offset to it.
Definition: MCDwarf.h:575
static MCCFIInstruction createOffset(MCSymbol *L, unsigned Register, int64_t Offset, SMLoc Loc={})
.cfi_offset Previous value of Register is saved at offset Offset from CFA.
Definition: MCDwarf.h:617
static MCCFIInstruction createNegateRAStateWithPC(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state_with_pc AArch64 negate RA state with PC.
Definition: MCDwarf.h:648
static MCCFIInstruction createNegateRAState(MCSymbol *L, SMLoc Loc={})
.cfi_negate_ra_state AArch64 negate RA state.
Definition: MCDwarf.h:643
static MCCFIInstruction cfiDefCfaOffset(MCSymbol *L, int64_t Offset, SMLoc Loc={})
.cfi_def_cfa_offset modifies a rule for computing CFA.
Definition: MCDwarf.h:590
static MCCFIInstruction createEscape(MCSymbol *L, StringRef Vals, SMLoc Loc={}, StringRef Comment="")
.cfi_escape Allows the user to add arbitrary bytes to the unwind info.
Definition: MCDwarf.h:687
static MCCFIInstruction createSameValue(MCSymbol *L, unsigned Register, SMLoc Loc={})
.cfi_same_value Current value of Register is the same as in the previous frame.
Definition: MCDwarf.h:670
MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:345
Describe properties that are true of each instruction in the target description file.
Definition: MCInstrDesc.h:198
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:41
void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
MachineInstr & instr_back()
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
bool isLiveIn(MCRegister Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
unsigned addFrameInst(const MCCFIInstruction &Inst)
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addCFIIndex(unsigned CFIIndex) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:71
void setFlags(unsigned flags)
Definition: MachineInstr.h:412
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
Definition: MachineInstr.h:399
void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
Definition: MachineInstr.h:587
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:394
void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
static MachineOperand CreateImm(int64_t Val)
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
Diagnostic information for optimization analysis remarks.
void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
bool isLiveIn(Register Reg) const
const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:310
Pass interface - Implemented by all 'passes'.
Definition: Pass.h:94
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasStreamingBody() const
bool empty() const
Definition: SmallVector.h:81
size_t size() const
Definition: SmallVector.h:78
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:573
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:937
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:683
void push_back(const T &Elt)
Definition: SmallVector.h:413
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1196
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:33
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:49
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:52
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:43
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:42
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:51
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetRegisterInfo * getRegisterInfo() const
getRegisterInfo - If register information is available, return it.
virtual const TargetInstrInfo * getInstrInfo() const
StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1354
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:345
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:202
self_iterator getIterator()
Definition: ilist_node.h:132
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition: raw_ostream.h:52
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ AArch64_SME_ABI_Support_Routines_PreserveMost_From_X1
Preserve X1-X15, X19-X29, SP, Z0-Z31, P0-P15.
Definition: CallingConv.h:271
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:443
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:480
void stable_sort(R &&Range)
Definition: STLExtras.h:2037
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA)
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:657
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1746
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:420
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1664
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:163
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:167
EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
auto remove_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::remove_if which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1778
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
Definition: APFixedPoint.h:303
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:215
Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:860
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Description of the encoding of one expression Op.
Pair of physical register and lane mask.
static MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.