LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
348/// Returns true if PPRs are spilled as ZPRs.
349static bool arePPRsSpilledAsZPR(const MachineFunction &MF) {
351 AArch64::PPRRegClass) == 16;
352}
353
359
362 // With split SVE objects, the hazard padding is added to the PPR region,
363 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
364 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
367 : 0,
368 AFI->getStackSizePPR());
369}
370
371// Conservatively, returns true if the function is likely to have SVE vectors
372// on the stack. This function is safe to be called before callee-saves or
373// object offsets have been determined.
375 const MachineFunction &MF) {
376 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
377 if (AFI->isSVECC())
378 return true;
379
380 if (AFI->hasCalculatedStackSizeSVE())
381 return bool(AFL.getSVEStackSize(MF));
382
383 const MachineFrameInfo &MFI = MF.getFrameInfo();
384 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
385 if (MFI.hasScalableStackID(FI))
386 return true;
387 }
388
389 return false;
390}
391
392/// Returns true if a homogeneous prolog or epilog code can be emitted
393/// for the size optimization. If possible, a frame helper call is injected.
394/// When Exit block is given, this check is for epilog.
395bool AArch64FrameLowering::homogeneousPrologEpilog(
396 MachineFunction &MF, MachineBasicBlock *Exit) const {
397 if (!MF.getFunction().hasMinSize())
398 return false;
400 return false;
401 if (EnableRedZone)
402 return false;
403
404 // TODO: Window is supported yet.
405 if (needsWinCFI(MF))
406 return false;
407
408 // TODO: SVE is not supported yet.
409 if (isLikelyToHaveSVEStack(*this, MF))
410 return false;
411
412 // Bail on stack adjustment needed on return for simplicity.
413 const MachineFrameInfo &MFI = MF.getFrameInfo();
414 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
415 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
416 return false;
417 if (Exit && getArgumentStackToRestore(MF, *Exit))
418 return false;
419
420 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
422 return false;
423
424 // If there are an odd number of GPRs before LR and FP in the CSRs list,
425 // they will not be paired into one RegPairInfo, which is incompatible with
426 // the assumption made by the homogeneous prolog epilog pass.
427 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
428 unsigned NumGPRs = 0;
429 for (unsigned I = 0; CSRegs[I]; ++I) {
430 Register Reg = CSRegs[I];
431 if (Reg == AArch64::LR) {
432 assert(CSRegs[I + 1] == AArch64::FP);
433 if (NumGPRs % 2 != 0)
434 return false;
435 break;
436 }
437 if (AArch64::GPR64RegClass.contains(Reg))
438 ++NumGPRs;
439 }
440
441 return true;
442}
443
444/// Returns true if CSRs should be paired.
445bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
446 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
447}
448
449/// This is the biggest offset to the stack pointer we can encode in aarch64
450/// instructions (without using a separate calculation and a temp register).
451/// Note that the exception here are vector stores/loads which cannot encode any
452/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
453static const unsigned DefaultSafeSPDisplacement = 255;
454
455/// Look at each instruction that references stack frames and return the stack
456/// size limit beyond which some of these instructions will require a scratch
457/// register during their expansion later.
459 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
460 // range. We'll end up allocating an unnecessary spill slot a lot, but
461 // realistically that's not a big deal at this stage of the game.
462 for (MachineBasicBlock &MBB : MF) {
463 for (MachineInstr &MI : MBB) {
464 if (MI.isDebugInstr() || MI.isPseudo() ||
465 MI.getOpcode() == AArch64::ADDXri ||
466 MI.getOpcode() == AArch64::ADDSXri)
467 continue;
468
469 for (const MachineOperand &MO : MI.operands()) {
470 if (!MO.isFI())
471 continue;
472
474 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
476 return 0;
477 }
478 }
479 }
481}
482
487
488unsigned
489AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
490 const AArch64FunctionInfo *AFI,
491 bool IsWin64, bool IsFunclet) const {
492 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
493 "Tail call reserved stack must be aligned to 16 bytes");
494 if (!IsWin64 || IsFunclet) {
495 return AFI->getTailCallReservedStack();
496 } else {
497 if (AFI->getTailCallReservedStack() != 0 &&
498 !MF.getFunction().getAttributes().hasAttrSomewhere(
499 Attribute::SwiftAsync))
500 report_fatal_error("cannot generate ABI-changing tail call for Win64");
501 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
502
503 // Var args are stored here in the primary function.
504 FixedObjectSize += AFI->getVarArgsGPRSize();
505
506 if (MF.hasEHFunclets()) {
507 // Catch objects are stored here in the primary function.
508 const MachineFrameInfo &MFI = MF.getFrameInfo();
509 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
510 SmallSetVector<int, 8> CatchObjFrameIndices;
511 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
512 for (const WinEHHandlerType &H : TBME.HandlerArray) {
513 int FrameIndex = H.CatchObj.FrameIndex;
514 if ((FrameIndex != INT_MAX) &&
515 CatchObjFrameIndices.insert(FrameIndex)) {
516 FixedObjectSize = alignTo(FixedObjectSize,
517 MFI.getObjectAlign(FrameIndex).value()) +
518 MFI.getObjectSize(FrameIndex);
519 }
520 }
521 }
522 // To support EH funclets we allocate an UnwindHelp object
523 FixedObjectSize += 8;
524 }
525 return alignTo(FixedObjectSize, 16);
526 }
527}
528
530 if (!EnableRedZone)
531 return false;
532
533 // Don't use the red zone if the function explicitly asks us not to.
534 // This is typically used for kernel code.
535 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
536 const unsigned RedZoneSize =
538 if (!RedZoneSize)
539 return false;
540
541 const MachineFrameInfo &MFI = MF.getFrameInfo();
543 uint64_t NumBytes = AFI->getLocalStackSize();
544
545 // If neither NEON or SVE are available, a COPY from one Q-reg to
546 // another requires a spill -> reload sequence. We can do that
547 // using a pre-decrementing store/post-decrementing load, but
548 // if we do so, we can't use the Red Zone.
549 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
550 !Subtarget.isNeonAvailable() &&
551 !Subtarget.hasSVE();
552
553 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
554 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
555}
556
557/// hasFPImpl - Return true if the specified function should have a dedicated
558/// frame pointer register.
560 const MachineFrameInfo &MFI = MF.getFrameInfo();
561 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
563
564 // Win64 EH requires a frame pointer if funclets are present, as the locals
565 // are accessed off the frame pointer in both the parent function and the
566 // funclets.
567 if (MF.hasEHFunclets())
568 return true;
569 // Retain behavior of always omitting the FP for leaf functions when possible.
571 return true;
572 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
573 MFI.hasStackMap() || MFI.hasPatchPoint() ||
574 RegInfo->hasStackRealignment(MF))
575 return true;
576
577 // If we:
578 //
579 // 1. Have streaming mode changes
580 // OR:
581 // 2. Have a streaming body with SVE stack objects
582 //
583 // Then the value of VG restored when unwinding to this function may not match
584 // the value of VG used to set up the stack.
585 //
586 // This is a problem as the CFA can be described with an expression of the
587 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
588 //
589 // If the value of VG used in that expression does not match the value used to
590 // set up the stack, an incorrect address for the CFA will be computed, and
591 // unwinding will fail.
592 //
593 // We work around this issue by ensuring the frame-pointer can describe the
594 // CFA in either of these cases.
595 if (AFI.needsDwarfUnwindInfo(MF) &&
598 return true;
599 // With large callframes around we may need to use FP to access the scavenging
600 // emergency spillslot.
601 //
602 // Unfortunately some calls to hasFP() like machine verifier ->
603 // getReservedReg() -> hasFP in the middle of global isel are too early
604 // to know the max call frame size. Hopefully conservatively returning "true"
605 // in those cases is fine.
606 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
607 if (!MFI.isMaxCallFrameSizeComputed() ||
609 return true;
610
611 return false;
612}
613
614/// Should the Frame Pointer be reserved for the current function?
616 const TargetMachine &TM = MF.getTarget();
617 const Triple &TT = TM.getTargetTriple();
618
619 // These OSes require the frame chain is valid, even if the current frame does
620 // not use a frame pointer.
621 if (TT.isOSDarwin() || TT.isOSWindows())
622 return true;
623
624 // If the function has a frame pointer, it is reserved.
625 if (hasFP(MF))
626 return true;
627
628 // Frontend has requested to preserve the frame pointer.
629 if (TM.Options.FramePointerIsReserved(MF))
630 return true;
631
632 return false;
633}
634
635/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
636/// not required, we reserve argument space for call sites in the function
637/// immediately on entry to the current function. This eliminates the need for
638/// add/sub sp brackets around call sites. Returns true if the call frame is
639/// included as part of the stack frame.
641 const MachineFunction &MF) const {
642 // The stack probing code for the dynamically allocated outgoing arguments
643 // area assumes that the stack is probed at the top - either by the prologue
644 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
645 // most recent variable-sized object allocation. Changing the condition here
646 // may need to be followed up by changes to the probe issuing logic.
647 return !MF.getFrameInfo().hasVarSizedObjects();
648}
649
653 const AArch64InstrInfo *TII =
654 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
655 const AArch64TargetLowering *TLI =
656 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
657 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
658 DebugLoc DL = I->getDebugLoc();
659 unsigned Opc = I->getOpcode();
660 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
661 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
662
663 if (!hasReservedCallFrame(MF)) {
664 int64_t Amount = I->getOperand(0).getImm();
665 Amount = alignTo(Amount, getStackAlign());
666 if (!IsDestroy)
667 Amount = -Amount;
668
669 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
670 // doesn't have to pop anything), then the first operand will be zero too so
671 // this adjustment is a no-op.
672 if (CalleePopAmount == 0) {
673 // FIXME: in-function stack adjustment for calls is limited to 24-bits
674 // because there's no guaranteed temporary register available.
675 //
676 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
677 // 1) For offset <= 12-bit, we use LSL #0
678 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
679 // LSL #0, and the other uses LSL #12.
680 //
681 // Most call frames will be allocated at the start of a function so
682 // this is OK, but it is a limitation that needs dealing with.
683 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
684
685 if (TLI->hasInlineStackProbe(MF) &&
687 // When stack probing is enabled, the decrement of SP may need to be
688 // probed. We only need to do this if the call site needs 1024 bytes of
689 // space or more, because a region smaller than that is allowed to be
690 // unprobed at an ABI boundary. We rely on the fact that SP has been
691 // probed exactly at this point, either by the prologue or most recent
692 // dynamic allocation.
694 "non-reserved call frame without var sized objects?");
695 Register ScratchReg =
696 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
697 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
698 } else {
699 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
700 StackOffset::getFixed(Amount), TII);
701 }
702 }
703 } else if (CalleePopAmount != 0) {
704 // If the calling convention demands that the callee pops arguments from the
705 // stack, we want to add it back if we have a reserved call frame.
706 assert(CalleePopAmount < 0xffffff && "call frame too large");
707 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
708 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
709 }
710 return MBB.erase(I);
711}
712
714 MachineBasicBlock &MBB) const {
715
716 MachineFunction &MF = *MBB.getParent();
717 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
718 const auto &TRI = *Subtarget.getRegisterInfo();
719 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
720
721 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
722
723 // Reset the CFA to `SP + 0`.
724 CFIBuilder.buildDefCFA(AArch64::SP, 0);
725
726 // Flip the RA sign state.
727 if (MFI.shouldSignReturnAddress(MF))
728 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
729 : CFIBuilder.buildNegateRAState();
730
731 // Shadow call stack uses X18, reset it.
732 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
733 CFIBuilder.buildSameValue(AArch64::X18);
734
735 // Emit .cfi_same_value for callee-saved registers.
736 const std::vector<CalleeSavedInfo> &CSI =
738 for (const auto &Info : CSI) {
739 MCRegister Reg = Info.getReg();
740 if (!TRI.regNeedsCFI(Reg, Reg))
741 continue;
742 CFIBuilder.buildSameValue(Reg);
743 }
744}
745
747 switch (Reg.id()) {
748 default:
749 // The called routine is expected to preserve r19-r28
750 // r29 and r30 are used as frame pointer and link register resp.
751 return 0;
752
753 // GPRs
754#define CASE(n) \
755 case AArch64::W##n: \
756 case AArch64::X##n: \
757 return AArch64::X##n
758 CASE(0);
759 CASE(1);
760 CASE(2);
761 CASE(3);
762 CASE(4);
763 CASE(5);
764 CASE(6);
765 CASE(7);
766 CASE(8);
767 CASE(9);
768 CASE(10);
769 CASE(11);
770 CASE(12);
771 CASE(13);
772 CASE(14);
773 CASE(15);
774 CASE(16);
775 CASE(17);
776 CASE(18);
777#undef CASE
778
779 // FPRs
780#define CASE(n) \
781 case AArch64::B##n: \
782 case AArch64::H##n: \
783 case AArch64::S##n: \
784 case AArch64::D##n: \
785 case AArch64::Q##n: \
786 return HasSVE ? AArch64::Z##n : AArch64::Q##n
787 CASE(0);
788 CASE(1);
789 CASE(2);
790 CASE(3);
791 CASE(4);
792 CASE(5);
793 CASE(6);
794 CASE(7);
795 CASE(8);
796 CASE(9);
797 CASE(10);
798 CASE(11);
799 CASE(12);
800 CASE(13);
801 CASE(14);
802 CASE(15);
803 CASE(16);
804 CASE(17);
805 CASE(18);
806 CASE(19);
807 CASE(20);
808 CASE(21);
809 CASE(22);
810 CASE(23);
811 CASE(24);
812 CASE(25);
813 CASE(26);
814 CASE(27);
815 CASE(28);
816 CASE(29);
817 CASE(30);
818 CASE(31);
819#undef CASE
820 }
821}
822
823void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
824 MachineBasicBlock &MBB) const {
825 // Insertion point.
827
828 // Fake a debug loc.
829 DebugLoc DL;
830 if (MBBI != MBB.end())
831 DL = MBBI->getDebugLoc();
832
833 const MachineFunction &MF = *MBB.getParent();
834 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
835 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
836
837 BitVector GPRsToZero(TRI.getNumRegs());
838 BitVector FPRsToZero(TRI.getNumRegs());
839 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
840 for (MCRegister Reg : RegsToZero.set_bits()) {
841 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
842 // For GPRs, we only care to clear out the 64-bit register.
843 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
844 GPRsToZero.set(XReg);
845 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
846 // For FPRs,
847 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
848 FPRsToZero.set(XReg);
849 }
850 }
851
852 const AArch64InstrInfo &TII = *STI.getInstrInfo();
853
854 // Zero out GPRs.
855 for (MCRegister Reg : GPRsToZero.set_bits())
856 TII.buildClearRegister(Reg, MBB, MBBI, DL);
857
858 // Zero out FP/vector registers.
859 for (MCRegister Reg : FPRsToZero.set_bits())
860 TII.buildClearRegister(Reg, MBB, MBBI, DL);
861
862 if (HasSVE) {
863 for (MCRegister PReg :
864 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
865 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
866 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
867 AArch64::P15}) {
868 if (RegsToZero[PReg])
869 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
870 }
871 }
872}
873
874bool AArch64FrameLowering::windowsRequiresStackProbe(
875 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
876 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
877 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
878 // TODO: When implementing stack protectors, take that into account
879 // for the probe threshold.
880 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
881 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
882}
883
885 const MachineBasicBlock &MBB) {
886 const MachineFunction *MF = MBB.getParent();
887 LiveRegs.addLiveIns(MBB);
888 // Mark callee saved registers as used so we will not choose them.
889 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
890 for (unsigned i = 0; CSRegs[i]; ++i)
891 LiveRegs.addReg(CSRegs[i]);
892}
893
895AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
896 bool HasCall) const {
897 MachineFunction *MF = MBB->getParent();
898
899 // If MBB is an entry block, use X9 as the scratch register
900 // preserve_none functions may be using X9 to pass arguments,
901 // so prefer to pick an available register below.
902 if (&MF->front() == MBB &&
904 return AArch64::X9;
905
906 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
907 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
908 LivePhysRegs LiveRegs(TRI);
909 getLiveRegsForEntryMBB(LiveRegs, *MBB);
910 if (HasCall) {
911 LiveRegs.addReg(AArch64::X16);
912 LiveRegs.addReg(AArch64::X17);
913 LiveRegs.addReg(AArch64::X18);
914 }
915
916 // Prefer X9 since it was historically used for the prologue scratch reg.
917 const MachineRegisterInfo &MRI = MF->getRegInfo();
918 if (LiveRegs.available(MRI, AArch64::X9))
919 return AArch64::X9;
920
921 for (unsigned Reg : AArch64::GPR64RegClass) {
922 if (LiveRegs.available(MRI, Reg))
923 return Reg;
924 }
925 return AArch64::NoRegister;
926}
927
929 const MachineBasicBlock &MBB) const {
930 const MachineFunction *MF = MBB.getParent();
931 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
932 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
933 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
934 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
936
937 if (AFI->hasSwiftAsyncContext()) {
938 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
939 const MachineRegisterInfo &MRI = MF->getRegInfo();
942 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
943 // available.
944 if (!LiveRegs.available(MRI, AArch64::X16) ||
945 !LiveRegs.available(MRI, AArch64::X17))
946 return false;
947 }
948
949 // Certain stack probing sequences might clobber flags, then we can't use
950 // the block as a prologue if the flags register is a live-in.
952 MBB.isLiveIn(AArch64::NZCV))
953 return false;
954
955 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
956 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
957 return false;
958
959 // May need a scratch register (for return value) if require making a special
960 // call
961 if (requiresSaveVG(*MF) ||
962 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
963 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
964 return false;
965
966 return true;
967}
968
970 const Function &F = MF.getFunction();
971 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
972 F.needsUnwindTableEntry();
973}
974
975bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
976 const MachineFunction &MF) const {
977 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
978 // and SEH_EpilogEnd instructions in the correct order.
980 return false;
982 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
983 return SignReturnAddressAll;
984}
985
986// Given a load or a store instruction, generate an appropriate unwinding SEH
987// code on Windows.
989AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
990 const TargetInstrInfo &TII,
991 MachineInstr::MIFlag Flag) const {
992 unsigned Opc = MBBI->getOpcode();
993 MachineBasicBlock *MBB = MBBI->getParent();
994 MachineFunction &MF = *MBB->getParent();
995 DebugLoc DL = MBBI->getDebugLoc();
996 unsigned ImmIdx = MBBI->getNumOperands() - 1;
997 int Imm = MBBI->getOperand(ImmIdx).getImm();
999 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1000 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1001
1002 switch (Opc) {
1003 default:
1004 report_fatal_error("No SEH Opcode for this instruction");
1005 case AArch64::STR_ZXI:
1006 case AArch64::LDR_ZXI: {
1007 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1008 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1009 .addImm(Reg0)
1010 .addImm(Imm)
1011 .setMIFlag(Flag);
1012 break;
1013 }
1014 case AArch64::STR_PXI:
1015 case AArch64::LDR_PXI: {
1016 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1017 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1018 .addImm(Reg0)
1019 .addImm(Imm)
1020 .setMIFlag(Flag);
1021 break;
1022 }
1023 case AArch64::LDPDpost:
1024 Imm = -Imm;
1025 [[fallthrough]];
1026 case AArch64::STPDpre: {
1027 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1028 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1029 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1030 .addImm(Reg0)
1031 .addImm(Reg1)
1032 .addImm(Imm * 8)
1033 .setMIFlag(Flag);
1034 break;
1035 }
1036 case AArch64::LDPXpost:
1037 Imm = -Imm;
1038 [[fallthrough]];
1039 case AArch64::STPXpre: {
1040 Register Reg0 = MBBI->getOperand(1).getReg();
1041 Register Reg1 = MBBI->getOperand(2).getReg();
1042 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1043 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 else
1047 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1048 .addImm(RegInfo->getSEHRegNum(Reg0))
1049 .addImm(RegInfo->getSEHRegNum(Reg1))
1050 .addImm(Imm * 8)
1051 .setMIFlag(Flag);
1052 break;
1053 }
1054 case AArch64::LDRDpost:
1055 Imm = -Imm;
1056 [[fallthrough]];
1057 case AArch64::STRDpre: {
1058 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1059 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1060 .addImm(Reg)
1061 .addImm(Imm)
1062 .setMIFlag(Flag);
1063 break;
1064 }
1065 case AArch64::LDRXpost:
1066 Imm = -Imm;
1067 [[fallthrough]];
1068 case AArch64::STRXpre: {
1069 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1070 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1071 .addImm(Reg)
1072 .addImm(Imm)
1073 .setMIFlag(Flag);
1074 break;
1075 }
1076 case AArch64::STPDi:
1077 case AArch64::LDPDi: {
1078 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1079 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1080 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1081 .addImm(Reg0)
1082 .addImm(Reg1)
1083 .addImm(Imm * 8)
1084 .setMIFlag(Flag);
1085 break;
1086 }
1087 case AArch64::STPXi:
1088 case AArch64::LDPXi: {
1089 Register Reg0 = MBBI->getOperand(0).getReg();
1090 Register Reg1 = MBBI->getOperand(1).getReg();
1091 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1092 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1093 .addImm(Imm * 8)
1094 .setMIFlag(Flag);
1095 else
1096 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1097 .addImm(RegInfo->getSEHRegNum(Reg0))
1098 .addImm(RegInfo->getSEHRegNum(Reg1))
1099 .addImm(Imm * 8)
1100 .setMIFlag(Flag);
1101 break;
1102 }
1103 case AArch64::STRXui:
1104 case AArch64::LDRXui: {
1105 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1106 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1107 .addImm(Reg)
1108 .addImm(Imm * 8)
1109 .setMIFlag(Flag);
1110 break;
1111 }
1112 case AArch64::STRDui:
1113 case AArch64::LDRDui: {
1114 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1115 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1116 .addImm(Reg)
1117 .addImm(Imm * 8)
1118 .setMIFlag(Flag);
1119 break;
1120 }
1121 case AArch64::STPQi:
1122 case AArch64::LDPQi: {
1123 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1124 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1125 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1126 .addImm(Reg0)
1127 .addImm(Reg1)
1128 .addImm(Imm * 16)
1129 .setMIFlag(Flag);
1130 break;
1131 }
1132 case AArch64::LDPQpost:
1133 Imm = -Imm;
1134 [[fallthrough]];
1135 case AArch64::STPQpre: {
1136 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1137 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1138 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1139 .addImm(Reg0)
1140 .addImm(Reg1)
1141 .addImm(Imm * 16)
1142 .setMIFlag(Flag);
1143 break;
1144 }
1145 }
1146 auto I = MBB->insertAfter(MBBI, MIB);
1147 return I;
1148}
1149
1152 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1153 return false;
1154 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1155 // is enabled with streaming mode changes.
1156 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1157 if (ST.isTargetDarwin())
1158 return ST.hasSVE();
1159 return true;
1160}
1161
1162static bool isTargetWindows(const MachineFunction &MF) {
1164}
1165
1167 MachineFunction &MF) const {
1168 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1169 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1170
1171 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1172 DebugLoc DL; // Set debug location to unknown.
1174
1175 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1177 };
1178
1179 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1180 DebugLoc DL;
1181 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1182 if (MBBI != MBB.end())
1183 DL = MBBI->getDebugLoc();
1184
1185 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1187 };
1188
1189 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1190 EmitSignRA(MF.front());
1191 for (MachineBasicBlock &MBB : MF) {
1192 if (MBB.isEHFuncletEntry())
1193 EmitSignRA(MBB);
1194 if (MBB.isReturnBlock())
1195 EmitAuthRA(MBB);
1196 }
1197}
1198
1200 MachineBasicBlock &MBB) const {
1201 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1202 PrologueEmitter.emitPrologue();
1203}
1204
1206 MachineBasicBlock &MBB) const {
1207 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1208 EpilogueEmitter.emitEpilogue();
1209}
1210
1213 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1214}
1215
1217 return enableCFIFixup(MF) &&
1218 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1219}
1220
1221/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1222/// debug info. It's the same as what we use for resolving the code-gen
1223/// references for now. FIXME: This can go wrong when references are
1224/// SP-relative and simple call frames aren't used.
1227 Register &FrameReg) const {
1229 MF, FI, FrameReg,
1230 /*PreferFP=*/
1231 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1232 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1233 /*ForSimm=*/false);
1234}
1235
1238 int FI) const {
1239 // This function serves to provide a comparable offset from a single reference
1240 // point (the value of SP at function entry) that can be used for analysis,
1241 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1242 // correct for all objects in the presence of VLA-area objects or dynamic
1243 // stack re-alignment.
1244
1245 const auto &MFI = MF.getFrameInfo();
1246
1247 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1248 StackOffset ZPRStackSize = getZPRStackSize(MF);
1249 StackOffset PPRStackSize = getPPRStackSize(MF);
1250 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1251
1252 // For VLA-area objects, just emit an offset at the end of the stack frame.
1253 // Whilst not quite correct, these objects do live at the end of the frame and
1254 // so it is more useful for analysis for the offset to reflect this.
1255 if (MFI.isVariableSizedObjectIndex(FI)) {
1256 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1257 }
1258
1259 // This is correct in the absence of any SVE stack objects.
1260 if (!SVEStackSize)
1261 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1262
1263 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1264 bool FPAfterSVECalleeSaves =
1266 if (MFI.hasScalableStackID(FI)) {
1267 if (FPAfterSVECalleeSaves &&
1268 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1269 assert(!AFI->hasSplitSVEObjects() &&
1270 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1271 return StackOffset::getScalable(ObjectOffset);
1272 }
1273 StackOffset AccessOffset{};
1274 // The scalable vectors are below (lower address) the scalable predicates
1275 // with split SVE objects, so we must subtract the size of the predicates.
1276 if (AFI->hasSplitSVEObjects() &&
1277 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1278 AccessOffset = -PPRStackSize;
1279 return AccessOffset +
1280 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1281 ObjectOffset);
1282 }
1283
1284 bool IsFixed = MFI.isFixedObjectIndex(FI);
1285 bool IsCSR =
1286 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1287
1288 StackOffset ScalableOffset = {};
1289 if (!IsFixed && !IsCSR) {
1290 ScalableOffset = -SVEStackSize;
1291 } else if (FPAfterSVECalleeSaves && IsCSR) {
1292 ScalableOffset =
1294 }
1295
1296 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1297}
1298
1304
1305StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1306 int64_t ObjectOffset) const {
1307 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1308 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1309 const Function &F = MF.getFunction();
1310 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1311 unsigned FixedObject =
1312 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1313 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1314 int64_t FPAdjust =
1315 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1316 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1317}
1318
1319StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1320 int64_t ObjectOffset) const {
1321 const auto &MFI = MF.getFrameInfo();
1322 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1323}
1324
1325// TODO: This function currently does not work for scalable vectors.
1327 int FI) const {
1328 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1329 MF.getSubtarget().getRegisterInfo());
1330 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1331 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1332 ? getFPOffset(MF, ObjectOffset).getFixed()
1333 : getStackOffset(MF, ObjectOffset).getFixed();
1334}
1335
1337 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1338 bool ForSimm) const {
1339 const auto &MFI = MF.getFrameInfo();
1340 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1341 bool isFixed = MFI.isFixedObjectIndex(FI);
1342 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1343 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1344 FrameReg, PreferFP, ForSimm);
1345}
1346
1348 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1349 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1350 bool ForSimm) const {
1351 const auto &MFI = MF.getFrameInfo();
1352 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1353 MF.getSubtarget().getRegisterInfo());
1354 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1355 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1356
1357 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1358 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1359 bool isCSR =
1360 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1361 bool isSVE = MFI.isScalableStackID(StackID);
1362
1363 StackOffset ZPRStackSize = getZPRStackSize(MF);
1364 StackOffset PPRStackSize = getPPRStackSize(MF);
1365 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1366
1367 // Use frame pointer to reference fixed objects. Use it for locals if
1368 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1369 // reliable as a base). Make sure useFPForScavengingIndex() does the
1370 // right thing for the emergency spill slot.
1371 bool UseFP = false;
1372 if (AFI->hasStackFrame() && !isSVE) {
1373 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1374 // there are scalable (SVE) objects in between the FP and the fixed-sized
1375 // objects.
1376 PreferFP &= !SVEStackSize;
1377
1378 // Note: Keeping the following as multiple 'if' statements rather than
1379 // merging to a single expression for readability.
1380 //
1381 // Argument access should always use the FP.
1382 if (isFixed) {
1383 UseFP = hasFP(MF);
1384 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1385 // References to the CSR area must use FP if we're re-aligning the stack
1386 // since the dynamically-sized alignment padding is between the SP/BP and
1387 // the CSR area.
1388 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1389 UseFP = true;
1390 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1391 // If the FPOffset is negative and we're producing a signed immediate, we
1392 // have to keep in mind that the available offset range for negative
1393 // offsets is smaller than for positive ones. If an offset is available
1394 // via the FP and the SP, use whichever is closest.
1395 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1396 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1397
1398 if (FPOffset >= 0) {
1399 // If the FPOffset is positive, that'll always be best, as the SP/BP
1400 // will be even further away.
1401 UseFP = true;
1402 } else if (MFI.hasVarSizedObjects()) {
1403 // If we have variable sized objects, we can use either FP or BP, as the
1404 // SP offset is unknown. We can use the base pointer if we have one and
1405 // FP is not preferred. If not, we're stuck with using FP.
1406 bool CanUseBP = RegInfo->hasBasePointer(MF);
1407 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1408 UseFP = PreferFP;
1409 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1410 UseFP = true;
1411 // else we can use BP and FP, but the offset from FP won't fit.
1412 // That will make us scavenge registers which we can probably avoid by
1413 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1414 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1415 // Funclets access the locals contained in the parent's stack frame
1416 // via the frame pointer, so we have to use the FP in the parent
1417 // function.
1418 (void) Subtarget;
1419 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1420 MF.getFunction().isVarArg()) &&
1421 "Funclets should only be present on Win64");
1422 UseFP = true;
1423 } else {
1424 // We have the choice between FP and (SP or BP).
1425 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1426 UseFP = true;
1427 }
1428 }
1429 }
1430
1431 assert(
1432 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1433 "In the presence of dynamic stack pointer realignment, "
1434 "non-argument/CSR objects cannot be accessed through the frame pointer");
1435
1436 bool FPAfterSVECalleeSaves =
1438
1439 if (isSVE) {
1440 StackOffset FPOffset = StackOffset::get(
1441 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1442 StackOffset SPOffset =
1443 SVEStackSize +
1444 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1445 ObjectOffset);
1446
1447 // With split SVE objects the ObjectOffset is relative to the split area
1448 // (i.e. the PPR area or ZPR area respectively).
1449 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1450 // If we're accessing an SVE vector with split SVE objects...
1451 // - From the FP we need to move down past the PPR area:
1452 FPOffset -= PPRStackSize;
1453 // - From the SP we only need to move up to the ZPR area:
1454 SPOffset -= PPRStackSize;
1455 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1456 // `SPOffset = ZPRStackSize + ...`.
1457 }
1458
1459 if (FPAfterSVECalleeSaves) {
1461 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1464 }
1465 }
1466
1467 // Always use the FP for SVE spills if available and beneficial.
1468 if (hasFP(MF) && (SPOffset.getFixed() ||
1469 FPOffset.getScalable() < SPOffset.getScalable() ||
1470 RegInfo->hasStackRealignment(MF))) {
1471 FrameReg = RegInfo->getFrameRegister(MF);
1472 return FPOffset;
1473 }
1474 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1475 : (unsigned)AArch64::SP;
1476
1477 return SPOffset;
1478 }
1479
1480 StackOffset SVEAreaOffset = {};
1481 if (FPAfterSVECalleeSaves) {
1482 // In this stack layout, the FP is in between the callee saves and other
1483 // SVE allocations.
1484 StackOffset SVECalleeSavedStack =
1486 if (UseFP) {
1487 if (isFixed)
1488 SVEAreaOffset = SVECalleeSavedStack;
1489 else if (!isCSR)
1490 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1491 } else {
1492 if (isFixed)
1493 SVEAreaOffset = SVEStackSize;
1494 else if (isCSR)
1495 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1496 }
1497 } else {
1498 if (UseFP && !(isFixed || isCSR))
1499 SVEAreaOffset = -SVEStackSize;
1500 if (!UseFP && (isFixed || isCSR))
1501 SVEAreaOffset = SVEStackSize;
1502 }
1503
1504 if (UseFP) {
1505 FrameReg = RegInfo->getFrameRegister(MF);
1506 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1507 }
1508
1509 // Use the base pointer if we have one.
1510 if (RegInfo->hasBasePointer(MF))
1511 FrameReg = RegInfo->getBaseRegister();
1512 else {
1513 assert(!MFI.hasVarSizedObjects() &&
1514 "Can't use SP when we have var sized objects.");
1515 FrameReg = AArch64::SP;
1516 // If we're using the red zone for this function, the SP won't actually
1517 // be adjusted, so the offsets will be negative. They're also all
1518 // within range of the signed 9-bit immediate instructions.
1519 if (canUseRedZone(MF))
1520 Offset -= AFI->getLocalStackSize();
1521 }
1522
1523 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1524}
1525
1526static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1527 // Do not set a kill flag on values that are also marked as live-in. This
1528 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1529 // callee saved registers.
1530 // Omitting the kill flags is conservatively correct even if the live-in
1531 // is not used after all.
1532 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1533 return getKillRegState(!IsLiveIn);
1534}
1535
1537 MachineFunction &MF) {
1538 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1539 AttributeList Attrs = MF.getFunction().getAttributes();
1541 return Subtarget.isTargetMachO() &&
1542 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1543 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1545 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1546}
1547
1548static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1549 bool NeedsWinCFI, bool IsFirst,
1550 const TargetRegisterInfo *TRI) {
1551 // If we are generating register pairs for a Windows function that requires
1552 // EH support, then pair consecutive registers only. There are no unwind
1553 // opcodes for saves/restores of non-consecutive register pairs.
1554 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1555 // save_lrpair.
1556 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1557
1558 if (Reg2 == AArch64::FP)
1559 return true;
1560 if (!NeedsWinCFI)
1561 return false;
1562 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1563 return false;
1564 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1565 // opcode. If this is the first register pair, it would end up with a
1566 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1567 // if LR is paired with something else than the first register.
1568 // The save_lrpair opcode requires the first register to be an odd one.
1569 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1570 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1571 return false;
1572 return true;
1573}
1574
1575/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1576/// WindowsCFI requires that only consecutive registers can be paired.
1577/// LR and FP need to be allocated together when the frame needs to save
1578/// the frame-record. This means any other register pairing with LR is invalid.
1579static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1580 bool UsesWinAAPCS, bool NeedsWinCFI,
1581 bool NeedsFrameRecord, bool IsFirst,
1582 const TargetRegisterInfo *TRI) {
1583 if (UsesWinAAPCS)
1584 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1585 TRI);
1586
1587 // If we need to store the frame record, don't pair any register
1588 // with LR other than FP.
1589 if (NeedsFrameRecord)
1590 return Reg2 == AArch64::LR;
1591
1592 return false;
1593}
1594
1595namespace {
1596
1597struct RegPairInfo {
1598 unsigned Reg1 = AArch64::NoRegister;
1599 unsigned Reg2 = AArch64::NoRegister;
1600 int FrameIdx;
1601 int Offset;
1602 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1603 const TargetRegisterClass *RC;
1604
1605 RegPairInfo() = default;
1606
1607 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
1608
1609 bool isScalable() const { return Type == PPR || Type == ZPR; }
1610};
1611
1612} // end anonymous namespace
1613
1614unsigned findFreePredicateReg(BitVector &SavedRegs) {
1615 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1616 if (SavedRegs.test(PReg)) {
1617 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1618 return PNReg;
1619 }
1620 }
1621 return AArch64::NoRegister;
1622}
1623
1624// The multivector LD/ST are available only for SME or SVE2p1 targets
1626 MachineFunction &MF) {
1628 return false;
1629
1630 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1631 bool IsLocallyStreaming =
1632 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1633
1634 // Only when in streaming mode SME2 instructions can be safely used.
1635 // It is not safe to use SME2 instructions when in streaming compatible or
1636 // locally streaming mode.
1637 return Subtarget.hasSVE2p1() ||
1638 (Subtarget.hasSME2() &&
1639 (!IsLocallyStreaming && Subtarget.isStreaming()));
1640}
1641
1643 MachineFunction &MF,
1645 const TargetRegisterInfo *TRI,
1647 bool NeedsFrameRecord) {
1648
1649 if (CSI.empty())
1650 return;
1651
1652 bool IsWindows = isTargetWindows(MF);
1653 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1655 unsigned StackHazardSize = getStackHazardSize(MF);
1656 MachineFrameInfo &MFI = MF.getFrameInfo();
1658 unsigned Count = CSI.size();
1659 (void)CC;
1660 // MachO's compact unwind format relies on all registers being stored in
1661 // pairs.
1662 assert((!produceCompactUnwindFrame(AFL, MF) ||
1665 (Count & 1) == 0) &&
1666 "Odd number of callee-saved regs to spill!");
1667 int ByteOffset = AFI->getCalleeSavedStackSize();
1668 int StackFillDir = -1;
1669 int RegInc = 1;
1670 unsigned FirstReg = 0;
1671 if (NeedsWinCFI) {
1672 // For WinCFI, fill the stack from the bottom up.
1673 ByteOffset = 0;
1674 StackFillDir = 1;
1675 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1676 // backwards, to pair up registers starting from lower numbered registers.
1677 RegInc = -1;
1678 FirstReg = Count - 1;
1679 }
1680
1681 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1682
1683 int ZPRByteOffset = 0;
1684 int PPRByteOffset = 0;
1685 bool SplitPPRs = AFI->hasSplitSVEObjects();
1686 if (SplitPPRs) {
1687 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1688 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1689 } else if (!FPAfterSVECalleeSaves) {
1690 ZPRByteOffset =
1692 // Unused: Everything goes in ZPR space.
1693 PPRByteOffset = 0;
1694 }
1695
1696 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1697 Register LastReg = 0;
1698 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1699
1700 // When iterating backwards, the loop condition relies on unsigned wraparound.
1701 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1702 RegPairInfo RPI;
1703 RPI.Reg1 = CSI[i].getReg();
1704
1705 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1706 RPI.Type = RegPairInfo::GPR;
1707 RPI.RC = &AArch64::GPR64RegClass;
1708 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1709 RPI.Type = RegPairInfo::FPR64;
1710 RPI.RC = &AArch64::FPR64RegClass;
1711 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1712 RPI.Type = RegPairInfo::FPR128;
1713 RPI.RC = &AArch64::FPR128RegClass;
1714 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1715 RPI.Type = RegPairInfo::ZPR;
1716 RPI.RC = &AArch64::ZPRRegClass;
1717 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1718 RPI.Type = RegPairInfo::PPR;
1719 RPI.RC = &AArch64::PPRRegClass;
1720 } else if (RPI.Reg1 == AArch64::VG) {
1721 RPI.Type = RegPairInfo::VG;
1722 RPI.RC = &AArch64::FIXED_REGSRegClass;
1723 } else {
1724 llvm_unreachable("Unsupported register class.");
1725 }
1726
1727 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1728 ? PPRByteOffset
1729 : ZPRByteOffset;
1730
1731 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1732 if (HasCSHazardPadding &&
1733 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1735 ByteOffset += StackFillDir * StackHazardSize;
1736 LastReg = RPI.Reg1;
1737
1738 int Scale = TRI->getSpillSize(*RPI.RC);
1739 // Add the next reg to the pair if it is in the same register class.
1740 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1741 MCRegister NextReg = CSI[i + RegInc].getReg();
1742 bool IsFirst = i == FirstReg;
1743 switch (RPI.Type) {
1744 case RegPairInfo::GPR:
1745 if (AArch64::GPR64RegClass.contains(NextReg) &&
1746 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
1747 NeedsWinCFI, NeedsFrameRecord, IsFirst,
1748 TRI))
1749 RPI.Reg2 = NextReg;
1750 break;
1751 case RegPairInfo::FPR64:
1752 if (AArch64::FPR64RegClass.contains(NextReg) &&
1753 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
1754 IsFirst, TRI))
1755 RPI.Reg2 = NextReg;
1756 break;
1757 case RegPairInfo::FPR128:
1758 if (AArch64::FPR128RegClass.contains(NextReg))
1759 RPI.Reg2 = NextReg;
1760 break;
1761 case RegPairInfo::PPR:
1762 break;
1763 case RegPairInfo::ZPR:
1764 if (AFI->getPredicateRegForFillSpill() != 0 &&
1765 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1766 // Calculate offset of register pair to see if pair instruction can be
1767 // used.
1768 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1769 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1770 RPI.Reg2 = NextReg;
1771 }
1772 break;
1773 case RegPairInfo::VG:
1774 break;
1775 }
1776 }
1777
1778 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1779 // list to come in sorted by frame index so that we can issue the store
1780 // pair instructions directly. Assert if we see anything otherwise.
1781 //
1782 // The order of the registers in the list is controlled by
1783 // getCalleeSavedRegs(), so they will always be in-order, as well.
1784 assert((!RPI.isPaired() ||
1785 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1786 "Out of order callee saved regs!");
1787
1788 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1789 RPI.Reg1 == AArch64::LR) &&
1790 "FrameRecord must be allocated together with LR");
1791
1792 // Windows AAPCS has FP and LR reversed.
1793 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1794 RPI.Reg2 == AArch64::LR) &&
1795 "FrameRecord must be allocated together with LR");
1796
1797 // MachO's compact unwind format relies on all registers being stored in
1798 // adjacent register pairs.
1799 assert((!produceCompactUnwindFrame(AFL, MF) ||
1802 (RPI.isPaired() &&
1803 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1804 RPI.Reg1 + 1 == RPI.Reg2))) &&
1805 "Callee-save registers not saved as adjacent register pair!");
1806
1807 RPI.FrameIdx = CSI[i].getFrameIdx();
1808 if (NeedsWinCFI &&
1809 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1810 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1811
1812 // Realign the scalable offset if necessary. This is relevant when
1813 // spilling predicates on Windows.
1814 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1815 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1816 }
1817
1818 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1819 assert(OffsetPre % Scale == 0);
1820
1821 if (RPI.isScalable())
1822 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1823 else
1824 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1825
1826 // Swift's async context is directly before FP, so allocate an extra
1827 // 8 bytes for it.
1828 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1829 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1830 (IsWindows && RPI.Reg2 == AArch64::LR)))
1831 ByteOffset += StackFillDir * 8;
1832
1833 // Round up size of non-pair to pair size if we need to pad the
1834 // callee-save area to ensure 16-byte alignment.
1835 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1836 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1837 ByteOffset % 16 != 0) {
1838 ByteOffset += 8 * StackFillDir;
1839 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1840 // A stack frame with a gap looks like this, bottom up:
1841 // d9, d8. x21, gap, x20, x19.
1842 // Set extra alignment on the x21 object to create the gap above it.
1843 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1844 NeedGapToAlignStack = false;
1845 }
1846
1847 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1848 assert(OffsetPost % Scale == 0);
1849 // If filling top down (default), we want the offset after incrementing it.
1850 // If filling bottom up (WinCFI) we need the original offset.
1851 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1852
1853 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1854 // Swift context can directly precede FP.
1855 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1856 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1857 (IsWindows && RPI.Reg2 == AArch64::LR)))
1858 Offset += 8;
1859 RPI.Offset = Offset / Scale;
1860
1861 assert((!RPI.isPaired() ||
1862 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1863 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1864 "Offset out of bounds for LDP/STP immediate");
1865
1866 auto isFrameRecord = [&] {
1867 if (RPI.isPaired())
1868 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1869 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1870 // Otherwise, look for the frame record as two unpaired registers. This is
1871 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1872 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1873 // On Windows, this check works out as current reg == FP, next reg == LR,
1874 // and on other platforms current reg == FP, previous reg == LR. This
1875 // works out as the correct pre-increment or post-increment offsets
1876 // respectively.
1877 return i > 0 && RPI.Reg1 == AArch64::FP &&
1878 CSI[i - 1].getReg() == AArch64::LR;
1879 };
1880
1881 // Save the offset to frame record so that the FP register can point to the
1882 // innermost frame record (spilled FP and LR registers).
1883 if (NeedsFrameRecord && isFrameRecord())
1885
1886 RegPairs.push_back(RPI);
1887 if (RPI.isPaired())
1888 i += RegInc;
1889 }
1890 if (NeedsWinCFI) {
1891 // If we need an alignment gap in the stack, align the topmost stack
1892 // object. A stack frame with a gap looks like this, bottom up:
1893 // x19, d8. d9, gap.
1894 // Set extra alignment on the topmost stack object (the first element in
1895 // CSI, which goes top down), to create the gap above it.
1896 if (AFI->hasCalleeSaveStackFreeSpace())
1897 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1898 // We iterated bottom up over the registers; flip RegPairs back to top
1899 // down order.
1900 std::reverse(RegPairs.begin(), RegPairs.end());
1901 }
1902}
1903
1907 MachineFunction &MF = *MBB.getParent();
1908 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1910 bool NeedsWinCFI = needsWinCFI(MF);
1911 DebugLoc DL;
1913
1914 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1915
1917 // Refresh the reserved regs in case there are any potential changes since the
1918 // last freeze.
1919 MRI.freezeReservedRegs();
1920
1921 if (homogeneousPrologEpilog(MF)) {
1922 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1924
1925 for (auto &RPI : RegPairs) {
1926 MIB.addReg(RPI.Reg1);
1927 MIB.addReg(RPI.Reg2);
1928
1929 // Update register live in.
1930 if (!MRI.isReserved(RPI.Reg1))
1931 MBB.addLiveIn(RPI.Reg1);
1932 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1933 MBB.addLiveIn(RPI.Reg2);
1934 }
1935 return true;
1936 }
1937 bool PTrueCreated = false;
1938 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1939 unsigned Reg1 = RPI.Reg1;
1940 unsigned Reg2 = RPI.Reg2;
1941 unsigned StrOpc;
1942
1943 // Issue sequence of spills for cs regs. The first spill may be converted
1944 // to a pre-decrement store later by emitPrologue if the callee-save stack
1945 // area allocation can't be combined with the local stack area allocation.
1946 // For example:
1947 // stp x22, x21, [sp, #0] // addImm(+0)
1948 // stp x20, x19, [sp, #16] // addImm(+2)
1949 // stp fp, lr, [sp, #32] // addImm(+4)
1950 // Rationale: This sequence saves uop updates compared to a sequence of
1951 // pre-increment spills like stp xi,xj,[sp,#-16]!
1952 // Note: Similar rationale and sequence for restores in epilog.
1953 unsigned Size = TRI->getSpillSize(*RPI.RC);
1954 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1955 switch (RPI.Type) {
1956 case RegPairInfo::GPR:
1957 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1958 break;
1959 case RegPairInfo::FPR64:
1960 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1961 break;
1962 case RegPairInfo::FPR128:
1963 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
1964 break;
1965 case RegPairInfo::ZPR:
1966 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
1967 break;
1968 case RegPairInfo::PPR:
1969 StrOpc =
1970 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
1971 break;
1972 case RegPairInfo::VG:
1973 StrOpc = AArch64::STRXui;
1974 break;
1975 }
1976
1977 unsigned X0Scratch = AArch64::NoRegister;
1978 auto RestoreX0 = make_scope_exit([&] {
1979 if (X0Scratch != AArch64::NoRegister)
1980 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
1981 .addReg(X0Scratch)
1983 });
1984
1985 if (Reg1 == AArch64::VG) {
1986 // Find an available register to store value of VG to.
1987 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
1988 assert(Reg1 != AArch64::NoRegister);
1989 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
1990 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
1991 .addImm(31)
1992 .addImm(1)
1994 } else {
1996 if (any_of(MBB.liveins(),
1997 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
1998 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
1999 AArch64::X0, LiveIn.PhysReg);
2000 })) {
2001 X0Scratch = Reg1;
2002 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2003 .addReg(AArch64::X0)
2005 }
2006
2007 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2008 const uint32_t *RegMask =
2009 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2010 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2011 .addExternalSymbol(TLI.getLibcallName(LC))
2012 .addRegMask(RegMask)
2013 .addReg(AArch64::X0, RegState::ImplicitDefine)
2015 Reg1 = AArch64::X0;
2016 }
2017 }
2018
2019 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2020 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2021 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2022 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2023 dbgs() << ")\n");
2024
2025 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2026 "Windows unwdinding requires a consecutive (FP,LR) pair");
2027 // Windows unwind codes require consecutive registers if registers are
2028 // paired. Make the switch here, so that the code below will save (x,x+1)
2029 // and not (x+1,x).
2030 unsigned FrameIdxReg1 = RPI.FrameIdx;
2031 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2032 if (NeedsWinCFI && RPI.isPaired()) {
2033 std::swap(Reg1, Reg2);
2034 std::swap(FrameIdxReg1, FrameIdxReg2);
2035 }
2036
2037 if (RPI.isPaired() && RPI.isScalable()) {
2038 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2041 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2042 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2043 "Expects SVE2.1 or SME2 target and a predicate register");
2044#ifdef EXPENSIVE_CHECKS
2045 auto IsPPR = [](const RegPairInfo &c) {
2046 return c.Reg1 == RegPairInfo::PPR;
2047 };
2048 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2049 auto IsZPR = [](const RegPairInfo &c) {
2050 return c.Type == RegPairInfo::ZPR;
2051 };
2052 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2053 assert(!(PPRBegin < ZPRBegin) &&
2054 "Expected callee save predicate to be handled first");
2055#endif
2056 if (!PTrueCreated) {
2057 PTrueCreated = true;
2058 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2060 }
2061 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2062 if (!MRI.isReserved(Reg1))
2063 MBB.addLiveIn(Reg1);
2064 if (!MRI.isReserved(Reg2))
2065 MBB.addLiveIn(Reg2);
2066 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2068 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2069 MachineMemOperand::MOStore, Size, Alignment));
2070 MIB.addReg(PnReg);
2071 MIB.addReg(AArch64::SP)
2072 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2073 // where 2*vscale is implicit
2076 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2077 MachineMemOperand::MOStore, Size, Alignment));
2078 if (NeedsWinCFI)
2079 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2080 } else { // The code when the pair of ZReg is not present
2081 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2082 if (!MRI.isReserved(Reg1))
2083 MBB.addLiveIn(Reg1);
2084 if (RPI.isPaired()) {
2085 if (!MRI.isReserved(Reg2))
2086 MBB.addLiveIn(Reg2);
2087 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2089 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2090 MachineMemOperand::MOStore, Size, Alignment));
2091 }
2092 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2093 .addReg(AArch64::SP)
2094 .addImm(RPI.Offset) // [sp, #offset*vscale],
2095 // where factor*vscale is implicit
2098 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2099 MachineMemOperand::MOStore, Size, Alignment));
2100 if (NeedsWinCFI)
2101 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2102 }
2103 // Update the StackIDs of the SVE stack slots.
2104 MachineFrameInfo &MFI = MF.getFrameInfo();
2105 if (RPI.Type == RegPairInfo::ZPR) {
2106 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2107 if (RPI.isPaired())
2108 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2109 } else if (RPI.Type == RegPairInfo::PPR) {
2111 if (RPI.isPaired())
2113 }
2114 }
2115 return true;
2116}
2117
2121 MachineFunction &MF = *MBB.getParent();
2123 DebugLoc DL;
2125 bool NeedsWinCFI = needsWinCFI(MF);
2126
2127 if (MBBI != MBB.end())
2128 DL = MBBI->getDebugLoc();
2129
2130 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2131 if (homogeneousPrologEpilog(MF, &MBB)) {
2132 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2134 for (auto &RPI : RegPairs) {
2135 MIB.addReg(RPI.Reg1, RegState::Define);
2136 MIB.addReg(RPI.Reg2, RegState::Define);
2137 }
2138 return true;
2139 }
2140
2141 // For performance reasons restore SVE register in increasing order
2142 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2143 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2144 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2145 std::reverse(PPRBegin, PPREnd);
2146 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2147 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2148 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2149 std::reverse(ZPRBegin, ZPREnd);
2150
2151 bool PTrueCreated = false;
2152 for (const RegPairInfo &RPI : RegPairs) {
2153 unsigned Reg1 = RPI.Reg1;
2154 unsigned Reg2 = RPI.Reg2;
2155
2156 // Issue sequence of restores for cs regs. The last restore may be converted
2157 // to a post-increment load later by emitEpilogue if the callee-save stack
2158 // area allocation can't be combined with the local stack area allocation.
2159 // For example:
2160 // ldp fp, lr, [sp, #32] // addImm(+4)
2161 // ldp x20, x19, [sp, #16] // addImm(+2)
2162 // ldp x22, x21, [sp, #0] // addImm(+0)
2163 // Note: see comment in spillCalleeSavedRegisters()
2164 unsigned LdrOpc;
2165 unsigned Size = TRI->getSpillSize(*RPI.RC);
2166 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2167 switch (RPI.Type) {
2168 case RegPairInfo::GPR:
2169 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2170 break;
2171 case RegPairInfo::FPR64:
2172 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2173 break;
2174 case RegPairInfo::FPR128:
2175 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2176 break;
2177 case RegPairInfo::ZPR:
2178 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2179 break;
2180 case RegPairInfo::PPR:
2181 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
2182 : AArch64::LDR_PXI;
2183 break;
2184 case RegPairInfo::VG:
2185 continue;
2186 }
2187 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2188 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2189 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2190 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2191 dbgs() << ")\n");
2192
2193 // Windows unwind codes require consecutive registers if registers are
2194 // paired. Make the switch here, so that the code below will save (x,x+1)
2195 // and not (x+1,x).
2196 unsigned FrameIdxReg1 = RPI.FrameIdx;
2197 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2198 if (NeedsWinCFI && RPI.isPaired()) {
2199 std::swap(Reg1, Reg2);
2200 std::swap(FrameIdxReg1, FrameIdxReg2);
2201 }
2202
2204 if (RPI.isPaired() && RPI.isScalable()) {
2205 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2207 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2208 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2209 "Expects SVE2.1 or SME2 target and a predicate register");
2210#ifdef EXPENSIVE_CHECKS
2211 assert(!(PPRBegin < ZPRBegin) &&
2212 "Expected callee save predicate to be handled first");
2213#endif
2214 if (!PTrueCreated) {
2215 PTrueCreated = true;
2216 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2218 }
2219 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2220 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2221 getDefRegState(true));
2223 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2224 MachineMemOperand::MOLoad, Size, Alignment));
2225 MIB.addReg(PnReg);
2226 MIB.addReg(AArch64::SP)
2227 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2228 // where 2*vscale is implicit
2231 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2232 MachineMemOperand::MOLoad, Size, Alignment));
2233 if (NeedsWinCFI)
2234 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2235 } else {
2236 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2237 if (RPI.isPaired()) {
2238 MIB.addReg(Reg2, getDefRegState(true));
2240 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2241 MachineMemOperand::MOLoad, Size, Alignment));
2242 }
2243 MIB.addReg(Reg1, getDefRegState(true));
2244 MIB.addReg(AArch64::SP)
2245 .addImm(RPI.Offset) // [sp, #offset*vscale]
2246 // where factor*vscale is implicit
2249 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2250 MachineMemOperand::MOLoad, Size, Alignment));
2251 if (NeedsWinCFI)
2252 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2253 }
2254 }
2255 return true;
2256}
2257
2258// Return the FrameID for a MMO.
2259static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2260 const MachineFrameInfo &MFI) {
2261 auto *PSV =
2263 if (PSV)
2264 return std::optional<int>(PSV->getFrameIndex());
2265
2266 if (MMO->getValue()) {
2267 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2268 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2269 FI++)
2270 if (MFI.getObjectAllocation(FI) == Al)
2271 return FI;
2272 }
2273 }
2274
2275 return std::nullopt;
2276}
2277
2278// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2279static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2280 const MachineFrameInfo &MFI) {
2281 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2282 return std::nullopt;
2283
2284 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2285}
2286
2287// Returns true if the LDST MachineInstr \p MI is a PPR access.
2288static bool isPPRAccess(const MachineInstr &MI) {
2289 return MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
2290 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
2291 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2292}
2293
2294// Check if a Hazard slot is needed for the current function, and if so create
2295// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2296// which can be used to determine if any hazard padding is needed.
2297void AArch64FrameLowering::determineStackHazardSlot(
2298 MachineFunction &MF, BitVector &SavedRegs) const {
2299 unsigned StackHazardSize = getStackHazardSize(MF);
2300 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2301 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2303 return;
2304
2305 // Stack hazards are only needed in streaming functions.
2306 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2307 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2308 return;
2309
2310 MachineFrameInfo &MFI = MF.getFrameInfo();
2311
2312 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2313 // stack objects.
2314 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2315 return AArch64::FPR64RegClass.contains(Reg) ||
2316 AArch64::FPR128RegClass.contains(Reg) ||
2317 AArch64::ZPRRegClass.contains(Reg);
2318 });
2319 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2320 return AArch64::PPRRegClass.contains(Reg);
2321 });
2322 bool HasFPRStackObjects = false;
2323 bool HasPPRStackObjects = false;
2324 if (!HasFPRCSRs || SplitSVEObjects) {
2325 enum SlotType : uint8_t {
2326 Unknown = 0,
2327 ZPRorFPR = 1 << 0,
2328 PPR = 1 << 1,
2329 GPR = 1 << 2,
2331 };
2332
2333 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2334 // based on the kinds of accesses used in the function.
2335 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2336 for (auto &MBB : MF) {
2337 for (auto &MI : MBB) {
2338 std::optional<int> FI = getLdStFrameID(MI, MFI);
2339 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2340 continue;
2341 if (MFI.hasScalableStackID(*FI)) {
2342 SlotTypes[*FI] |=
2343 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2344 } else {
2345 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2346 ? SlotType::ZPRorFPR
2347 : SlotType::GPR;
2348 }
2349 }
2350 }
2351
2352 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2353 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2354 // For SplitSVEObjects remember that this stack slot is a predicate, this
2355 // will be needed later when determining the frame layout.
2356 if (SlotTypes[FI] == SlotType::PPR) {
2358 HasPPRStackObjects = true;
2359 }
2360 }
2361 }
2362
2363 if (HasFPRCSRs || HasFPRStackObjects) {
2364 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2365 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2366 << StackHazardSize << "\n");
2368 }
2369
2370 // Determine if we should use SplitSVEObjects. This should only be used if
2371 // there's a possibility of a stack hazard between PPRs and ZPRs or FPRs.
2372 if (SplitSVEObjects) {
2373 if (!HasPPRCSRs && !HasPPRStackObjects) {
2374 LLVM_DEBUG(
2375 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2376 return;
2377 }
2378
2379 if (!HasFPRCSRs && !HasFPRStackObjects) {
2380 LLVM_DEBUG(
2381 dbgs()
2382 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2383 return;
2384 }
2385
2386 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2387 if (MFI.hasVarSizedObjects() || TRI->hasStackRealignment(MF)) {
2388 LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with variable "
2389 "sized objects or realignment\n");
2390 return;
2391 }
2392
2393 if (arePPRsSpilledAsZPR(MF)) {
2394 LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with "
2395 "-aarch64-enable-zpr-predicate-spills");
2396 return;
2397 }
2398
2399 // If another calling convention is explicitly set FPRs can't be promoted to
2400 // ZPR callee-saves.
2403 MF.getFunction().getCallingConv())) {
2404 LLVM_DEBUG(
2405 dbgs() << "Calling convention is not supported with SplitSVEObjects");
2406 return;
2407 }
2408
2409 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2410 MF.getSubtarget<AArch64Subtarget>();
2412 "Expected SVE to be available for PPRs");
2413
2414 // With SplitSVEObjects the CS hazard padding is placed between the
2415 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2416 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2417 BitVector FPRZRegs(SavedRegs.size());
2418 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2419 BitVector::reference RegBit = SavedRegs[Reg];
2420 if (!RegBit)
2421 continue;
2422 unsigned SubRegIdx = 0;
2423 if (AArch64::FPR64RegClass.contains(Reg))
2424 SubRegIdx = AArch64::dsub;
2425 else if (AArch64::FPR128RegClass.contains(Reg))
2426 SubRegIdx = AArch64::zsub;
2427 else
2428 continue;
2429 // Clear the bit for the FPR save.
2430 RegBit = false;
2431 // Mark that we should save the corresponding ZPR.
2432 Register ZReg =
2433 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2434 FPRZRegs.set(ZReg);
2435 }
2436 SavedRegs |= FPRZRegs;
2437
2438 AFI->setSplitSVEObjects(true);
2439 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2440 }
2441}
2442
2444 BitVector &SavedRegs,
2445 RegScavenger *RS) const {
2446 // All calls are tail calls in GHC calling conv, and functions have no
2447 // prologue/epilogue.
2449 return;
2450
2451 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2452
2454 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2455 MF.getSubtarget().getRegisterInfo());
2457 unsigned UnspilledCSGPR = AArch64::NoRegister;
2458 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2459
2460 MachineFrameInfo &MFI = MF.getFrameInfo();
2461 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2462
2463 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2464 ? RegInfo->getBaseRegister()
2465 : (unsigned)AArch64::NoRegister;
2466
2467 unsigned ExtraCSSpill = 0;
2468 bool HasUnpairedGPR64 = false;
2469 bool HasPairZReg = false;
2470 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2471 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2472
2473 // Figure out which callee-saved registers to save/restore.
2474 for (unsigned i = 0; CSRegs[i]; ++i) {
2475 const unsigned Reg = CSRegs[i];
2476
2477 // Add the base pointer register to SavedRegs if it is callee-save.
2478 if (Reg == BasePointerReg)
2479 SavedRegs.set(Reg);
2480
2481 // Don't save manually reserved registers set through +reserve-x#i,
2482 // even for callee-saved registers, as per GCC's behavior.
2483 if (UserReservedRegs[Reg]) {
2484 SavedRegs.reset(Reg);
2485 continue;
2486 }
2487
2488 bool RegUsed = SavedRegs.test(Reg);
2489 unsigned PairedReg = AArch64::NoRegister;
2490 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2491 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2492 AArch64::FPR128RegClass.contains(Reg)) {
2493 // Compensate for odd numbers of GP CSRs.
2494 // For now, all the known cases of odd number of CSRs are of GPRs.
2495 if (HasUnpairedGPR64)
2496 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2497 else
2498 PairedReg = CSRegs[i ^ 1];
2499 }
2500
2501 // If the function requires all the GP registers to save (SavedRegs),
2502 // and there are an odd number of GP CSRs at the same time (CSRegs),
2503 // PairedReg could be in a different register class from Reg, which would
2504 // lead to a FPR (usually D8) accidentally being marked saved.
2505 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2506 PairedReg = AArch64::NoRegister;
2507 HasUnpairedGPR64 = true;
2508 }
2509 assert(PairedReg == AArch64::NoRegister ||
2510 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2511 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2512 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2513
2514 if (!RegUsed) {
2515 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2516 UnspilledCSGPR = Reg;
2517 UnspilledCSGPRPaired = PairedReg;
2518 }
2519 continue;
2520 }
2521
2522 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
2523 // spilled. If all of p0-p3 are used as return values p4 is must be free
2524 // to reload p8-p15.
2525 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
2526 AArch64::PPR_p8to15RegClass.contains(Reg)) {
2527 SavedRegs.set(AArch64::P4);
2528 }
2529
2530 // MachO's compact unwind format relies on all registers being stored in
2531 // pairs.
2532 // FIXME: the usual format is actually better if unwinding isn't needed.
2533 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2534 !SavedRegs.test(PairedReg)) {
2535 SavedRegs.set(PairedReg);
2536 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2537 !ReservedRegs[PairedReg])
2538 ExtraCSSpill = PairedReg;
2539 }
2540 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2541 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2542 SavedRegs.test(CSRegs[i ^ 1]));
2543 }
2544
2545 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2547 // Find a suitable predicate register for the multi-vector spill/fill
2548 // instructions.
2549 unsigned PnReg = findFreePredicateReg(SavedRegs);
2550 if (PnReg != AArch64::NoRegister)
2551 AFI->setPredicateRegForFillSpill(PnReg);
2552 // If no free callee-save has been found assign one.
2553 if (!AFI->getPredicateRegForFillSpill() &&
2554 MF.getFunction().getCallingConv() ==
2556 SavedRegs.set(AArch64::P8);
2557 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2558 }
2559
2560 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2561 "Predicate cannot be a reserved register");
2562 }
2563
2565 !Subtarget.isTargetWindows()) {
2566 // For Windows calling convention on a non-windows OS, where X18 is treated
2567 // as reserved, back up X18 when entering non-windows code (marked with the
2568 // Windows calling convention) and restore when returning regardless of
2569 // whether the individual function uses it - it might call other functions
2570 // that clobber it.
2571 SavedRegs.set(AArch64::X18);
2572 }
2573
2574 // Determine if a Hazard slot should be used and where it should go.
2575 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2576 // and ZPRs. Otherwise, it goes in the callee save area.
2577 determineStackHazardSlot(MF, SavedRegs);
2578
2579 // Calculates the callee saved stack size.
2580 unsigned CSStackSize = 0;
2581 unsigned ZPRCSStackSize = 0;
2582 unsigned PPRCSStackSize = 0;
2584 for (unsigned Reg : SavedRegs.set_bits()) {
2585 auto *RC = TRI->getMinimalPhysRegClass(Reg);
2586 assert(RC && "expected register class!");
2587 auto SpillSize = TRI->getSpillSize(*RC);
2588 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2589 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2590 if (IsZPR || (IsPPR && arePPRsSpilledAsZPR(MF)))
2591 ZPRCSStackSize += SpillSize;
2592 else if (IsPPR)
2593 PPRCSStackSize += SpillSize;
2594 else
2595 CSStackSize += SpillSize;
2596 }
2597
2598 // Save number of saved regs, so we can easily update CSStackSize later to
2599 // account for any additional 64-bit GPR saves. Note: After this point
2600 // only 64-bit GPRs can be added to SavedRegs.
2601 unsigned NumSavedRegs = SavedRegs.count();
2602
2603 // If we have hazard padding in the CS area add that to the size.
2605 CSStackSize += getStackHazardSize(MF);
2606
2607 // Increase the callee-saved stack size if the function has streaming mode
2608 // changes, as we will need to spill the value of the VG register.
2609 if (requiresSaveVG(MF))
2610 CSStackSize += 8;
2611
2612 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2613 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2614 SavedRegs.set(AArch64::LR);
2615
2616 // The frame record needs to be created by saving the appropriate registers
2617 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2618 if (hasFP(MF) ||
2619 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2620 SavedRegs.set(AArch64::FP);
2621 SavedRegs.set(AArch64::LR);
2622 }
2623
2624 LLVM_DEBUG({
2625 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2626 for (unsigned Reg : SavedRegs.set_bits())
2627 dbgs() << ' ' << printReg(Reg, RegInfo);
2628 dbgs() << "\n";
2629 });
2630
2631 // If any callee-saved registers are used, the frame cannot be eliminated.
2632 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2634 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2635 uint64_t SVEStackSize =
2636 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2637 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2638
2639 // The CSR spill slots have not been allocated yet, so estimateStackSize
2640 // won't include them.
2641 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2642
2643 // We may address some of the stack above the canonical frame address, either
2644 // for our own arguments or during a call. Include that in calculating whether
2645 // we have complicated addressing concerns.
2646 int64_t CalleeStackUsed = 0;
2647 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2648 int64_t FixedOff = MFI.getObjectOffset(I);
2649 if (FixedOff > CalleeStackUsed)
2650 CalleeStackUsed = FixedOff;
2651 }
2652
2653 // Conservatively always assume BigStack when there are SVE spills.
2654 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2655 CalleeStackUsed) > EstimatedStackSizeLimit;
2656 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2657 AFI->setHasStackFrame(true);
2658
2659 // Estimate if we might need to scavenge a register at some point in order
2660 // to materialize a stack offset. If so, either spill one additional
2661 // callee-saved register or reserve a special spill slot to facilitate
2662 // register scavenging. If we already spilled an extra callee-saved register
2663 // above to keep the number of spills even, we don't need to do anything else
2664 // here.
2665 if (BigStack) {
2666 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2667 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2668 << " to get a scratch register.\n");
2669 SavedRegs.set(UnspilledCSGPR);
2670 ExtraCSSpill = UnspilledCSGPR;
2671
2672 // MachO's compact unwind format relies on all registers being stored in
2673 // pairs, so if we need to spill one extra for BigStack, then we need to
2674 // store the pair.
2675 if (producePairRegisters(MF)) {
2676 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2677 // Failed to make a pair for compact unwind format, revert spilling.
2678 if (produceCompactUnwindFrame(*this, MF)) {
2679 SavedRegs.reset(UnspilledCSGPR);
2680 ExtraCSSpill = AArch64::NoRegister;
2681 }
2682 } else
2683 SavedRegs.set(UnspilledCSGPRPaired);
2684 }
2685 }
2686
2687 // If we didn't find an extra callee-saved register to spill, create
2688 // an emergency spill slot.
2689 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2691 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2692 unsigned Size = TRI->getSpillSize(RC);
2693 Align Alignment = TRI->getSpillAlign(RC);
2694 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2696 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2697 << " as the emergency spill slot.\n");
2698 }
2699 }
2700
2701 // Adding the size of additional 64bit GPR saves.
2702 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2703
2704 // A Swift asynchronous context extends the frame record with a pointer
2705 // directly before FP.
2706 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2707 CSStackSize += 8;
2708
2709 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2710 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2711 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2712
2714 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2715 "Should not invalidate callee saved info");
2716
2717 // Round up to register pair alignment to avoid additional SP adjustment
2718 // instructions.
2719 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2720 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2721 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2722}
2723
2725 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2726 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2727 unsigned &MaxCSFrameIndex) const {
2728 bool NeedsWinCFI = needsWinCFI(MF);
2729 unsigned StackHazardSize = getStackHazardSize(MF);
2730 // To match the canonical windows frame layout, reverse the list of
2731 // callee saved registers to get them laid out by PrologEpilogInserter
2732 // in the right order. (PrologEpilogInserter allocates stack objects top
2733 // down. Windows canonical prologs store higher numbered registers at
2734 // the top, thus have the CSI array start from the highest registers.)
2735 if (NeedsWinCFI)
2736 std::reverse(CSI.begin(), CSI.end());
2737
2738 if (CSI.empty())
2739 return true; // Early exit if no callee saved registers are modified!
2740
2741 // Now that we know which registers need to be saved and restored, allocate
2742 // stack slots for them.
2743 MachineFrameInfo &MFI = MF.getFrameInfo();
2744 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2745
2746 bool UsesWinAAPCS = isTargetWindows(MF);
2747 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2748 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2749 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2750 if ((unsigned)FrameIdx < MinCSFrameIndex)
2751 MinCSFrameIndex = FrameIdx;
2752 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2753 MaxCSFrameIndex = FrameIdx;
2754 }
2755
2756 // Insert VG into the list of CSRs, immediately before LR if saved.
2757 if (requiresSaveVG(MF)) {
2758 CalleeSavedInfo VGInfo(AArch64::VG);
2759 auto It =
2760 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2761 if (It != CSI.end())
2762 CSI.insert(It, VGInfo);
2763 else
2764 CSI.push_back(VGInfo);
2765 }
2766
2767 Register LastReg = 0;
2768 int HazardSlotIndex = std::numeric_limits<int>::max();
2769 for (auto &CS : CSI) {
2770 MCRegister Reg = CS.getReg();
2771 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2772
2773 // Create a hazard slot as we switch between GPR and FPR CSRs.
2775 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2777 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2778 "Unexpected register order for hazard slot");
2779 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2780 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2781 << "\n");
2782 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2783 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2784 MinCSFrameIndex = HazardSlotIndex;
2785 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2786 MaxCSFrameIndex = HazardSlotIndex;
2787 }
2788
2789 unsigned Size = RegInfo->getSpillSize(*RC);
2790 Align Alignment(RegInfo->getSpillAlign(*RC));
2791 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2792 CS.setFrameIdx(FrameIdx);
2793
2794 if ((unsigned)FrameIdx < MinCSFrameIndex)
2795 MinCSFrameIndex = FrameIdx;
2796 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2797 MaxCSFrameIndex = FrameIdx;
2798
2799 // Grab 8 bytes below FP for the extended asynchronous frame info.
2800 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2801 Reg == AArch64::FP) {
2802 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2803 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2804 if ((unsigned)FrameIdx < MinCSFrameIndex)
2805 MinCSFrameIndex = FrameIdx;
2806 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2807 MaxCSFrameIndex = FrameIdx;
2808 }
2809 LastReg = Reg;
2810 }
2811
2812 // Add hazard slot in the case where no FPR CSRs are present.
2814 HazardSlotIndex == std::numeric_limits<int>::max()) {
2815 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2816 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2817 << "\n");
2818 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2819 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2820 MinCSFrameIndex = HazardSlotIndex;
2821 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2822 MaxCSFrameIndex = HazardSlotIndex;
2823 }
2824
2825 return true;
2826}
2827
2829 const MachineFunction &MF) const {
2831 // If the function has streaming-mode changes, don't scavenge a
2832 // spillslot in the callee-save area, as that might require an
2833 // 'addvl' in the streaming-mode-changing call-sequence when the
2834 // function doesn't use a FP.
2835 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2836 return false;
2837 // Don't allow register salvaging with hazard slots, in case it moves objects
2838 // into the wrong place.
2839 if (AFI->hasStackHazardSlotIndex())
2840 return false;
2841 return AFI->hasCalleeSaveStackFreeSpace();
2842}
2843
2844/// returns true if there are any SVE callee saves.
2846 int &Min, int &Max) {
2847 Min = std::numeric_limits<int>::max();
2848 Max = std::numeric_limits<int>::min();
2849
2850 if (!MFI.isCalleeSavedInfoValid())
2851 return false;
2852
2853 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2854 for (auto &CS : CSI) {
2855 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2856 AArch64::PPRRegClass.contains(CS.getReg())) {
2857 assert((Max == std::numeric_limits<int>::min() ||
2858 Max + 1 == CS.getFrameIdx()) &&
2859 "SVE CalleeSaves are not consecutive");
2860 Min = std::min(Min, CS.getFrameIdx());
2861 Max = std::max(Max, CS.getFrameIdx());
2862 }
2863 }
2864 return Min != std::numeric_limits<int>::max();
2865}
2866
2868 AssignObjectOffsets AssignOffsets) {
2869 MachineFrameInfo &MFI = MF.getFrameInfo();
2870 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2871
2872 SVEStackSizes SVEStack{};
2873
2874 // With SplitSVEObjects we maintain separate stack offsets for predicates
2875 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2876 // are included in the SVE vector area.
2877 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2878 uint64_t &PPRStackTop =
2879 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2880
2881#ifndef NDEBUG
2882 // First process all fixed stack objects.
2883 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2884 assert(!MFI.hasScalableStackID(I) &&
2885 "SVE vectors should never be passed on the stack by value, only by "
2886 "reference.");
2887#endif
2888
2889 auto AllocateObject = [&](int FI) {
2891 ? ZPRStackTop
2892 : PPRStackTop;
2893
2894 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2895 // two, we'd need to align every object dynamically at runtime if the
2896 // alignment is larger than 16. This is not yet supported.
2897 Align Alignment = MFI.getObjectAlign(FI);
2898 if (Alignment > Align(16))
2900 "Alignment of scalable vectors > 16 bytes is not yet supported");
2901
2902 StackTop += MFI.getObjectSize(FI);
2903 StackTop = alignTo(StackTop, Alignment);
2904
2905 assert(StackTop < std::numeric_limits<int64_t>::max() &&
2906 "SVE StackTop far too large?!");
2907
2908 int64_t Offset = -int64_t(StackTop);
2909 if (AssignOffsets == AssignObjectOffsets::Yes)
2910 MFI.setObjectOffset(FI, Offset);
2911
2912 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2913 };
2914
2915 // Then process all callee saved slots.
2916 int MinCSFrameIndex, MaxCSFrameIndex;
2917 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2918 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2919 AllocateObject(FI);
2920 }
2921
2922 // Ensure the CS area is 16-byte aligned.
2923 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2924 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2925
2926 // Create a buffer of SVE objects to allocate and sort it.
2927 SmallVector<int, 8> ObjectsToAllocate;
2928 // If we have a stack protector, and we've previously decided that we have SVE
2929 // objects on the stack and thus need it to go in the SVE stack area, then it
2930 // needs to go first.
2931 int StackProtectorFI = -1;
2932 if (MFI.hasStackProtectorIndex()) {
2933 StackProtectorFI = MFI.getStackProtectorIndex();
2934 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2935 ObjectsToAllocate.push_back(StackProtectorFI);
2936 }
2937
2938 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2939 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2940 continue;
2941 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2942 continue;
2943
2946 continue;
2947
2948 ObjectsToAllocate.push_back(FI);
2949 }
2950
2951 // Allocate all SVE locals and spills
2952 for (unsigned FI : ObjectsToAllocate)
2953 AllocateObject(FI);
2954
2955 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2956 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2957
2958 if (AssignOffsets == AssignObjectOffsets::Yes)
2959 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2960
2961 return SVEStack;
2962}
2963
2964/// Attempts to scavenge a register from \p ScavengeableRegs given the used
2965/// registers in \p UsedRegs.
2968 Register PreferredReg) {
2969 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
2970 return PreferredReg;
2971 for (auto Reg : ScavengeableRegs.set_bits()) {
2972 if (UsedRegs.available(Reg))
2973 return Reg;
2974 }
2975 return AArch64::NoRegister;
2976}
2977
2978/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
2979/// \p MachineInstrs.
2980static void propagateFrameFlags(MachineInstr &SourceMI,
2981 ArrayRef<MachineInstr *> MachineInstrs) {
2982 for (MachineInstr *MI : MachineInstrs) {
2983 if (SourceMI.getFlag(MachineInstr::FrameSetup))
2984 MI->setFlag(MachineInstr::FrameSetup);
2985 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
2987 }
2988}
2989
2990/// RAII helper class for scavenging or spilling a register. On construction
2991/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
2992/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
2993/// MaybeSpillFI to free a register. The free'd register is returned via the \p
2994/// FreeReg output parameter. On destruction, if there is a spill, its previous
2995/// value is reloaded. The spilling and scavenging is only valid at the
2996/// insertion point \p MBBI, this class should _not_ be used in places that
2997/// create or manipulate basic blocks, moving the expected insertion point.
3001
3004 Register SpillCandidate, const TargetRegisterClass &RC,
3005 LiveRegUnits const &UsedRegs,
3006 BitVector const &AllocatableRegs,
3007 std::optional<int> *MaybeSpillFI,
3008 Register PreferredReg = AArch64::NoRegister)
3009 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
3010 *MF.getSubtarget().getInstrInfo())),
3011 TRI(*MF.getSubtarget().getRegisterInfo()) {
3012 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
3013 if (FreeReg != AArch64::NoRegister)
3014 return;
3015 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
3016 "(attempted to spill in prologue/epilogue?)");
3017 if (!MaybeSpillFI->has_value()) {
3018 MachineFrameInfo &MFI = MF.getFrameInfo();
3019 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
3020 TRI.getSpillAlign(RC));
3021 }
3022 FreeReg = SpillCandidate;
3023 SpillFI = MaybeSpillFI->value();
3024 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
3025 Register());
3026 }
3027
3028 bool hasSpilled() const { return SpillFI.has_value(); }
3029
3030 /// Returns the free register (found from scavenging or spilling a register).
3031 Register freeRegister() const { return FreeReg; }
3032
3033 Register operator*() const { return freeRegister(); }
3034
3036 if (hasSpilled())
3037 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
3038 Register());
3039 }
3040
3041private:
3044 const TargetRegisterClass &RC;
3045 const AArch64InstrInfo &TII;
3046 const TargetRegisterInfo &TRI;
3047 Register FreeReg = AArch64::NoRegister;
3048 std::optional<int> SpillFI;
3049};
3050
3051/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
3052/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3054 std::optional<int> ZPRSpillFI;
3055 std::optional<int> PPRSpillFI;
3056 std::optional<int> GPRSpillFI;
3057};
3058
3059/// Registers available for scavenging (ZPR, PPR3b, GPR).
3065
3067 return MI.getFlag(MachineInstr::FrameSetup) ||
3069}
3070
3071/// Expands:
3072/// ```
3073/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
3074/// ```
3075/// To:
3076/// ```
3077/// $z0 = CPY_ZPzI_B $p0, 1, 0
3078/// STR_ZXI $z0, $stack.0, 0
3079/// ```
3080/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3081/// spilling if necessary).
3084 const TargetRegisterInfo &TRI,
3085 LiveRegUnits const &UsedRegs,
3086 ScavengeableRegs const &SR,
3087 EmergencyStackSlots &SpillSlots) {
3088 MachineFunction &MF = *MBB.getParent();
3089 auto *TII =
3090 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3091
3092 ScopedScavengeOrSpill ZPredReg(
3093 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3094 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3095
3096 SmallVector<MachineInstr *, 2> MachineInstrs;
3097 const DebugLoc &DL = MI.getDebugLoc();
3098 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
3099 .addReg(*ZPredReg, RegState::Define)
3100 .add(MI.getOperand(0))
3101 .addImm(1)
3102 .addImm(0)
3103 .getInstr());
3104 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
3105 .addReg(*ZPredReg)
3106 .add(MI.getOperand(1))
3107 .addImm(MI.getOperand(2).getImm())
3108 .setMemRefs(MI.memoperands())
3109 .getInstr());
3110 propagateFrameFlags(MI, MachineInstrs);
3111}
3112
3113/// Expands:
3114/// ```
3115/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
3116/// ```
3117/// To:
3118/// ```
3119/// $z0 = LDR_ZXI %stack.0, 0
3120/// $p0 = PTRUE_B 31, implicit $vg
3121/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
3122/// ```
3123/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3124/// spilling if necessary). If the status flags are in use at the point of
3125/// expansion they are preserved (by moving them to/from a GPR). This may cause
3126/// an additional spill if no GPR is free at the expansion point.
3129 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
3130 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
3131 MachineFunction &MF = *MBB.getParent();
3132 auto *TII =
3133 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3134
3135 ScopedScavengeOrSpill ZPredReg(
3136 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3137 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3138
3139 ScopedScavengeOrSpill PredReg(
3140 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
3141 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
3142 /*PreferredReg=*/
3143 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
3144
3145 // Elide NZCV spills if we know it is not used.
3146 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
3147 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
3148 if (IsNZCVUsed)
3149 NZCVSaveReg.emplace(
3150 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
3151 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
3152 SmallVector<MachineInstr *, 4> MachineInstrs;
3153 const DebugLoc &DL = MI.getDebugLoc();
3154 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
3155 .addReg(*ZPredReg, RegState::Define)
3156 .add(MI.getOperand(1))
3157 .addImm(MI.getOperand(2).getImm())
3158 .setMemRefs(MI.memoperands())
3159 .getInstr());
3160 if (IsNZCVUsed)
3161 MachineInstrs.push_back(
3162 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
3163 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
3164 .addImm(AArch64SysReg::NZCV)
3165 .addReg(AArch64::NZCV, RegState::Implicit)
3166 .getInstr());
3167
3168 // Reuse previous ptrue if we know it has not been clobbered.
3169 if (LastPTrue) {
3170 assert(*PredReg == LastPTrue->getOperand(0).getReg());
3171 LastPTrue->moveBefore(&MI);
3172 } else {
3173 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
3174 .addReg(*PredReg, RegState::Define)
3175 .addImm(31);
3176 }
3177 MachineInstrs.push_back(LastPTrue);
3178 MachineInstrs.push_back(
3179 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
3180 .addReg(MI.getOperand(0).getReg(), RegState::Define)
3181 .addReg(*PredReg)
3182 .addReg(*ZPredReg)
3183 .addImm(0)
3184 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3185 .getInstr());
3186 if (IsNZCVUsed)
3187 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
3188 .addImm(AArch64SysReg::NZCV)
3189 .addReg(NZCVSaveReg->freeRegister())
3190 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3191 .getInstr());
3192
3193 propagateFrameFlags(MI, MachineInstrs);
3194 return PredReg.hasSpilled();
3195}
3196
3197/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
3198/// operations within the MachineBasicBlock \p MBB.
3200 const TargetRegisterInfo &TRI,
3201 ScavengeableRegs const &SR,
3202 EmergencyStackSlots &SpillSlots) {
3203 LiveRegUnits UsedRegs(TRI);
3204 UsedRegs.addLiveOuts(MBB);
3205 bool HasPPRSpills = false;
3206 MachineInstr *LastPTrue = nullptr;
3208 UsedRegs.stepBackward(MI);
3209 switch (MI.getOpcode()) {
3210 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
3211 if (LastPTrue &&
3212 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
3213 LastPTrue = nullptr;
3214 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
3215 LastPTrue, SpillSlots);
3216 MI.eraseFromParent();
3217 break;
3218 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
3219 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
3220 MI.eraseFromParent();
3221 [[fallthrough]];
3222 default:
3223 LastPTrue = nullptr;
3224 break;
3225 }
3226 }
3227
3228 return HasPPRSpills;
3229}
3230
3232 MachineFunction &MF, RegScavenger *RS) const {
3233
3235 const TargetSubtargetInfo &TSI = MF.getSubtarget();
3236 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
3237
3238 // If predicates spills are 16-bytes we may need to expand
3239 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3240 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
3241 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
3242 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
3243 assert(Regs.count() > 0 && "Expected scavengeable registers");
3244 return Regs;
3245 };
3246
3247 ScavengeableRegs SR{};
3248 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
3249 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
3250 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
3251 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
3252
3253 EmergencyStackSlots SpillSlots;
3254 for (MachineBasicBlock &MBB : MF) {
3255 // In the case we had to spill a predicate (in the range p0-p7) to reload
3256 // a predicate (>= p8), additional spill/fill pseudos will be created.
3257 // These need an additional expansion pass. Note: There will only be at
3258 // most two expansion passes, as spilling/filling a predicate in the range
3259 // p0-p7 never requires spilling another predicate.
3260 for (int Pass = 0; Pass < 2; Pass++) {
3261 bool HasPPRSpills =
3262 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
3263 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
3264 if (!HasPPRSpills)
3265 break;
3266 }
3267 }
3268 }
3269
3270 MachineFrameInfo &MFI = MF.getFrameInfo();
3271
3273 "Upwards growing stack unsupported");
3274
3276
3277 // If this function isn't doing Win64-style C++ EH, we don't need to do
3278 // anything.
3279 if (!MF.hasEHFunclets())
3280 return;
3281
3282 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3283 // object area right next to the UnwindHelp object.
3284 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3285 int64_t CurrentOffset =
3287 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3288 for (WinEHHandlerType &H : TBME.HandlerArray) {
3289 int FrameIndex = H.CatchObj.FrameIndex;
3290 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3291 CurrentOffset =
3292 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3293 CurrentOffset += MFI.getObjectSize(FrameIndex);
3294 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3295 }
3296 }
3297 }
3298
3299 // Create an UnwindHelp object.
3300 // The UnwindHelp object is allocated at the start of the fixed object area
3301 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3302 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3303 /*IsFunclet*/ false) &&
3304 "UnwindHelpOffset must be at the start of the fixed object area");
3305 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3306 /*IsImmutable=*/false);
3307 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3308
3309 MachineBasicBlock &MBB = MF.front();
3310 auto MBBI = MBB.begin();
3311 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3312 ++MBBI;
3313
3314 // We need to store -2 into the UnwindHelp object at the start of the
3315 // function.
3316 DebugLoc DL;
3318 RS->backward(MBBI);
3319 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3320 assert(DstReg && "There must be a free register after frame setup");
3322 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3323 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3324 .addReg(DstReg, getKillRegState(true))
3325 .addFrameIndex(UnwindHelpFI)
3326 .addImm(0);
3327}
3328
3329namespace {
3330struct TagStoreInstr {
3332 int64_t Offset, Size;
3333 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3334 : MI(MI), Offset(Offset), Size(Size) {}
3335};
3336
3337class TagStoreEdit {
3338 MachineFunction *MF;
3339 MachineBasicBlock *MBB;
3340 MachineRegisterInfo *MRI;
3341 // Tag store instructions that are being replaced.
3343 // Combined memref arguments of the above instructions.
3345
3346 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3347 // FrameRegOffset + Size) with the address tag of SP.
3348 Register FrameReg;
3349 StackOffset FrameRegOffset;
3350 int64_t Size;
3351 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3352 // end.
3353 std::optional<int64_t> FrameRegUpdate;
3354 // MIFlags for any FrameReg updating instructions.
3355 unsigned FrameRegUpdateFlags;
3356
3357 // Use zeroing instruction variants.
3358 bool ZeroData;
3359 DebugLoc DL;
3360
3361 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3362 void emitLoop(MachineBasicBlock::iterator InsertI);
3363
3364public:
3365 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3366 : MBB(MBB), ZeroData(ZeroData) {
3367 MF = MBB->getParent();
3368 MRI = &MF->getRegInfo();
3369 }
3370 // Add an instruction to be replaced. Instructions must be added in the
3371 // ascending order of Offset, and have to be adjacent.
3372 void addInstruction(TagStoreInstr I) {
3373 assert((TagStores.empty() ||
3374 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3375 "Non-adjacent tag store instructions.");
3376 TagStores.push_back(I);
3377 }
3378 void clear() { TagStores.clear(); }
3379 // Emit equivalent code at the given location, and erase the current set of
3380 // instructions. May skip if the replacement is not profitable. May invalidate
3381 // the input iterator and replace it with a valid one.
3382 void emitCode(MachineBasicBlock::iterator &InsertI,
3383 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3384};
3385
3386void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3387 const AArch64InstrInfo *TII =
3388 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3389
3390 const int64_t kMinOffset = -256 * 16;
3391 const int64_t kMaxOffset = 255 * 16;
3392
3393 Register BaseReg = FrameReg;
3394 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3395 if (BaseRegOffsetBytes < kMinOffset ||
3396 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3397 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3398 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3399 // is required for the offset of ST2G.
3400 BaseRegOffsetBytes % 16 != 0) {
3401 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3402 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3403 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3404 BaseReg = ScratchReg;
3405 BaseRegOffsetBytes = 0;
3406 }
3407
3408 MachineInstr *LastI = nullptr;
3409 while (Size) {
3410 int64_t InstrSize = (Size > 16) ? 32 : 16;
3411 unsigned Opcode =
3412 InstrSize == 16
3413 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3414 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3415 assert(BaseRegOffsetBytes % 16 == 0);
3416 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3417 .addReg(AArch64::SP)
3418 .addReg(BaseReg)
3419 .addImm(BaseRegOffsetBytes / 16)
3420 .setMemRefs(CombinedMemRefs);
3421 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3422 // final SP adjustment in the epilogue.
3423 if (BaseRegOffsetBytes == 0)
3424 LastI = I;
3425 BaseRegOffsetBytes += InstrSize;
3426 Size -= InstrSize;
3427 }
3428
3429 if (LastI)
3430 MBB->splice(InsertI, MBB, LastI);
3431}
3432
3433void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3434 const AArch64InstrInfo *TII =
3435 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3436
3437 Register BaseReg = FrameRegUpdate
3438 ? FrameReg
3439 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3440 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3441
3442 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3443
3444 int64_t LoopSize = Size;
3445 // If the loop size is not a multiple of 32, split off one 16-byte store at
3446 // the end to fold BaseReg update into.
3447 if (FrameRegUpdate && *FrameRegUpdate)
3448 LoopSize -= LoopSize % 32;
3449 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3450 TII->get(ZeroData ? AArch64::STZGloop_wback
3451 : AArch64::STGloop_wback))
3452 .addDef(SizeReg)
3453 .addDef(BaseReg)
3454 .addImm(LoopSize)
3455 .addReg(BaseReg)
3456 .setMemRefs(CombinedMemRefs);
3457 if (FrameRegUpdate)
3458 LoopI->setFlags(FrameRegUpdateFlags);
3459
3460 int64_t ExtraBaseRegUpdate =
3461 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3462 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3463 << ", Size=" << Size
3464 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3465 << ", FrameRegUpdate=" << FrameRegUpdate
3466 << ", FrameRegOffset.getFixed()="
3467 << FrameRegOffset.getFixed() << "\n");
3468 if (LoopSize < Size) {
3469 assert(FrameRegUpdate);
3470 assert(Size - LoopSize == 16);
3471 // Tag 16 more bytes at BaseReg and update BaseReg.
3472 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3473 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3474 "STG immediate out of range");
3475 BuildMI(*MBB, InsertI, DL,
3476 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3477 .addDef(BaseReg)
3478 .addReg(BaseReg)
3479 .addReg(BaseReg)
3480 .addImm(STGOffset / 16)
3481 .setMemRefs(CombinedMemRefs)
3482 .setMIFlags(FrameRegUpdateFlags);
3483 } else if (ExtraBaseRegUpdate) {
3484 // Update BaseReg.
3485 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3486 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3487 BuildMI(
3488 *MBB, InsertI, DL,
3489 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3490 .addDef(BaseReg)
3491 .addReg(BaseReg)
3492 .addImm(AddSubOffset)
3493 .addImm(0)
3494 .setMIFlags(FrameRegUpdateFlags);
3495 }
3496}
3497
3498// Check if *II is a register update that can be merged into STGloop that ends
3499// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3500// end of the loop.
3501bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3502 int64_t Size, int64_t *TotalOffset) {
3503 MachineInstr &MI = *II;
3504 if ((MI.getOpcode() == AArch64::ADDXri ||
3505 MI.getOpcode() == AArch64::SUBXri) &&
3506 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3507 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3508 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3509 if (MI.getOpcode() == AArch64::SUBXri)
3510 Offset = -Offset;
3511 int64_t PostOffset = Offset - Size;
3512 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3513 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3514 // chosen depends on the alignment of the loop size, but the difference
3515 // between the valid ranges for the two instructions is small, so we
3516 // conservatively assume that it could be either case here.
3517 //
3518 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3519 // instruction.
3520 const int64_t kMaxOffset = 4080 - 16;
3521 // Max offset of SUBXri.
3522 const int64_t kMinOffset = -4095;
3523 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3524 PostOffset % 16 == 0) {
3525 *TotalOffset = Offset;
3526 return true;
3527 }
3528 }
3529 return false;
3530}
3531
3532void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3534 MemRefs.clear();
3535 for (auto &TS : TSE) {
3536 MachineInstr *MI = TS.MI;
3537 // An instruction without memory operands may access anything. Be
3538 // conservative and return an empty list.
3539 if (MI->memoperands_empty()) {
3540 MemRefs.clear();
3541 return;
3542 }
3543 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3544 }
3545}
3546
3547void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3548 const AArch64FrameLowering *TFI,
3549 bool TryMergeSPUpdate) {
3550 if (TagStores.empty())
3551 return;
3552 TagStoreInstr &FirstTagStore = TagStores[0];
3553 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3554 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3555 DL = TagStores[0].MI->getDebugLoc();
3556
3557 Register Reg;
3558 FrameRegOffset = TFI->resolveFrameOffsetReference(
3559 *MF, FirstTagStore.Offset, false /*isFixed*/,
3560 TargetStackID::Default /*StackID*/, Reg,
3561 /*PreferFP=*/false, /*ForSimm=*/true);
3562 FrameReg = Reg;
3563 FrameRegUpdate = std::nullopt;
3564
3565 mergeMemRefs(TagStores, CombinedMemRefs);
3566
3567 LLVM_DEBUG({
3568 dbgs() << "Replacing adjacent STG instructions:\n";
3569 for (const auto &Instr : TagStores) {
3570 dbgs() << " " << *Instr.MI;
3571 }
3572 });
3573
3574 // Size threshold where a loop becomes shorter than a linear sequence of
3575 // tagging instructions.
3576 const int kSetTagLoopThreshold = 176;
3577 if (Size < kSetTagLoopThreshold) {
3578 if (TagStores.size() < 2)
3579 return;
3580 emitUnrolled(InsertI);
3581 } else {
3582 MachineInstr *UpdateInstr = nullptr;
3583 int64_t TotalOffset = 0;
3584 if (TryMergeSPUpdate) {
3585 // See if we can merge base register update into the STGloop.
3586 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3587 // but STGloop is way too unusual for that, and also it only
3588 // realistically happens in function epilogue. Also, STGloop is expanded
3589 // before that pass.
3590 if (InsertI != MBB->end() &&
3591 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3592 &TotalOffset)) {
3593 UpdateInstr = &*InsertI++;
3594 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3595 << *UpdateInstr);
3596 }
3597 }
3598
3599 if (!UpdateInstr && TagStores.size() < 2)
3600 return;
3601
3602 if (UpdateInstr) {
3603 FrameRegUpdate = TotalOffset;
3604 FrameRegUpdateFlags = UpdateInstr->getFlags();
3605 }
3606 emitLoop(InsertI);
3607 if (UpdateInstr)
3608 UpdateInstr->eraseFromParent();
3609 }
3610
3611 for (auto &TS : TagStores)
3612 TS.MI->eraseFromParent();
3613}
3614
3615bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3616 int64_t &Size, bool &ZeroData) {
3617 MachineFunction &MF = *MI.getParent()->getParent();
3618 const MachineFrameInfo &MFI = MF.getFrameInfo();
3619
3620 unsigned Opcode = MI.getOpcode();
3621 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3622 Opcode == AArch64::STZ2Gi);
3623
3624 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3625 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3626 return false;
3627 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3628 return false;
3629 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3630 Size = MI.getOperand(2).getImm();
3631 return true;
3632 }
3633
3634 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3635 Size = 16;
3636 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3637 Size = 32;
3638 else
3639 return false;
3640
3641 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3642 return false;
3643
3644 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3645 16 * MI.getOperand(2).getImm();
3646 return true;
3647}
3648
3649// Detect a run of memory tagging instructions for adjacent stack frame slots,
3650// and replace them with a shorter instruction sequence:
3651// * replace STG + STG with ST2G
3652// * replace STGloop + STGloop with STGloop
3653// This code needs to run when stack slot offsets are already known, but before
3654// FrameIndex operands in STG instructions are eliminated.
3656 const AArch64FrameLowering *TFI,
3657 RegScavenger *RS) {
3658 bool FirstZeroData;
3659 int64_t Size, Offset;
3660 MachineInstr &MI = *II;
3661 MachineBasicBlock *MBB = MI.getParent();
3663 if (&MI == &MBB->instr_back())
3664 return II;
3665 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3666 return II;
3667
3669 Instrs.emplace_back(&MI, Offset, Size);
3670
3671 constexpr int kScanLimit = 10;
3672 int Count = 0;
3674 NextI != E && Count < kScanLimit; ++NextI) {
3675 MachineInstr &MI = *NextI;
3676 bool ZeroData;
3677 int64_t Size, Offset;
3678 // Collect instructions that update memory tags with a FrameIndex operand
3679 // and (when applicable) constant size, and whose output registers are dead
3680 // (the latter is almost always the case in practice). Since these
3681 // instructions effectively have no inputs or outputs, we are free to skip
3682 // any non-aliasing instructions in between without tracking used registers.
3683 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3684 if (ZeroData != FirstZeroData)
3685 break;
3686 Instrs.emplace_back(&MI, Offset, Size);
3687 continue;
3688 }
3689
3690 // Only count non-transient, non-tagging instructions toward the scan
3691 // limit.
3692 if (!MI.isTransient())
3693 ++Count;
3694
3695 // Just in case, stop before the epilogue code starts.
3696 if (MI.getFlag(MachineInstr::FrameSetup) ||
3698 break;
3699
3700 // Reject anything that may alias the collected instructions.
3701 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3702 break;
3703 }
3704
3705 // New code will be inserted after the last tagging instruction we've found.
3706 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3707
3708 // All the gathered stack tag instructions are merged and placed after
3709 // last tag store in the list. The check should be made if the nzcv
3710 // flag is live at the point where we are trying to insert. Otherwise
3711 // the nzcv flag might get clobbered if any stg loops are present.
3712
3713 // FIXME : This approach of bailing out from merge is conservative in
3714 // some ways like even if stg loops are not present after merge the
3715 // insert list, this liveness check is done (which is not needed).
3717 LiveRegs.addLiveOuts(*MBB);
3718 for (auto I = MBB->rbegin();; ++I) {
3719 MachineInstr &MI = *I;
3720 if (MI == InsertI)
3721 break;
3722 LiveRegs.stepBackward(*I);
3723 }
3724 InsertI++;
3725 if (LiveRegs.contains(AArch64::NZCV))
3726 return InsertI;
3727
3728 llvm::stable_sort(Instrs,
3729 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3730 return Left.Offset < Right.Offset;
3731 });
3732
3733 // Make sure that we don't have any overlapping stores.
3734 int64_t CurOffset = Instrs[0].Offset;
3735 for (auto &Instr : Instrs) {
3736 if (CurOffset > Instr.Offset)
3737 return NextI;
3738 CurOffset = Instr.Offset + Instr.Size;
3739 }
3740
3741 // Find contiguous runs of tagged memory and emit shorter instruction
3742 // sequences for them when possible.
3743 TagStoreEdit TSE(MBB, FirstZeroData);
3744 std::optional<int64_t> EndOffset;
3745 for (auto &Instr : Instrs) {
3746 if (EndOffset && *EndOffset != Instr.Offset) {
3747 // Found a gap.
3748 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3749 TSE.clear();
3750 }
3751
3752 TSE.addInstruction(Instr);
3753 EndOffset = Instr.Offset + Instr.Size;
3754 }
3755
3756 const MachineFunction *MF = MBB->getParent();
3757 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3758 TSE.emitCode(
3759 InsertI, TFI, /*TryMergeSPUpdate = */
3761
3762 return InsertI;
3763}
3764} // namespace
3765
3767 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3768 for (auto &BB : MF)
3769 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3771 II = tryMergeAdjacentSTG(II, this, RS);
3772 }
3773
3774 // By the time this method is called, most of the prologue/epilogue code is
3775 // already emitted, whether its location was affected by the shrink-wrapping
3776 // optimization or not.
3777 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3778 shouldSignReturnAddressEverywhere(MF))
3780}
3781
3782/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3783/// before the update. This is easily retrieved as it is exactly the offset
3784/// that is set in processFunctionBeforeFrameFinalized.
3786 const MachineFunction &MF, int FI, Register &FrameReg,
3787 bool IgnoreSPUpdates) const {
3788 const MachineFrameInfo &MFI = MF.getFrameInfo();
3789 if (IgnoreSPUpdates) {
3790 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3791 << MFI.getObjectOffset(FI) << "\n");
3792 FrameReg = AArch64::SP;
3793 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3794 }
3795
3796 // Go to common code if we cannot provide sp + offset.
3797 if (MFI.hasVarSizedObjects() ||
3800 return getFrameIndexReference(MF, FI, FrameReg);
3801
3802 FrameReg = AArch64::SP;
3803 return getStackOffset(MF, MFI.getObjectOffset(FI));
3804}
3805
3806/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3807/// the parent's frame pointer
3809 const MachineFunction &MF) const {
3810 return 0;
3811}
3812
3813/// Funclets only need to account for space for the callee saved registers,
3814/// as the locals are accounted for in the parent's stack frame.
3816 const MachineFunction &MF) const {
3817 // This is the size of the pushed CSRs.
3818 unsigned CSSize =
3819 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3820 // This is the amount of stack a funclet needs to allocate.
3821 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3822 getStackAlign());
3823}
3824
3825namespace {
3826struct FrameObject {
3827 bool IsValid = false;
3828 // Index of the object in MFI.
3829 int ObjectIndex = 0;
3830 // Group ID this object belongs to.
3831 int GroupIndex = -1;
3832 // This object should be placed first (closest to SP).
3833 bool ObjectFirst = false;
3834 // This object's group (which always contains the object with
3835 // ObjectFirst==true) should be placed first.
3836 bool GroupFirst = false;
3837
3838 // Used to distinguish between FP and GPR accesses. The values are decided so
3839 // that they sort FPR < Hazard < GPR and they can be or'd together.
3840 unsigned Accesses = 0;
3841 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3842};
3843
3844class GroupBuilder {
3845 SmallVector<int, 8> CurrentMembers;
3846 int NextGroupIndex = 0;
3847 std::vector<FrameObject> &Objects;
3848
3849public:
3850 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3851 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3852 void EndCurrentGroup() {
3853 if (CurrentMembers.size() > 1) {
3854 // Create a new group with the current member list. This might remove them
3855 // from their pre-existing groups. That's OK, dealing with overlapping
3856 // groups is too hard and unlikely to make a difference.
3857 LLVM_DEBUG(dbgs() << "group:");
3858 for (int Index : CurrentMembers) {
3859 Objects[Index].GroupIndex = NextGroupIndex;
3860 LLVM_DEBUG(dbgs() << " " << Index);
3861 }
3862 LLVM_DEBUG(dbgs() << "\n");
3863 NextGroupIndex++;
3864 }
3865 CurrentMembers.clear();
3866 }
3867};
3868
3869bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3870 // Objects at a lower index are closer to FP; objects at a higher index are
3871 // closer to SP.
3872 //
3873 // For consistency in our comparison, all invalid objects are placed
3874 // at the end. This also allows us to stop walking when we hit the
3875 // first invalid item after it's all sorted.
3876 //
3877 // If we want to include a stack hazard region, order FPR accesses < the
3878 // hazard object < GPRs accesses in order to create a separation between the
3879 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3880 //
3881 // Otherwise the "first" object goes first (closest to SP), followed by the
3882 // members of the "first" group.
3883 //
3884 // The rest are sorted by the group index to keep the groups together.
3885 // Higher numbered groups are more likely to be around longer (i.e. untagged
3886 // in the function epilogue and not at some earlier point). Place them closer
3887 // to SP.
3888 //
3889 // If all else equal, sort by the object index to keep the objects in the
3890 // original order.
3891 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3892 A.GroupIndex, A.ObjectIndex) <
3893 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3894 B.GroupIndex, B.ObjectIndex);
3895}
3896} // namespace
3897
3899 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3901
3902 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3903 ObjectsToAllocate.empty())
3904 return;
3905
3906 const MachineFrameInfo &MFI = MF.getFrameInfo();
3907 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3908 for (auto &Obj : ObjectsToAllocate) {
3909 FrameObjects[Obj].IsValid = true;
3910 FrameObjects[Obj].ObjectIndex = Obj;
3911 }
3912
3913 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3914 // the same time.
3915 GroupBuilder GB(FrameObjects);
3916 for (auto &MBB : MF) {
3917 for (auto &MI : MBB) {
3918 if (MI.isDebugInstr())
3919 continue;
3920
3921 if (AFI.hasStackHazardSlotIndex()) {
3922 std::optional<int> FI = getLdStFrameID(MI, MFI);
3923 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3924 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3926 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3927 else
3928 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3929 }
3930 }
3931
3932 int OpIndex;
3933 switch (MI.getOpcode()) {
3934 case AArch64::STGloop:
3935 case AArch64::STZGloop:
3936 OpIndex = 3;
3937 break;
3938 case AArch64::STGi:
3939 case AArch64::STZGi:
3940 case AArch64::ST2Gi:
3941 case AArch64::STZ2Gi:
3942 OpIndex = 1;
3943 break;
3944 default:
3945 OpIndex = -1;
3946 }
3947
3948 int TaggedFI = -1;
3949 if (OpIndex >= 0) {
3950 const MachineOperand &MO = MI.getOperand(OpIndex);
3951 if (MO.isFI()) {
3952 int FI = MO.getIndex();
3953 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3954 FrameObjects[FI].IsValid)
3955 TaggedFI = FI;
3956 }
3957 }
3958
3959 // If this is a stack tagging instruction for a slot that is not part of a
3960 // group yet, either start a new group or add it to the current one.
3961 if (TaggedFI >= 0)
3962 GB.AddMember(TaggedFI);
3963 else
3964 GB.EndCurrentGroup();
3965 }
3966 // Groups should never span multiple basic blocks.
3967 GB.EndCurrentGroup();
3968 }
3969
3970 if (AFI.hasStackHazardSlotIndex()) {
3971 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3972 FrameObject::AccessHazard;
3973 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3974 for (auto &Obj : FrameObjects)
3975 if (!Obj.Accesses ||
3976 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3977 Obj.Accesses = FrameObject::AccessGPR;
3978 }
3979
3980 // If the function's tagged base pointer is pinned to a stack slot, we want to
3981 // put that slot first when possible. This will likely place it at SP + 0,
3982 // and save one instruction when generating the base pointer because IRG does
3983 // not allow an immediate offset.
3984 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3985 if (TBPI) {
3986 FrameObjects[*TBPI].ObjectFirst = true;
3987 FrameObjects[*TBPI].GroupFirst = true;
3988 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3989 if (FirstGroupIndex >= 0)
3990 for (FrameObject &Object : FrameObjects)
3991 if (Object.GroupIndex == FirstGroupIndex)
3992 Object.GroupFirst = true;
3993 }
3994
3995 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3996
3997 int i = 0;
3998 for (auto &Obj : FrameObjects) {
3999 // All invalid items are sorted at the end, so it's safe to stop.
4000 if (!Obj.IsValid)
4001 break;
4002 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4003 }
4004
4005 LLVM_DEBUG({
4006 dbgs() << "Final frame order:\n";
4007 for (auto &Obj : FrameObjects) {
4008 if (!Obj.IsValid)
4009 break;
4010 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4011 if (Obj.ObjectFirst)
4012 dbgs() << ", first";
4013 if (Obj.GroupFirst)
4014 dbgs() << ", group-first";
4015 dbgs() << "\n";
4016 }
4017 });
4018}
4019
4020/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4021/// least every ProbeSize bytes. Returns an iterator of the first instruction
4022/// after the loop. The difference between SP and TargetReg must be an exact
4023/// multiple of ProbeSize.
4025AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4026 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4027 Register TargetReg) const {
4028 MachineBasicBlock &MBB = *MBBI->getParent();
4029 MachineFunction &MF = *MBB.getParent();
4030 const AArch64InstrInfo *TII =
4031 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4032 DebugLoc DL = MBB.findDebugLoc(MBBI);
4033
4034 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4035 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4036 MF.insert(MBBInsertPoint, LoopMBB);
4037 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4038 MF.insert(MBBInsertPoint, ExitMBB);
4039
4040 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4041 // in SUB).
4042 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4043 StackOffset::getFixed(-ProbeSize), TII,
4045 // STR XZR, [SP]
4046 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4047 .addReg(AArch64::XZR)
4048 .addReg(AArch64::SP)
4049 .addImm(0)
4051 // CMP SP, TargetReg
4052 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4053 AArch64::XZR)
4054 .addReg(AArch64::SP)
4055 .addReg(TargetReg)
4058 // B.CC Loop
4059 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4061 .addMBB(LoopMBB)
4063
4064 LoopMBB->addSuccessor(ExitMBB);
4065 LoopMBB->addSuccessor(LoopMBB);
4066 // Synthesize the exit MBB.
4067 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4069 MBB.addSuccessor(LoopMBB);
4070 // Update liveins.
4071 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4072
4073 return ExitMBB->begin();
4074}
4075
4076void AArch64FrameLowering::inlineStackProbeFixed(
4077 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4078 StackOffset CFAOffset) const {
4079 MachineBasicBlock *MBB = MBBI->getParent();
4080 MachineFunction &MF = *MBB->getParent();
4081 const AArch64InstrInfo *TII =
4082 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4083 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
4084 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4085 bool HasFP = hasFP(MF);
4086
4087 DebugLoc DL;
4088 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4089 int64_t NumBlocks = FrameSize / ProbeSize;
4090 int64_t ResidualSize = FrameSize % ProbeSize;
4091
4092 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4093 << NumBlocks << " blocks of " << ProbeSize
4094 << " bytes, plus " << ResidualSize << " bytes\n");
4095
4096 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4097 // ordinary loop.
4098 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4099 for (int i = 0; i < NumBlocks; ++i) {
4100 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4101 // encodable in a SUB).
4102 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4103 StackOffset::getFixed(-ProbeSize), TII,
4104 MachineInstr::FrameSetup, false, false, nullptr,
4105 EmitAsyncCFI && !HasFP, CFAOffset);
4106 CFAOffset += StackOffset::getFixed(ProbeSize);
4107 // STR XZR, [SP]
4108 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4109 .addReg(AArch64::XZR)
4110 .addReg(AArch64::SP)
4111 .addImm(0)
4113 }
4114 } else if (NumBlocks != 0) {
4115 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4116 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4117 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4118 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4119 MachineInstr::FrameSetup, false, false, nullptr,
4120 EmitAsyncCFI && !HasFP, CFAOffset);
4121 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4122 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4123 MBB = MBBI->getParent();
4124 if (EmitAsyncCFI && !HasFP) {
4125 // Set the CFA register back to SP.
4126 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
4127 .buildDefCFARegister(AArch64::SP);
4128 }
4129 }
4130
4131 if (ResidualSize != 0) {
4132 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4133 // in SUB).
4134 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4135 StackOffset::getFixed(-ResidualSize), TII,
4136 MachineInstr::FrameSetup, false, false, nullptr,
4137 EmitAsyncCFI && !HasFP, CFAOffset);
4138 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4139 // STR XZR, [SP]
4140 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4141 .addReg(AArch64::XZR)
4142 .addReg(AArch64::SP)
4143 .addImm(0)
4145 }
4146 }
4147}
4148
4149void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4150 MachineBasicBlock &MBB) const {
4151 // Get the instructions that need to be replaced. We emit at most two of
4152 // these. Remember them in order to avoid complications coming from the need
4153 // to traverse the block while potentially creating more blocks.
4154 SmallVector<MachineInstr *, 4> ToReplace;
4155 for (MachineInstr &MI : MBB)
4156 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4157 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4158 ToReplace.push_back(&MI);
4159
4160 for (MachineInstr *MI : ToReplace) {
4161 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4162 Register ScratchReg = MI->getOperand(0).getReg();
4163 int64_t FrameSize = MI->getOperand(1).getImm();
4164 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4165 MI->getOperand(3).getImm());
4166 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4167 CFAOffset);
4168 } else {
4169 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4170 "Stack probe pseudo-instruction expected");
4171 const AArch64InstrInfo *TII =
4172 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4173 Register TargetReg = MI->getOperand(0).getReg();
4174 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4175 }
4176 MI->eraseFromParent();
4177 }
4178}
4179
4182 NotAccessed = 0, // Stack object not accessed by load/store instructions.
4183 GPR = 1 << 0, // A general purpose register.
4184 PPR = 1 << 1, // A predicate register.
4185 FPR = 1 << 2, // A floating point/Neon/SVE register.
4186 };
4187
4188 int Idx;
4190 int64_t Size;
4191 unsigned AccessTypes;
4192
4194
4195 bool operator<(const StackAccess &Rhs) const {
4196 return std::make_tuple(start(), Idx) <
4197 std::make_tuple(Rhs.start(), Rhs.Idx);
4198 }
4199
4200 bool isCPU() const {
4201 // Predicate register load and store instructions execute on the CPU.
4203 }
4204 bool isSME() const { return AccessTypes & AccessType::FPR; }
4205 bool isMixed() const { return isCPU() && isSME(); }
4206
4207 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
4208 int64_t end() const { return start() + Size; }
4209
4210 std::string getTypeString() const {
4211 switch (AccessTypes) {
4212 case AccessType::FPR:
4213 return "FPR";
4214 case AccessType::PPR:
4215 return "PPR";
4216 case AccessType::GPR:
4217 return "GPR";
4219 return "NA";
4220 default:
4221 return "Mixed";
4222 }
4223 }
4224
4225 void print(raw_ostream &OS) const {
4226 OS << getTypeString() << " stack object at [SP"
4227 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
4228 if (Offset.getScalable())
4229 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
4230 << " * vscale";
4231 OS << "]";
4232 }
4233};
4234
4235static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4236 SA.print(OS);
4237 return OS;
4238}
4239
4240void AArch64FrameLowering::emitRemarks(
4241 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4242
4243 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4245 return;
4246
4247 unsigned StackHazardSize = getStackHazardSize(MF);
4248 const uint64_t HazardSize =
4249 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4250
4251 if (HazardSize == 0)
4252 return;
4253
4254 const MachineFrameInfo &MFI = MF.getFrameInfo();
4255 // Bail if function has no stack objects.
4256 if (!MFI.hasStackObjects())
4257 return;
4258
4259 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4260
4261 size_t NumFPLdSt = 0;
4262 size_t NumNonFPLdSt = 0;
4263
4264 // Collect stack accesses via Load/Store instructions.
4265 for (const MachineBasicBlock &MBB : MF) {
4266 for (const MachineInstr &MI : MBB) {
4267 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4268 continue;
4269 for (MachineMemOperand *MMO : MI.memoperands()) {
4270 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4271 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4272 int FrameIdx = *FI;
4273
4274 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4275 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4276 StackAccesses[ArrIdx].Idx = FrameIdx;
4277 StackAccesses[ArrIdx].Offset =
4278 getFrameIndexReferenceFromSP(MF, FrameIdx);
4279 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4280 }
4281
4282 unsigned RegTy = StackAccess::AccessType::GPR;
4283 if (MFI.hasScalableStackID(FrameIdx)) {
4284 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
4285 // spill/fill the predicate as a data vector (so are an FPR access).
4286 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
4287 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
4288 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
4289 RegTy = StackAccess::PPR;
4290 } else
4291 RegTy = StackAccess::FPR;
4292 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
4293 RegTy = StackAccess::FPR;
4294 }
4295
4296 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4297
4298 if (RegTy == StackAccess::FPR)
4299 ++NumFPLdSt;
4300 else
4301 ++NumNonFPLdSt;
4302 }
4303 }
4304 }
4305 }
4306
4307 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4308 return;
4309
4310 llvm::sort(StackAccesses);
4311 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4313 });
4314
4317
4318 if (StackAccesses.front().isMixed())
4319 MixedObjects.push_back(&StackAccesses.front());
4320
4321 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4322 It != End; ++It) {
4323 const auto &First = *It;
4324 const auto &Second = *(It + 1);
4325
4326 if (Second.isMixed())
4327 MixedObjects.push_back(&Second);
4328
4329 if ((First.isSME() && Second.isCPU()) ||
4330 (First.isCPU() && Second.isSME())) {
4331 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4332 if (Distance < HazardSize)
4333 HazardPairs.emplace_back(&First, &Second);
4334 }
4335 }
4336
4337 auto EmitRemark = [&](llvm::StringRef Str) {
4338 ORE->emit([&]() {
4339 auto R = MachineOptimizationRemarkAnalysis(
4340 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4341 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4342 });
4343 };
4344
4345 for (const auto &P : HazardPairs)
4346 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4347
4348 for (const auto *Obj : MixedObjects)
4349 EmitRemark(
4350 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4351}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool arePPRsSpilledAsZPR(const MachineFunction &MF)
Returns true if PPRs are spilled as ZPRs.
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
A set of register units used to track register liveness.
bool available(MCRegister Reg) const
Returns true if no part of physical register Reg is live.
LLVM_ABI void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
LLVM_ABI void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Pass interface - Implemented by all 'passes'.
Definition Pass.h:99
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:338
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2038
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:644
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:634
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:754
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1712
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1624
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1738
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2100
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1877
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray