LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// | <hazard padding> |
60// |-----------------------------------|
61// | |
62// | callee-saved fp/simd/SVE regs |
63// | |
64// |-----------------------------------|
65// | |
66// | SVE stack objects |
67// | |
68// |-----------------------------------|
69// |.empty.space.to.make.part.below....|
70// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
71// |.the.standard.16-byte.alignment....| compile time; if present)
72// |-----------------------------------|
73// | local variables of fixed size |
74// | including spill slots |
75// | <FPR> |
76// | <hazard padding> |
77// | <GPR> |
78// |-----------------------------------| <- bp(not defined by ABI,
79// |.variable-sized.local.variables....| LLVM chooses X19)
80// |.(VLAs)............................| (size of this area is unknown at
81// |...................................| compile time)
82// |-----------------------------------| <- sp
83// | | Lower address
84//
85//
86// To access the data in a frame, at-compile time, a constant offset must be
87// computable from one of the pointers (fp, bp, sp) to access it. The size
88// of the areas with a dotted background cannot be computed at compile-time
89// if they are present, making it required to have all three of fp, bp and
90// sp to be set up to be able to access all contents in the frame areas,
91// assuming all of the frame areas are non-empty.
92//
93// For most functions, some of the frame areas are empty. For those functions,
94// it may not be necessary to set up fp or bp:
95// * A base pointer is definitely needed when there are both VLAs and local
96// variables with more-than-default alignment requirements.
97// * A frame pointer is definitely needed when there are local variables with
98// more-than-default alignment requirements.
99//
100// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
101// callee-saved area, since the unwind encoding does not allow for encoding
102// this dynamically and existing tools depend on this layout. For other
103// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
104// area to allow SVE stack objects (allocated directly below the callee-saves,
105// if available) to be accessed directly from the framepointer.
106// The SVE spill/fill instructions have VL-scaled addressing modes such
107// as:
108// ldr z8, [fp, #-7 mul vl]
109// For SVE the size of the vector length (VL) is not known at compile-time, so
110// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
111// layout, we don't need to add an unscaled offset to the framepointer before
112// accessing the SVE object in the frame.
113//
114// In some cases when a base pointer is not strictly needed, it is generated
115// anyway when offsets from the frame pointer to access local variables become
116// so large that the offset can't be encoded in the immediate fields of loads
117// or stores.
118//
119// Outgoing function arguments must be at the bottom of the stack frame when
120// calling another function. If we do not have variable-sized stack objects, we
121// can allocate a "reserved call frame" area at the bottom of the local
122// variable area, large enough for all outgoing calls. If we do have VLAs, then
123// the stack pointer must be decremented and incremented around each call to
124// make space for the arguments below the VLAs.
125//
126// FIXME: also explain the redzone concept.
127//
128// About stack hazards: Under some SME contexts, a coprocessor with its own
129// separate cache can used for FP operations. This can create hazards if the CPU
130// and the SME unit try to access the same area of memory, including if the
131// access is to an area of the stack. To try to alleviate this we attempt to
132// introduce extra padding into the stack frame between FP and GPR accesses,
133// controlled by the aarch64-stack-hazard-size option. Without changing the
134// layout of the stack frame in the diagram above, a stack object of size
135// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
136// to the stack objects section, and stack objects are sorted so that FPR >
137// Hazard padding slot > GPRs (where possible). Unfortunately some things are
138// not handled well (VLA area, arguments on the stack, objects with both GPR and
139// FPR accesses), but if those are controlled by the user then the entire stack
140// frame becomes GPR at the start/end with FPR in the middle, surrounded by
141// Hazard padding.
142//
143// An example of the prologue:
144//
145// .globl __foo
146// .align 2
147// __foo:
148// Ltmp0:
149// .cfi_startproc
150// .cfi_personality 155, ___gxx_personality_v0
151// Leh_func_begin:
152// .cfi_lsda 16, Lexception33
153//
154// stp xa,bx, [sp, -#offset]!
155// ...
156// stp x28, x27, [sp, #offset-32]
157// stp fp, lr, [sp, #offset-16]
158// add fp, sp, #offset - 16
159// sub sp, sp, #1360
160//
161// The Stack:
162// +-------------------------------------------+
163// 10000 | ........ | ........ | ........ | ........ |
164// 10004 | ........ | ........ | ........ | ........ |
165// +-------------------------------------------+
166// 10008 | ........ | ........ | ........ | ........ |
167// 1000c | ........ | ........ | ........ | ........ |
168// +===========================================+
169// 10010 | X28 Register |
170// 10014 | X28 Register |
171// +-------------------------------------------+
172// 10018 | X27 Register |
173// 1001c | X27 Register |
174// +===========================================+
175// 10020 | Frame Pointer |
176// 10024 | Frame Pointer |
177// +-------------------------------------------+
178// 10028 | Link Register |
179// 1002c | Link Register |
180// +===========================================+
181// 10030 | ........ | ........ | ........ | ........ |
182// 10034 | ........ | ........ | ........ | ........ |
183// +-------------------------------------------+
184// 10038 | ........ | ........ | ........ | ........ |
185// 1003c | ........ | ........ | ........ | ........ |
186// +-------------------------------------------+
187//
188// [sp] = 10030 :: >>initial value<<
189// sp = 10020 :: stp fp, lr, [sp, #-16]!
190// fp = sp == 10020 :: mov fp, sp
191// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
192// sp == 10010 :: >>final value<<
193//
194// The frame pointer (w29) points to address 10020. If we use an offset of
195// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
196// for w27, and -32 for w28:
197//
198// Ltmp1:
199// .cfi_def_cfa w29, 16
200// Ltmp2:
201// .cfi_offset w30, -8
202// Ltmp3:
203// .cfi_offset w29, -16
204// Ltmp4:
205// .cfi_offset w27, -24
206// Ltmp5:
207// .cfi_offset w28, -32
208//
209//===----------------------------------------------------------------------===//
210
211#include "AArch64FrameLowering.h"
212#include "AArch64InstrInfo.h"
214#include "AArch64RegisterInfo.h"
215#include "AArch64Subtarget.h"
219#include "llvm/ADT/ScopeExit.h"
220#include "llvm/ADT/SmallVector.h"
221#include "llvm/ADT/Statistic.h"
239#include "llvm/IR/Attributes.h"
240#include "llvm/IR/CallingConv.h"
241#include "llvm/IR/DataLayout.h"
242#include "llvm/IR/DebugLoc.h"
243#include "llvm/IR/Function.h"
244#include "llvm/MC/MCAsmInfo.h"
245#include "llvm/MC/MCDwarf.h"
247#include "llvm/Support/Debug.h"
254#include <cassert>
255#include <cstdint>
256#include <iterator>
257#include <optional>
258#include <vector>
259
260using namespace llvm;
261
262#define DEBUG_TYPE "frame-info"
263
264static cl::opt<bool> EnableRedZone("aarch64-redzone",
265 cl::desc("enable use of redzone on AArch64"),
266 cl::init(false), cl::Hidden);
267
269 "stack-tagging-merge-settag",
270 cl::desc("merge settag instruction in function epilog"), cl::init(true),
271 cl::Hidden);
272
273static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
274 cl::desc("sort stack allocations"),
275 cl::init(true), cl::Hidden);
276
278 "homogeneous-prolog-epilog", cl::Hidden,
279 cl::desc("Emit homogeneous prologue and epilogue for the size "
280 "optimization (default = off)"));
281
282// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
284 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
285 cl::Hidden);
286// Whether to insert padding into non-streaming functions (for testing).
287static cl::opt<bool>
288 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
289 cl::init(false), cl::Hidden);
290
292 "aarch64-disable-multivector-spill-fill",
293 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
294 cl::Hidden);
295
296STATISTIC(NumRedZoneFunctions, "Number of functions using red zone");
297
298/// Returns how much of the incoming argument stack area (in bytes) we should
299/// clean up in an epilogue. For the C calling convention this will be 0, for
300/// guaranteed tail call conventions it can be positive (a normal return or a
301/// tail call to a function that uses less stack space for arguments) or
302/// negative (for a tail call to a function that needs more stack space than us
303/// for arguments).
308 bool IsTailCallReturn = (MBB.end() != MBBI)
310 : false;
311
312 int64_t ArgumentPopSize = 0;
313 if (IsTailCallReturn) {
314 MachineOperand &StackAdjust = MBBI->getOperand(1);
315
316 // For a tail-call in a callee-pops-arguments environment, some or all of
317 // the stack may actually be in use for the call's arguments, this is
318 // calculated during LowerCall and consumed here...
319 ArgumentPopSize = StackAdjust.getImm();
320 } else {
321 // ... otherwise the amount to pop is *all* of the argument space,
322 // conveniently stored in the MachineFunctionInfo by
323 // LowerFormalArguments. This will, of course, be zero for the C calling
324 // convention.
325 ArgumentPopSize = AFI->getArgumentStackToRestore();
326 }
327
328 return ArgumentPopSize;
329}
330
332static bool needsWinCFI(const MachineFunction &MF);
335 bool HasCall = false);
336static bool requiresSaveVG(const MachineFunction &MF);
337
338// Conservatively, returns true if the function is likely to have an SVE vectors
339// on the stack. This function is safe to be called before callee-saves or
340// object offsets have been determined.
342 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
343 if (AFI->isSVECC())
344 return true;
345
346 if (AFI->hasCalculatedStackSizeSVE())
347 return bool(getSVEStackSize(MF));
348
349 const MachineFrameInfo &MFI = MF.getFrameInfo();
350 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
352 return true;
353 }
354
355 return false;
356}
357
358/// Returns true if a homogeneous prolog or epilog code can be emitted
359/// for the size optimization. If possible, a frame helper call is injected.
360/// When Exit block is given, this check is for epilog.
361bool AArch64FrameLowering::homogeneousPrologEpilog(
362 MachineFunction &MF, MachineBasicBlock *Exit) const {
363 if (!MF.getFunction().hasMinSize())
364 return false;
366 return false;
367 if (EnableRedZone)
368 return false;
369
370 // TODO: Window is supported yet.
371 if (needsWinCFI(MF))
372 return false;
373
374 // TODO: SVE is not supported yet.
376 return false;
377
378 // Bail on stack adjustment needed on return for simplicity.
379 const MachineFrameInfo &MFI = MF.getFrameInfo();
381 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
382 return false;
383 if (Exit && getArgumentStackToRestore(MF, *Exit))
384 return false;
385
386 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
387 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
388 return false;
389
390 // If there are an odd number of GPRs before LR and FP in the CSRs list,
391 // they will not be paired into one RegPairInfo, which is incompatible with
392 // the assumption made by the homogeneous prolog epilog pass.
393 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
394 unsigned NumGPRs = 0;
395 for (unsigned I = 0; CSRegs[I]; ++I) {
396 Register Reg = CSRegs[I];
397 if (Reg == AArch64::LR) {
398 assert(CSRegs[I + 1] == AArch64::FP);
399 if (NumGPRs % 2 != 0)
400 return false;
401 break;
402 }
403 if (AArch64::GPR64RegClass.contains(Reg))
404 ++NumGPRs;
405 }
406
407 return true;
408}
409
410/// Returns true if CSRs should be paired.
411bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
412 return produceCompactUnwindFrame(MF) || homogeneousPrologEpilog(MF);
413}
414
415/// This is the biggest offset to the stack pointer we can encode in aarch64
416/// instructions (without using a separate calculation and a temp register).
417/// Note that the exception here are vector stores/loads which cannot encode any
418/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
419static const unsigned DefaultSafeSPDisplacement = 255;
420
421/// Look at each instruction that references stack frames and return the stack
422/// size limit beyond which some of these instructions will require a scratch
423/// register during their expansion later.
425 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
426 // range. We'll end up allocating an unnecessary spill slot a lot, but
427 // realistically that's not a big deal at this stage of the game.
428 for (MachineBasicBlock &MBB : MF) {
429 for (MachineInstr &MI : MBB) {
430 if (MI.isDebugInstr() || MI.isPseudo() ||
431 MI.getOpcode() == AArch64::ADDXri ||
432 MI.getOpcode() == AArch64::ADDSXri)
433 continue;
434
435 for (const MachineOperand &MO : MI.operands()) {
436 if (!MO.isFI())
437 continue;
438
440 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
442 return 0;
443 }
444 }
445 }
447}
448
452}
453
454/// Returns the size of the fixed object area (allocated next to sp on entry)
455/// On Win64 this may include a var args area and an UnwindHelp object for EH.
456static unsigned getFixedObjectSize(const MachineFunction &MF,
457 const AArch64FunctionInfo *AFI, bool IsWin64,
458 bool IsFunclet) {
459 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
460 "Tail call reserved stack must be aligned to 16 bytes");
461 if (!IsWin64 || IsFunclet) {
462 return AFI->getTailCallReservedStack();
463 } else {
464 if (AFI->getTailCallReservedStack() != 0 &&
466 Attribute::SwiftAsync))
467 report_fatal_error("cannot generate ABI-changing tail call for Win64");
468 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
469
470 // Var args are stored here in the primary function.
471 FixedObjectSize += AFI->getVarArgsGPRSize();
472
473 if (MF.hasEHFunclets()) {
474 // Catch objects are stored here in the primary function.
475 const MachineFrameInfo &MFI = MF.getFrameInfo();
476 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
477 SmallSetVector<int, 8> CatchObjFrameIndices;
478 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
479 for (const WinEHHandlerType &H : TBME.HandlerArray) {
480 int FrameIndex = H.CatchObj.FrameIndex;
481 if ((FrameIndex != INT_MAX) &&
482 CatchObjFrameIndices.insert(FrameIndex)) {
483 FixedObjectSize = alignTo(FixedObjectSize,
484 MFI.getObjectAlign(FrameIndex).value()) +
485 MFI.getObjectSize(FrameIndex);
486 }
487 }
488 }
489 // To support EH funclets we allocate an UnwindHelp object
490 FixedObjectSize += 8;
491 }
492 return alignTo(FixedObjectSize, 16);
493 }
494}
495
496/// Returns the size of the entire SVE stackframe (calleesaves + spills).
499 return StackOffset::getScalable((int64_t)AFI->getStackSizeSVE());
500}
501
503 if (!EnableRedZone)
504 return false;
505
506 // Don't use the red zone if the function explicitly asks us not to.
507 // This is typically used for kernel code.
508 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
509 const unsigned RedZoneSize =
511 if (!RedZoneSize)
512 return false;
513
514 const MachineFrameInfo &MFI = MF.getFrameInfo();
516 uint64_t NumBytes = AFI->getLocalStackSize();
517
518 // If neither NEON or SVE are available, a COPY from one Q-reg to
519 // another requires a spill -> reload sequence. We can do that
520 // using a pre-decrementing store/post-decrementing load, but
521 // if we do so, we can't use the Red Zone.
522 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
523 !Subtarget.isNeonAvailable() &&
524 !Subtarget.hasSVE();
525
526 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
527 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
528}
529
530/// hasFPImpl - Return true if the specified function should have a dedicated
531/// frame pointer register.
533 const MachineFrameInfo &MFI = MF.getFrameInfo();
534 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
536
537 // Win64 EH requires a frame pointer if funclets are present, as the locals
538 // are accessed off the frame pointer in both the parent function and the
539 // funclets.
540 if (MF.hasEHFunclets())
541 return true;
542 // Retain behavior of always omitting the FP for leaf functions when possible.
544 return true;
545 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
546 MFI.hasStackMap() || MFI.hasPatchPoint() ||
547 RegInfo->hasStackRealignment(MF))
548 return true;
549
550 // If we:
551 //
552 // 1. Have streaming mode changes
553 // OR:
554 // 2. Have a streaming body with SVE stack objects
555 //
556 // Then the value of VG restored when unwinding to this function may not match
557 // the value of VG used to set up the stack.
558 //
559 // This is a problem as the CFA can be described with an expression of the
560 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
561 //
562 // If the value of VG used in that expression does not match the value used to
563 // set up the stack, an incorrect address for the CFA will be computed, and
564 // unwinding will fail.
565 //
566 // We work around this issue by ensuring the frame-pointer can describe the
567 // CFA in either of these cases.
568 if (AFI.needsDwarfUnwindInfo(MF) &&
570 (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
571 return true;
572 // With large callframes around we may need to use FP to access the scavenging
573 // emergency spillslot.
574 //
575 // Unfortunately some calls to hasFP() like machine verifier ->
576 // getReservedReg() -> hasFP in the middle of global isel are too early
577 // to know the max call frame size. Hopefully conservatively returning "true"
578 // in those cases is fine.
579 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
580 if (!MFI.isMaxCallFrameSizeComputed() ||
582 return true;
583
584 return false;
585}
586
587/// Should the Frame Pointer be reserved for the current function?
589 const TargetMachine &TM = MF.getTarget();
590 const Triple &TT = TM.getTargetTriple();
591
592 // These OSes require the frame chain is valid, even if the current frame does
593 // not use a frame pointer.
594 if (TT.isOSDarwin() || TT.isOSWindows())
595 return true;
596
597 // If the function has a frame pointer, it is reserved.
598 if (hasFP(MF))
599 return true;
600
601 // Frontend has requested to preserve the frame pointer.
602 if (TM.Options.FramePointerIsReserved(MF))
603 return true;
604
605 return false;
606}
607
608/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
609/// not required, we reserve argument space for call sites in the function
610/// immediately on entry to the current function. This eliminates the need for
611/// add/sub sp brackets around call sites. Returns true if the call frame is
612/// included as part of the stack frame.
614 const MachineFunction &MF) const {
615 // The stack probing code for the dynamically allocated outgoing arguments
616 // area assumes that the stack is probed at the top - either by the prologue
617 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
618 // most recent variable-sized object allocation. Changing the condition here
619 // may need to be followed up by changes to the probe issuing logic.
620 return !MF.getFrameInfo().hasVarSizedObjects();
621}
622
626 const AArch64InstrInfo *TII =
627 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
628 const AArch64TargetLowering *TLI =
629 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
630 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
631 DebugLoc DL = I->getDebugLoc();
632 unsigned Opc = I->getOpcode();
633 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
634 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
635
636 if (!hasReservedCallFrame(MF)) {
637 int64_t Amount = I->getOperand(0).getImm();
638 Amount = alignTo(Amount, getStackAlign());
639 if (!IsDestroy)
640 Amount = -Amount;
641
642 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
643 // doesn't have to pop anything), then the first operand will be zero too so
644 // this adjustment is a no-op.
645 if (CalleePopAmount == 0) {
646 // FIXME: in-function stack adjustment for calls is limited to 24-bits
647 // because there's no guaranteed temporary register available.
648 //
649 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
650 // 1) For offset <= 12-bit, we use LSL #0
651 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
652 // LSL #0, and the other uses LSL #12.
653 //
654 // Most call frames will be allocated at the start of a function so
655 // this is OK, but it is a limitation that needs dealing with.
656 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
657
658 if (TLI->hasInlineStackProbe(MF) &&
660 // When stack probing is enabled, the decrement of SP may need to be
661 // probed. We only need to do this if the call site needs 1024 bytes of
662 // space or more, because a region smaller than that is allowed to be
663 // unprobed at an ABI boundary. We rely on the fact that SP has been
664 // probed exactly at this point, either by the prologue or most recent
665 // dynamic allocation.
667 "non-reserved call frame without var sized objects?");
668 Register ScratchReg =
669 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
670 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
671 } else {
672 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
673 StackOffset::getFixed(Amount), TII);
674 }
675 }
676 } else if (CalleePopAmount != 0) {
677 // If the calling convention demands that the callee pops arguments from the
678 // stack, we want to add it back if we have a reserved call frame.
679 assert(CalleePopAmount < 0xffffff && "call frame too large");
680 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
681 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
682 }
683 return MBB.erase(I);
684}
685
686void AArch64FrameLowering::emitCalleeSavedGPRLocations(
689 MachineFrameInfo &MFI = MF.getFrameInfo();
690
691 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
692 if (CSI.empty())
693 return;
694
696 for (const auto &Info : CSI) {
697 unsigned FrameIdx = Info.getFrameIdx();
698 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector)
699 continue;
700
701 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
702 int64_t Offset = MFI.getObjectOffset(FrameIdx) - getOffsetOfLocalArea();
703 CFIBuilder.buildOffset(Info.getReg(), Offset);
704 }
705}
706
707void AArch64FrameLowering::emitCalleeSavedSVELocations(
710 MachineFrameInfo &MFI = MF.getFrameInfo();
711
712 // Add callee saved registers to move list.
713 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
714 if (CSI.empty())
715 return;
716
717 const TargetSubtargetInfo &STI = MF.getSubtarget();
718 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
721
722 std::optional<int64_t> IncomingVGOffsetFromDefCFA;
723 if (requiresSaveVG(MF)) {
724 auto IncomingVG = *find_if(
725 reverse(CSI), [](auto &Info) { return Info.getReg() == AArch64::VG; });
726 IncomingVGOffsetFromDefCFA =
727 MFI.getObjectOffset(IncomingVG.getFrameIdx()) - getOffsetOfLocalArea();
728 }
729
730 for (const auto &Info : CSI) {
731 if (MFI.getStackID(Info.getFrameIdx()) != TargetStackID::ScalableVector)
732 continue;
733
734 // Not all unwinders may know about SVE registers, so assume the lowest
735 // common denominator.
736 assert(!Info.isSpilledToReg() && "Spilling to registers not implemented");
737 MCRegister Reg = Info.getReg();
738 if (!static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
739 continue;
740
742 StackOffset::getScalable(MFI.getObjectOffset(Info.getFrameIdx())) -
744
745 CFIBuilder.insertCFIInst(
746 createCFAOffset(TRI, Reg, Offset, IncomingVGOffsetFromDefCFA));
747 }
748}
749
751 MachineBasicBlock &MBB) const {
752
754 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
755 const auto &TRI = *Subtarget.getRegisterInfo();
756 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
757
759
760 // Reset the CFA to `SP + 0`.
761 CFIBuilder.buildDefCFA(AArch64::SP, 0);
762
763 // Flip the RA sign state.
764 if (MFI.shouldSignReturnAddress(MF))
765 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
766 : CFIBuilder.buildNegateRAState();
767
768 // Shadow call stack uses X18, reset it.
769 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
770 CFIBuilder.buildSameValue(AArch64::X18);
771
772 // Emit .cfi_same_value for callee-saved registers.
773 const std::vector<CalleeSavedInfo> &CSI =
775 for (const auto &Info : CSI) {
776 MCRegister Reg = Info.getReg();
777 if (!TRI.regNeedsCFI(Reg, Reg))
778 continue;
779 CFIBuilder.buildSameValue(Reg);
780 }
781}
782
785 bool SVE) {
787 MachineFrameInfo &MFI = MF.getFrameInfo();
788
789 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
790 if (CSI.empty())
791 return;
792
793 const TargetSubtargetInfo &STI = MF.getSubtarget();
794 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
796
797 for (const auto &Info : CSI) {
798 if (SVE !=
799 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
800 continue;
801
802 MCRegister Reg = Info.getReg();
803 if (SVE &&
804 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
805 continue;
806
807 CFIBuilder.buildRestore(Info.getReg());
808 }
809}
810
811void AArch64FrameLowering::emitCalleeSavedGPRRestores(
814}
815
816void AArch64FrameLowering::emitCalleeSavedSVERestores(
819}
820
821// Return the maximum possible number of bytes for `Size` due to the
822// architectural limit on the size of a SVE register.
823static int64_t upperBound(StackOffset Size) {
824 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
825 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
826}
827
828void AArch64FrameLowering::allocateStackSpace(
830 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
831 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
832 bool FollowupAllocs) const {
833
834 if (!AllocSize)
835 return;
836
837 DebugLoc DL;
839 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
840 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
842 const MachineFrameInfo &MFI = MF.getFrameInfo();
843
844 const int64_t MaxAlign = MFI.getMaxAlign().value();
845 const uint64_t AndMask = ~(MaxAlign - 1);
846
847 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
848 Register TargetReg = RealignmentPadding
850 : AArch64::SP;
851 // SUB Xd/SP, SP, AllocSize
852 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
853 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
854 EmitCFI, InitialOffset);
855
856 if (RealignmentPadding) {
857 // AND SP, X9, 0b11111...0000
858 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
859 .addReg(TargetReg, RegState::Kill)
862 AFI.setStackRealigned(true);
863
864 // No need for SEH instructions here; if we're realigning the stack,
865 // we've set a frame pointer and already finished the SEH prologue.
866 assert(!NeedsWinCFI);
867 }
868 return;
869 }
870
871 //
872 // Stack probing allocation.
873 //
874
875 // Fixed length allocation. If we don't need to re-align the stack and don't
876 // have SVE objects, we can use a more efficient sequence for stack probing.
877 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
879 assert(ScratchReg != AArch64::NoRegister);
880 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
881 .addDef(ScratchReg)
882 .addImm(AllocSize.getFixed())
883 .addImm(InitialOffset.getFixed())
884 .addImm(InitialOffset.getScalable());
885 // The fixed allocation may leave unprobed bytes at the top of the
886 // stack. If we have subsequent allocation (e.g. if we have variable-sized
887 // objects), we need to issue an extra probe, so these allocations start in
888 // a known state.
889 if (FollowupAllocs) {
890 // STR XZR, [SP]
891 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
892 .addReg(AArch64::XZR)
893 .addReg(AArch64::SP)
894 .addImm(0)
896 }
897
898 return;
899 }
900
901 // Variable length allocation.
902
903 // If the (unknown) allocation size cannot exceed the probe size, decrement
904 // the stack pointer right away.
905 int64_t ProbeSize = AFI.getStackProbeSize();
906 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
907 Register ScratchReg = RealignmentPadding
909 : AArch64::SP;
910 assert(ScratchReg != AArch64::NoRegister);
911 // SUB Xd, SP, AllocSize
912 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
913 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
914 EmitCFI, InitialOffset);
915 if (RealignmentPadding) {
916 // AND SP, Xn, 0b11111...0000
917 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
918 .addReg(ScratchReg, RegState::Kill)
921 AFI.setStackRealigned(true);
922 }
923 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
925 // STR XZR, [SP]
926 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
927 .addReg(AArch64::XZR)
928 .addReg(AArch64::SP)
929 .addImm(0)
931 }
932 return;
933 }
934
935 // Emit a variable-length allocation probing loop.
936 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
937 // each of them guaranteed to adjust the stack by less than the probe size.
939 assert(TargetReg != AArch64::NoRegister);
940 // SUB Xd, SP, AllocSize
941 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
942 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
943 EmitCFI, InitialOffset);
944 if (RealignmentPadding) {
945 // AND Xn, Xn, 0b11111...0000
946 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
947 .addReg(TargetReg, RegState::Kill)
950 }
951
952 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
953 .addReg(TargetReg);
954 if (EmitCFI) {
955 // Set the CFA register back to SP.
957 .buildDefCFARegister(AArch64::SP);
958 }
959 if (RealignmentPadding)
960 AFI.setStackRealigned(true);
961}
962
963static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE) {
964 switch (Reg.id()) {
965 default:
966 // The called routine is expected to preserve r19-r28
967 // r29 and r30 are used as frame pointer and link register resp.
968 return 0;
969
970 // GPRs
971#define CASE(n) \
972 case AArch64::W##n: \
973 case AArch64::X##n: \
974 return AArch64::X##n
975 CASE(0);
976 CASE(1);
977 CASE(2);
978 CASE(3);
979 CASE(4);
980 CASE(5);
981 CASE(6);
982 CASE(7);
983 CASE(8);
984 CASE(9);
985 CASE(10);
986 CASE(11);
987 CASE(12);
988 CASE(13);
989 CASE(14);
990 CASE(15);
991 CASE(16);
992 CASE(17);
993 CASE(18);
994#undef CASE
995
996 // FPRs
997#define CASE(n) \
998 case AArch64::B##n: \
999 case AArch64::H##n: \
1000 case AArch64::S##n: \
1001 case AArch64::D##n: \
1002 case AArch64::Q##n: \
1003 return HasSVE ? AArch64::Z##n : AArch64::Q##n
1004 CASE(0);
1005 CASE(1);
1006 CASE(2);
1007 CASE(3);
1008 CASE(4);
1009 CASE(5);
1010 CASE(6);
1011 CASE(7);
1012 CASE(8);
1013 CASE(9);
1014 CASE(10);
1015 CASE(11);
1016 CASE(12);
1017 CASE(13);
1018 CASE(14);
1019 CASE(15);
1020 CASE(16);
1021 CASE(17);
1022 CASE(18);
1023 CASE(19);
1024 CASE(20);
1025 CASE(21);
1026 CASE(22);
1027 CASE(23);
1028 CASE(24);
1029 CASE(25);
1030 CASE(26);
1031 CASE(27);
1032 CASE(28);
1033 CASE(29);
1034 CASE(30);
1035 CASE(31);
1036#undef CASE
1037 }
1038}
1039
1040void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
1041 MachineBasicBlock &MBB) const {
1042 // Insertion point.
1044
1045 // Fake a debug loc.
1046 DebugLoc DL;
1047 if (MBBI != MBB.end())
1048 DL = MBBI->getDebugLoc();
1049
1050 const MachineFunction &MF = *MBB.getParent();
1052 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
1053
1054 BitVector GPRsToZero(TRI.getNumRegs());
1055 BitVector FPRsToZero(TRI.getNumRegs());
1056 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
1057 for (MCRegister Reg : RegsToZero.set_bits()) {
1058 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
1059 // For GPRs, we only care to clear out the 64-bit register.
1060 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1061 GPRsToZero.set(XReg);
1062 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
1063 // For FPRs,
1064 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
1065 FPRsToZero.set(XReg);
1066 }
1067 }
1068
1069 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1070
1071 // Zero out GPRs.
1072 for (MCRegister Reg : GPRsToZero.set_bits())
1073 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1074
1075 // Zero out FP/vector registers.
1076 for (MCRegister Reg : FPRsToZero.set_bits())
1077 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1078
1079 if (HasSVE) {
1080 for (MCRegister PReg :
1081 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1082 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1083 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1084 AArch64::P15}) {
1085 if (RegsToZero[PReg])
1086 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1087 }
1088 }
1089}
1090
1092 uint64_t StackSizeInBytes) {
1093 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1095 // TODO: When implementing stack protectors, take that into account
1096 // for the probe threshold.
1097 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1098 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1099}
1100
1102 const MachineBasicBlock &MBB) {
1103 const MachineFunction *MF = MBB.getParent();
1104 LiveRegs.addLiveIns(MBB);
1105 // Mark callee saved registers as used so we will not choose them.
1106 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1107 for (unsigned i = 0; CSRegs[i]; ++i)
1108 LiveRegs.addReg(CSRegs[i]);
1109}
1110
1111// Find a scratch register that we can use at the start of the prologue to
1112// re-align the stack pointer. We avoid using callee-save registers since they
1113// may appear to be free when this is called from canUseAsPrologue (during
1114// shrink wrapping), but then no longer be free when this is called from
1115// emitPrologue.
1116//
1117// FIXME: This is a bit conservative, since in the above case we could use one
1118// of the callee-save registers as a scratch temp to re-align the stack pointer,
1119// but we would then have to make sure that we were in fact saving at least one
1120// callee-save register in the prologue, which is additional complexity that
1121// doesn't seem worth the benefit.
1123 bool HasCall) {
1124 MachineFunction *MF = MBB->getParent();
1125
1126 // If MBB is an entry block, use X9 as the scratch register
1127 // preserve_none functions may be using X9 to pass arguments,
1128 // so prefer to pick an available register below.
1129 if (&MF->front() == MBB &&
1131 return AArch64::X9;
1132
1133 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1134 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1135 LivePhysRegs LiveRegs(TRI);
1136 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1137 if (HasCall) {
1138 LiveRegs.addReg(AArch64::X16);
1139 LiveRegs.addReg(AArch64::X17);
1140 LiveRegs.addReg(AArch64::X18);
1141 }
1142
1143 // Prefer X9 since it was historically used for the prologue scratch reg.
1144 const MachineRegisterInfo &MRI = MF->getRegInfo();
1145 if (LiveRegs.available(MRI, AArch64::X9))
1146 return AArch64::X9;
1147
1148 for (unsigned Reg : AArch64::GPR64RegClass) {
1149 if (LiveRegs.available(MRI, Reg))
1150 return Reg;
1151 }
1152 return AArch64::NoRegister;
1153}
1154
1156 const MachineBasicBlock &MBB) const {
1157 const MachineFunction *MF = MBB.getParent();
1158 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1159 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1160 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1161 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1163
1164 if (AFI->hasSwiftAsyncContext()) {
1165 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1166 const MachineRegisterInfo &MRI = MF->getRegInfo();
1167 LivePhysRegs LiveRegs(TRI);
1168 getLiveRegsForEntryMBB(LiveRegs, MBB);
1169 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1170 // available.
1171 if (!LiveRegs.available(MRI, AArch64::X16) ||
1172 !LiveRegs.available(MRI, AArch64::X17))
1173 return false;
1174 }
1175
1176 // Certain stack probing sequences might clobber flags, then we can't use
1177 // the block as a prologue if the flags register is a live-in.
1179 MBB.isLiveIn(AArch64::NZCV))
1180 return false;
1181
1182 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
1183 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
1184 return false;
1185
1186 // May need a scratch register (for return value) if require making a special
1187 // call
1188 if (requiresSaveVG(*MF) ||
1189 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
1190 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
1191 return false;
1192
1193 return true;
1194}
1195
1196static bool needsWinCFI(const MachineFunction &MF) {
1197 const Function &F = MF.getFunction();
1198 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1199 F.needsUnwindTableEntry();
1200}
1201
1203 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
1204 // and SEH_EpilogEnd instructions in the correct order.
1206 return false;
1208 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
1209 return SignReturnAddressAll;
1210}
1211
1212bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1213 MachineFunction &MF, uint64_t StackBumpBytes) const {
1215 const MachineFrameInfo &MFI = MF.getFrameInfo();
1216 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1217 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1218 if (homogeneousPrologEpilog(MF))
1219 return false;
1220
1221 if (AFI->getLocalStackSize() == 0)
1222 return false;
1223
1224 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1225 // (to force a stp with predecrement) to match the packed unwind format,
1226 // provided that there actually are any callee saved registers to merge the
1227 // decrement with.
1228 // This is potentially marginally slower, but allows using the packed
1229 // unwind format for functions that both have a local area and callee saved
1230 // registers. Using the packed unwind format notably reduces the size of
1231 // the unwind info.
1232 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1233 MF.getFunction().hasOptSize())
1234 return false;
1235
1236 // 512 is the maximum immediate for stp/ldp that will be used for
1237 // callee-save save/restores
1238 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1239 return false;
1240
1241 if (MFI.hasVarSizedObjects())
1242 return false;
1243
1244 if (RegInfo->hasStackRealignment(MF))
1245 return false;
1246
1247 // This isn't strictly necessary, but it simplifies things a bit since the
1248 // current RedZone handling code assumes the SP is adjusted by the
1249 // callee-save save/restore code.
1250 if (canUseRedZone(MF))
1251 return false;
1252
1253 // When there is an SVE area on the stack, always allocate the
1254 // callee-saves and spills/locals separately.
1255 if (getSVEStackSize(MF))
1256 return false;
1257
1258 return true;
1259}
1260
1261bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1262 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1263 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1264 return false;
1265 if (MBB.empty())
1266 return true;
1267
1268 // Disable combined SP bump if the last instruction is an MTE tag store. It
1269 // is almost always better to merge SP adjustment into those instructions.
1272 while (LastI != Begin) {
1273 --LastI;
1274 if (LastI->isTransient())
1275 continue;
1276 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1277 break;
1278 }
1279 switch (LastI->getOpcode()) {
1280 case AArch64::STGloop:
1281 case AArch64::STZGloop:
1282 case AArch64::STGi:
1283 case AArch64::STZGi:
1284 case AArch64::ST2Gi:
1285 case AArch64::STZ2Gi:
1286 return false;
1287 default:
1288 return true;
1289 }
1290 llvm_unreachable("unreachable");
1291}
1292
1293// Given a load or a store instruction, generate an appropriate unwinding SEH
1294// code on Windows.
1296 const TargetInstrInfo &TII,
1297 MachineInstr::MIFlag Flag) {
1298 unsigned Opc = MBBI->getOpcode();
1300 MachineFunction &MF = *MBB->getParent();
1301 DebugLoc DL = MBBI->getDebugLoc();
1302 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1303 int Imm = MBBI->getOperand(ImmIdx).getImm();
1305 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1306 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1307
1308 switch (Opc) {
1309 default:
1310 report_fatal_error("No SEH Opcode for this instruction");
1311 case AArch64::STR_ZXI:
1312 case AArch64::LDR_ZXI: {
1313 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1314 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1315 .addImm(Reg0)
1316 .addImm(Imm)
1317 .setMIFlag(Flag);
1318 break;
1319 }
1320 case AArch64::STR_PXI:
1321 case AArch64::LDR_PXI: {
1322 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1323 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1324 .addImm(Reg0)
1325 .addImm(Imm)
1326 .setMIFlag(Flag);
1327 break;
1328 }
1329 case AArch64::LDPDpost:
1330 Imm = -Imm;
1331 [[fallthrough]];
1332 case AArch64::STPDpre: {
1333 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1334 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1335 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1336 .addImm(Reg0)
1337 .addImm(Reg1)
1338 .addImm(Imm * 8)
1339 .setMIFlag(Flag);
1340 break;
1341 }
1342 case AArch64::LDPXpost:
1343 Imm = -Imm;
1344 [[fallthrough]];
1345 case AArch64::STPXpre: {
1346 Register Reg0 = MBBI->getOperand(1).getReg();
1347 Register Reg1 = MBBI->getOperand(2).getReg();
1348 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1349 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1350 .addImm(Imm * 8)
1351 .setMIFlag(Flag);
1352 else
1353 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1354 .addImm(RegInfo->getSEHRegNum(Reg0))
1355 .addImm(RegInfo->getSEHRegNum(Reg1))
1356 .addImm(Imm * 8)
1357 .setMIFlag(Flag);
1358 break;
1359 }
1360 case AArch64::LDRDpost:
1361 Imm = -Imm;
1362 [[fallthrough]];
1363 case AArch64::STRDpre: {
1364 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1365 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1366 .addImm(Reg)
1367 .addImm(Imm)
1368 .setMIFlag(Flag);
1369 break;
1370 }
1371 case AArch64::LDRXpost:
1372 Imm = -Imm;
1373 [[fallthrough]];
1374 case AArch64::STRXpre: {
1375 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1376 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1377 .addImm(Reg)
1378 .addImm(Imm)
1379 .setMIFlag(Flag);
1380 break;
1381 }
1382 case AArch64::STPDi:
1383 case AArch64::LDPDi: {
1384 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1385 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1386 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1387 .addImm(Reg0)
1388 .addImm(Reg1)
1389 .addImm(Imm * 8)
1390 .setMIFlag(Flag);
1391 break;
1392 }
1393 case AArch64::STPXi:
1394 case AArch64::LDPXi: {
1395 Register Reg0 = MBBI->getOperand(0).getReg();
1396 Register Reg1 = MBBI->getOperand(1).getReg();
1397 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1398 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1399 .addImm(Imm * 8)
1400 .setMIFlag(Flag);
1401 else
1402 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1403 .addImm(RegInfo->getSEHRegNum(Reg0))
1404 .addImm(RegInfo->getSEHRegNum(Reg1))
1405 .addImm(Imm * 8)
1406 .setMIFlag(Flag);
1407 break;
1408 }
1409 case AArch64::STRXui:
1410 case AArch64::LDRXui: {
1411 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1412 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1413 .addImm(Reg)
1414 .addImm(Imm * 8)
1415 .setMIFlag(Flag);
1416 break;
1417 }
1418 case AArch64::STRDui:
1419 case AArch64::LDRDui: {
1420 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1421 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1422 .addImm(Reg)
1423 .addImm(Imm * 8)
1424 .setMIFlag(Flag);
1425 break;
1426 }
1427 case AArch64::STPQi:
1428 case AArch64::LDPQi: {
1429 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1430 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1431 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1432 .addImm(Reg0)
1433 .addImm(Reg1)
1434 .addImm(Imm * 16)
1435 .setMIFlag(Flag);
1436 break;
1437 }
1438 case AArch64::LDPQpost:
1439 Imm = -Imm;
1440 [[fallthrough]];
1441 case AArch64::STPQpre: {
1442 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1443 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1444 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1445 .addImm(Reg0)
1446 .addImm(Reg1)
1447 .addImm(Imm * 16)
1448 .setMIFlag(Flag);
1449 break;
1450 }
1451 }
1452 auto I = MBB->insertAfter(MBBI, MIB);
1453 return I;
1454}
1455
1456// Fix up the SEH opcode associated with the save/restore instruction.
1458 unsigned LocalStackSize) {
1459 MachineOperand *ImmOpnd = nullptr;
1460 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1461 switch (MBBI->getOpcode()) {
1462 default:
1463 llvm_unreachable("Fix the offset in the SEH instruction");
1464 case AArch64::SEH_SaveFPLR:
1465 case AArch64::SEH_SaveRegP:
1466 case AArch64::SEH_SaveReg:
1467 case AArch64::SEH_SaveFRegP:
1468 case AArch64::SEH_SaveFReg:
1469 case AArch64::SEH_SaveAnyRegQP:
1470 case AArch64::SEH_SaveAnyRegQPX:
1471 ImmOpnd = &MBBI->getOperand(ImmIdx);
1472 break;
1473 }
1474 if (ImmOpnd)
1475 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1476}
1477
1480 return AFI->hasStreamingModeChanges() &&
1481 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1482}
1483
1484static bool requiresSaveVG(const MachineFunction &MF) {
1486 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1487 return false;
1488 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1489 // is enabled with streaming mode changes.
1490 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1491 if (ST.isTargetDarwin())
1492 return ST.hasSVE();
1493 return true;
1494}
1495
1496static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO,
1497 RTLIB::Libcall LC) {
1498 return MO.isSymbol() &&
1499 StringRef(TLI.getLibcallName(LC)) == MO.getSymbolName();
1500}
1501
1503 const TargetLowering &TLI) {
1504 unsigned Opc = MBBI->getOpcode();
1505 if (Opc == AArch64::CNTD_XPiI)
1506 return true;
1507
1508 if (!requiresGetVGCall(*MBBI->getMF()))
1509 return false;
1510
1511 if (Opc == AArch64::BL)
1512 return matchLibcall(TLI, MBBI->getOperand(0), RTLIB::SMEABI_GET_CURRENT_VG);
1513
1514 return Opc == TargetOpcode::COPY;
1515}
1516
1517// Convert callee-save register save/restore instruction to do stack pointer
1518// decrement/increment to allocate/deallocate the callee-save stack area by
1519// converting store/load to use pre/post increment version.
1522 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1523 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1525 int CFAOffset = 0) {
1526 unsigned NewOpc;
1527
1528 // If the function contains streaming mode changes, we expect instructions
1529 // to calculate the value of VG before spilling. Move past these instructions
1530 // if necessary.
1531 MachineFunction &MF = *MBB.getParent();
1532 if (requiresSaveVG(MF)) {
1533 auto &TLI = *MF.getSubtarget().getTargetLowering();
1534 while (isVGInstruction(MBBI, TLI))
1535 ++MBBI;
1536 }
1537
1538 switch (MBBI->getOpcode()) {
1539 default:
1540 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1541 case AArch64::STPXi:
1542 NewOpc = AArch64::STPXpre;
1543 break;
1544 case AArch64::STPDi:
1545 NewOpc = AArch64::STPDpre;
1546 break;
1547 case AArch64::STPQi:
1548 NewOpc = AArch64::STPQpre;
1549 break;
1550 case AArch64::STRXui:
1551 NewOpc = AArch64::STRXpre;
1552 break;
1553 case AArch64::STRDui:
1554 NewOpc = AArch64::STRDpre;
1555 break;
1556 case AArch64::STRQui:
1557 NewOpc = AArch64::STRQpre;
1558 break;
1559 case AArch64::LDPXi:
1560 NewOpc = AArch64::LDPXpost;
1561 break;
1562 case AArch64::LDPDi:
1563 NewOpc = AArch64::LDPDpost;
1564 break;
1565 case AArch64::LDPQi:
1566 NewOpc = AArch64::LDPQpost;
1567 break;
1568 case AArch64::LDRXui:
1569 NewOpc = AArch64::LDRXpost;
1570 break;
1571 case AArch64::LDRDui:
1572 NewOpc = AArch64::LDRDpost;
1573 break;
1574 case AArch64::LDRQui:
1575 NewOpc = AArch64::LDRQpost;
1576 break;
1577 }
1578 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1579 int64_t MinOffset, MaxOffset;
1580 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1581 NewOpc, Scale, Width, MinOffset, MaxOffset);
1582 (void)Success;
1583 assert(Success && "unknown load/store opcode");
1584
1585 // If the first store isn't right where we want SP then we can't fold the
1586 // update in so create a normal arithmetic instruction instead.
1587 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1588 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1589 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1590 // If we are destroying the frame, make sure we add the increment after the
1591 // last frame operation.
1592 if (FrameFlag == MachineInstr::FrameDestroy) {
1593 ++MBBI;
1594 // Also skip the SEH instruction, if needed
1595 if (NeedsWinCFI && AArch64InstrInfo::isSEHInstruction(*MBBI))
1596 ++MBBI;
1597 }
1598 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1599 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1600 false, NeedsWinCFI, HasWinCFI, EmitCFI,
1601 StackOffset::getFixed(CFAOffset));
1602
1603 return std::prev(MBBI);
1604 }
1605
1606 // Get rid of the SEH code associated with the old instruction.
1607 if (NeedsWinCFI) {
1608 auto SEH = std::next(MBBI);
1610 SEH->eraseFromParent();
1611 }
1612
1613 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1614 MIB.addReg(AArch64::SP, RegState::Define);
1615
1616 // Copy all operands other than the immediate offset.
1617 unsigned OpndIdx = 0;
1618 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1619 ++OpndIdx)
1620 MIB.add(MBBI->getOperand(OpndIdx));
1621
1622 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1623 "Unexpected immediate offset in first/last callee-save save/restore "
1624 "instruction!");
1625 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1626 "Unexpected base register in callee-save save/restore instruction!");
1627 assert(CSStackSizeInc % Scale == 0);
1628 MIB.addImm(CSStackSizeInc / (int)Scale);
1629
1630 MIB.setMIFlags(MBBI->getFlags());
1631 MIB.setMemRefs(MBBI->memoperands());
1632
1633 // Generate a new SEH code that corresponds to the new instruction.
1634 if (NeedsWinCFI) {
1635 *HasWinCFI = true;
1636 InsertSEH(*MIB, *TII, FrameFlag);
1637 }
1638
1639 if (EmitCFI)
1640 CFIInstBuilder(MBB, MBBI, FrameFlag)
1641 .buildDefCFAOffset(CFAOffset - CSStackSizeInc);
1642
1643 return std::prev(MBB.erase(MBBI));
1644}
1645
1646// Fixup callee-save register save/restore instructions to take into account
1647// combined SP bump by adding the local stack size to the stack offsets.
1649 uint64_t LocalStackSize,
1650 bool NeedsWinCFI,
1651 bool *HasWinCFI) {
1653 return;
1654
1655 unsigned Opc = MI.getOpcode();
1656 unsigned Scale;
1657 switch (Opc) {
1658 case AArch64::STPXi:
1659 case AArch64::STRXui:
1660 case AArch64::STPDi:
1661 case AArch64::STRDui:
1662 case AArch64::LDPXi:
1663 case AArch64::LDRXui:
1664 case AArch64::LDPDi:
1665 case AArch64::LDRDui:
1666 Scale = 8;
1667 break;
1668 case AArch64::STPQi:
1669 case AArch64::STRQui:
1670 case AArch64::LDPQi:
1671 case AArch64::LDRQui:
1672 Scale = 16;
1673 break;
1674 default:
1675 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1676 }
1677
1678 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1679 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1680 "Unexpected base register in callee-save save/restore instruction!");
1681 // Last operand is immediate offset that needs fixing.
1682 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1683 // All generated opcodes have scaled offsets.
1684 assert(LocalStackSize % Scale == 0);
1685 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1686
1687 if (NeedsWinCFI) {
1688 *HasWinCFI = true;
1689 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1690 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1692 "Expecting a SEH instruction");
1693 fixupSEHOpcode(MBBI, LocalStackSize);
1694 }
1695}
1696
1697static bool isTargetWindows(const MachineFunction &MF) {
1699}
1700
1701static unsigned getStackHazardSize(const MachineFunction &MF) {
1702 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1703}
1704
1705// Convenience function to determine whether I is an SVE callee save.
1707 switch (I->getOpcode()) {
1708 default:
1709 return false;
1710 case AArch64::PTRUE_C_B:
1711 case AArch64::LD1B_2Z_IMM:
1712 case AArch64::ST1B_2Z_IMM:
1713 case AArch64::STR_ZXI:
1714 case AArch64::STR_PXI:
1715 case AArch64::LDR_ZXI:
1716 case AArch64::LDR_PXI:
1717 case AArch64::PTRUE_B:
1718 case AArch64::CPY_ZPzI_B:
1719 case AArch64::CMPNE_PPzZI_B:
1720 return I->getFlag(MachineInstr::FrameSetup) ||
1721 I->getFlag(MachineInstr::FrameDestroy);
1722 case AArch64::SEH_SavePReg:
1723 case AArch64::SEH_SaveZReg:
1724 return true;
1725 }
1726}
1727
1729 MachineFunction &MF,
1732 const DebugLoc &DL, bool NeedsWinCFI,
1733 bool NeedsUnwindInfo) {
1734 // Shadow call stack prolog: str x30, [x18], #8
1735 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXpost))
1736 .addReg(AArch64::X18, RegState::Define)
1737 .addReg(AArch64::LR)
1738 .addReg(AArch64::X18)
1739 .addImm(8)
1741
1742 // This instruction also makes x18 live-in to the entry block.
1743 MBB.addLiveIn(AArch64::X18);
1744
1745 if (NeedsWinCFI)
1746 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1748
1749 if (NeedsUnwindInfo) {
1750 // Emit a CFI instruction that causes 8 to be subtracted from the value of
1751 // x18 when unwinding past this frame.
1752 static const char CFIInst[] = {
1753 dwarf::DW_CFA_val_expression,
1754 18, // register
1755 2, // length
1756 static_cast<char>(unsigned(dwarf::DW_OP_breg18)),
1757 static_cast<char>(-8) & 0x7f, // addend (sleb128)
1758 };
1760 .buildEscape(StringRef(CFIInst, sizeof(CFIInst)));
1761 }
1762}
1763
1765 MachineFunction &MF,
1768 const DebugLoc &DL, bool NeedsWinCFI) {
1769 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1770 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1771 .addReg(AArch64::X18, RegState::Define)
1772 .addReg(AArch64::LR, RegState::Define)
1773 .addReg(AArch64::X18)
1774 .addImm(-8)
1776
1777 if (NeedsWinCFI)
1778 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1780
1783 .buildRestore(AArch64::X18);
1784}
1785
1786// Define the current CFA rule to use the provided FP.
1789 unsigned FixedObject) {
1793
1794 const int OffsetToFirstCalleeSaveFromFP =
1797 Register FramePtr = TRI->getFrameRegister(MF);
1799 .buildDefCFA(FramePtr, FixedObject - OffsetToFirstCalleeSaveFromFP);
1800}
1801
1802#ifndef NDEBUG
1803/// Collect live registers from the end of \p MI's parent up to (including) \p
1804/// MI in \p LiveRegs.
1806 LivePhysRegs &LiveRegs) {
1807
1808 MachineBasicBlock &MBB = *MI.getParent();
1809 LiveRegs.addLiveOuts(MBB);
1810 for (const MachineInstr &MI :
1811 reverse(make_range(MI.getIterator(), MBB.instr_end())))
1812 LiveRegs.stepBackward(MI);
1813}
1814#endif
1815
1817 MachineFunction &MF) const {
1818 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1819 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1820
1821 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1822 DebugLoc DL; // Set debug location to unknown.
1824
1825 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1827 };
1828
1829 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1830 DebugLoc DL;
1832 if (MBBI != MBB.end())
1833 DL = MBBI->getDebugLoc();
1834
1835 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1837 };
1838
1839 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1840 EmitSignRA(MF.front());
1841 for (MachineBasicBlock &MBB : MF) {
1842 if (MBB.isEHFuncletEntry())
1843 EmitSignRA(MBB);
1844 if (MBB.isReturnBlock())
1845 EmitAuthRA(MBB);
1846 }
1847}
1848
1850 MachineBasicBlock &MBB) const {
1852 const MachineFrameInfo &MFI = MF.getFrameInfo();
1853 const Function &F = MF.getFunction();
1854 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1855 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1856 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1857
1859 bool EmitCFI = AFI->needsDwarfUnwindInfo(MF);
1860 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1861 bool HasFP = hasFP(MF);
1862 bool NeedsWinCFI = needsWinCFI(MF);
1863 bool HasWinCFI = false;
1864 auto Cleanup = make_scope_exit([&]() { MF.setHasWinCFI(HasWinCFI); });
1865
1867#ifndef NDEBUG
1869 // Collect live register from the end of MBB up to the start of the existing
1870 // frame setup instructions.
1871 MachineBasicBlock::iterator NonFrameStart = MBB.begin();
1872 while (NonFrameStart != End &&
1873 NonFrameStart->getFlag(MachineInstr::FrameSetup))
1874 ++NonFrameStart;
1875
1876 LivePhysRegs LiveRegs(*TRI);
1877 if (NonFrameStart != MBB.end()) {
1878 getLivePhysRegsUpTo(*NonFrameStart, *TRI, LiveRegs);
1879 // Ignore registers used for stack management for now.
1880 LiveRegs.removeReg(AArch64::SP);
1881 LiveRegs.removeReg(AArch64::X19);
1882 LiveRegs.removeReg(AArch64::FP);
1883 LiveRegs.removeReg(AArch64::LR);
1884
1885 // X0 will be clobbered by a call to __arm_get_current_vg in the prologue.
1886 // This is necessary to spill VG if required where SVE is unavailable, but
1887 // X0 is preserved around this call.
1888 if (requiresGetVGCall(MF))
1889 LiveRegs.removeReg(AArch64::X0);
1890 }
1891
1892 auto VerifyClobberOnExit = make_scope_exit([&]() {
1893 if (NonFrameStart == MBB.end())
1894 return;
1895 // Check if any of the newly instructions clobber any of the live registers.
1896 for (MachineInstr &MI :
1897 make_range(MBB.instr_begin(), NonFrameStart->getIterator())) {
1898 for (auto &Op : MI.operands())
1899 if (Op.isReg() && Op.isDef())
1900 assert(!LiveRegs.contains(Op.getReg()) &&
1901 "live register clobbered by inserted prologue instructions");
1902 }
1903 });
1904#endif
1905
1906 bool IsFunclet = MBB.isEHFuncletEntry();
1907
1908 // At this point, we're going to decide whether or not the function uses a
1909 // redzone. In most cases, the function doesn't have a redzone so let's
1910 // assume that's false and set it to true in the case that there's a redzone.
1911 AFI->setHasRedZone(false);
1912
1913 // Debug location must be unknown since the first debug location is used
1914 // to determine the end of the prologue.
1915 DebugLoc DL;
1916
1917 const auto &MFnI = *MF.getInfo<AArch64FunctionInfo>();
1918 if (MFnI.shouldSignReturnAddress(MF)) {
1919 // If pac-ret+leaf is in effect, PAUTH_PROLOGUE pseudo instructions
1920 // are inserted by emitPacRetPlusLeafHardening().
1922 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1924 }
1925 // AArch64PointerAuth pass will insert SEH_PACSignLR
1926 HasWinCFI |= NeedsWinCFI;
1927 }
1928
1929 if (MFnI.needsShadowCallStackPrologueEpilogue(MF)) {
1930 emitShadowCallStackPrologue(*TII, MF, MBB, MBBI, DL, NeedsWinCFI,
1931 MFnI.needsDwarfUnwindInfo(MF));
1932 HasWinCFI |= NeedsWinCFI;
1933 }
1934
1935 if (EmitCFI && MFnI.isMTETagged()) {
1936 BuildMI(MBB, MBBI, DL, TII->get(AArch64::EMITMTETAGGED))
1938 }
1939
1940 // We signal the presence of a Swift extended frame to external tools by
1941 // storing FP with 0b0001 in bits 63:60. In normal userland operation a simple
1942 // ORR is sufficient, it is assumed a Swift kernel would initialize the TBI
1943 // bits so that is still true.
1944 if (HasFP && AFI->hasSwiftAsyncContext()) {
1947 if (Subtarget.swiftAsyncContextIsDynamicallySet()) {
1948 // The special symbol below is absolute and has a *value* that can be
1949 // combined with the frame pointer to signal an extended frame.
1950 BuildMI(MBB, MBBI, DL, TII->get(AArch64::LOADgot), AArch64::X16)
1951 .addExternalSymbol("swift_async_extendedFramePointerFlags",
1953 if (NeedsWinCFI) {
1954 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1956 HasWinCFI = true;
1957 }
1958 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrs), AArch64::FP)
1959 .addUse(AArch64::FP)
1960 .addUse(AArch64::X16)
1961 .addImm(Subtarget.isTargetILP32() ? 32 : 0);
1962 if (NeedsWinCFI) {
1963 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1965 HasWinCFI = true;
1966 }
1967 break;
1968 }
1969 [[fallthrough]];
1970
1972 // ORR x29, x29, #0x1000_0000_0000_0000
1973 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXri), AArch64::FP)
1974 .addUse(AArch64::FP)
1975 .addImm(0x1100)
1977 if (NeedsWinCFI) {
1978 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1980 HasWinCFI = true;
1981 }
1982 break;
1983
1985 break;
1986 }
1987 }
1988
1989 // All calls are tail calls in GHC calling conv, and functions have no
1990 // prologue/epilogue.
1992 return;
1993
1994 // Set tagged base pointer to the requested stack slot.
1995 // Ideally it should match SP value after prologue.
1996 std::optional<int> TBPI = AFI->getTaggedBasePointerIndex();
1997 if (TBPI)
1999 else
2001
2002 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2003
2004 // getStackSize() includes all the locals in its size calculation. We don't
2005 // include these locals when computing the stack size of a funclet, as they
2006 // are allocated in the parent's stack frame and accessed via the frame
2007 // pointer from the funclet. We only save the callee saved registers in the
2008 // funclet, which are really the callee saved registers of the parent
2009 // function, including the funclet.
2010 int64_t NumBytes =
2011 IsFunclet ? getWinEHFuncletFrameSize(MF) : MFI.getStackSize();
2012 if (!AFI->hasStackFrame() && !windowsRequiresStackProbe(MF, NumBytes)) {
2013 assert(!HasFP && "unexpected function without stack frame but with FP");
2014 assert(!SVEStackSize &&
2015 "unexpected function without stack frame but with SVE objects");
2016 // All of the stack allocation is for locals.
2017 AFI->setLocalStackSize(NumBytes);
2018 if (!NumBytes) {
2019 if (NeedsWinCFI && HasWinCFI) {
2020 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2022 }
2023 return;
2024 }
2025 // REDZONE: If the stack size is less than 128 bytes, we don't need
2026 // to actually allocate.
2027 if (canUseRedZone(MF)) {
2028 AFI->setHasRedZone(true);
2029 ++NumRedZoneFunctions;
2030 } else {
2031 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
2032 StackOffset::getFixed(-NumBytes), TII,
2033 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2034 if (EmitCFI) {
2035 // Label used to tie together the PROLOG_LABEL and the MachineMoves.
2036 MCSymbol *FrameLabel = MF.getContext().createTempSymbol();
2037 // Encode the stack size of the leaf function.
2039 .buildDefCFAOffset(NumBytes, FrameLabel);
2040 }
2041 }
2042
2043 if (NeedsWinCFI) {
2044 HasWinCFI = true;
2045 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2047 }
2048
2049 return;
2050 }
2051
2052 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2053 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2054
2055 // Windows unwind can't represent the required stack adjustments if we have
2056 // both SVE callee-saves and dynamic stack allocations, and the frame
2057 // pointer is before the SVE spills. The allocation of the frame pointer
2058 // must be the last instruction in the prologue so the unwinder can restore
2059 // the stack pointer correctly. (And there isn't any unwind opcode for
2060 // `addvl sp, x29, -17`.)
2061 //
2062 // Because of this, we do spills in the opposite order on Windows: first SVE,
2063 // then GPRs. The main side-effect of this is that it makes accessing
2064 // parameters passed on the stack more expensive.
2065 //
2066 // We could consider rearranging the spills for simpler cases.
2067 bool FPAfterSVECalleeSaves =
2068 Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
2069
2070 if (FPAfterSVECalleeSaves && AFI->hasStackHazardSlotIndex())
2071 reportFatalUsageError("SME hazard padding is not supported on Windows");
2072
2073 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2074 // All of the remaining stack allocations are for locals.
2075 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2076 bool CombineSPBump = shouldCombineCSRLocalStackBump(MF, NumBytes);
2077 bool HomPrologEpilog = homogeneousPrologEpilog(MF);
2078 if (FPAfterSVECalleeSaves) {
2079 // If we're doing SVE saves first, we need to immediately allocate space
2080 // for fixed objects, then space for the SVE callee saves.
2081 //
2082 // Windows unwind requires that the scalable size is a multiple of 16;
2083 // that's handled when the callee-saved size is computed.
2084 auto SaveSize =
2086 StackOffset::getFixed(FixedObject);
2087 allocateStackSpace(MBB, MBBI, 0, SaveSize, NeedsWinCFI, &HasWinCFI,
2088 /*EmitCFI=*/false, StackOffset{},
2089 /*FollowupAllocs=*/true);
2090 NumBytes -= FixedObject;
2091
2092 // Now allocate space for the GPR callee saves.
2093 while (MBBI != End && IsSVECalleeSave(MBBI))
2094 ++MBBI;
2096 MBB, MBBI, DL, TII, -AFI->getCalleeSavedStackSize(), NeedsWinCFI,
2097 &HasWinCFI, EmitAsyncCFI);
2098 NumBytes -= AFI->getCalleeSavedStackSize();
2099 } else if (CombineSPBump) {
2100 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2101 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
2102 StackOffset::getFixed(-NumBytes), TII,
2103 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI,
2104 EmitAsyncCFI);
2105 NumBytes = 0;
2106 } else if (HomPrologEpilog) {
2107 // Stack has been already adjusted.
2108 NumBytes -= PrologueSaveSize;
2109 } else if (PrologueSaveSize != 0) {
2111 MBB, MBBI, DL, TII, -PrologueSaveSize, NeedsWinCFI, &HasWinCFI,
2112 EmitAsyncCFI);
2113 NumBytes -= PrologueSaveSize;
2114 }
2115 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2116
2117 // Move past the saves of the callee-saved registers, fixing up the offsets
2118 // and pre-inc if we decided to combine the callee-save and local stack
2119 // pointer bump above.
2120 auto &TLI = *MF.getSubtarget().getTargetLowering();
2121 while (MBBI != End && MBBI->getFlag(MachineInstr::FrameSetup) &&
2123 if (CombineSPBump &&
2124 // Only fix-up frame-setup load/store instructions.
2125 (!requiresSaveVG(MF) || !isVGInstruction(MBBI, TLI)))
2127 NeedsWinCFI, &HasWinCFI);
2128 ++MBBI;
2129 }
2130
2131 // For funclets the FP belongs to the containing function.
2132 if (!IsFunclet && HasFP) {
2133 // Only set up FP if we actually need to.
2134 int64_t FPOffset = AFI->getCalleeSaveBaseToFrameRecordOffset();
2135
2136 if (CombineSPBump)
2137 FPOffset += AFI->getLocalStackSize();
2138
2139 if (AFI->hasSwiftAsyncContext()) {
2140 // Before we update the live FP we have to ensure there's a valid (or
2141 // null) asynchronous context in its slot just before FP in the frame
2142 // record, so store it now.
2143 const auto &Attrs = MF.getFunction().getAttributes();
2144 bool HaveInitialContext = Attrs.hasAttrSomewhere(Attribute::SwiftAsync);
2145 if (HaveInitialContext)
2146 MBB.addLiveIn(AArch64::X22);
2147 Register Reg = HaveInitialContext ? AArch64::X22 : AArch64::XZR;
2148 BuildMI(MBB, MBBI, DL, TII->get(AArch64::StoreSwiftAsyncContext))
2149 .addUse(Reg)
2150 .addUse(AArch64::SP)
2151 .addImm(FPOffset - 8)
2153 if (NeedsWinCFI) {
2154 // WinCFI and arm64e, where StoreSwiftAsyncContext is expanded
2155 // to multiple instructions, should be mutually-exclusive.
2156 assert(Subtarget.getTargetTriple().getArchName() != "arm64e");
2157 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2159 HasWinCFI = true;
2160 }
2161 }
2162
2163 if (HomPrologEpilog) {
2164 auto Prolog = MBBI;
2165 --Prolog;
2166 assert(Prolog->getOpcode() == AArch64::HOM_Prolog);
2167 Prolog->addOperand(MachineOperand::CreateImm(FPOffset));
2168 } else {
2169 // Issue sub fp, sp, FPOffset or
2170 // mov fp,sp when FPOffset is zero.
2171 // Note: All stores of callee-saved registers are marked as "FrameSetup".
2172 // This code marks the instruction(s) that set the FP also.
2173 emitFrameOffset(MBB, MBBI, DL, AArch64::FP, AArch64::SP,
2174 StackOffset::getFixed(FPOffset), TII,
2175 MachineInstr::FrameSetup, false, NeedsWinCFI, &HasWinCFI);
2176 if (NeedsWinCFI && HasWinCFI) {
2177 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2179 // After setting up the FP, the rest of the prolog doesn't need to be
2180 // included in the SEH unwind info.
2181 NeedsWinCFI = false;
2182 }
2183 }
2184 if (EmitAsyncCFI)
2185 emitDefineCFAWithFP(MF, MBB, MBBI, FixedObject);
2186 }
2187
2188 // Now emit the moves for whatever callee saved regs we have (including FP,
2189 // LR if those are saved). Frame instructions for SVE register are emitted
2190 // later, after the instruction which actually save SVE regs.
2191 if (EmitAsyncCFI)
2192 emitCalleeSavedGPRLocations(MBB, MBBI);
2193
2194 // Alignment is required for the parent frame, not the funclet
2195 const bool NeedsRealignment =
2196 NumBytes && !IsFunclet && RegInfo->hasStackRealignment(MF);
2197 const int64_t RealignmentPadding =
2198 (NeedsRealignment && MFI.getMaxAlign() > Align(16))
2199 ? MFI.getMaxAlign().value() - 16
2200 : 0;
2201
2202 if (windowsRequiresStackProbe(MF, NumBytes + RealignmentPadding)) {
2203 if (AFI->getSVECalleeSavedStackSize())
2205 "SVE callee saves not yet supported with stack probing");
2206
2207 // Find an available register to spill the value of X15 to, if X15 is being
2208 // used already for nest.
2209 unsigned X15Scratch = AArch64::NoRegister;
2211 if (llvm::any_of(MBB.liveins(),
2212 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2213 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2214 AArch64::X15, LiveIn.PhysReg);
2215 })) {
2216 X15Scratch = findScratchNonCalleeSaveRegister(&MBB, true);
2217 assert(X15Scratch != AArch64::NoRegister &&
2218 (X15Scratch < AArch64::X15 || X15Scratch > AArch64::X17));
2219#ifndef NDEBUG
2220 LiveRegs.removeReg(AArch64::X15); // ignore X15 since we restore it
2221#endif
2222 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrr), X15Scratch)
2223 .addReg(AArch64::XZR)
2224 .addReg(AArch64::X15, RegState::Undef)
2225 .addReg(AArch64::X15, RegState::Implicit)
2227 }
2228
2229 uint64_t NumWords = (NumBytes + RealignmentPadding) >> 4;
2230 if (NeedsWinCFI) {
2231 HasWinCFI = true;
2232 // alloc_l can hold at most 256MB, so assume that NumBytes doesn't
2233 // exceed this amount. We need to move at most 2^24 - 1 into x15.
2234 // This is at most two instructions, MOVZ followed by MOVK.
2235 // TODO: Fix to use multiple stack alloc unwind codes for stacks
2236 // exceeding 256MB in size.
2237 if (NumBytes >= (1 << 28))
2238 report_fatal_error("Stack size cannot exceed 256MB for stack "
2239 "unwinding purposes");
2240
2241 uint32_t LowNumWords = NumWords & 0xFFFF;
2242 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVZXi), AArch64::X15)
2243 .addImm(LowNumWords)
2246 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2248 if ((NumWords & 0xFFFF0000) != 0) {
2249 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVKXi), AArch64::X15)
2250 .addReg(AArch64::X15)
2251 .addImm((NumWords & 0xFFFF0000) >> 16) // High half
2254 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2256 }
2257 } else {
2258 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm), AArch64::X15)
2259 .addImm(NumWords)
2261 }
2262
2263 const char *ChkStk = Subtarget.getChkStkName();
2264 switch (MF.getTarget().getCodeModel()) {
2265 case CodeModel::Tiny:
2266 case CodeModel::Small:
2267 case CodeModel::Medium:
2268 case CodeModel::Kernel:
2269 BuildMI(MBB, MBBI, DL, TII->get(AArch64::BL))
2270 .addExternalSymbol(ChkStk)
2271 .addReg(AArch64::X15, RegState::Implicit)
2276 if (NeedsWinCFI) {
2277 HasWinCFI = true;
2278 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2280 }
2281 break;
2282 case CodeModel::Large:
2283 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVaddrEXT))
2284 .addReg(AArch64::X16, RegState::Define)
2285 .addExternalSymbol(ChkStk)
2286 .addExternalSymbol(ChkStk)
2288 if (NeedsWinCFI) {
2289 HasWinCFI = true;
2290 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2292 }
2293
2294 BuildMI(MBB, MBBI, DL, TII->get(getBLRCallOpcode(MF)))
2295 .addReg(AArch64::X16, RegState::Kill)
2301 if (NeedsWinCFI) {
2302 HasWinCFI = true;
2303 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2305 }
2306 break;
2307 }
2308
2309 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SUBXrx64), AArch64::SP)
2310 .addReg(AArch64::SP, RegState::Kill)
2311 .addReg(AArch64::X15, RegState::Kill)
2314 if (NeedsWinCFI) {
2315 HasWinCFI = true;
2316 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_StackAlloc))
2317 .addImm(NumBytes)
2319 }
2320 NumBytes = 0;
2321
2322 if (RealignmentPadding > 0) {
2323 if (RealignmentPadding >= 4096) {
2324 BuildMI(MBB, MBBI, DL, TII->get(AArch64::MOVi64imm))
2325 .addReg(AArch64::X16, RegState::Define)
2326 .addImm(RealignmentPadding)
2328 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXrx64), AArch64::X15)
2329 .addReg(AArch64::SP)
2330 .addReg(AArch64::X16, RegState::Kill)
2333 } else {
2334 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ADDXri), AArch64::X15)
2335 .addReg(AArch64::SP)
2336 .addImm(RealignmentPadding)
2337 .addImm(0)
2339 }
2340
2341 uint64_t AndMask = ~(MFI.getMaxAlign().value() - 1);
2342 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ANDXri), AArch64::SP)
2343 .addReg(AArch64::X15, RegState::Kill)
2345 AFI->setStackRealigned(true);
2346
2347 // No need for SEH instructions here; if we're realigning the stack,
2348 // we've set a frame pointer and already finished the SEH prologue.
2349 assert(!NeedsWinCFI);
2350 }
2351 if (X15Scratch != AArch64::NoRegister) {
2352 BuildMI(MBB, MBBI, DL, TII->get(AArch64::ORRXrr), AArch64::X15)
2353 .addReg(AArch64::XZR)
2354 .addReg(X15Scratch, RegState::Undef)
2355 .addReg(X15Scratch, RegState::Implicit)
2357 }
2358 }
2359
2360 StackOffset SVECalleeSavesSize = {}, SVELocalsSize = SVEStackSize;
2361 MachineBasicBlock::iterator CalleeSavesEnd = MBBI;
2362
2363 StackOffset CFAOffset =
2364 StackOffset::getFixed((int64_t)MFI.getStackSize() - NumBytes);
2365
2366 // Process the SVE callee-saves to determine what space needs to be
2367 // allocated.
2368 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2369 LLVM_DEBUG(dbgs() << "SVECalleeSavedStackSize = " << CalleeSavedSize
2370 << "\n");
2371 SVECalleeSavesSize = StackOffset::getScalable(CalleeSavedSize);
2372 SVELocalsSize = SVEStackSize - SVECalleeSavesSize;
2373 // Find callee save instructions in frame.
2374 // Note: With FPAfterSVECalleeSaves the callee saves have already been
2375 // allocated.
2376 if (!FPAfterSVECalleeSaves) {
2377 MachineBasicBlock::iterator CalleeSavesBegin = MBBI;
2378 assert(IsSVECalleeSave(CalleeSavesBegin) && "Unexpected instruction");
2380 ++MBBI;
2381 CalleeSavesEnd = MBBI;
2382
2383 StackOffset LocalsSize = SVELocalsSize + StackOffset::getFixed(NumBytes);
2384 // Allocate space for the callee saves (if any).
2385 allocateStackSpace(MBB, CalleeSavesBegin, 0, SVECalleeSavesSize, false,
2386 nullptr, EmitAsyncCFI && !HasFP, CFAOffset,
2387 MFI.hasVarSizedObjects() || LocalsSize);
2388 }
2389 }
2390 CFAOffset += SVECalleeSavesSize;
2391
2392 if (EmitAsyncCFI)
2393 emitCalleeSavedSVELocations(MBB, CalleeSavesEnd);
2394
2395 // Allocate space for the rest of the frame including SVE locals. Align the
2396 // stack as necessary.
2397 assert(!(canUseRedZone(MF) && NeedsRealignment) &&
2398 "Cannot use redzone with stack realignment");
2399 if (!canUseRedZone(MF)) {
2400 // FIXME: in the case of dynamic re-alignment, NumBytes doesn't have
2401 // the correct value here, as NumBytes also includes padding bytes,
2402 // which shouldn't be counted here.
2403 allocateStackSpace(MBB, CalleeSavesEnd, RealignmentPadding,
2404 SVELocalsSize + StackOffset::getFixed(NumBytes),
2405 NeedsWinCFI, &HasWinCFI, EmitAsyncCFI && !HasFP,
2406 CFAOffset, MFI.hasVarSizedObjects());
2407 }
2408
2409 // If we need a base pointer, set it up here. It's whatever the value of the
2410 // stack pointer is at this point. Any variable size objects will be allocated
2411 // after this, so we can still use the base pointer to reference locals.
2412 //
2413 // FIXME: Clarify FrameSetup flags here.
2414 // Note: Use emitFrameOffset() like above for FP if the FrameSetup flag is
2415 // needed.
2416 // For funclets the BP belongs to the containing function.
2417 if (!IsFunclet && RegInfo->hasBasePointer(MF)) {
2418 TII->copyPhysReg(MBB, MBBI, DL, RegInfo->getBaseRegister(), AArch64::SP,
2419 false);
2420 if (NeedsWinCFI) {
2421 HasWinCFI = true;
2422 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2424 }
2425 }
2426
2427 // The very last FrameSetup instruction indicates the end of prologue. Emit a
2428 // SEH opcode indicating the prologue end.
2429 if (NeedsWinCFI && HasWinCFI) {
2430 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_PrologEnd))
2432 }
2433
2434 // SEH funclets are passed the frame pointer in X1. If the parent
2435 // function uses the base register, then the base register is used
2436 // directly, and is not retrieved from X1.
2437 if (IsFunclet && F.hasPersonalityFn()) {
2438 EHPersonality Per = classifyEHPersonality(F.getPersonalityFn());
2439 if (isAsynchronousEHPersonality(Per)) {
2440 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::COPY), AArch64::FP)
2441 .addReg(AArch64::X1)
2443 MBB.addLiveIn(AArch64::X1);
2444 }
2445 }
2446
2447 if (EmitCFI && !EmitAsyncCFI) {
2448 if (HasFP) {
2449 emitDefineCFAWithFP(MF, MBB, MBBI, FixedObject);
2450 } else {
2451 StackOffset TotalSize =
2452 SVEStackSize + StackOffset::getFixed((int64_t)MFI.getStackSize());
2454 CFIBuilder.insertCFIInst(
2455 createDefCFA(*RegInfo, /*FrameReg=*/AArch64::SP, /*Reg=*/AArch64::SP,
2456 TotalSize, /*LastAdjustmentWasScalable=*/false));
2457 }
2458 emitCalleeSavedGPRLocations(MBB, MBBI);
2459 emitCalleeSavedSVELocations(MBB, MBBI);
2460 }
2461}
2462
2464 switch (MI.getOpcode()) {
2465 default:
2466 return false;
2467 case AArch64::CATCHRET:
2468 case AArch64::CLEANUPRET:
2469 return true;
2470 }
2471}
2472
2474 MachineBasicBlock &MBB) const {
2476 MachineFrameInfo &MFI = MF.getFrameInfo();
2478 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2479 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
2480 DebugLoc DL;
2481 bool NeedsWinCFI = needsWinCFI(MF);
2482 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
2483 bool HasWinCFI = false;
2484 bool IsFunclet = false;
2485
2486 if (MBB.end() != MBBI) {
2487 DL = MBBI->getDebugLoc();
2488 IsFunclet = isFuncletReturnInstr(*MBBI);
2489 }
2490
2491 MachineBasicBlock::iterator EpilogStartI = MBB.end();
2492
2493 auto FinishingTouches = make_scope_exit([&]() {
2495 emitShadowCallStackEpilogue(*TII, MF, MBB, MBB.getFirstTerminator(), DL,
2496 NeedsWinCFI);
2497 HasWinCFI |= NeedsWinCFI;
2498 }
2499 if (EmitCFI)
2500 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
2501 if (AFI->shouldSignReturnAddress(MF)) {
2502 // If pac-ret+leaf is in effect, PAUTH_EPILOGUE pseudo instructions
2503 // are inserted by emitPacRetPlusLeafHardening().
2504 if (!shouldSignReturnAddressEverywhere(MF)) {
2505 BuildMI(MBB, MBB.getFirstTerminator(), DL,
2506 TII->get(AArch64::PAUTH_EPILOGUE))
2507 .setMIFlag(MachineInstr::FrameDestroy);
2508 }
2509 // AArch64PointerAuth pass will insert SEH_PACSignLR
2510 HasWinCFI |= NeedsWinCFI;
2511 }
2512 if (HasWinCFI) {
2514 TII->get(AArch64::SEH_EpilogEnd))
2516 if (!MF.hasWinCFI())
2517 MF.setHasWinCFI(true);
2518 }
2519 if (NeedsWinCFI) {
2520 assert(EpilogStartI != MBB.end());
2521 if (!HasWinCFI)
2522 MBB.erase(EpilogStartI);
2523 }
2524 });
2525
2526 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
2527 : MFI.getStackSize();
2528
2529 // All calls are tail calls in GHC calling conv, and functions have no
2530 // prologue/epilogue.
2532 return;
2533
2534 // How much of the stack used by incoming arguments this function is expected
2535 // to restore in this particular epilogue.
2536 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
2537 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2538 MF.getFunction().isVarArg());
2539 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
2540
2541 int64_t AfterCSRPopSize = ArgumentStackToRestore;
2542 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
2543 // We cannot rely on the local stack size set in emitPrologue if the function
2544 // has funclets, as funclets have different local stack size requirements, and
2545 // the current value set in emitPrologue may be that of the containing
2546 // function.
2547 if (MF.hasEHFunclets())
2548 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
2549 if (homogeneousPrologEpilog(MF, &MBB)) {
2550 assert(!NeedsWinCFI);
2551 auto LastPopI = MBB.getFirstTerminator();
2552 if (LastPopI != MBB.begin()) {
2553 auto HomogeneousEpilog = std::prev(LastPopI);
2554 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
2555 LastPopI = HomogeneousEpilog;
2556 }
2557
2558 // Adjust local stack
2559 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2561 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2562
2563 // SP has been already adjusted while restoring callee save regs.
2564 // We've bailed-out the case with adjusting SP for arguments.
2565 assert(AfterCSRPopSize == 0);
2566 return;
2567 }
2568
2569 bool FPAfterSVECalleeSaves =
2570 Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
2571
2572 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
2573 // Assume we can't combine the last pop with the sp restore.
2574 bool CombineAfterCSRBump = false;
2575 if (FPAfterSVECalleeSaves) {
2576 AfterCSRPopSize += FixedObject;
2577 } else if (!CombineSPBump && PrologueSaveSize != 0) {
2579 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
2581 Pop = std::prev(Pop);
2582 // Converting the last ldp to a post-index ldp is valid only if the last
2583 // ldp's offset is 0.
2584 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
2585 // If the offset is 0 and the AfterCSR pop is not actually trying to
2586 // allocate more stack for arguments (in space that an untimely interrupt
2587 // may clobber), convert it to a post-index ldp.
2588 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
2590 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
2591 MachineInstr::FrameDestroy, PrologueSaveSize);
2592 } else {
2593 // If not, make sure to emit an add after the last ldp.
2594 // We're doing this by transferring the size to be restored from the
2595 // adjustment *before* the CSR pops to the adjustment *after* the CSR
2596 // pops.
2597 AfterCSRPopSize += PrologueSaveSize;
2598 CombineAfterCSRBump = true;
2599 }
2600 }
2601
2602 // Move past the restores of the callee-saved registers.
2603 // If we plan on combining the sp bump of the local stack size and the callee
2604 // save stack size, we might need to adjust the CSR save and restore offsets.
2607 while (LastPopI != Begin) {
2608 --LastPopI;
2609 if (!LastPopI->getFlag(MachineInstr::FrameDestroy) ||
2610 (!FPAfterSVECalleeSaves && IsSVECalleeSave(LastPopI))) {
2611 ++LastPopI;
2612 break;
2613 } else if (CombineSPBump)
2615 NeedsWinCFI, &HasWinCFI);
2616 }
2617
2618 if (NeedsWinCFI) {
2619 // Note that there are cases where we insert SEH opcodes in the
2620 // epilogue when we had no SEH opcodes in the prologue. For
2621 // example, when there is no stack frame but there are stack
2622 // arguments. Insert the SEH_EpilogStart and remove it later if it
2623 // we didn't emit any SEH opcodes to avoid generating WinCFI for
2624 // functions that don't need it.
2625 BuildMI(MBB, LastPopI, DL, TII->get(AArch64::SEH_EpilogStart))
2627 EpilogStartI = LastPopI;
2628 --EpilogStartI;
2629 }
2630
2631 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2634 // Avoid the reload as it is GOT relative, and instead fall back to the
2635 // hardcoded value below. This allows a mismatch between the OS and
2636 // application without immediately terminating on the difference.
2637 [[fallthrough]];
2639 // We need to reset FP to its untagged state on return. Bit 60 is
2640 // currently used to show the presence of an extended frame.
2641
2642 // BIC x29, x29, #0x1000_0000_0000_0000
2643 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
2644 AArch64::FP)
2645 .addUse(AArch64::FP)
2646 .addImm(0x10fe)
2648 if (NeedsWinCFI) {
2649 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
2651 HasWinCFI = true;
2652 }
2653 break;
2654
2656 break;
2657 }
2658 }
2659
2660 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2661
2662 // If there is a single SP update, insert it before the ret and we're done.
2663 if (CombineSPBump) {
2664 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
2665
2666 // When we are about to restore the CSRs, the CFA register is SP again.
2667 if (EmitCFI && hasFP(MF))
2669 .buildDefCFA(AArch64::SP, NumBytes);
2670
2671 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2672 StackOffset::getFixed(NumBytes + AfterCSRPopSize), TII,
2673 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI,
2674 EmitCFI, StackOffset::getFixed(NumBytes));
2675 return;
2676 }
2677
2678 NumBytes -= PrologueSaveSize;
2679 assert(NumBytes >= 0 && "Negative stack allocation size!?");
2680
2681 // Process the SVE callee-saves to determine what space needs to be
2682 // deallocated.
2683 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
2684 MachineBasicBlock::iterator RestoreBegin = LastPopI, RestoreEnd = LastPopI;
2685 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
2686 if (FPAfterSVECalleeSaves)
2687 RestoreEnd = MBB.getFirstTerminator();
2688
2689 RestoreBegin = std::prev(RestoreEnd);
2690 while (RestoreBegin != MBB.begin() &&
2691 IsSVECalleeSave(std::prev(RestoreBegin)))
2692 --RestoreBegin;
2693
2694 assert(IsSVECalleeSave(RestoreBegin) &&
2695 IsSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
2696
2697 StackOffset CalleeSavedSizeAsOffset =
2698 StackOffset::getScalable(CalleeSavedSize);
2699 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
2700 DeallocateAfter = CalleeSavedSizeAsOffset;
2701 }
2702
2703 // Deallocate the SVE area.
2704 if (FPAfterSVECalleeSaves) {
2705 // If the callee-save area is before FP, restoring the FP implicitly
2706 // deallocates non-callee-save SVE allocations. Otherwise, deallocate
2707 // them explicitly.
2708 if (!AFI->isStackRealigned() && !MFI.hasVarSizedObjects()) {
2709 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2710 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2711 NeedsWinCFI, &HasWinCFI);
2712 }
2713
2714 // Deallocate callee-save non-SVE registers.
2715 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2717 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2718
2719 // Deallocate fixed objects.
2720 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2721 StackOffset::getFixed(FixedObject), TII,
2722 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2723
2724 // Deallocate callee-save SVE registers.
2725 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2726 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2727 NeedsWinCFI, &HasWinCFI);
2728 } else if (SVEStackSize) {
2729 int64_t SVECalleeSavedSize = AFI->getSVECalleeSavedStackSize();
2730 // If we have stack realignment or variable-sized objects we must use the
2731 // FP to restore SVE callee saves (as there is an unknown amount of
2732 // data/padding between the SP and SVE CS area).
2733 Register BaseForSVEDealloc =
2734 (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) ? AArch64::FP
2735 : AArch64::SP;
2736 if (SVECalleeSavedSize && BaseForSVEDealloc == AArch64::FP) {
2737 Register CalleeSaveBase = AArch64::FP;
2738 if (int64_t CalleeSaveBaseOffset =
2740 // If we have have an non-zero offset to the non-SVE CS base we need to
2741 // compute the base address by subtracting the offest in a temporary
2742 // register first (to avoid briefly deallocating the SVE CS).
2743 CalleeSaveBase = MBB.getParent()->getRegInfo().createVirtualRegister(
2744 &AArch64::GPR64RegClass);
2745 emitFrameOffset(MBB, RestoreBegin, DL, CalleeSaveBase, AArch64::FP,
2746 StackOffset::getFixed(-CalleeSaveBaseOffset), TII,
2748 }
2749 // The code below will deallocate the stack space space by moving the
2750 // SP to the start of the SVE callee-save area.
2751 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, CalleeSaveBase,
2752 StackOffset::getScalable(-SVECalleeSavedSize), TII,
2754 } else if (BaseForSVEDealloc == AArch64::SP) {
2755 if (SVECalleeSavedSize) {
2756 // Deallocate the non-SVE locals first before we can deallocate (and
2757 // restore callee saves) from the SVE area.
2759 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2761 false, NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2762 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2763 NumBytes = 0;
2764 }
2765
2766 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2767 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2768 NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2769 SVEStackSize +
2770 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2771
2772 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2773 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2774 NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2775 DeallocateAfter +
2776 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2777 }
2778 if (EmitCFI)
2779 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2780 }
2781
2782 if (!hasFP(MF)) {
2783 bool RedZone = canUseRedZone(MF);
2784 // If this was a redzone leaf function, we don't need to restore the
2785 // stack pointer (but we may need to pop stack args for fastcc).
2786 if (RedZone && AfterCSRPopSize == 0)
2787 return;
2788
2789 // Pop the local variables off the stack. If there are no callee-saved
2790 // registers, it means we are actually positioned at the terminator and can
2791 // combine stack increment for the locals and the stack increment for
2792 // callee-popped arguments into (possibly) a single instruction and be done.
2793 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2794 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2795 if (NoCalleeSaveRestore)
2796 StackRestoreBytes += AfterCSRPopSize;
2797
2799 MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2800 StackOffset::getFixed(StackRestoreBytes), TII,
2801 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2802 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2803
2804 // If we were able to combine the local stack pop with the argument pop,
2805 // then we're done.
2806 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2807 return;
2808 }
2809
2810 NumBytes = 0;
2811 }
2812
2813 // Restore the original stack pointer.
2814 // FIXME: Rather than doing the math here, we should instead just use
2815 // non-post-indexed loads for the restores if we aren't actually going to
2816 // be able to save any instructions.
2817 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2819 MBB, LastPopI, DL, AArch64::SP, AArch64::FP,
2821 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2822 } else if (NumBytes)
2823 emitFrameOffset(MBB, LastPopI, DL, AArch64::SP, AArch64::SP,
2824 StackOffset::getFixed(NumBytes), TII,
2825 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2826
2827 // When we are about to restore the CSRs, the CFA register is SP again.
2828 if (EmitCFI && hasFP(MF))
2830 .buildDefCFA(AArch64::SP, PrologueSaveSize);
2831
2832 // This must be placed after the callee-save restore code because that code
2833 // assumes the SP is at the same location as it was after the callee-save save
2834 // code in the prologue.
2835 if (AfterCSRPopSize) {
2836 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2837 "interrupt may have clobbered");
2838
2840 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2842 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2843 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2844 }
2845}
2846
2849 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
2850}
2851
2853 return enableCFIFixup(MF) &&
2854 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2855}
2856
2857/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2858/// debug info. It's the same as what we use for resolving the code-gen
2859/// references for now. FIXME: This can go wrong when references are
2860/// SP-relative and simple call frames aren't used.
2863 Register &FrameReg) const {
2865 MF, FI, FrameReg,
2866 /*PreferFP=*/
2867 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2868 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2869 /*ForSimm=*/false);
2870}
2871
2874 int FI) const {
2875 // This function serves to provide a comparable offset from a single reference
2876 // point (the value of SP at function entry) that can be used for analysis,
2877 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2878 // correct for all objects in the presence of VLA-area objects or dynamic
2879 // stack re-alignment.
2880
2881 const auto &MFI = MF.getFrameInfo();
2882
2883 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2884 StackOffset SVEStackSize = getSVEStackSize(MF);
2885
2886 // For VLA-area objects, just emit an offset at the end of the stack frame.
2887 // Whilst not quite correct, these objects do live at the end of the frame and
2888 // so it is more useful for analysis for the offset to reflect this.
2889 if (MFI.isVariableSizedObjectIndex(FI)) {
2890 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2891 }
2892
2893 // This is correct in the absence of any SVE stack objects.
2894 if (!SVEStackSize)
2895 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2896
2897 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2898 bool FPAfterSVECalleeSaves =
2900 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2901 if (FPAfterSVECalleeSaves &&
2902 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
2903 return StackOffset::getScalable(ObjectOffset);
2904 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2905 ObjectOffset);
2906 }
2907
2908 bool IsFixed = MFI.isFixedObjectIndex(FI);
2909 bool IsCSR =
2910 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2911
2912 StackOffset ScalableOffset = {};
2913 if (!IsFixed && !IsCSR) {
2914 ScalableOffset = -SVEStackSize;
2915 } else if (FPAfterSVECalleeSaves && IsCSR) {
2916 ScalableOffset =
2918 }
2919
2920 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2921}
2922
2925 int FI) const {
2927}
2928
2930 int64_t ObjectOffset) {
2931 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2932 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2933 const Function &F = MF.getFunction();
2934 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2935 unsigned FixedObject =
2936 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2937 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2938 int64_t FPAdjust =
2939 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2940 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2941}
2942
2944 int64_t ObjectOffset) {
2945 const auto &MFI = MF.getFrameInfo();
2946 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2947}
2948
2949// TODO: This function currently does not work for scalable vectors.
2951 int FI) const {
2952 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2954 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2955 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2956 ? getFPOffset(MF, ObjectOffset).getFixed()
2957 : getStackOffset(MF, ObjectOffset).getFixed();
2958}
2959
2961 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2962 bool ForSimm) const {
2963 const auto &MFI = MF.getFrameInfo();
2964 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2965 bool isFixed = MFI.isFixedObjectIndex(FI);
2966 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2967 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2968 PreferFP, ForSimm);
2969}
2970
2972 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2973 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2974 const auto &MFI = MF.getFrameInfo();
2975 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2977 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2978 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2979
2980 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2981 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2982 bool isCSR =
2983 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2984
2985 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2986
2987 // Use frame pointer to reference fixed objects. Use it for locals if
2988 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2989 // reliable as a base). Make sure useFPForScavengingIndex() does the
2990 // right thing for the emergency spill slot.
2991 bool UseFP = false;
2992 if (AFI->hasStackFrame() && !isSVE) {
2993 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2994 // there are scalable (SVE) objects in between the FP and the fixed-sized
2995 // objects.
2996 PreferFP &= !SVEStackSize;
2997
2998 // Note: Keeping the following as multiple 'if' statements rather than
2999 // merging to a single expression for readability.
3000 //
3001 // Argument access should always use the FP.
3002 if (isFixed) {
3003 UseFP = hasFP(MF);
3004 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
3005 // References to the CSR area must use FP if we're re-aligning the stack
3006 // since the dynamically-sized alignment padding is between the SP/BP and
3007 // the CSR area.
3008 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
3009 UseFP = true;
3010 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
3011 // If the FPOffset is negative and we're producing a signed immediate, we
3012 // have to keep in mind that the available offset range for negative
3013 // offsets is smaller than for positive ones. If an offset is available
3014 // via the FP and the SP, use whichever is closest.
3015 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
3016 PreferFP |= Offset > -FPOffset && !SVEStackSize;
3017
3018 if (FPOffset >= 0) {
3019 // If the FPOffset is positive, that'll always be best, as the SP/BP
3020 // will be even further away.
3021 UseFP = true;
3022 } else if (MFI.hasVarSizedObjects()) {
3023 // If we have variable sized objects, we can use either FP or BP, as the
3024 // SP offset is unknown. We can use the base pointer if we have one and
3025 // FP is not preferred. If not, we're stuck with using FP.
3026 bool CanUseBP = RegInfo->hasBasePointer(MF);
3027 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
3028 UseFP = PreferFP;
3029 else if (!CanUseBP) // Can't use BP. Forced to use FP.
3030 UseFP = true;
3031 // else we can use BP and FP, but the offset from FP won't fit.
3032 // That will make us scavenge registers which we can probably avoid by
3033 // using BP. If it won't fit for BP either, we'll scavenge anyway.
3034 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
3035 // Funclets access the locals contained in the parent's stack frame
3036 // via the frame pointer, so we have to use the FP in the parent
3037 // function.
3038 (void) Subtarget;
3039 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
3040 MF.getFunction().isVarArg()) &&
3041 "Funclets should only be present on Win64");
3042 UseFP = true;
3043 } else {
3044 // We have the choice between FP and (SP or BP).
3045 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
3046 UseFP = true;
3047 }
3048 }
3049 }
3050
3051 assert(
3052 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
3053 "In the presence of dynamic stack pointer realignment, "
3054 "non-argument/CSR objects cannot be accessed through the frame pointer");
3055
3056 bool FPAfterSVECalleeSaves =
3058
3059 if (isSVE) {
3060 StackOffset FPOffset =
3062 StackOffset SPOffset =
3063 SVEStackSize +
3064 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
3065 ObjectOffset);
3066 if (FPAfterSVECalleeSaves) {
3068 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
3071 }
3072 }
3073 // Always use the FP for SVE spills if available and beneficial.
3074 if (hasFP(MF) && (SPOffset.getFixed() ||
3075 FPOffset.getScalable() < SPOffset.getScalable() ||
3076 RegInfo->hasStackRealignment(MF))) {
3077 FrameReg = RegInfo->getFrameRegister(MF);
3078 return FPOffset;
3079 }
3080
3081 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
3082 : (unsigned)AArch64::SP;
3083 return SPOffset;
3084 }
3085
3086 StackOffset ScalableOffset = {};
3087 if (FPAfterSVECalleeSaves) {
3088 // In this stack layout, the FP is in between the callee saves and other
3089 // SVE allocations.
3090 StackOffset SVECalleeSavedStack =
3092 if (UseFP) {
3093 if (isFixed)
3094 ScalableOffset = SVECalleeSavedStack;
3095 else if (!isCSR)
3096 ScalableOffset = SVECalleeSavedStack - SVEStackSize;
3097 } else {
3098 if (isFixed)
3099 ScalableOffset = SVEStackSize;
3100 else if (isCSR)
3101 ScalableOffset = SVEStackSize - SVECalleeSavedStack;
3102 }
3103 } else {
3104 if (UseFP && !(isFixed || isCSR))
3105 ScalableOffset = -SVEStackSize;
3106 if (!UseFP && (isFixed || isCSR))
3107 ScalableOffset = SVEStackSize;
3108 }
3109
3110 if (UseFP) {
3111 FrameReg = RegInfo->getFrameRegister(MF);
3112 return StackOffset::getFixed(FPOffset) + ScalableOffset;
3113 }
3114
3115 // Use the base pointer if we have one.
3116 if (RegInfo->hasBasePointer(MF))
3117 FrameReg = RegInfo->getBaseRegister();
3118 else {
3119 assert(!MFI.hasVarSizedObjects() &&
3120 "Can't use SP when we have var sized objects.");
3121 FrameReg = AArch64::SP;
3122 // If we're using the red zone for this function, the SP won't actually
3123 // be adjusted, so the offsets will be negative. They're also all
3124 // within range of the signed 9-bit immediate instructions.
3125 if (canUseRedZone(MF))
3126 Offset -= AFI->getLocalStackSize();
3127 }
3128
3129 return StackOffset::getFixed(Offset) + ScalableOffset;
3130}
3131
3132static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
3133 // Do not set a kill flag on values that are also marked as live-in. This
3134 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
3135 // callee saved registers.
3136 // Omitting the kill flags is conservatively correct even if the live-in
3137 // is not used after all.
3138 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
3139 return getKillRegState(!IsLiveIn);
3140}
3141
3143 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3146 return Subtarget.isTargetMachO() &&
3147 !(Subtarget.getTargetLowering()->supportSwiftError() &&
3148 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
3150 !requiresSaveVG(MF) && !AFI->isSVECC();
3151}
3152
3153static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
3154 bool NeedsWinCFI, bool IsFirst,
3155 const TargetRegisterInfo *TRI) {
3156 // If we are generating register pairs for a Windows function that requires
3157 // EH support, then pair consecutive registers only. There are no unwind
3158 // opcodes for saves/restores of non-consecutive register pairs.
3159 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
3160 // save_lrpair.
3161 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
3162
3163 if (Reg2 == AArch64::FP)
3164 return true;
3165 if (!NeedsWinCFI)
3166 return false;
3167 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
3168 return false;
3169 // If pairing a GPR with LR, the pair can be described by the save_lrpair
3170 // opcode. If this is the first register pair, it would end up with a
3171 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
3172 // if LR is paired with something else than the first register.
3173 // The save_lrpair opcode requires the first register to be an odd one.
3174 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
3175 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
3176 return false;
3177 return true;
3178}
3179
3180/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
3181/// WindowsCFI requires that only consecutive registers can be paired.
3182/// LR and FP need to be allocated together when the frame needs to save
3183/// the frame-record. This means any other register pairing with LR is invalid.
3184static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
3185 bool UsesWinAAPCS, bool NeedsWinCFI,
3186 bool NeedsFrameRecord, bool IsFirst,
3187 const TargetRegisterInfo *TRI) {
3188 if (UsesWinAAPCS)
3189 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
3190 TRI);
3191
3192 // If we need to store the frame record, don't pair any register
3193 // with LR other than FP.
3194 if (NeedsFrameRecord)
3195 return Reg2 == AArch64::LR;
3196
3197 return false;
3198}
3199
3200namespace {
3201
3202struct RegPairInfo {
3203 unsigned Reg1 = AArch64::NoRegister;
3204 unsigned Reg2 = AArch64::NoRegister;
3205 int FrameIdx;
3206 int Offset;
3207 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
3208 const TargetRegisterClass *RC;
3209
3210 RegPairInfo() = default;
3211
3212 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
3213
3214 bool isScalable() const { return Type == PPR || Type == ZPR; }
3215};
3216
3217} // end anonymous namespace
3218
3219unsigned findFreePredicateReg(BitVector &SavedRegs) {
3220 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
3221 if (SavedRegs.test(PReg)) {
3222 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
3223 return PNReg;
3224 }
3225 }
3226 return AArch64::NoRegister;
3227}
3228
3229// The multivector LD/ST are available only for SME or SVE2p1 targets
3231 MachineFunction &MF) {
3233 return false;
3234
3235 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
3236 bool IsLocallyStreaming =
3237 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
3238
3239 // Only when in streaming mode SME2 instructions can be safely used.
3240 // It is not safe to use SME2 instructions when in streaming compatible or
3241 // locally streaming mode.
3242 return Subtarget.hasSVE2p1() ||
3243 (Subtarget.hasSME2() &&
3244 (!IsLocallyStreaming && Subtarget.isStreaming()));
3245}
3246
3250 bool NeedsFrameRecord) {
3251
3252 if (CSI.empty())
3253 return;
3254
3255 bool IsWindows = isTargetWindows(MF);
3256 bool NeedsWinCFI = needsWinCFI(MF);
3258 unsigned StackHazardSize = getStackHazardSize(MF);
3259 MachineFrameInfo &MFI = MF.getFrameInfo();
3261 unsigned Count = CSI.size();
3262 (void)CC;
3263 // MachO's compact unwind format relies on all registers being stored in
3264 // pairs.
3267 CC == CallingConv::Win64 || (Count & 1) == 0) &&
3268 "Odd number of callee-saved regs to spill!");
3269 int ByteOffset = AFI->getCalleeSavedStackSize();
3270 int StackFillDir = -1;
3271 int RegInc = 1;
3272 unsigned FirstReg = 0;
3273 if (NeedsWinCFI) {
3274 // For WinCFI, fill the stack from the bottom up.
3275 ByteOffset = 0;
3276 StackFillDir = 1;
3277 // As the CSI array is reversed to match PrologEpilogInserter, iterate
3278 // backwards, to pair up registers starting from lower numbered registers.
3279 RegInc = -1;
3280 FirstReg = Count - 1;
3281 }
3282 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
3283 int ScalableByteOffset =
3284 FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
3285 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
3286 Register LastReg = 0;
3287
3288 // When iterating backwards, the loop condition relies on unsigned wraparound.
3289 for (unsigned i = FirstReg; i < Count; i += RegInc) {
3290 RegPairInfo RPI;
3291 RPI.Reg1 = CSI[i].getReg();
3292
3293 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
3294 RPI.Type = RegPairInfo::GPR;
3295 RPI.RC = &AArch64::GPR64RegClass;
3296 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
3297 RPI.Type = RegPairInfo::FPR64;
3298 RPI.RC = &AArch64::FPR64RegClass;
3299 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
3300 RPI.Type = RegPairInfo::FPR128;
3301 RPI.RC = &AArch64::FPR128RegClass;
3302 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
3303 RPI.Type = RegPairInfo::ZPR;
3304 RPI.RC = &AArch64::ZPRRegClass;
3305 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
3306 RPI.Type = RegPairInfo::PPR;
3307 RPI.RC = &AArch64::PPRRegClass;
3308 } else if (RPI.Reg1 == AArch64::VG) {
3309 RPI.Type = RegPairInfo::VG;
3310 RPI.RC = &AArch64::FIXED_REGSRegClass;
3311 } else {
3312 llvm_unreachable("Unsupported register class.");
3313 }
3314
3315 // Add the stack hazard size as we transition from GPR->FPR CSRs.
3316 if (AFI->hasStackHazardSlotIndex() &&
3317 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3319 ByteOffset += StackFillDir * StackHazardSize;
3320 LastReg = RPI.Reg1;
3321
3322 int Scale = TRI->getSpillSize(*RPI.RC);
3323 // Add the next reg to the pair if it is in the same register class.
3324 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
3325 MCRegister NextReg = CSI[i + RegInc].getReg();
3326 bool IsFirst = i == FirstReg;
3327 switch (RPI.Type) {
3328 case RegPairInfo::GPR:
3329 if (AArch64::GPR64RegClass.contains(NextReg) &&
3330 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
3331 NeedsWinCFI, NeedsFrameRecord, IsFirst,
3332 TRI))
3333 RPI.Reg2 = NextReg;
3334 break;
3335 case RegPairInfo::FPR64:
3336 if (AArch64::FPR64RegClass.contains(NextReg) &&
3337 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
3338 IsFirst, TRI))
3339 RPI.Reg2 = NextReg;
3340 break;
3341 case RegPairInfo::FPR128:
3342 if (AArch64::FPR128RegClass.contains(NextReg))
3343 RPI.Reg2 = NextReg;
3344 break;
3345 case RegPairInfo::PPR:
3346 break;
3347 case RegPairInfo::ZPR:
3348 if (AFI->getPredicateRegForFillSpill() != 0 &&
3349 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
3350 // Calculate offset of register pair to see if pair instruction can be
3351 // used.
3352 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
3353 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
3354 RPI.Reg2 = NextReg;
3355 }
3356 break;
3357 case RegPairInfo::VG:
3358 break;
3359 }
3360 }
3361
3362 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
3363 // list to come in sorted by frame index so that we can issue the store
3364 // pair instructions directly. Assert if we see anything otherwise.
3365 //
3366 // The order of the registers in the list is controlled by
3367 // getCalleeSavedRegs(), so they will always be in-order, as well.
3368 assert((!RPI.isPaired() ||
3369 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
3370 "Out of order callee saved regs!");
3371
3372 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
3373 RPI.Reg1 == AArch64::LR) &&
3374 "FrameRecord must be allocated together with LR");
3375
3376 // Windows AAPCS has FP and LR reversed.
3377 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
3378 RPI.Reg2 == AArch64::LR) &&
3379 "FrameRecord must be allocated together with LR");
3380
3381 // MachO's compact unwind format relies on all registers being stored in
3382 // adjacent register pairs.
3385 CC == CallingConv::Win64 ||
3386 (RPI.isPaired() &&
3387 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
3388 RPI.Reg1 + 1 == RPI.Reg2))) &&
3389 "Callee-save registers not saved as adjacent register pair!");
3390
3391 RPI.FrameIdx = CSI[i].getFrameIdx();
3392 if (NeedsWinCFI &&
3393 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
3394 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
3395
3396 // Realign the scalable offset if necessary. This is relevant when
3397 // spilling predicates on Windows.
3398 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
3399 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
3400 }
3401
3402 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3403 assert(OffsetPre % Scale == 0);
3404
3405 if (RPI.isScalable())
3406 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3407 else
3408 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
3409
3410 // Swift's async context is directly before FP, so allocate an extra
3411 // 8 bytes for it.
3412 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3413 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3414 (IsWindows && RPI.Reg2 == AArch64::LR)))
3415 ByteOffset += StackFillDir * 8;
3416
3417 // Round up size of non-pair to pair size if we need to pad the
3418 // callee-save area to ensure 16-byte alignment.
3419 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
3420 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
3421 ByteOffset % 16 != 0) {
3422 ByteOffset += 8 * StackFillDir;
3423 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
3424 // A stack frame with a gap looks like this, bottom up:
3425 // d9, d8. x21, gap, x20, x19.
3426 // Set extra alignment on the x21 object to create the gap above it.
3427 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
3428 NeedGapToAlignStack = false;
3429 }
3430
3431 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
3432 assert(OffsetPost % Scale == 0);
3433 // If filling top down (default), we want the offset after incrementing it.
3434 // If filling bottom up (WinCFI) we need the original offset.
3435 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
3436
3437 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
3438 // Swift context can directly precede FP.
3439 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
3440 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
3441 (IsWindows && RPI.Reg2 == AArch64::LR)))
3442 Offset += 8;
3443 RPI.Offset = Offset / Scale;
3444
3445 assert((!RPI.isPaired() ||
3446 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
3447 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
3448 "Offset out of bounds for LDP/STP immediate");
3449
3450 auto isFrameRecord = [&] {
3451 if (RPI.isPaired())
3452 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
3453 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
3454 // Otherwise, look for the frame record as two unpaired registers. This is
3455 // needed for -aarch64-stack-hazard-size=<val>, which disables register
3456 // pairing (as the padding may be too large for the LDP/STP offset). Note:
3457 // On Windows, this check works out as current reg == FP, next reg == LR,
3458 // and on other platforms current reg == FP, previous reg == LR. This
3459 // works out as the correct pre-increment or post-increment offsets
3460 // respectively.
3461 return i > 0 && RPI.Reg1 == AArch64::FP &&
3462 CSI[i - 1].getReg() == AArch64::LR;
3463 };
3464
3465 // Save the offset to frame record so that the FP register can point to the
3466 // innermost frame record (spilled FP and LR registers).
3467 if (NeedsFrameRecord && isFrameRecord())
3469
3470 RegPairs.push_back(RPI);
3471 if (RPI.isPaired())
3472 i += RegInc;
3473 }
3474 if (NeedsWinCFI) {
3475 // If we need an alignment gap in the stack, align the topmost stack
3476 // object. A stack frame with a gap looks like this, bottom up:
3477 // x19, d8. d9, gap.
3478 // Set extra alignment on the topmost stack object (the first element in
3479 // CSI, which goes top down), to create the gap above it.
3480 if (AFI->hasCalleeSaveStackFreeSpace())
3481 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
3482 // We iterated bottom up over the registers; flip RegPairs back to top
3483 // down order.
3484 std::reverse(RegPairs.begin(), RegPairs.end());
3485 }
3486}
3487
3491 MachineFunction &MF = *MBB.getParent();
3492 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
3494 bool NeedsWinCFI = needsWinCFI(MF);
3495 DebugLoc DL;
3497
3498 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3499
3501 // Refresh the reserved regs in case there are any potential changes since the
3502 // last freeze.
3503 MRI.freezeReservedRegs();
3504
3505 if (homogeneousPrologEpilog(MF)) {
3506 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
3508
3509 for (auto &RPI : RegPairs) {
3510 MIB.addReg(RPI.Reg1);
3511 MIB.addReg(RPI.Reg2);
3512
3513 // Update register live in.
3514 if (!MRI.isReserved(RPI.Reg1))
3515 MBB.addLiveIn(RPI.Reg1);
3516 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
3517 MBB.addLiveIn(RPI.Reg2);
3518 }
3519 return true;
3520 }
3521 bool PTrueCreated = false;
3522 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
3523 unsigned Reg1 = RPI.Reg1;
3524 unsigned Reg2 = RPI.Reg2;
3525 unsigned StrOpc;
3526
3527 // Issue sequence of spills for cs regs. The first spill may be converted
3528 // to a pre-decrement store later by emitPrologue if the callee-save stack
3529 // area allocation can't be combined with the local stack area allocation.
3530 // For example:
3531 // stp x22, x21, [sp, #0] // addImm(+0)
3532 // stp x20, x19, [sp, #16] // addImm(+2)
3533 // stp fp, lr, [sp, #32] // addImm(+4)
3534 // Rationale: This sequence saves uop updates compared to a sequence of
3535 // pre-increment spills like stp xi,xj,[sp,#-16]!
3536 // Note: Similar rationale and sequence for restores in epilog.
3537 unsigned Size = TRI->getSpillSize(*RPI.RC);
3538 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3539 switch (RPI.Type) {
3540 case RegPairInfo::GPR:
3541 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
3542 break;
3543 case RegPairInfo::FPR64:
3544 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
3545 break;
3546 case RegPairInfo::FPR128:
3547 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
3548 break;
3549 case RegPairInfo::ZPR:
3550 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
3551 break;
3552 case RegPairInfo::PPR:
3553 StrOpc =
3554 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
3555 break;
3556 case RegPairInfo::VG:
3557 StrOpc = AArch64::STRXui;
3558 break;
3559 }
3560
3561 unsigned X0Scratch = AArch64::NoRegister;
3562 auto RestoreX0 = make_scope_exit([&] {
3563 if (X0Scratch != AArch64::NoRegister)
3564 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
3565 .addReg(X0Scratch)
3567 });
3568
3569 if (Reg1 == AArch64::VG) {
3570 // Find an available register to store value of VG to.
3572 assert(Reg1 != AArch64::NoRegister);
3573 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
3574 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
3575 .addImm(31)
3576 .addImm(1)
3578 } else {
3580 if (any_of(MBB.liveins(),
3581 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
3582 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
3583 AArch64::X0, LiveIn.PhysReg);
3584 })) {
3585 X0Scratch = Reg1;
3586 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
3587 .addReg(AArch64::X0)
3589 }
3590
3591 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
3592 const uint32_t *RegMask =
3593 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
3594 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
3595 .addExternalSymbol(TLI.getLibcallName(LC))
3596 .addRegMask(RegMask)
3597 .addReg(AArch64::X0, RegState::ImplicitDefine)
3599 Reg1 = AArch64::X0;
3600 }
3601 }
3602
3603 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
3604 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3605 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3606 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3607 dbgs() << ")\n");
3608
3609 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
3610 "Windows unwdinding requires a consecutive (FP,LR) pair");
3611 // Windows unwind codes require consecutive registers if registers are
3612 // paired. Make the switch here, so that the code below will save (x,x+1)
3613 // and not (x+1,x).
3614 unsigned FrameIdxReg1 = RPI.FrameIdx;
3615 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3616 if (NeedsWinCFI && RPI.isPaired()) {
3617 std::swap(Reg1, Reg2);
3618 std::swap(FrameIdxReg1, FrameIdxReg2);
3619 }
3620
3621 if (RPI.isPaired() && RPI.isScalable()) {
3622 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3625 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3626 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3627 "Expects SVE2.1 or SME2 target and a predicate register");
3628#ifdef EXPENSIVE_CHECKS
3629 auto IsPPR = [](const RegPairInfo &c) {
3630 return c.Reg1 == RegPairInfo::PPR;
3631 };
3632 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
3633 auto IsZPR = [](const RegPairInfo &c) {
3634 return c.Type == RegPairInfo::ZPR;
3635 };
3636 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
3637 assert(!(PPRBegin < ZPRBegin) &&
3638 "Expected callee save predicate to be handled first");
3639#endif
3640 if (!PTrueCreated) {
3641 PTrueCreated = true;
3642 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3644 }
3645 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3646 if (!MRI.isReserved(Reg1))
3647 MBB.addLiveIn(Reg1);
3648 if (!MRI.isReserved(Reg2))
3649 MBB.addLiveIn(Reg2);
3650 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
3652 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3653 MachineMemOperand::MOStore, Size, Alignment));
3654 MIB.addReg(PnReg);
3655 MIB.addReg(AArch64::SP)
3656 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
3657 // where 2*vscale is implicit
3660 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3661 MachineMemOperand::MOStore, Size, Alignment));
3662 if (NeedsWinCFI)
3664 } else { // The code when the pair of ZReg is not present
3665 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
3666 if (!MRI.isReserved(Reg1))
3667 MBB.addLiveIn(Reg1);
3668 if (RPI.isPaired()) {
3669 if (!MRI.isReserved(Reg2))
3670 MBB.addLiveIn(Reg2);
3671 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
3673 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3674 MachineMemOperand::MOStore, Size, Alignment));
3675 }
3676 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
3677 .addReg(AArch64::SP)
3678 .addImm(RPI.Offset) // [sp, #offset*vscale],
3679 // where factor*vscale is implicit
3682 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3683 MachineMemOperand::MOStore, Size, Alignment));
3684 if (NeedsWinCFI)
3686 }
3687 // Update the StackIDs of the SVE stack slots.
3688 MachineFrameInfo &MFI = MF.getFrameInfo();
3689 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
3690 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
3691 if (RPI.isPaired())
3692 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
3693 }
3694 }
3695 return true;
3696}
3697
3701 MachineFunction &MF = *MBB.getParent();
3703 DebugLoc DL;
3705 bool NeedsWinCFI = needsWinCFI(MF);
3706
3707 if (MBBI != MBB.end())
3708 DL = MBBI->getDebugLoc();
3709
3710 computeCalleeSaveRegisterPairs(MF, CSI, TRI, RegPairs, hasFP(MF));
3711 if (homogeneousPrologEpilog(MF, &MBB)) {
3712 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
3714 for (auto &RPI : RegPairs) {
3715 MIB.addReg(RPI.Reg1, RegState::Define);
3716 MIB.addReg(RPI.Reg2, RegState::Define);
3717 }
3718 return true;
3719 }
3720
3721 // For performance reasons restore SVE register in increasing order
3722 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
3723 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
3724 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
3725 std::reverse(PPRBegin, PPREnd);
3726 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
3727 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
3728 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
3729 std::reverse(ZPRBegin, ZPREnd);
3730
3731 bool PTrueCreated = false;
3732 for (const RegPairInfo &RPI : RegPairs) {
3733 unsigned Reg1 = RPI.Reg1;
3734 unsigned Reg2 = RPI.Reg2;
3735
3736 // Issue sequence of restores for cs regs. The last restore may be converted
3737 // to a post-increment load later by emitEpilogue if the callee-save stack
3738 // area allocation can't be combined with the local stack area allocation.
3739 // For example:
3740 // ldp fp, lr, [sp, #32] // addImm(+4)
3741 // ldp x20, x19, [sp, #16] // addImm(+2)
3742 // ldp x22, x21, [sp, #0] // addImm(+0)
3743 // Note: see comment in spillCalleeSavedRegisters()
3744 unsigned LdrOpc;
3745 unsigned Size = TRI->getSpillSize(*RPI.RC);
3746 Align Alignment = TRI->getSpillAlign(*RPI.RC);
3747 switch (RPI.Type) {
3748 case RegPairInfo::GPR:
3749 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
3750 break;
3751 case RegPairInfo::FPR64:
3752 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3753 break;
3754 case RegPairInfo::FPR128:
3755 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3756 break;
3757 case RegPairInfo::ZPR:
3758 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3759 break;
3760 case RegPairInfo::PPR:
3761 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
3762 : AArch64::LDR_PXI;
3763 break;
3764 case RegPairInfo::VG:
3765 continue;
3766 }
3767 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3768 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3769 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3770 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3771 dbgs() << ")\n");
3772
3773 // Windows unwind codes require consecutive registers if registers are
3774 // paired. Make the switch here, so that the code below will save (x,x+1)
3775 // and not (x+1,x).
3776 unsigned FrameIdxReg1 = RPI.FrameIdx;
3777 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3778 if (NeedsWinCFI && RPI.isPaired()) {
3779 std::swap(Reg1, Reg2);
3780 std::swap(FrameIdxReg1, FrameIdxReg2);
3781 }
3782
3784 if (RPI.isPaired() && RPI.isScalable()) {
3785 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3787 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3788 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3789 "Expects SVE2.1 or SME2 target and a predicate register");
3790#ifdef EXPENSIVE_CHECKS
3791 assert(!(PPRBegin < ZPRBegin) &&
3792 "Expected callee save predicate to be handled first");
3793#endif
3794 if (!PTrueCreated) {
3795 PTrueCreated = true;
3796 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3798 }
3799 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3800 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3801 getDefRegState(true));
3803 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3804 MachineMemOperand::MOLoad, Size, Alignment));
3805 MIB.addReg(PnReg);
3806 MIB.addReg(AArch64::SP)
3807 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
3808 // where 2*vscale is implicit
3811 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3812 MachineMemOperand::MOLoad, Size, Alignment));
3813 if (NeedsWinCFI)
3815 } else {
3816 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3817 if (RPI.isPaired()) {
3818 MIB.addReg(Reg2, getDefRegState(true));
3820 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3821 MachineMemOperand::MOLoad, Size, Alignment));
3822 }
3823 MIB.addReg(Reg1, getDefRegState(true));
3824 MIB.addReg(AArch64::SP)
3825 .addImm(RPI.Offset) // [sp, #offset*vscale]
3826 // where factor*vscale is implicit
3829 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3830 MachineMemOperand::MOLoad, Size, Alignment));
3831 if (NeedsWinCFI)
3833 }
3834 }
3835 return true;
3836}
3837
3838// Return the FrameID for a MMO.
3839static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3840 const MachineFrameInfo &MFI) {
3841 auto *PSV =
3842 dyn_cast_or_null<FixedStackPseudoSourceValue>(MMO->getPseudoValue());
3843 if (PSV)
3844 return std::optional<int>(PSV->getFrameIndex());
3845
3846 if (MMO->getValue()) {
3847 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3848 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3849 FI++)
3850 if (MFI.getObjectAllocation(FI) == Al)
3851 return FI;
3852 }
3853 }
3854
3855 return std::nullopt;
3856}
3857
3858// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3859static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3860 const MachineFrameInfo &MFI) {
3861 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3862 return std::nullopt;
3863
3864 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3865}
3866
3867// Check if a Hazard slot is needed for the current function, and if so create
3868// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3869// which can be used to determine if any hazard padding is needed.
3870void AArch64FrameLowering::determineStackHazardSlot(
3871 MachineFunction &MF, BitVector &SavedRegs) const {
3872 unsigned StackHazardSize = getStackHazardSize(MF);
3873 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3874 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3876 return;
3877
3878 // Stack hazards are only needed in streaming functions.
3879 SMEAttrs Attrs = AFI->getSMEFnAttrs();
3880 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3881 return;
3882
3883 MachineFrameInfo &MFI = MF.getFrameInfo();
3884
3885 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3886 // stack objects.
3887 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3888 return AArch64::FPR64RegClass.contains(Reg) ||
3889 AArch64::FPR128RegClass.contains(Reg) ||
3890 AArch64::ZPRRegClass.contains(Reg) ||
3891 AArch64::PPRRegClass.contains(Reg);
3892 });
3893 bool HasFPRStackObjects = false;
3894 if (!HasFPRCSRs) {
3895 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3896 for (auto &MBB : MF) {
3897 for (auto &MI : MBB) {
3898 std::optional<int> FI = getLdStFrameID(MI, MFI);
3899 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3900 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3902 FrameObjects[*FI] |= 2;
3903 else
3904 FrameObjects[*FI] |= 1;
3905 }
3906 }
3907 }
3908 HasFPRStackObjects =
3909 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3910 }
3911
3912 if (HasFPRCSRs || HasFPRStackObjects) {
3913 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3914 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3915 << StackHazardSize << "\n");
3917 }
3918}
3919
3921 BitVector &SavedRegs,
3922 RegScavenger *RS) const {
3923 // All calls are tail calls in GHC calling conv, and functions have no
3924 // prologue/epilogue.
3926 return;
3927
3929 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3931 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3933 unsigned UnspilledCSGPR = AArch64::NoRegister;
3934 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3935
3936 MachineFrameInfo &MFI = MF.getFrameInfo();
3937 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3938
3939 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3940 ? RegInfo->getBaseRegister()
3941 : (unsigned)AArch64::NoRegister;
3942
3943 unsigned ExtraCSSpill = 0;
3944 bool HasUnpairedGPR64 = false;
3945 bool HasPairZReg = false;
3946 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
3947 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
3948
3949 // Figure out which callee-saved registers to save/restore.
3950 for (unsigned i = 0; CSRegs[i]; ++i) {
3951 const unsigned Reg = CSRegs[i];
3952
3953 // Add the base pointer register to SavedRegs if it is callee-save.
3954 if (Reg == BasePointerReg)
3955 SavedRegs.set(Reg);
3956
3957 // Don't save manually reserved registers set through +reserve-x#i,
3958 // even for callee-saved registers, as per GCC's behavior.
3959 if (UserReservedRegs[Reg]) {
3960 SavedRegs.reset(Reg);
3961 continue;
3962 }
3963
3964 bool RegUsed = SavedRegs.test(Reg);
3965 unsigned PairedReg = AArch64::NoRegister;
3966 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3967 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3968 AArch64::FPR128RegClass.contains(Reg)) {
3969 // Compensate for odd numbers of GP CSRs.
3970 // For now, all the known cases of odd number of CSRs are of GPRs.
3971 if (HasUnpairedGPR64)
3972 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3973 else
3974 PairedReg = CSRegs[i ^ 1];
3975 }
3976
3977 // If the function requires all the GP registers to save (SavedRegs),
3978 // and there are an odd number of GP CSRs at the same time (CSRegs),
3979 // PairedReg could be in a different register class from Reg, which would
3980 // lead to a FPR (usually D8) accidentally being marked saved.
3981 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3982 PairedReg = AArch64::NoRegister;
3983 HasUnpairedGPR64 = true;
3984 }
3985 assert(PairedReg == AArch64::NoRegister ||
3986 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3987 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3988 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3989
3990 if (!RegUsed) {
3991 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
3992 UnspilledCSGPR = Reg;
3993 UnspilledCSGPRPaired = PairedReg;
3994 }
3995 continue;
3996 }
3997
3998 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
3999 // spilled. If all of p0-p3 are used as return values p4 is must be free
4000 // to reload p8-p15.
4001 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
4002 AArch64::PPR_p8to15RegClass.contains(Reg)) {
4003 SavedRegs.set(AArch64::P4);
4004 }
4005
4006 // MachO's compact unwind format relies on all registers being stored in
4007 // pairs.
4008 // FIXME: the usual format is actually better if unwinding isn't needed.
4009 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
4010 !SavedRegs.test(PairedReg)) {
4011 SavedRegs.set(PairedReg);
4012 if (AArch64::GPR64RegClass.contains(PairedReg) &&
4013 !ReservedRegs[PairedReg])
4014 ExtraCSSpill = PairedReg;
4015 }
4016 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
4017 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
4018 SavedRegs.test(CSRegs[i ^ 1]));
4019 }
4020
4021 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
4023 // Find a suitable predicate register for the multi-vector spill/fill
4024 // instructions.
4025 unsigned PnReg = findFreePredicateReg(SavedRegs);
4026 if (PnReg != AArch64::NoRegister)
4027 AFI->setPredicateRegForFillSpill(PnReg);
4028 // If no free callee-save has been found assign one.
4029 if (!AFI->getPredicateRegForFillSpill() &&
4030 MF.getFunction().getCallingConv() ==
4032 SavedRegs.set(AArch64::P8);
4033 AFI->setPredicateRegForFillSpill(AArch64::PN8);
4034 }
4035
4036 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
4037 "Predicate cannot be a reserved register");
4038 }
4039
4041 !Subtarget.isTargetWindows()) {
4042 // For Windows calling convention on a non-windows OS, where X18 is treated
4043 // as reserved, back up X18 when entering non-windows code (marked with the
4044 // Windows calling convention) and restore when returning regardless of
4045 // whether the individual function uses it - it might call other functions
4046 // that clobber it.
4047 SavedRegs.set(AArch64::X18);
4048 }
4049
4050 // Calculates the callee saved stack size.
4051 unsigned CSStackSize = 0;
4052 unsigned SVECSStackSize = 0;
4054 for (unsigned Reg : SavedRegs.set_bits()) {
4055 auto *RC = TRI->getMinimalPhysRegClass(Reg);
4056 assert(RC && "expected register class!");
4057 auto SpillSize = TRI->getSpillSize(*RC);
4058 if (AArch64::PPRRegClass.contains(Reg) ||
4059 AArch64::ZPRRegClass.contains(Reg))
4060 SVECSStackSize += SpillSize;
4061 else
4062 CSStackSize += SpillSize;
4063 }
4064
4065 // Save number of saved regs, so we can easily update CSStackSize later to
4066 // account for any additional 64-bit GPR saves. Note: After this point
4067 // only 64-bit GPRs can be added to SavedRegs.
4068 unsigned NumSavedRegs = SavedRegs.count();
4069
4070 // Increase the callee-saved stack size if the function has streaming mode
4071 // changes, as we will need to spill the value of the VG register.
4072 if (requiresSaveVG(MF))
4073 CSStackSize += 8;
4074
4075 // Determine if a Hazard slot should be used, and increase the CSStackSize by
4076 // StackHazardSize if so.
4077 determineStackHazardSlot(MF, SavedRegs);
4078 if (AFI->hasStackHazardSlotIndex())
4079 CSStackSize += getStackHazardSize(MF);
4080
4081 // If we must call __arm_get_current_vg in the prologue preserve the LR.
4082 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
4083 SavedRegs.set(AArch64::LR);
4084
4085 // The frame record needs to be created by saving the appropriate registers
4086 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
4087 if (hasFP(MF) ||
4088 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
4089 SavedRegs.set(AArch64::FP);
4090 SavedRegs.set(AArch64::LR);
4091 }
4092
4093 LLVM_DEBUG({
4094 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
4095 for (unsigned Reg : SavedRegs.set_bits())
4096 dbgs() << ' ' << printReg(Reg, RegInfo);
4097 dbgs() << "\n";
4098 });
4099
4100 // If any callee-saved registers are used, the frame cannot be eliminated.
4101 int64_t SVEStackSize =
4102 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
4103 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
4104
4105 // The CSR spill slots have not been allocated yet, so estimateStackSize
4106 // won't include them.
4107 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
4108
4109 // We may address some of the stack above the canonical frame address, either
4110 // for our own arguments or during a call. Include that in calculating whether
4111 // we have complicated addressing concerns.
4112 int64_t CalleeStackUsed = 0;
4113 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
4114 int64_t FixedOff = MFI.getObjectOffset(I);
4115 if (FixedOff > CalleeStackUsed)
4116 CalleeStackUsed = FixedOff;
4117 }
4118
4119 // Conservatively always assume BigStack when there are SVE spills.
4120 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
4121 CalleeStackUsed) > EstimatedStackSizeLimit;
4122 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
4123 AFI->setHasStackFrame(true);
4124
4125 // Estimate if we might need to scavenge a register at some point in order
4126 // to materialize a stack offset. If so, either spill one additional
4127 // callee-saved register or reserve a special spill slot to facilitate
4128 // register scavenging. If we already spilled an extra callee-saved register
4129 // above to keep the number of spills even, we don't need to do anything else
4130 // here.
4131 if (BigStack) {
4132 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
4133 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
4134 << " to get a scratch register.\n");
4135 SavedRegs.set(UnspilledCSGPR);
4136 ExtraCSSpill = UnspilledCSGPR;
4137
4138 // MachO's compact unwind format relies on all registers being stored in
4139 // pairs, so if we need to spill one extra for BigStack, then we need to
4140 // store the pair.
4141 if (producePairRegisters(MF)) {
4142 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
4143 // Failed to make a pair for compact unwind format, revert spilling.
4144 if (produceCompactUnwindFrame(MF)) {
4145 SavedRegs.reset(UnspilledCSGPR);
4146 ExtraCSSpill = AArch64::NoRegister;
4147 }
4148 } else
4149 SavedRegs.set(UnspilledCSGPRPaired);
4150 }
4151 }
4152
4153 // If we didn't find an extra callee-saved register to spill, create
4154 // an emergency spill slot.
4155 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
4157 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
4158 unsigned Size = TRI->getSpillSize(RC);
4159 Align Alignment = TRI->getSpillAlign(RC);
4160 int FI = MFI.CreateSpillStackObject(Size, Alignment);
4162 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
4163 << " as the emergency spill slot.\n");
4164 }
4165 }
4166
4167 // Adding the size of additional 64bit GPR saves.
4168 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
4169
4170 // A Swift asynchronous context extends the frame record with a pointer
4171 // directly before FP.
4172 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
4173 CSStackSize += 8;
4174
4175 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
4176 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
4177 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
4178
4180 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
4181 "Should not invalidate callee saved info");
4182
4183 // Round up to register pair alignment to avoid additional SP adjustment
4184 // instructions.
4185 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
4186 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
4187 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
4188}
4189
4191 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
4192 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
4193 unsigned &MaxCSFrameIndex) const {
4194 bool NeedsWinCFI = needsWinCFI(MF);
4195 unsigned StackHazardSize = getStackHazardSize(MF);
4196 // To match the canonical windows frame layout, reverse the list of
4197 // callee saved registers to get them laid out by PrologEpilogInserter
4198 // in the right order. (PrologEpilogInserter allocates stack objects top
4199 // down. Windows canonical prologs store higher numbered registers at
4200 // the top, thus have the CSI array start from the highest registers.)
4201 if (NeedsWinCFI)
4202 std::reverse(CSI.begin(), CSI.end());
4203
4204 if (CSI.empty())
4205 return true; // Early exit if no callee saved registers are modified!
4206
4207 // Now that we know which registers need to be saved and restored, allocate
4208 // stack slots for them.
4209 MachineFrameInfo &MFI = MF.getFrameInfo();
4210 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4211
4212 bool UsesWinAAPCS = isTargetWindows(MF);
4213 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
4214 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
4215 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
4216 if ((unsigned)FrameIdx < MinCSFrameIndex)
4217 MinCSFrameIndex = FrameIdx;
4218 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4219 MaxCSFrameIndex = FrameIdx;
4220 }
4221
4222 // Insert VG into the list of CSRs, immediately before LR if saved.
4223 if (requiresSaveVG(MF)) {
4224 CalleeSavedInfo VGInfo(AArch64::VG);
4225 auto It =
4226 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
4227 if (It != CSI.end())
4228 CSI.insert(It, VGInfo);
4229 else
4230 CSI.push_back(VGInfo);
4231 }
4232
4233 Register LastReg = 0;
4234 int HazardSlotIndex = std::numeric_limits<int>::max();
4235 for (auto &CS : CSI) {
4236 MCRegister Reg = CS.getReg();
4237 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
4238
4239 // Create a hazard slot as we switch between GPR and FPR CSRs.
4240 if (AFI->hasStackHazardSlotIndex() &&
4241 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
4243 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
4244 "Unexpected register order for hazard slot");
4245 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
4246 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4247 << "\n");
4248 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4249 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4250 MinCSFrameIndex = HazardSlotIndex;
4251 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4252 MaxCSFrameIndex = HazardSlotIndex;
4253 }
4254
4255 unsigned Size = RegInfo->getSpillSize(*RC);
4256 Align Alignment(RegInfo->getSpillAlign(*RC));
4257 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
4258 CS.setFrameIdx(FrameIdx);
4259
4260 if ((unsigned)FrameIdx < MinCSFrameIndex)
4261 MinCSFrameIndex = FrameIdx;
4262 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4263 MaxCSFrameIndex = FrameIdx;
4264
4265 // Grab 8 bytes below FP for the extended asynchronous frame info.
4266 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
4267 Reg == AArch64::FP) {
4268 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
4269 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
4270 if ((unsigned)FrameIdx < MinCSFrameIndex)
4271 MinCSFrameIndex = FrameIdx;
4272 if ((unsigned)FrameIdx > MaxCSFrameIndex)
4273 MaxCSFrameIndex = FrameIdx;
4274 }
4275 LastReg = Reg;
4276 }
4277
4278 // Add hazard slot in the case where no FPR CSRs are present.
4279 if (AFI->hasStackHazardSlotIndex() &&
4280 HazardSlotIndex == std::numeric_limits<int>::max()) {
4281 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
4282 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
4283 << "\n");
4284 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
4285 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
4286 MinCSFrameIndex = HazardSlotIndex;
4287 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
4288 MaxCSFrameIndex = HazardSlotIndex;
4289 }
4290
4291 return true;
4292}
4293
4295 const MachineFunction &MF) const {
4297 // If the function has streaming-mode changes, don't scavenge a
4298 // spillslot in the callee-save area, as that might require an
4299 // 'addvl' in the streaming-mode-changing call-sequence when the
4300 // function doesn't use a FP.
4301 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
4302 return false;
4303 // Don't allow register salvaging with hazard slots, in case it moves objects
4304 // into the wrong place.
4305 if (AFI->hasStackHazardSlotIndex())
4306 return false;
4307 return AFI->hasCalleeSaveStackFreeSpace();
4308}
4309
4310/// returns true if there are any SVE callee saves.
4312 int &Min, int &Max) {
4313 Min = std::numeric_limits<int>::max();
4314 Max = std::numeric_limits<int>::min();
4315
4316 if (!MFI.isCalleeSavedInfoValid())
4317 return false;
4318
4319 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
4320 for (auto &CS : CSI) {
4321 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
4322 AArch64::PPRRegClass.contains(CS.getReg())) {
4323 assert((Max == std::numeric_limits<int>::min() ||
4324 Max + 1 == CS.getFrameIdx()) &&
4325 "SVE CalleeSaves are not consecutive");
4326
4327 Min = std::min(Min, CS.getFrameIdx());
4328 Max = std::max(Max, CS.getFrameIdx());
4329 }
4330 }
4331 return Min != std::numeric_limits<int>::max();
4332}
4333
4334// Process all the SVE stack objects and determine offsets for each
4335// object. If AssignOffsets is true, the offsets get assigned.
4336// Fills in the first and last callee-saved frame indices into
4337// Min/MaxCSFrameIndex, respectively.
4338// Returns the size of the stack.
4340 int &MinCSFrameIndex,
4341 int &MaxCSFrameIndex,
4342 bool AssignOffsets) {
4343#ifndef NDEBUG
4344 // First process all fixed stack objects.
4345 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
4347 "SVE vectors should never be passed on the stack by value, only by "
4348 "reference.");
4349#endif
4350
4351 auto Assign = [&MFI](int FI, int64_t Offset) {
4352 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
4353 MFI.setObjectOffset(FI, Offset);
4354 };
4355
4356 int64_t Offset = 0;
4357
4358 // Then process all callee saved slots.
4359 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
4360 // Assign offsets to the callee save slots.
4361 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
4362 Offset += MFI.getObjectSize(I);
4364 if (AssignOffsets)
4365 Assign(I, -Offset);
4366 }
4367 }
4368
4369 // Ensure that the Callee-save area is aligned to 16bytes.
4370 Offset = alignTo(Offset, Align(16U));
4371
4372 // Create a buffer of SVE objects to allocate and sort it.
4373 SmallVector<int, 8> ObjectsToAllocate;
4374 // If we have a stack protector, and we've previously decided that we have SVE
4375 // objects on the stack and thus need it to go in the SVE stack area, then it
4376 // needs to go first.
4377 int StackProtectorFI = -1;
4378 if (MFI.hasStackProtectorIndex()) {
4379 StackProtectorFI = MFI.getStackProtectorIndex();
4380 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
4381 ObjectsToAllocate.push_back(StackProtectorFI);
4382 }
4383 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
4384 unsigned StackID = MFI.getStackID(I);
4385 if (StackID != TargetStackID::ScalableVector)
4386 continue;
4387 if (I == StackProtectorFI)
4388 continue;
4389 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
4390 continue;
4391 if (MFI.isDeadObjectIndex(I))
4392 continue;
4393
4394 ObjectsToAllocate.push_back(I);
4395 }
4396
4397 // Allocate all SVE locals and spills
4398 for (unsigned FI : ObjectsToAllocate) {
4399 Align Alignment = MFI.getObjectAlign(FI);
4400 // FIXME: Given that the length of SVE vectors is not necessarily a power of
4401 // two, we'd need to align every object dynamically at runtime if the
4402 // alignment is larger than 16. This is not yet supported.
4403 if (Alignment > Align(16))
4405 "Alignment of scalable vectors > 16 bytes is not yet supported");
4406
4407 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
4408 if (AssignOffsets)
4409 Assign(FI, -Offset);
4410 }
4411
4412 return Offset;
4413}
4414
4415int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
4416 MachineFrameInfo &MFI) const {
4417 int MinCSFrameIndex, MaxCSFrameIndex;
4418 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
4419}
4420
4421int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
4422 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
4423 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
4424 true);
4425}
4426
4427/// Attempts to scavenge a register from \p ScavengeableRegs given the used
4428/// registers in \p UsedRegs.
4431 Register PreferredReg) {
4432 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
4433 return PreferredReg;
4434 for (auto Reg : ScavengeableRegs.set_bits()) {
4435 if (UsedRegs.available(Reg))
4436 return Reg;
4437 }
4438 return AArch64::NoRegister;
4439}
4440
4441/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
4442/// \p MachineInstrs.
4443static void propagateFrameFlags(MachineInstr &SourceMI,
4444 ArrayRef<MachineInstr *> MachineInstrs) {
4445 for (MachineInstr *MI : MachineInstrs) {
4446 if (SourceMI.getFlag(MachineInstr::FrameSetup))
4447 MI->setFlag(MachineInstr::FrameSetup);
4448 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
4450 }
4451}
4452
4453/// RAII helper class for scavenging or spilling a register. On construction
4454/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
4455/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
4456/// MaybeSpillFI to free a register. The free'd register is returned via the \p
4457/// FreeReg output parameter. On destruction, if there is a spill, its previous
4458/// value is reloaded. The spilling and scavenging is only valid at the
4459/// insertion point \p MBBI, this class should _not_ be used in places that
4460/// create or manipulate basic blocks, moving the expected insertion point.
4464
4467 Register SpillCandidate, const TargetRegisterClass &RC,
4468 LiveRegUnits const &UsedRegs,
4469 BitVector const &AllocatableRegs,
4470 std::optional<int> *MaybeSpillFI,
4471 Register PreferredReg = AArch64::NoRegister)
4472 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
4473 *MF.getSubtarget().getInstrInfo())),
4474 TRI(*MF.getSubtarget().getRegisterInfo()) {
4475 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
4476 if (FreeReg != AArch64::NoRegister)
4477 return;
4478 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
4479 "(attempted to spill in prologue/epilogue?)");
4480 if (!MaybeSpillFI->has_value()) {
4481 MachineFrameInfo &MFI = MF.getFrameInfo();
4482 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
4483 TRI.getSpillAlign(RC));
4484 }
4485 FreeReg = SpillCandidate;
4486 SpillFI = MaybeSpillFI->value();
4487 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
4488 Register());
4489 }
4490
4491 bool hasSpilled() const { return SpillFI.has_value(); }
4492
4493 /// Returns the free register (found from scavenging or spilling a register).
4494 Register freeRegister() const { return FreeReg; }
4495
4496 Register operator*() const { return freeRegister(); }
4497
4499 if (hasSpilled())
4500 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
4501 Register());
4502 }
4503
4504private:
4507 const TargetRegisterClass &RC;
4508 const AArch64InstrInfo &TII;
4509 const TargetRegisterInfo &TRI;
4510 Register FreeReg = AArch64::NoRegister;
4511 std::optional<int> SpillFI;
4512};
4513
4514/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
4515/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4517 std::optional<int> ZPRSpillFI;
4518 std::optional<int> PPRSpillFI;
4519 std::optional<int> GPRSpillFI;
4520};
4521
4522/// Registers available for scavenging (ZPR, PPR3b, GPR).
4527};
4528
4530 return MI.getFlag(MachineInstr::FrameSetup) ||
4532}
4533
4534/// Expands:
4535/// ```
4536/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
4537/// ```
4538/// To:
4539/// ```
4540/// $z0 = CPY_ZPzI_B $p0, 1, 0
4541/// STR_ZXI $z0, $stack.0, 0
4542/// ```
4543/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4544/// spilling if necessary).
4547 const TargetRegisterInfo &TRI,
4548 LiveRegUnits const &UsedRegs,
4549 ScavengeableRegs const &SR,
4550 EmergencyStackSlots &SpillSlots) {
4551 MachineFunction &MF = *MBB.getParent();
4552 auto *TII =
4553 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4554
4555 ScopedScavengeOrSpill ZPredReg(
4556 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4557 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4558
4559 SmallVector<MachineInstr *, 2> MachineInstrs;
4560 const DebugLoc &DL = MI.getDebugLoc();
4561 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
4562 .addReg(*ZPredReg, RegState::Define)
4563 .add(MI.getOperand(0))
4564 .addImm(1)
4565 .addImm(0)
4566 .getInstr());
4567 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
4568 .addReg(*ZPredReg)
4569 .add(MI.getOperand(1))
4570 .addImm(MI.getOperand(2).getImm())
4571 .setMemRefs(MI.memoperands())
4572 .getInstr());
4573 propagateFrameFlags(MI, MachineInstrs);
4574}
4575
4576/// Expands:
4577/// ```
4578/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
4579/// ```
4580/// To:
4581/// ```
4582/// $z0 = LDR_ZXI %stack.0, 0
4583/// $p0 = PTRUE_B 31, implicit $vg
4584/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
4585/// ```
4586/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
4587/// spilling if necessary). If the status flags are in use at the point of
4588/// expansion they are preserved (by moving them to/from a GPR). This may cause
4589/// an additional spill if no GPR is free at the expansion point.
4592 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
4593 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
4594 MachineFunction &MF = *MBB.getParent();
4595 auto *TII =
4596 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
4597
4598 ScopedScavengeOrSpill ZPredReg(
4599 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
4600 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
4601
4602 ScopedScavengeOrSpill PredReg(
4603 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
4604 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
4605 /*PreferredReg=*/
4606 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
4607
4608 // Elide NZCV spills if we know it is not used.
4609 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
4610 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
4611 if (IsNZCVUsed)
4612 NZCVSaveReg.emplace(
4613 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
4614 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
4615 SmallVector<MachineInstr *, 4> MachineInstrs;
4616 const DebugLoc &DL = MI.getDebugLoc();
4617 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
4618 .addReg(*ZPredReg, RegState::Define)
4619 .add(MI.getOperand(1))
4620 .addImm(MI.getOperand(2).getImm())
4621 .setMemRefs(MI.memoperands())
4622 .getInstr());
4623 if (IsNZCVUsed)
4624 MachineInstrs.push_back(
4625 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
4626 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
4627 .addImm(AArch64SysReg::NZCV)
4628 .addReg(AArch64::NZCV, RegState::Implicit)
4629 .getInstr());
4630
4631 // Reuse previous ptrue if we know it has not been clobbered.
4632 if (LastPTrue) {
4633 assert(*PredReg == LastPTrue->getOperand(0).getReg());
4634 LastPTrue->moveBefore(&MI);
4635 } else {
4636 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
4637 .addReg(*PredReg, RegState::Define)
4638 .addImm(31);
4639 }
4640 MachineInstrs.push_back(LastPTrue);
4641 MachineInstrs.push_back(
4642 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
4643 .addReg(MI.getOperand(0).getReg(), RegState::Define)
4644 .addReg(*PredReg)
4645 .addReg(*ZPredReg)
4646 .addImm(0)
4647 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4648 .getInstr());
4649 if (IsNZCVUsed)
4650 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
4651 .addImm(AArch64SysReg::NZCV)
4652 .addReg(NZCVSaveReg->freeRegister())
4653 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
4654 .getInstr());
4655
4656 propagateFrameFlags(MI, MachineInstrs);
4657 return PredReg.hasSpilled();
4658}
4659
4660/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
4661/// operations within the MachineBasicBlock \p MBB.
4663 const TargetRegisterInfo &TRI,
4664 ScavengeableRegs const &SR,
4665 EmergencyStackSlots &SpillSlots) {
4666 LiveRegUnits UsedRegs(TRI);
4667 UsedRegs.addLiveOuts(MBB);
4668 bool HasPPRSpills = false;
4669 MachineInstr *LastPTrue = nullptr;
4671 UsedRegs.stepBackward(MI);
4672 switch (MI.getOpcode()) {
4673 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
4674 if (LastPTrue &&
4675 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
4676 LastPTrue = nullptr;
4677 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
4678 LastPTrue, SpillSlots);
4679 MI.eraseFromParent();
4680 break;
4681 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
4682 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
4683 MI.eraseFromParent();
4684 [[fallthrough]];
4685 default:
4686 LastPTrue = nullptr;
4687 break;
4688 }
4689 }
4690
4691 return HasPPRSpills;
4692}
4693
4695 MachineFunction &MF, RegScavenger *RS) const {
4696
4698 const TargetSubtargetInfo &TSI = MF.getSubtarget();
4699 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
4700
4701 // If predicates spills are 16-bytes we may need to expand
4702 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
4703 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
4704 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
4705 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
4706 assert(Regs.count() > 0 && "Expected scavengeable registers");
4707 return Regs;
4708 };
4709
4710 ScavengeableRegs SR{};
4711 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
4712 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
4713 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
4714 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
4715
4716 EmergencyStackSlots SpillSlots;
4717 for (MachineBasicBlock &MBB : MF) {
4718 // In the case we had to spill a predicate (in the range p0-p7) to reload
4719 // a predicate (>= p8), additional spill/fill pseudos will be created.
4720 // These need an additional expansion pass. Note: There will only be at
4721 // most two expansion passes, as spilling/filling a predicate in the range
4722 // p0-p7 never requires spilling another predicate.
4723 for (int Pass = 0; Pass < 2; Pass++) {
4724 bool HasPPRSpills =
4725 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
4726 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
4727 if (!HasPPRSpills)
4728 break;
4729 }
4730 }
4731 }
4732
4733 MachineFrameInfo &MFI = MF.getFrameInfo();
4734
4736 "Upwards growing stack unsupported");
4737
4738 int MinCSFrameIndex, MaxCSFrameIndex;
4739 int64_t SVEStackSize =
4740 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
4741
4742 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
4743 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
4744
4745 // If this function isn't doing Win64-style C++ EH, we don't need to do
4746 // anything.
4747 if (!MF.hasEHFunclets())
4748 return;
4749
4750 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
4751 // object area right next to the UnwindHelp object.
4752 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4753 int64_t CurrentOffset =
4755 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
4756 for (WinEHHandlerType &H : TBME.HandlerArray) {
4757 int FrameIndex = H.CatchObj.FrameIndex;
4758 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
4759 CurrentOffset =
4760 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
4761 CurrentOffset += MFI.getObjectSize(FrameIndex);
4762 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
4763 }
4764 }
4765 }
4766
4767 // Create an UnwindHelp object.
4768 // The UnwindHelp object is allocated at the start of the fixed object area
4769 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
4770 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
4771 /*IsFunclet*/ false) &&
4772 "UnwindHelpOffset must be at the start of the fixed object area");
4773 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
4774 /*IsImmutable=*/false);
4775 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4776
4777 MachineBasicBlock &MBB = MF.front();
4778 auto MBBI = MBB.begin();
4779 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4780 ++MBBI;
4781
4782 // We need to store -2 into the UnwindHelp object at the start of the
4783 // function.
4784 DebugLoc DL;
4786 RS->backward(MBBI);
4787 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4788 assert(DstReg && "There must be a free register after frame setup");
4790 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4791 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4792 .addReg(DstReg, getKillRegState(true))
4793 .addFrameIndex(UnwindHelpFI)
4794 .addImm(0);
4795}
4796
4797namespace {
4798struct TagStoreInstr {
4800 int64_t Offset, Size;
4801 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4802 : MI(MI), Offset(Offset), Size(Size) {}
4803};
4804
4805class TagStoreEdit {
4806 MachineFunction *MF;
4809 // Tag store instructions that are being replaced.
4811 // Combined memref arguments of the above instructions.
4813
4814 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4815 // FrameRegOffset + Size) with the address tag of SP.
4816 Register FrameReg;
4817 StackOffset FrameRegOffset;
4818 int64_t Size;
4819 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4820 // end.
4821 std::optional<int64_t> FrameRegUpdate;
4822 // MIFlags for any FrameReg updating instructions.
4823 unsigned FrameRegUpdateFlags;
4824
4825 // Use zeroing instruction variants.
4826 bool ZeroData;
4827 DebugLoc DL;
4828
4829 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4830 void emitLoop(MachineBasicBlock::iterator InsertI);
4831
4832public:
4833 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4834 : MBB(MBB), ZeroData(ZeroData) {
4835 MF = MBB->getParent();
4836 MRI = &MF->getRegInfo();
4837 }
4838 // Add an instruction to be replaced. Instructions must be added in the
4839 // ascending order of Offset, and have to be adjacent.
4840 void addInstruction(TagStoreInstr I) {
4841 assert((TagStores.empty() ||
4842 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4843 "Non-adjacent tag store instructions.");
4844 TagStores.push_back(I);
4845 }
4846 void clear() { TagStores.clear(); }
4847 // Emit equivalent code at the given location, and erase the current set of
4848 // instructions. May skip if the replacement is not profitable. May invalidate
4849 // the input iterator and replace it with a valid one.
4850 void emitCode(MachineBasicBlock::iterator &InsertI,
4851 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4852};
4853
4854void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4855 const AArch64InstrInfo *TII =
4856 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4857
4858 const int64_t kMinOffset = -256 * 16;
4859 const int64_t kMaxOffset = 255 * 16;
4860
4861 Register BaseReg = FrameReg;
4862 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4863 if (BaseRegOffsetBytes < kMinOffset ||
4864 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4865 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4866 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4867 // is required for the offset of ST2G.
4868 BaseRegOffsetBytes % 16 != 0) {
4869 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4870 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4871 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4872 BaseReg = ScratchReg;
4873 BaseRegOffsetBytes = 0;
4874 }
4875
4876 MachineInstr *LastI = nullptr;
4877 while (Size) {
4878 int64_t InstrSize = (Size > 16) ? 32 : 16;
4879 unsigned Opcode =
4880 InstrSize == 16
4881 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4882 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4883 assert(BaseRegOffsetBytes % 16 == 0);
4884 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4885 .addReg(AArch64::SP)
4886 .addReg(BaseReg)
4887 .addImm(BaseRegOffsetBytes / 16)
4888 .setMemRefs(CombinedMemRefs);
4889 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4890 // final SP adjustment in the epilogue.
4891 if (BaseRegOffsetBytes == 0)
4892 LastI = I;
4893 BaseRegOffsetBytes += InstrSize;
4894 Size -= InstrSize;
4895 }
4896
4897 if (LastI)
4898 MBB->splice(InsertI, MBB, LastI);
4899}
4900
4901void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4902 const AArch64InstrInfo *TII =
4903 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4904
4905 Register BaseReg = FrameRegUpdate
4906 ? FrameReg
4907 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4908 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4909
4910 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4911
4912 int64_t LoopSize = Size;
4913 // If the loop size is not a multiple of 32, split off one 16-byte store at
4914 // the end to fold BaseReg update into.
4915 if (FrameRegUpdate && *FrameRegUpdate)
4916 LoopSize -= LoopSize % 32;
4917 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4918 TII->get(ZeroData ? AArch64::STZGloop_wback
4919 : AArch64::STGloop_wback))
4920 .addDef(SizeReg)
4921 .addDef(BaseReg)
4922 .addImm(LoopSize)
4923 .addReg(BaseReg)
4924 .setMemRefs(CombinedMemRefs);
4925 if (FrameRegUpdate)
4926 LoopI->setFlags(FrameRegUpdateFlags);
4927
4928 int64_t ExtraBaseRegUpdate =
4929 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4930 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
4931 << ", Size=" << Size
4932 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
4933 << ", FrameRegUpdate=" << FrameRegUpdate
4934 << ", FrameRegOffset.getFixed()="
4935 << FrameRegOffset.getFixed() << "\n");
4936 if (LoopSize < Size) {
4937 assert(FrameRegUpdate);
4938 assert(Size - LoopSize == 16);
4939 // Tag 16 more bytes at BaseReg and update BaseReg.
4940 int64_t STGOffset = ExtraBaseRegUpdate + 16;
4941 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
4942 "STG immediate out of range");
4943 BuildMI(*MBB, InsertI, DL,
4944 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4945 .addDef(BaseReg)
4946 .addReg(BaseReg)
4947 .addReg(BaseReg)
4948 .addImm(STGOffset / 16)
4949 .setMemRefs(CombinedMemRefs)
4950 .setMIFlags(FrameRegUpdateFlags);
4951 } else if (ExtraBaseRegUpdate) {
4952 // Update BaseReg.
4953 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
4954 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
4955 BuildMI(
4956 *MBB, InsertI, DL,
4957 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4958 .addDef(BaseReg)
4959 .addReg(BaseReg)
4960 .addImm(AddSubOffset)
4961 .addImm(0)
4962 .setMIFlags(FrameRegUpdateFlags);
4963 }
4964}
4965
4966// Check if *II is a register update that can be merged into STGloop that ends
4967// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4968// end of the loop.
4969bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4970 int64_t Size, int64_t *TotalOffset) {
4971 MachineInstr &MI = *II;
4972 if ((MI.getOpcode() == AArch64::ADDXri ||
4973 MI.getOpcode() == AArch64::SUBXri) &&
4974 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4975 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4976 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4977 if (MI.getOpcode() == AArch64::SUBXri)
4978 Offset = -Offset;
4979 int64_t PostOffset = Offset - Size;
4980 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
4981 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
4982 // chosen depends on the alignment of the loop size, but the difference
4983 // between the valid ranges for the two instructions is small, so we
4984 // conservatively assume that it could be either case here.
4985 //
4986 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
4987 // instruction.
4988 const int64_t kMaxOffset = 4080 - 16;
4989 // Max offset of SUBXri.
4990 const int64_t kMinOffset = -4095;
4991 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
4992 PostOffset % 16 == 0) {
4993 *TotalOffset = Offset;
4994 return true;
4995 }
4996 }
4997 return false;
4998}
4999
5000void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
5002 MemRefs.clear();
5003 for (auto &TS : TSE) {
5004 MachineInstr *MI = TS.MI;
5005 // An instruction without memory operands may access anything. Be
5006 // conservative and return an empty list.
5007 if (MI->memoperands_empty()) {
5008 MemRefs.clear();
5009 return;
5010 }
5011 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
5012 }
5013}
5014
5015void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
5016 const AArch64FrameLowering *TFI,
5017 bool TryMergeSPUpdate) {
5018 if (TagStores.empty())
5019 return;
5020 TagStoreInstr &FirstTagStore = TagStores[0];
5021 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
5022 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
5023 DL = TagStores[0].MI->getDebugLoc();
5024
5025 Register Reg;
5026 FrameRegOffset = TFI->resolveFrameOffsetReference(
5027 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
5028 /*PreferFP=*/false, /*ForSimm=*/true);
5029 FrameReg = Reg;
5030 FrameRegUpdate = std::nullopt;
5031
5032 mergeMemRefs(TagStores, CombinedMemRefs);
5033
5034 LLVM_DEBUG({
5035 dbgs() << "Replacing adjacent STG instructions:\n";
5036 for (const auto &Instr : TagStores) {
5037 dbgs() << " " << *Instr.MI;
5038 }
5039 });
5040
5041 // Size threshold where a loop becomes shorter than a linear sequence of
5042 // tagging instructions.
5043 const int kSetTagLoopThreshold = 176;
5044 if (Size < kSetTagLoopThreshold) {
5045 if (TagStores.size() < 2)
5046 return;
5047 emitUnrolled(InsertI);
5048 } else {
5049 MachineInstr *UpdateInstr = nullptr;
5050 int64_t TotalOffset = 0;
5051 if (TryMergeSPUpdate) {
5052 // See if we can merge base register update into the STGloop.
5053 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
5054 // but STGloop is way too unusual for that, and also it only
5055 // realistically happens in function epilogue. Also, STGloop is expanded
5056 // before that pass.
5057 if (InsertI != MBB->end() &&
5058 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
5059 &TotalOffset)) {
5060 UpdateInstr = &*InsertI++;
5061 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
5062 << *UpdateInstr);
5063 }
5064 }
5065
5066 if (!UpdateInstr && TagStores.size() < 2)
5067 return;
5068
5069 if (UpdateInstr) {
5070 FrameRegUpdate = TotalOffset;
5071 FrameRegUpdateFlags = UpdateInstr->getFlags();
5072 }
5073 emitLoop(InsertI);
5074 if (UpdateInstr)
5075 UpdateInstr->eraseFromParent();
5076 }
5077
5078 for (auto &TS : TagStores)
5079 TS.MI->eraseFromParent();
5080}
5081
5082bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
5083 int64_t &Size, bool &ZeroData) {
5084 MachineFunction &MF = *MI.getParent()->getParent();
5085 const MachineFrameInfo &MFI = MF.getFrameInfo();
5086
5087 unsigned Opcode = MI.getOpcode();
5088 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
5089 Opcode == AArch64::STZ2Gi);
5090
5091 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
5092 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
5093 return false;
5094 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
5095 return false;
5096 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
5097 Size = MI.getOperand(2).getImm();
5098 return true;
5099 }
5100
5101 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
5102 Size = 16;
5103 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
5104 Size = 32;
5105 else
5106 return false;
5107
5108 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
5109 return false;
5110
5111 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
5112 16 * MI.getOperand(2).getImm();
5113 return true;
5114}
5115
5116// Detect a run of memory tagging instructions for adjacent stack frame slots,
5117// and replace them with a shorter instruction sequence:
5118// * replace STG + STG with ST2G
5119// * replace STGloop + STGloop with STGloop
5120// This code needs to run when stack slot offsets are already known, but before
5121// FrameIndex operands in STG instructions are eliminated.
5123 const AArch64FrameLowering *TFI,
5124 RegScavenger *RS) {
5125 bool FirstZeroData;
5126 int64_t Size, Offset;
5127 MachineInstr &MI = *II;
5128 MachineBasicBlock *MBB = MI.getParent();
5130 if (&MI == &MBB->instr_back())
5131 return II;
5132 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
5133 return II;
5134
5136 Instrs.emplace_back(&MI, Offset, Size);
5137
5138 constexpr int kScanLimit = 10;
5139 int Count = 0;
5141 NextI != E && Count < kScanLimit; ++NextI) {
5142 MachineInstr &MI = *NextI;
5143 bool ZeroData;
5144 int64_t Size, Offset;
5145 // Collect instructions that update memory tags with a FrameIndex operand
5146 // and (when applicable) constant size, and whose output registers are dead
5147 // (the latter is almost always the case in practice). Since these
5148 // instructions effectively have no inputs or outputs, we are free to skip
5149 // any non-aliasing instructions in between without tracking used registers.
5150 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
5151 if (ZeroData != FirstZeroData)
5152 break;
5153 Instrs.emplace_back(&MI, Offset, Size);
5154 continue;
5155 }
5156
5157 // Only count non-transient, non-tagging instructions toward the scan
5158 // limit.
5159 if (!MI.isTransient())
5160 ++Count;
5161
5162 // Just in case, stop before the epilogue code starts.
5163 if (MI.getFlag(MachineInstr::FrameSetup) ||
5165 break;
5166
5167 // Reject anything that may alias the collected instructions.
5168 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
5169 break;
5170 }
5171
5172 // New code will be inserted after the last tagging instruction we've found.
5173 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
5174
5175 // All the gathered stack tag instructions are merged and placed after
5176 // last tag store in the list. The check should be made if the nzcv
5177 // flag is live at the point where we are trying to insert. Otherwise
5178 // the nzcv flag might get clobbered if any stg loops are present.
5179
5180 // FIXME : This approach of bailing out from merge is conservative in
5181 // some ways like even if stg loops are not present after merge the
5182 // insert list, this liveness check is done (which is not needed).
5184 LiveRegs.addLiveOuts(*MBB);
5185 for (auto I = MBB->rbegin();; ++I) {
5186 MachineInstr &MI = *I;
5187 if (MI == InsertI)
5188 break;
5189 LiveRegs.stepBackward(*I);
5190 }
5191 InsertI++;
5192 if (LiveRegs.contains(AArch64::NZCV))
5193 return InsertI;
5194
5195 llvm::stable_sort(Instrs,
5196 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
5197 return Left.Offset < Right.Offset;
5198 });
5199
5200 // Make sure that we don't have any overlapping stores.
5201 int64_t CurOffset = Instrs[0].Offset;
5202 for (auto &Instr : Instrs) {
5203 if (CurOffset > Instr.Offset)
5204 return NextI;
5205 CurOffset = Instr.Offset + Instr.Size;
5206 }
5207
5208 // Find contiguous runs of tagged memory and emit shorter instruction
5209 // sequences for them when possible.
5210 TagStoreEdit TSE(MBB, FirstZeroData);
5211 std::optional<int64_t> EndOffset;
5212 for (auto &Instr : Instrs) {
5213 if (EndOffset && *EndOffset != Instr.Offset) {
5214 // Found a gap.
5215 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
5216 TSE.clear();
5217 }
5218
5219 TSE.addInstruction(Instr);
5220 EndOffset = Instr.Offset + Instr.Size;
5221 }
5222
5223 const MachineFunction *MF = MBB->getParent();
5224 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
5225 TSE.emitCode(
5226 InsertI, TFI, /*TryMergeSPUpdate = */
5228
5229 return InsertI;
5230}
5231} // namespace
5232
5234 MachineFunction &MF, RegScavenger *RS = nullptr) const {
5235 for (auto &BB : MF)
5236 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
5238 II = tryMergeAdjacentSTG(II, this, RS);
5239 }
5240
5241 // By the time this method is called, most of the prologue/epilogue code is
5242 // already emitted, whether its location was affected by the shrink-wrapping
5243 // optimization or not.
5244 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
5247}
5248
5249/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
5250/// before the update. This is easily retrieved as it is exactly the offset
5251/// that is set in processFunctionBeforeFrameFinalized.
5253 const MachineFunction &MF, int FI, Register &FrameReg,
5254 bool IgnoreSPUpdates) const {
5255 const MachineFrameInfo &MFI = MF.getFrameInfo();
5256 if (IgnoreSPUpdates) {
5257 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
5258 << MFI.getObjectOffset(FI) << "\n");
5259 FrameReg = AArch64::SP;
5260 return StackOffset::getFixed(MFI.getObjectOffset(FI));
5261 }
5262
5263 // Go to common code if we cannot provide sp + offset.
5264 if (MFI.hasVarSizedObjects() ||
5267 return getFrameIndexReference(MF, FI, FrameReg);
5268
5269 FrameReg = AArch64::SP;
5270 return getStackOffset(MF, MFI.getObjectOffset(FI));
5271}
5272
5273/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
5274/// the parent's frame pointer
5276 const MachineFunction &MF) const {
5277 return 0;
5278}
5279
5280/// Funclets only need to account for space for the callee saved registers,
5281/// as the locals are accounted for in the parent's stack frame.
5283 const MachineFunction &MF) const {
5284 // This is the size of the pushed CSRs.
5285 unsigned CSSize =
5286 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
5287 // This is the amount of stack a funclet needs to allocate.
5288 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
5289 getStackAlign());
5290}
5291
5292namespace {
5293struct FrameObject {
5294 bool IsValid = false;
5295 // Index of the object in MFI.
5296 int ObjectIndex = 0;
5297 // Group ID this object belongs to.
5298 int GroupIndex = -1;
5299 // This object should be placed first (closest to SP).
5300 bool ObjectFirst = false;
5301 // This object's group (which always contains the object with
5302 // ObjectFirst==true) should be placed first.
5303 bool GroupFirst = false;
5304
5305 // Used to distinguish between FP and GPR accesses. The values are decided so
5306 // that they sort FPR < Hazard < GPR and they can be or'd together.
5307 unsigned Accesses = 0;
5308 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
5309};
5310
5311class GroupBuilder {
5312 SmallVector<int, 8> CurrentMembers;
5313 int NextGroupIndex = 0;
5314 std::vector<FrameObject> &Objects;
5315
5316public:
5317 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
5318 void AddMember(int Index) { CurrentMembers.push_back(Index); }
5319 void EndCurrentGroup() {
5320 if (CurrentMembers.size() > 1) {
5321 // Create a new group with the current member list. This might remove them
5322 // from their pre-existing groups. That's OK, dealing with overlapping
5323 // groups is too hard and unlikely to make a difference.
5324 LLVM_DEBUG(dbgs() << "group:");
5325 for (int Index : CurrentMembers) {
5326 Objects[Index].GroupIndex = NextGroupIndex;
5327 LLVM_DEBUG(dbgs() << " " << Index);
5328 }
5329 LLVM_DEBUG(dbgs() << "\n");
5330 NextGroupIndex++;
5331 }
5332 CurrentMembers.clear();
5333 }
5334};
5335
5336bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
5337 // Objects at a lower index are closer to FP; objects at a higher index are
5338 // closer to SP.
5339 //
5340 // For consistency in our comparison, all invalid objects are placed
5341 // at the end. This also allows us to stop walking when we hit the
5342 // first invalid item after it's all sorted.
5343 //
5344 // If we want to include a stack hazard region, order FPR accesses < the
5345 // hazard object < GPRs accesses in order to create a separation between the
5346 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
5347 //
5348 // Otherwise the "first" object goes first (closest to SP), followed by the
5349 // members of the "first" group.
5350 //
5351 // The rest are sorted by the group index to keep the groups together.
5352 // Higher numbered groups are more likely to be around longer (i.e. untagged
5353 // in the function epilogue and not at some earlier point). Place them closer
5354 // to SP.
5355 //
5356 // If all else equal, sort by the object index to keep the objects in the
5357 // original order.
5358 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
5359 A.GroupIndex, A.ObjectIndex) <
5360 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
5361 B.GroupIndex, B.ObjectIndex);
5362}
5363} // namespace
5364
5366 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
5367 if (!OrderFrameObjects || ObjectsToAllocate.empty())
5368 return;
5369
5371 const MachineFrameInfo &MFI = MF.getFrameInfo();
5372 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
5373 for (auto &Obj : ObjectsToAllocate) {
5374 FrameObjects[Obj].IsValid = true;
5375 FrameObjects[Obj].ObjectIndex = Obj;
5376 }
5377
5378 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
5379 // the same time.
5380 GroupBuilder GB(FrameObjects);
5381 for (auto &MBB : MF) {
5382 for (auto &MI : MBB) {
5383 if (MI.isDebugInstr())
5384 continue;
5385
5386 if (AFI.hasStackHazardSlotIndex()) {
5387 std::optional<int> FI = getLdStFrameID(MI, MFI);
5388 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
5389 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
5391 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
5392 else
5393 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
5394 }
5395 }
5396
5397 int OpIndex;
5398 switch (MI.getOpcode()) {
5399 case AArch64::STGloop:
5400 case AArch64::STZGloop:
5401 OpIndex = 3;
5402 break;
5403 case AArch64::STGi:
5404 case AArch64::STZGi:
5405 case AArch64::ST2Gi:
5406 case AArch64::STZ2Gi:
5407 OpIndex = 1;
5408 break;
5409 default:
5410 OpIndex = -1;
5411 }
5412
5413 int TaggedFI = -1;
5414 if (OpIndex >= 0) {
5415 const MachineOperand &MO = MI.getOperand(OpIndex);
5416 if (MO.isFI()) {
5417 int FI = MO.getIndex();
5418 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
5419 FrameObjects[FI].IsValid)
5420 TaggedFI = FI;
5421 }
5422 }
5423
5424 // If this is a stack tagging instruction for a slot that is not part of a
5425 // group yet, either start a new group or add it to the current one.
5426 if (TaggedFI >= 0)
5427 GB.AddMember(TaggedFI);
5428 else
5429 GB.EndCurrentGroup();
5430 }
5431 // Groups should never span multiple basic blocks.
5432 GB.EndCurrentGroup();
5433 }
5434
5435 if (AFI.hasStackHazardSlotIndex()) {
5436 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
5437 FrameObject::AccessHazard;
5438 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
5439 for (auto &Obj : FrameObjects)
5440 if (!Obj.Accesses ||
5441 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
5442 Obj.Accesses = FrameObject::AccessGPR;
5443 }
5444
5445 // If the function's tagged base pointer is pinned to a stack slot, we want to
5446 // put that slot first when possible. This will likely place it at SP + 0,
5447 // and save one instruction when generating the base pointer because IRG does
5448 // not allow an immediate offset.
5449 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
5450 if (TBPI) {
5451 FrameObjects[*TBPI].ObjectFirst = true;
5452 FrameObjects[*TBPI].GroupFirst = true;
5453 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
5454 if (FirstGroupIndex >= 0)
5455 for (FrameObject &Object : FrameObjects)
5456 if (Object.GroupIndex == FirstGroupIndex)
5457 Object.GroupFirst = true;
5458 }
5459
5460 llvm::stable_sort(FrameObjects, FrameObjectCompare);
5461
5462 int i = 0;
5463 for (auto &Obj : FrameObjects) {
5464 // All invalid items are sorted at the end, so it's safe to stop.
5465 if (!Obj.IsValid)
5466 break;
5467 ObjectsToAllocate[i++] = Obj.ObjectIndex;
5468 }
5469
5470 LLVM_DEBUG({
5471 dbgs() << "Final frame order:\n";
5472 for (auto &Obj : FrameObjects) {
5473 if (!Obj.IsValid)
5474 break;
5475 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
5476 if (Obj.ObjectFirst)
5477 dbgs() << ", first";
5478 if (Obj.GroupFirst)
5479 dbgs() << ", group-first";
5480 dbgs() << "\n";
5481 }
5482 });
5483}
5484
5485/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
5486/// least every ProbeSize bytes. Returns an iterator of the first instruction
5487/// after the loop. The difference between SP and TargetReg must be an exact
5488/// multiple of ProbeSize.
5490AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
5491 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
5492 Register TargetReg) const {
5494 MachineFunction &MF = *MBB.getParent();
5495 const AArch64InstrInfo *TII =
5496 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5498
5499 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
5501 MF.insert(MBBInsertPoint, LoopMBB);
5503 MF.insert(MBBInsertPoint, ExitMBB);
5504
5505 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
5506 // in SUB).
5507 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
5508 StackOffset::getFixed(-ProbeSize), TII,
5510 // STR XZR, [SP]
5511 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
5512 .addReg(AArch64::XZR)
5513 .addReg(AArch64::SP)
5514 .addImm(0)
5516 // CMP SP, TargetReg
5517 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
5518 AArch64::XZR)
5519 .addReg(AArch64::SP)
5520 .addReg(TargetReg)
5523 // B.CC Loop
5524 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
5526 .addMBB(LoopMBB)
5528
5529 LoopMBB->addSuccessor(ExitMBB);
5530 LoopMBB->addSuccessor(LoopMBB);
5531 // Synthesize the exit MBB.
5532 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
5534 MBB.addSuccessor(LoopMBB);
5535 // Update liveins.
5536 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
5537
5538 return ExitMBB->begin();
5539}
5540
5541void AArch64FrameLowering::inlineStackProbeFixed(
5542 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
5543 StackOffset CFAOffset) const {
5545 MachineFunction &MF = *MBB->getParent();
5546 const AArch64InstrInfo *TII =
5547 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
5549 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
5550 bool HasFP = hasFP(MF);
5551
5552 DebugLoc DL;
5553 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
5554 int64_t NumBlocks = FrameSize / ProbeSize;
5555 int64_t ResidualSize = FrameSize % ProbeSize;
5556
5557 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
5558 << NumBlocks << " blocks of " << ProbeSize
5559 << " bytes, plus " << ResidualSize << " bytes\n");
5560
5561 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
5562 // ordinary loop.
5563 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
5564 for (int i = 0; i < NumBlocks; ++i) {
5565 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
5566 // encodable in a SUB).
5567 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5568 StackOffset::getFixed(-ProbeSize), TII,
5569 MachineInstr::FrameSetup, false, false, nullptr,
5570 EmitAsyncCFI && !HasFP, CFAOffset);
5571 CFAOffset += StackOffset::getFixed(ProbeSize);
5572 // STR XZR, [SP]
5573 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5574 .addReg(AArch64::XZR)
5575 .addReg(AArch64::SP)
5576 .addImm(0)
5578 }
5579 } else if (NumBlocks != 0) {
5580 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
5581 // encodable in ADD). ScrathReg may temporarily become the CFA register.
5582 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
5583 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
5584 MachineInstr::FrameSetup, false, false, nullptr,
5585 EmitAsyncCFI && !HasFP, CFAOffset);
5586 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
5587 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
5588 MBB = MBBI->getParent();
5589 if (EmitAsyncCFI && !HasFP) {
5590 // Set the CFA register back to SP.
5592 .buildDefCFARegister(AArch64::SP);
5593 }
5594 }
5595
5596 if (ResidualSize != 0) {
5597 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
5598 // in SUB).
5599 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
5600 StackOffset::getFixed(-ResidualSize), TII,
5601 MachineInstr::FrameSetup, false, false, nullptr,
5602 EmitAsyncCFI && !HasFP, CFAOffset);
5603 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
5604 // STR XZR, [SP]
5605 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
5606 .addReg(AArch64::XZR)
5607 .addReg(AArch64::SP)
5608 .addImm(0)
5610 }
5611 }
5612}
5613
5614void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
5615 MachineBasicBlock &MBB) const {
5616 // Get the instructions that need to be replaced. We emit at most two of
5617 // these. Remember them in order to avoid complications coming from the need
5618 // to traverse the block while potentially creating more blocks.
5620 for (MachineInstr &MI : MBB)
5621 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
5622 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
5623 ToReplace.push_back(&MI);
5624
5625 for (MachineInstr *MI : ToReplace) {
5626 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
5627 Register ScratchReg = MI->getOperand(0).getReg();
5628 int64_t FrameSize = MI->getOperand(1).getImm();
5629 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
5630 MI->getOperand(3).getImm());
5631 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
5632 CFAOffset);
5633 } else {
5634 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
5635 "Stack probe pseudo-instruction expected");
5636 const AArch64InstrInfo *TII =
5637 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
5638 Register TargetReg = MI->getOperand(0).getReg();
5639 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
5640 }
5641 MI->eraseFromParent();
5642 }
5643}
5644
5647 NotAccessed = 0, // Stack object not accessed by load/store instructions.
5648 GPR = 1 << 0, // A general purpose register.
5649 PPR = 1 << 1, // A predicate register.
5650 FPR = 1 << 2, // A floating point/Neon/SVE register.
5651 };
5652
5653 int Idx;
5655 int64_t Size;
5656 unsigned AccessTypes;
5657
5658 StackAccess() : Idx(0), Offset(), Size(0), AccessTypes(NotAccessed) {}
5659
5660 bool operator<(const StackAccess &Rhs) const {
5661 return std::make_tuple(start(), Idx) <
5662 std::make_tuple(Rhs.start(), Rhs.Idx);
5663 }
5664
5665 bool isCPU() const {
5666 // Predicate register load and store instructions execute on the CPU.
5667 return AccessTypes & (AccessType::GPR | AccessType::PPR);
5668 }
5669 bool isSME() const { return AccessTypes & AccessType::FPR; }
5670 bool isMixed() const { return isCPU() && isSME(); }
5671
5672 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
5673 int64_t end() const { return start() + Size; }
5674
5675 std::string getTypeString() const {
5676 switch (AccessTypes) {
5677 case AccessType::FPR:
5678 return "FPR";
5679 case AccessType::PPR:
5680 return "PPR";
5681 case AccessType::GPR:
5682 return "GPR";
5683 case AccessType::NotAccessed:
5684 return "NA";
5685 default:
5686 return "Mixed";
5687 }
5688 }
5689
5690 void print(raw_ostream &OS) const {
5691 OS << getTypeString() << " stack object at [SP"
5692 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
5693 if (Offset.getScalable())
5694 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
5695 << " * vscale";
5696 OS << "]";
5697 }
5698};
5699
5700static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
5701 SA.print(OS);
5702 return OS;
5703}
5704
5705void AArch64FrameLowering::emitRemarks(
5706 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
5707
5708 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
5710 return;
5711
5712 unsigned StackHazardSize = getStackHazardSize(MF);
5713 const uint64_t HazardSize =
5714 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
5715
5716 if (HazardSize == 0)
5717 return;
5718
5719 const MachineFrameInfo &MFI = MF.getFrameInfo();
5720 // Bail if function has no stack objects.
5721 if (!MFI.hasStackObjects())
5722 return;
5723
5724 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
5725
5726 size_t NumFPLdSt = 0;
5727 size_t NumNonFPLdSt = 0;
5728
5729 // Collect stack accesses via Load/Store instructions.
5730 for (const MachineBasicBlock &MBB : MF) {
5731 for (const MachineInstr &MI : MBB) {
5732 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
5733 continue;
5734 for (MachineMemOperand *MMO : MI.memoperands()) {
5735 std::optional<int> FI = getMMOFrameID(MMO, MFI);
5736 if (FI && !MFI.isDeadObjectIndex(*FI)) {
5737 int FrameIdx = *FI;
5738
5739 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
5740 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
5741 StackAccesses[ArrIdx].Idx = FrameIdx;
5742 StackAccesses[ArrIdx].Offset =
5743 getFrameIndexReferenceFromSP(MF, FrameIdx);
5744 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
5745 }
5746
5747 unsigned RegTy = StackAccess::AccessType::GPR;
5748 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
5749 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
5750 // spill/fill the predicate as a data vector (so are an FPR access).
5751 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
5752 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
5753 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
5754 RegTy = StackAccess::PPR;
5755 } else
5756 RegTy = StackAccess::FPR;
5757 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5758 RegTy = StackAccess::FPR;
5759 }
5760
5761 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5762
5763 if (RegTy == StackAccess::FPR)
5764 ++NumFPLdSt;
5765 else
5766 ++NumNonFPLdSt;
5767 }
5768 }
5769 }
5770 }
5771
5772 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5773 return;
5774
5775 llvm::sort(StackAccesses);
5776 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
5778 });
5779
5782
5783 if (StackAccesses.front().isMixed())
5784 MixedObjects.push_back(&StackAccesses.front());
5785
5786 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5787 It != End; ++It) {
5788 const auto &First = *It;
5789 const auto &Second = *(It + 1);
5790
5791 if (Second.isMixed())
5792 MixedObjects.push_back(&Second);
5793
5794 if ((First.isSME() && Second.isCPU()) ||
5795 (First.isCPU() && Second.isSME())) {
5796 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5797 if (Distance < HazardSize)
5798 HazardPairs.emplace_back(&First, &Second);
5799 }
5800 }
5801
5802 auto EmitRemark = [&](llvm::StringRef Str) {
5803 ORE->emit([&]() {
5805 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5806 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5807 });
5808 };
5809
5810 for (const auto &P : HazardPairs)
5811 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5812
5813 for (const auto *Obj : MixedObjects)
5814 EmitRemark(
5815 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5816}
unsigned const MachineRegisterInfo * MRI
for(const MachineOperand &MO :llvm::drop_begin(OldMI.operands(), Desc.getNumOperands()))
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static void computeCalleeSaveRegisterPairs(MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static Register findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB, bool HasCall=false)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool needsWinCFI(const MachineFunction &MF)
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool requiresGetVGCall(MachineFunction &MF)
bool isVGInstruction(MachineBasicBlock::iterator MBBI, const TargetLowering &TLI)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static bool produceCompactUnwindFrame(MachineFunction &MF)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO, RTLIB::Libcall LC)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static void fixupCalleeSaveRestoreStackOffset(MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI, bool *HasWinCFI)
static bool requiresSaveVG(const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static bool windowsRequiresStackProbe(const MachineFunction &MF, uint64_t StackSizeInBytes)
static MachineBasicBlock::iterator convertCalleeSaveRestoreToSPPrePostIncDec(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc, bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI, MachineInstr::MIFlag FrameFlag=MachineInstr::FrameSetup, int CFAOffset=0)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static StackOffset getSVEStackSize(const MachineFunction &MF)
Returns the size of the entire SVE stackframe (calleesaves + spills).
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
static void getLivePhysRegsUpTo(MachineInstr &MI, const TargetRegisterInfo &TRI, LivePhysRegs &LiveRegs)
Collect live registers from the end of MI's parent up to (including) MI in LiveRegs.
static bool isLikelyToHaveSVEStack(const MachineFunction &MF)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
static bool IsSVECalleeSave(MachineBasicBlock::iterator I)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static void emitDefineCFAWithFP(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, unsigned FixedObject)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static StackOffset getFPOffset(const MachineFunction &MF, int64_t ObjectOffset)
static bool isTargetWindows(const MachineFunction &MF)
static bool shouldSignReturnAddressEverywhere(const MachineFunction &MF)
static StackOffset getStackOffset(const MachineFunction &MF, int64_t ObjectOffset)
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void emitShadowCallStackPrologue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI, bool NeedsUnwindInfo)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
static unsigned getFixedObjectSize(const MachineFunction &MF, const AArch64FunctionInfo *AFI, bool IsWin64, bool IsFunclet)
Returns the size of the fixed object area (allocated next to sp on entry) On Win64 this may include a...
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
Analysis containing CSE Info
Definition: CSEInfo.cpp:27
DXIL Forward Handle Accesses
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
uint32_t Index
uint64_t Size
bool End
Definition: ELF_riscv.cpp:480
static const HTTPClientCleanup Cleanup
Definition: HTTPClient.cpp:42
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
std::pair< Instruction::BinaryOps, Value * > OffsetOp
Find all possible pairs (BinOp, RHS) that BinOp V, RHS can be simplified.
static std::string getTypeString(Type *T)
Definition: LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
#define H(x, y, z)
Definition: MD5.cpp:57
Register const TargetRegisterInfo * TRI
uint64_t IntrinsicInst * II
#define P(N)
static const MCPhysReg FPR[]
FPR - The set of FP registers that should be allocated for arguments on Darwin and AIX.
if(PassOpts->AAPipeline)
This file declares the machine register scavenger class.
unsigned OpIndex
raw_pwrite_stream & OS
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition: Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
Definition: Statistic.h:167
#define LLVM_DEBUG(...)
Definition: Debug.h:119
static const unsigned FramePtr
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setPredicateRegForFillSpill(unsigned Reg)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setTaggedBasePointerOffset(unsigned Offset)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isSEHInstruction(const MachineInstr &MI)
Return true if the instructions is a SEH instruction used for unwinding on Windows.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
BitVector getReservedRegs(const MachineFunction &MF) const override
BitVector getUserReservedRegs(const MachineFunction &MF) const
bool hasBasePointer(const MachineFunction &MF) const
bool cannotEliminateFrame(const MachineFunction &MF) const
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
const Triple & getTargetTriple() const
const char * getChkStkName() const
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool isCallingConvWin64(CallingConv::ID CC, bool IsVarArg) const
bool swiftAsyncContextIsDynamicallySet() const
Return whether FrameLowering should always set the "extended frame present" bit in FP,...
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
bool supportSwiftError() const override
Return true if the target supports swifterror attribute.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:142
LLVM_ABI bool hasAttrSomewhere(Attribute::AttrKind Kind, unsigned *Index=nullptr) const
Return true if the specified attribute is set for at least one parameter or for the return value.
bool test(unsigned Idx) const
Definition: BitVector.h:461
BitVector & reset()
Definition: BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition: BitVector.h:162
BitVector & set()
Definition: BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition: BitVector.h:140
Helper class for creating CFI instructions and inserting them into MIR.
void buildEscape(StringRef Bytes, StringRef Comment="") const
void buildDefCFAOffset(int64_t Offset, MCSymbol *Label=nullptr) const
void buildRestore(MCRegister Reg) const
void buildDefCFARegister(MCRegister Reg) const
void insertCFIInst(const MCCFIInstruction &CFIInst) const
void buildDefCFA(MCRegister Reg, int64_t Offset) const
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition: DebugLoc.h:124
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition: Function.h:706
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition: Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition: Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition: Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition: Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition: Function.cpp:727
void copyPhysReg(MachineBasicBlock &MBB, MachineBasicBlock::iterator I, const DebugLoc &DL, Register DestReg, Register SrcReg, bool KillSrc, bool RenamableDest=false, bool RenamableSrc=false) const override
Emit instructions to copy a pair of physical registers.
void storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SrcReg, bool isKill, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Store the specified register of the given register class to the specified stack frame index.
void loadRegFromStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register DestReg, int FrameIndex, const TargetRegisterClass *RC, const TargetRegisterInfo *TRI, Register VReg, MachineInstr::MIFlag Flags=MachineInstr::NoFlags) const override
Load the specified register of the given register class from the specified stack frame index.
A set of physical registers with utility functions to track liveness when walking backward/forward th...
Definition: LivePhysRegs.h:52
A set of register units used to track register liveness.
Definition: LiveRegUnits.h:31
bool available(MCRegister Reg) const
Returns true if no part of physical register Reg is live.
Definition: LiveRegUnits.h:117
LLVM_ABI void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
LLVM_ABI void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition: MCAsmInfo.h:652
LLVM_ABI MCSymbol * createTempSymbol()
Create a temporary symbol with a unique name.
Definition: MCContext.cpp:386
Wrapper class representing physical registers. Should be passed by value.
Definition: MCRegister.h:33
MCSymbol - Instances of this class represent a symbol name in the MC file, and MCSymbols are created ...
Definition: MCSymbol.h:42
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
instr_iterator instr_begin()
iterator_range< livein_iterator > liveins() const
const BasicBlock * getBasicBlock() const
Return the LLVM basic block that this instance corresponded to originally.
bool isEHFuncletEntry() const
Returns true if this is the entry block of an EH funclet.
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
bool isReturnBlock() const
Convenience function that returns true if the block ends in a return instruction.
MachineInstr & instr_back()
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
LLVM_ABI DebugLoc findDebugLoc(instr_iterator MBBI)
Find the next valid DebugLoc starting at MBBI, skipping any debug instructions.
LLVM_ABI iterator getLastNonDebugInstr(bool SkipPseudoOp=true)
Returns an iterator to the last non-debug instruction in the basic block, or end().
instr_iterator instr_end()
void addLiveIn(MCRegister PhysReg, LaneBitmask LaneMask=LaneBitmask::getAll())
Adds the specified register as a live in.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
LLVM_ABI instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
LLVM_ABI bool isLiveIn(MCRegister Reg, LaneBitmask LaneMask=LaneBitmask::getAll()) const
Return true if the specified register is in the live in set.
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MCContext & getContext() const
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineBasicBlock - Allocate a new MachineBasicBlock.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
Definition: MachineInstr.h:72
void setFlags(unsigned flags)
Definition: MachineInstr.h:422
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
Definition: MachineInstr.h:409
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
Definition: MachineInstr.h:595
uint32_t getFlags() const
Return the MI flags bitvector.
Definition: MachineInstr.h:404
LLVM_ABI void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
bool isSymbol() const
isSymbol - Tests if this is a MO_ExternalSymbol operand.
static MachineOperand CreateImm(int64_t Val)
const char * getSymbolName() const
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
Diagnostic information for optimization analysis remarks.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition: ArrayRef.h:303
Pass interface - Implemented by all 'passes'.
Definition: Pass.h:99
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition: Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition: SetVector.h:168
A SetVector that performs no allocations if smaller than a certain size.
Definition: SetVector.h:356
bool empty() const
Definition: SmallVector.h:82
size_t size() const
Definition: SmallVector.h:79
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
Definition: SmallVector.h:574
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:938
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:684
void push_back(const T &Elt)
Definition: SmallVector.h:414
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1197
StackOffset holds a fixed and a scalable offset in bytes.
Definition: TypeSize.h:34
int64_t getFixed() const
Returns the fixed component of the stack.
Definition: TypeSize.h:50
int64_t getScalable() const
Returns the scalable component of the stack.
Definition: TypeSize.h:53
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition: TypeSize.h:45
static StackOffset getScalable(int64_t Scalable)
Definition: TypeSize.h:44
static StackOffset getFixed(int64_t Fixed)
Definition: TypeSize.h:43
StringRef - Represent a constant reference to a string, i.e.
Definition: StringRef.h:55
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
const char * getLibcallName(RTLIB::Libcall Call) const
Get the libcall routine name for the specified libcall.
This class defines information used to lower LLVM code to legal SelectionDAG operators that the targe...
Primary interface to the complete machine description for the target machine.
Definition: TargetMachine.h:83
TargetOptions Options
CodeModel::Model getCodeModel() const
Returns the code model.
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
const TargetRegisterClass * getMinimalPhysRegClass(MCRegister Reg, MVT VT=MVT::Other) const
Returns the Register Class of a physical register of the given type, picking the most sub register cl...
Align getSpillAlign(const TargetRegisterClass &RC) const
Return the minimum required alignment in bytes for a spill slot for a register of this class.
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
unsigned getSpillSize(const TargetRegisterClass &RC) const
Return the size in bytes of the stack slot allocated to hold a spilled copy of a register from class ...
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
virtual const TargetLowering * getTargetLowering() const
Triple - Helper class for working with autoconf configuration names.
Definition: Triple.h:47
LLVM_ABI StringRef getArchName() const
Get the architecture (first) component of the triple.
Definition: Triple.cpp:1376
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition: TypeSize.h:346
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
constexpr ScalarTy getFixedValue() const
Definition: TypeSize.h:203
self_iterator getIterator()
Definition: ilist_node.h:134
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition: raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
@ MO_GOT
MO_GOT - This flag indicates that a symbol operand represents the address of the GOT entry for the sy...
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
static unsigned getShifterImm(AArch64_AM::ShiftExtendType ST, unsigned Imm)
getShifterImm - Encode the shift type and amount: imm: 6-bit shift amount shifter: 000 ==> lsl 001 ==...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
Definition: CallingConv.h:224
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition: CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition: CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition: CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition: CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition: CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
Definition: CallingConv.h:159
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition: CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Dead
Unused definition.
@ Define
Register definition.
@ Kill
The last use of a register.
@ Undef
Value of the register doesn't matter.
Reg
All possible values of the reg field in the ModR/M byte.
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:444
NodeAddr< InstrNode * > Instr
Definition: RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition: SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:477
void stable_sort(R &&Range)
Definition: STLExtras.h:2077
MCCFIInstruction createDefCFA(const TargetRegisterInfo &TRI, unsigned FrameReg, unsigned Reg, const StackOffset &Offset, bool LastAdjustmentWasScalable=true)
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition: ScopeExit.h:59
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
unsigned getBLRCallOpcode(const MachineFunction &MF)
Return opcode to be used for indirect calls.
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:663
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1751
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition: STLExtras.h:428
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1669
@ Always
Always set the bit.
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition: Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition: Error.cpp:167
LLVM_ABI EHPersonality classifyEHPersonality(const Value *Pers)
See if the given exception handling personality function is one that we understand.
@ Success
The lock was released successfully.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
unsigned getDefRegState(bool B)
MCCFIInstruction createCFAOffset(const TargetRegisterInfo &MRI, unsigned Reg, const StackOffset &OffsetFromDefCFA, std::optional< int64_t > IncomingVGOffsetFromDefCFA)
unsigned getKillRegState(bool B)
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
Definition: APFixedPoint.h:312
bool isAsynchronousEHPersonality(EHPersonality Pers)
Returns true if this personality function catches asynchronous exceptions.
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition: STLExtras.h:1777
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition: STLExtras.h:2139
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
Definition: LivePhysRegs.h:225
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition: Error.cpp:180
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition: BitVector.h:858
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
Definition: WinEHFuncInfo.h:97
SmallVector< WinEHHandlerType, 1 > HandlerArray
Definition: WinEHFuncInfo.h:76