LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// | <hazard padding> |
60// |-----------------------------------|
61// | |
62// | callee-saved fp/simd/SVE regs |
63// | |
64// |-----------------------------------|
65// | |
66// | SVE stack objects |
67// | |
68// |-----------------------------------|
69// |.empty.space.to.make.part.below....|
70// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
71// |.the.standard.16-byte.alignment....| compile time; if present)
72// |-----------------------------------|
73// | local variables of fixed size |
74// | including spill slots |
75// | <FPR> |
76// | <hazard padding> |
77// | <GPR> |
78// |-----------------------------------| <- bp(not defined by ABI,
79// |.variable-sized.local.variables....| LLVM chooses X19)
80// |.(VLAs)............................| (size of this area is unknown at
81// |...................................| compile time)
82// |-----------------------------------| <- sp
83// | | Lower address
84//
85//
86// To access the data in a frame, at-compile time, a constant offset must be
87// computable from one of the pointers (fp, bp, sp) to access it. The size
88// of the areas with a dotted background cannot be computed at compile-time
89// if they are present, making it required to have all three of fp, bp and
90// sp to be set up to be able to access all contents in the frame areas,
91// assuming all of the frame areas are non-empty.
92//
93// For most functions, some of the frame areas are empty. For those functions,
94// it may not be necessary to set up fp or bp:
95// * A base pointer is definitely needed when there are both VLAs and local
96// variables with more-than-default alignment requirements.
97// * A frame pointer is definitely needed when there are local variables with
98// more-than-default alignment requirements.
99//
100// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
101// callee-saved area, since the unwind encoding does not allow for encoding
102// this dynamically and existing tools depend on this layout. For other
103// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
104// area to allow SVE stack objects (allocated directly below the callee-saves,
105// if available) to be accessed directly from the framepointer.
106// The SVE spill/fill instructions have VL-scaled addressing modes such
107// as:
108// ldr z8, [fp, #-7 mul vl]
109// For SVE the size of the vector length (VL) is not known at compile-time, so
110// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
111// layout, we don't need to add an unscaled offset to the framepointer before
112// accessing the SVE object in the frame.
113//
114// In some cases when a base pointer is not strictly needed, it is generated
115// anyway when offsets from the frame pointer to access local variables become
116// so large that the offset can't be encoded in the immediate fields of loads
117// or stores.
118//
119// Outgoing function arguments must be at the bottom of the stack frame when
120// calling another function. If we do not have variable-sized stack objects, we
121// can allocate a "reserved call frame" area at the bottom of the local
122// variable area, large enough for all outgoing calls. If we do have VLAs, then
123// the stack pointer must be decremented and incremented around each call to
124// make space for the arguments below the VLAs.
125//
126// FIXME: also explain the redzone concept.
127//
128// About stack hazards: Under some SME contexts, a coprocessor with its own
129// separate cache can used for FP operations. This can create hazards if the CPU
130// and the SME unit try to access the same area of memory, including if the
131// access is to an area of the stack. To try to alleviate this we attempt to
132// introduce extra padding into the stack frame between FP and GPR accesses,
133// controlled by the aarch64-stack-hazard-size option. Without changing the
134// layout of the stack frame in the diagram above, a stack object of size
135// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
136// to the stack objects section, and stack objects are sorted so that FPR >
137// Hazard padding slot > GPRs (where possible). Unfortunately some things are
138// not handled well (VLA area, arguments on the stack, objects with both GPR and
139// FPR accesses), but if those are controlled by the user then the entire stack
140// frame becomes GPR at the start/end with FPR in the middle, surrounded by
141// Hazard padding.
142//
143// An example of the prologue:
144//
145// .globl __foo
146// .align 2
147// __foo:
148// Ltmp0:
149// .cfi_startproc
150// .cfi_personality 155, ___gxx_personality_v0
151// Leh_func_begin:
152// .cfi_lsda 16, Lexception33
153//
154// stp xa,bx, [sp, -#offset]!
155// ...
156// stp x28, x27, [sp, #offset-32]
157// stp fp, lr, [sp, #offset-16]
158// add fp, sp, #offset - 16
159// sub sp, sp, #1360
160//
161// The Stack:
162// +-------------------------------------------+
163// 10000 | ........ | ........ | ........ | ........ |
164// 10004 | ........ | ........ | ........ | ........ |
165// +-------------------------------------------+
166// 10008 | ........ | ........ | ........ | ........ |
167// 1000c | ........ | ........ | ........ | ........ |
168// +===========================================+
169// 10010 | X28 Register |
170// 10014 | X28 Register |
171// +-------------------------------------------+
172// 10018 | X27 Register |
173// 1001c | X27 Register |
174// +===========================================+
175// 10020 | Frame Pointer |
176// 10024 | Frame Pointer |
177// +-------------------------------------------+
178// 10028 | Link Register |
179// 1002c | Link Register |
180// +===========================================+
181// 10030 | ........ | ........ | ........ | ........ |
182// 10034 | ........ | ........ | ........ | ........ |
183// +-------------------------------------------+
184// 10038 | ........ | ........ | ........ | ........ |
185// 1003c | ........ | ........ | ........ | ........ |
186// +-------------------------------------------+
187//
188// [sp] = 10030 :: >>initial value<<
189// sp = 10020 :: stp fp, lr, [sp, #-16]!
190// fp = sp == 10020 :: mov fp, sp
191// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
192// sp == 10010 :: >>final value<<
193//
194// The frame pointer (w29) points to address 10020. If we use an offset of
195// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
196// for w27, and -32 for w28:
197//
198// Ltmp1:
199// .cfi_def_cfa w29, 16
200// Ltmp2:
201// .cfi_offset w30, -8
202// Ltmp3:
203// .cfi_offset w29, -16
204// Ltmp4:
205// .cfi_offset w27, -24
206// Ltmp5:
207// .cfi_offset w28, -32
208//
209//===----------------------------------------------------------------------===//
210
211#include "AArch64FrameLowering.h"
212#include "AArch64InstrInfo.h"
215#include "AArch64RegisterInfo.h"
216#include "AArch64Subtarget.h"
220#include "llvm/ADT/ScopeExit.h"
221#include "llvm/ADT/SmallVector.h"
239#include "llvm/IR/Attributes.h"
240#include "llvm/IR/CallingConv.h"
241#include "llvm/IR/DataLayout.h"
242#include "llvm/IR/DebugLoc.h"
243#include "llvm/IR/Function.h"
244#include "llvm/MC/MCAsmInfo.h"
245#include "llvm/MC/MCDwarf.h"
247#include "llvm/Support/Debug.h"
254#include <cassert>
255#include <cstdint>
256#include <iterator>
257#include <optional>
258#include <vector>
259
260using namespace llvm;
261
262#define DEBUG_TYPE "frame-info"
263
264static cl::opt<bool> EnableRedZone("aarch64-redzone",
265 cl::desc("enable use of redzone on AArch64"),
266 cl::init(false), cl::Hidden);
267
269 "stack-tagging-merge-settag",
270 cl::desc("merge settag instruction in function epilog"), cl::init(true),
271 cl::Hidden);
272
273static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
274 cl::desc("sort stack allocations"),
275 cl::init(true), cl::Hidden);
276
278 "homogeneous-prolog-epilog", cl::Hidden,
279 cl::desc("Emit homogeneous prologue and epilogue for the size "
280 "optimization (default = off)"));
281
282// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
284 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
285 cl::Hidden);
286// Whether to insert padding into non-streaming functions (for testing).
287static cl::opt<bool>
288 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
289 cl::init(false), cl::Hidden);
290
292 "aarch64-disable-multivector-spill-fill",
293 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
294 cl::Hidden);
295
296int64_t
297AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
298 MachineBasicBlock &MBB) const {
299 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
301 bool IsTailCallReturn = (MBB.end() != MBBI)
303 : false;
304
305 int64_t ArgumentPopSize = 0;
306 if (IsTailCallReturn) {
307 MachineOperand &StackAdjust = MBBI->getOperand(1);
308
309 // For a tail-call in a callee-pops-arguments environment, some or all of
310 // the stack may actually be in use for the call's arguments, this is
311 // calculated during LowerCall and consumed here...
312 ArgumentPopSize = StackAdjust.getImm();
313 } else {
314 // ... otherwise the amount to pop is *all* of the argument space,
315 // conveniently stored in the MachineFunctionInfo by
316 // LowerFormalArguments. This will, of course, be zero for the C calling
317 // convention.
318 ArgumentPopSize = AFI->getArgumentStackToRestore();
319 }
320
321 return ArgumentPopSize;
322}
323
325 MachineFunction &MF);
326
327// Conservatively, returns true if the function is likely to have an SVE vectors
328// on the stack. This function is safe to be called before callee-saves or
329// object offsets have been determined.
331 const MachineFunction &MF) {
332 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
333 if (AFI->isSVECC())
334 return true;
335
336 if (AFI->hasCalculatedStackSizeSVE())
337 return bool(AFL.getSVEStackSize(MF));
338
339 const MachineFrameInfo &MFI = MF.getFrameInfo();
340 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
342 return true;
343 }
344
345 return false;
346}
347
348/// Returns true if a homogeneous prolog or epilog code can be emitted
349/// for the size optimization. If possible, a frame helper call is injected.
350/// When Exit block is given, this check is for epilog.
351bool AArch64FrameLowering::homogeneousPrologEpilog(
352 MachineFunction &MF, MachineBasicBlock *Exit) const {
353 if (!MF.getFunction().hasMinSize())
354 return false;
356 return false;
357 if (EnableRedZone)
358 return false;
359
360 // TODO: Window is supported yet.
361 if (needsWinCFI(MF))
362 return false;
363
364 // TODO: SVE is not supported yet.
365 if (isLikelyToHaveSVEStack(*this, MF))
366 return false;
367
368 // Bail on stack adjustment needed on return for simplicity.
369 const MachineFrameInfo &MFI = MF.getFrameInfo();
370 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
371 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
372 return false;
373 if (Exit && getArgumentStackToRestore(MF, *Exit))
374 return false;
375
376 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
378 return false;
379
380 // If there are an odd number of GPRs before LR and FP in the CSRs list,
381 // they will not be paired into one RegPairInfo, which is incompatible with
382 // the assumption made by the homogeneous prolog epilog pass.
383 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
384 unsigned NumGPRs = 0;
385 for (unsigned I = 0; CSRegs[I]; ++I) {
386 Register Reg = CSRegs[I];
387 if (Reg == AArch64::LR) {
388 assert(CSRegs[I + 1] == AArch64::FP);
389 if (NumGPRs % 2 != 0)
390 return false;
391 break;
392 }
393 if (AArch64::GPR64RegClass.contains(Reg))
394 ++NumGPRs;
395 }
396
397 return true;
398}
399
400/// Returns true if CSRs should be paired.
401bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
402 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
403}
404
405/// This is the biggest offset to the stack pointer we can encode in aarch64
406/// instructions (without using a separate calculation and a temp register).
407/// Note that the exception here are vector stores/loads which cannot encode any
408/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
409static const unsigned DefaultSafeSPDisplacement = 255;
410
411/// Look at each instruction that references stack frames and return the stack
412/// size limit beyond which some of these instructions will require a scratch
413/// register during their expansion later.
415 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
416 // range. We'll end up allocating an unnecessary spill slot a lot, but
417 // realistically that's not a big deal at this stage of the game.
418 for (MachineBasicBlock &MBB : MF) {
419 for (MachineInstr &MI : MBB) {
420 if (MI.isDebugInstr() || MI.isPseudo() ||
421 MI.getOpcode() == AArch64::ADDXri ||
422 MI.getOpcode() == AArch64::ADDSXri)
423 continue;
424
425 for (const MachineOperand &MO : MI.operands()) {
426 if (!MO.isFI())
427 continue;
428
430 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
432 return 0;
433 }
434 }
435 }
437}
438
443
444unsigned
445AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
446 const AArch64FunctionInfo *AFI,
447 bool IsWin64, bool IsFunclet) const {
448 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
449 "Tail call reserved stack must be aligned to 16 bytes");
450 if (!IsWin64 || IsFunclet) {
451 return AFI->getTailCallReservedStack();
452 } else {
453 if (AFI->getTailCallReservedStack() != 0 &&
454 !MF.getFunction().getAttributes().hasAttrSomewhere(
455 Attribute::SwiftAsync))
456 report_fatal_error("cannot generate ABI-changing tail call for Win64");
457 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
458
459 // Var args are stored here in the primary function.
460 FixedObjectSize += AFI->getVarArgsGPRSize();
461
462 if (MF.hasEHFunclets()) {
463 // Catch objects are stored here in the primary function.
464 const MachineFrameInfo &MFI = MF.getFrameInfo();
465 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
466 SmallSetVector<int, 8> CatchObjFrameIndices;
467 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
468 for (const WinEHHandlerType &H : TBME.HandlerArray) {
469 int FrameIndex = H.CatchObj.FrameIndex;
470 if ((FrameIndex != INT_MAX) &&
471 CatchObjFrameIndices.insert(FrameIndex)) {
472 FixedObjectSize = alignTo(FixedObjectSize,
473 MFI.getObjectAlign(FrameIndex).value()) +
474 MFI.getObjectSize(FrameIndex);
475 }
476 }
477 }
478 // To support EH funclets we allocate an UnwindHelp object
479 FixedObjectSize += 8;
480 }
481 return alignTo(FixedObjectSize, 16);
482 }
483}
484
485/// Returns the size of the entire SVE stackframe (calleesaves + spills).
491
493 if (!EnableRedZone)
494 return false;
495
496 // Don't use the red zone if the function explicitly asks us not to.
497 // This is typically used for kernel code.
498 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
499 const unsigned RedZoneSize =
501 if (!RedZoneSize)
502 return false;
503
504 const MachineFrameInfo &MFI = MF.getFrameInfo();
506 uint64_t NumBytes = AFI->getLocalStackSize();
507
508 // If neither NEON or SVE are available, a COPY from one Q-reg to
509 // another requires a spill -> reload sequence. We can do that
510 // using a pre-decrementing store/post-decrementing load, but
511 // if we do so, we can't use the Red Zone.
512 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
513 !Subtarget.isNeonAvailable() &&
514 !Subtarget.hasSVE();
515
516 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
517 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
518}
519
520/// hasFPImpl - Return true if the specified function should have a dedicated
521/// frame pointer register.
523 const MachineFrameInfo &MFI = MF.getFrameInfo();
524 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
526
527 // Win64 EH requires a frame pointer if funclets are present, as the locals
528 // are accessed off the frame pointer in both the parent function and the
529 // funclets.
530 if (MF.hasEHFunclets())
531 return true;
532 // Retain behavior of always omitting the FP for leaf functions when possible.
534 return true;
535 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
536 MFI.hasStackMap() || MFI.hasPatchPoint() ||
537 RegInfo->hasStackRealignment(MF))
538 return true;
539
540 // If we:
541 //
542 // 1. Have streaming mode changes
543 // OR:
544 // 2. Have a streaming body with SVE stack objects
545 //
546 // Then the value of VG restored when unwinding to this function may not match
547 // the value of VG used to set up the stack.
548 //
549 // This is a problem as the CFA can be described with an expression of the
550 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
551 //
552 // If the value of VG used in that expression does not match the value used to
553 // set up the stack, an incorrect address for the CFA will be computed, and
554 // unwinding will fail.
555 //
556 // We work around this issue by ensuring the frame-pointer can describe the
557 // CFA in either of these cases.
558 if (AFI.needsDwarfUnwindInfo(MF) &&
560 (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
561 return true;
562 // With large callframes around we may need to use FP to access the scavenging
563 // emergency spillslot.
564 //
565 // Unfortunately some calls to hasFP() like machine verifier ->
566 // getReservedReg() -> hasFP in the middle of global isel are too early
567 // to know the max call frame size. Hopefully conservatively returning "true"
568 // in those cases is fine.
569 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
570 if (!MFI.isMaxCallFrameSizeComputed() ||
572 return true;
573
574 return false;
575}
576
577/// Should the Frame Pointer be reserved for the current function?
579 const TargetMachine &TM = MF.getTarget();
580 const Triple &TT = TM.getTargetTriple();
581
582 // These OSes require the frame chain is valid, even if the current frame does
583 // not use a frame pointer.
584 if (TT.isOSDarwin() || TT.isOSWindows())
585 return true;
586
587 // If the function has a frame pointer, it is reserved.
588 if (hasFP(MF))
589 return true;
590
591 // Frontend has requested to preserve the frame pointer.
592 if (TM.Options.FramePointerIsReserved(MF))
593 return true;
594
595 return false;
596}
597
598/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
599/// not required, we reserve argument space for call sites in the function
600/// immediately on entry to the current function. This eliminates the need for
601/// add/sub sp brackets around call sites. Returns true if the call frame is
602/// included as part of the stack frame.
604 const MachineFunction &MF) const {
605 // The stack probing code for the dynamically allocated outgoing arguments
606 // area assumes that the stack is probed at the top - either by the prologue
607 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
608 // most recent variable-sized object allocation. Changing the condition here
609 // may need to be followed up by changes to the probe issuing logic.
610 return !MF.getFrameInfo().hasVarSizedObjects();
611}
612
616 const AArch64InstrInfo *TII =
617 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
618 const AArch64TargetLowering *TLI =
619 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
620 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
621 DebugLoc DL = I->getDebugLoc();
622 unsigned Opc = I->getOpcode();
623 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
624 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
625
626 if (!hasReservedCallFrame(MF)) {
627 int64_t Amount = I->getOperand(0).getImm();
628 Amount = alignTo(Amount, getStackAlign());
629 if (!IsDestroy)
630 Amount = -Amount;
631
632 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
633 // doesn't have to pop anything), then the first operand will be zero too so
634 // this adjustment is a no-op.
635 if (CalleePopAmount == 0) {
636 // FIXME: in-function stack adjustment for calls is limited to 24-bits
637 // because there's no guaranteed temporary register available.
638 //
639 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
640 // 1) For offset <= 12-bit, we use LSL #0
641 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
642 // LSL #0, and the other uses LSL #12.
643 //
644 // Most call frames will be allocated at the start of a function so
645 // this is OK, but it is a limitation that needs dealing with.
646 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
647
648 if (TLI->hasInlineStackProbe(MF) &&
650 // When stack probing is enabled, the decrement of SP may need to be
651 // probed. We only need to do this if the call site needs 1024 bytes of
652 // space or more, because a region smaller than that is allowed to be
653 // unprobed at an ABI boundary. We rely on the fact that SP has been
654 // probed exactly at this point, either by the prologue or most recent
655 // dynamic allocation.
657 "non-reserved call frame without var sized objects?");
658 Register ScratchReg =
659 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
660 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
661 } else {
662 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
663 StackOffset::getFixed(Amount), TII);
664 }
665 }
666 } else if (CalleePopAmount != 0) {
667 // If the calling convention demands that the callee pops arguments from the
668 // stack, we want to add it back if we have a reserved call frame.
669 assert(CalleePopAmount < 0xffffff && "call frame too large");
670 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
671 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
672 }
673 return MBB.erase(I);
674}
675
677 MachineBasicBlock &MBB) const {
678
679 MachineFunction &MF = *MBB.getParent();
680 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
681 const auto &TRI = *Subtarget.getRegisterInfo();
682 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
683
684 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
685
686 // Reset the CFA to `SP + 0`.
687 CFIBuilder.buildDefCFA(AArch64::SP, 0);
688
689 // Flip the RA sign state.
690 if (MFI.shouldSignReturnAddress(MF))
691 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
692 : CFIBuilder.buildNegateRAState();
693
694 // Shadow call stack uses X18, reset it.
695 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
696 CFIBuilder.buildSameValue(AArch64::X18);
697
698 // Emit .cfi_same_value for callee-saved registers.
699 const std::vector<CalleeSavedInfo> &CSI =
701 for (const auto &Info : CSI) {
702 MCRegister Reg = Info.getReg();
703 if (!TRI.regNeedsCFI(Reg, Reg))
704 continue;
705 CFIBuilder.buildSameValue(Reg);
706 }
707}
708
709// Return the maximum possible number of bytes for `Size` due to the
710// architectural limit on the size of a SVE register.
711static int64_t upperBound(StackOffset Size) {
712 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
713 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
714}
715
716void AArch64FrameLowering::allocateStackSpace(
718 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
719 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
720 bool FollowupAllocs) const {
721
722 if (!AllocSize)
723 return;
724
725 DebugLoc DL;
726 MachineFunction &MF = *MBB.getParent();
727 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
728 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
729 AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
730 const MachineFrameInfo &MFI = MF.getFrameInfo();
731
732 const int64_t MaxAlign = MFI.getMaxAlign().value();
733 const uint64_t AndMask = ~(MaxAlign - 1);
734
735 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
736 Register TargetReg = RealignmentPadding
737 ? findScratchNonCalleeSaveRegister(&MBB)
738 : AArch64::SP;
739 // SUB Xd/SP, SP, AllocSize
740 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
741 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
742 EmitCFI, InitialOffset);
743
744 if (RealignmentPadding) {
745 // AND SP, X9, 0b11111...0000
746 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
747 .addReg(TargetReg, RegState::Kill)
750 AFI.setStackRealigned(true);
751
752 // No need for SEH instructions here; if we're realigning the stack,
753 // we've set a frame pointer and already finished the SEH prologue.
754 assert(!NeedsWinCFI);
755 }
756 return;
757 }
758
759 //
760 // Stack probing allocation.
761 //
762
763 // Fixed length allocation. If we don't need to re-align the stack and don't
764 // have SVE objects, we can use a more efficient sequence for stack probing.
765 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
766 Register ScratchReg = findScratchNonCalleeSaveRegister(&MBB);
767 assert(ScratchReg != AArch64::NoRegister);
768 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
769 .addDef(ScratchReg)
770 .addImm(AllocSize.getFixed())
771 .addImm(InitialOffset.getFixed())
772 .addImm(InitialOffset.getScalable());
773 // The fixed allocation may leave unprobed bytes at the top of the
774 // stack. If we have subsequent allocation (e.g. if we have variable-sized
775 // objects), we need to issue an extra probe, so these allocations start in
776 // a known state.
777 if (FollowupAllocs) {
778 // STR XZR, [SP]
779 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
780 .addReg(AArch64::XZR)
781 .addReg(AArch64::SP)
782 .addImm(0)
784 }
785
786 return;
787 }
788
789 // Variable length allocation.
790
791 // If the (unknown) allocation size cannot exceed the probe size, decrement
792 // the stack pointer right away.
793 int64_t ProbeSize = AFI.getStackProbeSize();
794 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
795 Register ScratchReg = RealignmentPadding
796 ? findScratchNonCalleeSaveRegister(&MBB)
797 : AArch64::SP;
798 assert(ScratchReg != AArch64::NoRegister);
799 // SUB Xd, SP, AllocSize
800 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
801 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
802 EmitCFI, InitialOffset);
803 if (RealignmentPadding) {
804 // AND SP, Xn, 0b11111...0000
805 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
806 .addReg(ScratchReg, RegState::Kill)
809 AFI.setStackRealigned(true);
810 }
811 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
813 // STR XZR, [SP]
814 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
815 .addReg(AArch64::XZR)
816 .addReg(AArch64::SP)
817 .addImm(0)
819 }
820 return;
821 }
822
823 // Emit a variable-length allocation probing loop.
824 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
825 // each of them guaranteed to adjust the stack by less than the probe size.
826 Register TargetReg = findScratchNonCalleeSaveRegister(&MBB);
827 assert(TargetReg != AArch64::NoRegister);
828 // SUB Xd, SP, AllocSize
829 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
830 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
831 EmitCFI, InitialOffset);
832 if (RealignmentPadding) {
833 // AND Xn, Xn, 0b11111...0000
834 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
835 .addReg(TargetReg, RegState::Kill)
838 }
839
840 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
841 .addReg(TargetReg);
842 if (EmitCFI) {
843 // Set the CFA register back to SP.
844 CFIInstBuilder(MBB, MBBI, MachineInstr::FrameSetup)
845 .buildDefCFARegister(AArch64::SP);
846 }
847 if (RealignmentPadding)
848 AFI.setStackRealigned(true);
849}
850
852 switch (Reg.id()) {
853 default:
854 // The called routine is expected to preserve r19-r28
855 // r29 and r30 are used as frame pointer and link register resp.
856 return 0;
857
858 // GPRs
859#define CASE(n) \
860 case AArch64::W##n: \
861 case AArch64::X##n: \
862 return AArch64::X##n
863 CASE(0);
864 CASE(1);
865 CASE(2);
866 CASE(3);
867 CASE(4);
868 CASE(5);
869 CASE(6);
870 CASE(7);
871 CASE(8);
872 CASE(9);
873 CASE(10);
874 CASE(11);
875 CASE(12);
876 CASE(13);
877 CASE(14);
878 CASE(15);
879 CASE(16);
880 CASE(17);
881 CASE(18);
882#undef CASE
883
884 // FPRs
885#define CASE(n) \
886 case AArch64::B##n: \
887 case AArch64::H##n: \
888 case AArch64::S##n: \
889 case AArch64::D##n: \
890 case AArch64::Q##n: \
891 return HasSVE ? AArch64::Z##n : AArch64::Q##n
892 CASE(0);
893 CASE(1);
894 CASE(2);
895 CASE(3);
896 CASE(4);
897 CASE(5);
898 CASE(6);
899 CASE(7);
900 CASE(8);
901 CASE(9);
902 CASE(10);
903 CASE(11);
904 CASE(12);
905 CASE(13);
906 CASE(14);
907 CASE(15);
908 CASE(16);
909 CASE(17);
910 CASE(18);
911 CASE(19);
912 CASE(20);
913 CASE(21);
914 CASE(22);
915 CASE(23);
916 CASE(24);
917 CASE(25);
918 CASE(26);
919 CASE(27);
920 CASE(28);
921 CASE(29);
922 CASE(30);
923 CASE(31);
924#undef CASE
925 }
926}
927
928void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
929 MachineBasicBlock &MBB) const {
930 // Insertion point.
932
933 // Fake a debug loc.
934 DebugLoc DL;
935 if (MBBI != MBB.end())
936 DL = MBBI->getDebugLoc();
937
938 const MachineFunction &MF = *MBB.getParent();
939 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
940 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
941
942 BitVector GPRsToZero(TRI.getNumRegs());
943 BitVector FPRsToZero(TRI.getNumRegs());
944 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
945 for (MCRegister Reg : RegsToZero.set_bits()) {
946 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
947 // For GPRs, we only care to clear out the 64-bit register.
948 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
949 GPRsToZero.set(XReg);
950 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
951 // For FPRs,
952 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
953 FPRsToZero.set(XReg);
954 }
955 }
956
957 const AArch64InstrInfo &TII = *STI.getInstrInfo();
958
959 // Zero out GPRs.
960 for (MCRegister Reg : GPRsToZero.set_bits())
961 TII.buildClearRegister(Reg, MBB, MBBI, DL);
962
963 // Zero out FP/vector registers.
964 for (MCRegister Reg : FPRsToZero.set_bits())
965 TII.buildClearRegister(Reg, MBB, MBBI, DL);
966
967 if (HasSVE) {
968 for (MCRegister PReg :
969 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
970 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
971 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
972 AArch64::P15}) {
973 if (RegsToZero[PReg])
974 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
975 }
976 }
977}
978
979bool AArch64FrameLowering::windowsRequiresStackProbe(
980 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
981 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
982 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
983 // TODO: When implementing stack protectors, take that into account
984 // for the probe threshold.
985 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
986 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
987}
988
990 const MachineBasicBlock &MBB) {
991 const MachineFunction *MF = MBB.getParent();
992 LiveRegs.addLiveIns(MBB);
993 // Mark callee saved registers as used so we will not choose them.
994 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
995 for (unsigned i = 0; CSRegs[i]; ++i)
996 LiveRegs.addReg(CSRegs[i]);
997}
998
1000AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
1001 bool HasCall) const {
1002 MachineFunction *MF = MBB->getParent();
1003
1004 // If MBB is an entry block, use X9 as the scratch register
1005 // preserve_none functions may be using X9 to pass arguments,
1006 // so prefer to pick an available register below.
1007 if (&MF->front() == MBB &&
1009 return AArch64::X9;
1010
1011 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1012 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1013 LivePhysRegs LiveRegs(TRI);
1014 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1015 if (HasCall) {
1016 LiveRegs.addReg(AArch64::X16);
1017 LiveRegs.addReg(AArch64::X17);
1018 LiveRegs.addReg(AArch64::X18);
1019 }
1020
1021 // Prefer X9 since it was historically used for the prologue scratch reg.
1022 const MachineRegisterInfo &MRI = MF->getRegInfo();
1023 if (LiveRegs.available(MRI, AArch64::X9))
1024 return AArch64::X9;
1025
1026 for (unsigned Reg : AArch64::GPR64RegClass) {
1027 if (LiveRegs.available(MRI, Reg))
1028 return Reg;
1029 }
1030 return AArch64::NoRegister;
1031}
1032
1034 const MachineBasicBlock &MBB) const {
1035 const MachineFunction *MF = MBB.getParent();
1036 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1037 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1038 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1039 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1041
1042 if (AFI->hasSwiftAsyncContext()) {
1043 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1044 const MachineRegisterInfo &MRI = MF->getRegInfo();
1047 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1048 // available.
1049 if (!LiveRegs.available(MRI, AArch64::X16) ||
1050 !LiveRegs.available(MRI, AArch64::X17))
1051 return false;
1052 }
1053
1054 // Certain stack probing sequences might clobber flags, then we can't use
1055 // the block as a prologue if the flags register is a live-in.
1057 MBB.isLiveIn(AArch64::NZCV))
1058 return false;
1059
1060 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
1061 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
1062 return false;
1063
1064 // May need a scratch register (for return value) if require making a special
1065 // call
1066 if (requiresSaveVG(*MF) ||
1067 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
1068 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
1069 return false;
1070
1071 return true;
1072}
1073
1075 const Function &F = MF.getFunction();
1076 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1077 F.needsUnwindTableEntry();
1078}
1079
1080bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
1081 const MachineFunction &MF) const {
1082 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
1083 // and SEH_EpilogEnd instructions in the correct order.
1085 return false;
1087 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
1088 return SignReturnAddressAll;
1089}
1090
1091bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1092 MachineFunction &MF, uint64_t StackBumpBytes) const {
1094 const MachineFrameInfo &MFI = MF.getFrameInfo();
1095 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1096 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1097 if (homogeneousPrologEpilog(MF))
1098 return false;
1099
1100 if (AFI->getLocalStackSize() == 0)
1101 return false;
1102
1103 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1104 // (to force a stp with predecrement) to match the packed unwind format,
1105 // provided that there actually are any callee saved registers to merge the
1106 // decrement with.
1107 // This is potentially marginally slower, but allows using the packed
1108 // unwind format for functions that both have a local area and callee saved
1109 // registers. Using the packed unwind format notably reduces the size of
1110 // the unwind info.
1111 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1112 MF.getFunction().hasOptSize())
1113 return false;
1114
1115 // 512 is the maximum immediate for stp/ldp that will be used for
1116 // callee-save save/restores
1117 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1118 return false;
1119
1120 if (MFI.hasVarSizedObjects())
1121 return false;
1122
1123 if (RegInfo->hasStackRealignment(MF))
1124 return false;
1125
1126 // This isn't strictly necessary, but it simplifies things a bit since the
1127 // current RedZone handling code assumes the SP is adjusted by the
1128 // callee-save save/restore code.
1129 if (canUseRedZone(MF))
1130 return false;
1131
1132 // When there is an SVE area on the stack, always allocate the
1133 // callee-saves and spills/locals separately.
1134 if (getSVEStackSize(MF))
1135 return false;
1136
1137 return true;
1138}
1139
1140bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1141 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1142 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1143 return false;
1144 if (MBB.empty())
1145 return true;
1146
1147 // Disable combined SP bump if the last instruction is an MTE tag store. It
1148 // is almost always better to merge SP adjustment into those instructions.
1151 while (LastI != Begin) {
1152 --LastI;
1153 if (LastI->isTransient())
1154 continue;
1155 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1156 break;
1157 }
1158 switch (LastI->getOpcode()) {
1159 case AArch64::STGloop:
1160 case AArch64::STZGloop:
1161 case AArch64::STGi:
1162 case AArch64::STZGi:
1163 case AArch64::ST2Gi:
1164 case AArch64::STZ2Gi:
1165 return false;
1166 default:
1167 return true;
1168 }
1169 llvm_unreachable("unreachable");
1170}
1171
1172// Given a load or a store instruction, generate an appropriate unwinding SEH
1173// code on Windows.
1175 const TargetInstrInfo &TII,
1176 MachineInstr::MIFlag Flag) {
1177 unsigned Opc = MBBI->getOpcode();
1178 MachineBasicBlock *MBB = MBBI->getParent();
1179 MachineFunction &MF = *MBB->getParent();
1180 DebugLoc DL = MBBI->getDebugLoc();
1181 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1182 int Imm = MBBI->getOperand(ImmIdx).getImm();
1184 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1185 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1186
1187 switch (Opc) {
1188 default:
1189 report_fatal_error("No SEH Opcode for this instruction");
1190 case AArch64::STR_ZXI:
1191 case AArch64::LDR_ZXI: {
1192 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1193 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1194 .addImm(Reg0)
1195 .addImm(Imm)
1196 .setMIFlag(Flag);
1197 break;
1198 }
1199 case AArch64::STR_PXI:
1200 case AArch64::LDR_PXI: {
1201 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1202 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1203 .addImm(Reg0)
1204 .addImm(Imm)
1205 .setMIFlag(Flag);
1206 break;
1207 }
1208 case AArch64::LDPDpost:
1209 Imm = -Imm;
1210 [[fallthrough]];
1211 case AArch64::STPDpre: {
1212 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1213 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1214 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1215 .addImm(Reg0)
1216 .addImm(Reg1)
1217 .addImm(Imm * 8)
1218 .setMIFlag(Flag);
1219 break;
1220 }
1221 case AArch64::LDPXpost:
1222 Imm = -Imm;
1223 [[fallthrough]];
1224 case AArch64::STPXpre: {
1225 Register Reg0 = MBBI->getOperand(1).getReg();
1226 Register Reg1 = MBBI->getOperand(2).getReg();
1227 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1228 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1229 .addImm(Imm * 8)
1230 .setMIFlag(Flag);
1231 else
1232 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1233 .addImm(RegInfo->getSEHRegNum(Reg0))
1234 .addImm(RegInfo->getSEHRegNum(Reg1))
1235 .addImm(Imm * 8)
1236 .setMIFlag(Flag);
1237 break;
1238 }
1239 case AArch64::LDRDpost:
1240 Imm = -Imm;
1241 [[fallthrough]];
1242 case AArch64::STRDpre: {
1243 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1244 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1245 .addImm(Reg)
1246 .addImm(Imm)
1247 .setMIFlag(Flag);
1248 break;
1249 }
1250 case AArch64::LDRXpost:
1251 Imm = -Imm;
1252 [[fallthrough]];
1253 case AArch64::STRXpre: {
1254 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1255 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1256 .addImm(Reg)
1257 .addImm(Imm)
1258 .setMIFlag(Flag);
1259 break;
1260 }
1261 case AArch64::STPDi:
1262 case AArch64::LDPDi: {
1263 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1264 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1265 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1266 .addImm(Reg0)
1267 .addImm(Reg1)
1268 .addImm(Imm * 8)
1269 .setMIFlag(Flag);
1270 break;
1271 }
1272 case AArch64::STPXi:
1273 case AArch64::LDPXi: {
1274 Register Reg0 = MBBI->getOperand(0).getReg();
1275 Register Reg1 = MBBI->getOperand(1).getReg();
1276 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1277 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1278 .addImm(Imm * 8)
1279 .setMIFlag(Flag);
1280 else
1281 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1282 .addImm(RegInfo->getSEHRegNum(Reg0))
1283 .addImm(RegInfo->getSEHRegNum(Reg1))
1284 .addImm(Imm * 8)
1285 .setMIFlag(Flag);
1286 break;
1287 }
1288 case AArch64::STRXui:
1289 case AArch64::LDRXui: {
1290 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1291 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1292 .addImm(Reg)
1293 .addImm(Imm * 8)
1294 .setMIFlag(Flag);
1295 break;
1296 }
1297 case AArch64::STRDui:
1298 case AArch64::LDRDui: {
1299 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1300 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1301 .addImm(Reg)
1302 .addImm(Imm * 8)
1303 .setMIFlag(Flag);
1304 break;
1305 }
1306 case AArch64::STPQi:
1307 case AArch64::LDPQi: {
1308 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1309 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1310 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1311 .addImm(Reg0)
1312 .addImm(Reg1)
1313 .addImm(Imm * 16)
1314 .setMIFlag(Flag);
1315 break;
1316 }
1317 case AArch64::LDPQpost:
1318 Imm = -Imm;
1319 [[fallthrough]];
1320 case AArch64::STPQpre: {
1321 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1322 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1323 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1324 .addImm(Reg0)
1325 .addImm(Reg1)
1326 .addImm(Imm * 16)
1327 .setMIFlag(Flag);
1328 break;
1329 }
1330 }
1331 auto I = MBB->insertAfter(MBBI, MIB);
1332 return I;
1333}
1334
1335// Fix up the SEH opcode associated with the save/restore instruction.
1337 unsigned LocalStackSize) {
1338 MachineOperand *ImmOpnd = nullptr;
1339 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1340 switch (MBBI->getOpcode()) {
1341 default:
1342 llvm_unreachable("Fix the offset in the SEH instruction");
1343 case AArch64::SEH_SaveFPLR:
1344 case AArch64::SEH_SaveRegP:
1345 case AArch64::SEH_SaveReg:
1346 case AArch64::SEH_SaveFRegP:
1347 case AArch64::SEH_SaveFReg:
1348 case AArch64::SEH_SaveAnyRegQP:
1349 case AArch64::SEH_SaveAnyRegQPX:
1350 ImmOpnd = &MBBI->getOperand(ImmIdx);
1351 break;
1352 }
1353 if (ImmOpnd)
1354 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1355}
1356
1357bool AArch64FrameLowering::requiresGetVGCall(const MachineFunction &MF) const {
1358 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1359 return AFI->hasStreamingModeChanges() &&
1360 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1361}
1362
1365 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1366 return false;
1367 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1368 // is enabled with streaming mode changes.
1369 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1370 if (ST.isTargetDarwin())
1371 return ST.hasSVE();
1372 return true;
1373}
1374
1375static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO,
1376 RTLIB::Libcall LC) {
1377 return MO.isSymbol() &&
1378 StringRef(TLI.getLibcallName(LC)) == MO.getSymbolName();
1379}
1380
1381bool AArch64FrameLowering::isVGInstruction(MachineBasicBlock::iterator MBBI,
1382 const TargetLowering &TLI) const {
1383 unsigned Opc = MBBI->getOpcode();
1384 if (Opc == AArch64::CNTD_XPiI)
1385 return true;
1386
1387 if (!requiresGetVGCall(*MBBI->getMF()))
1388 return false;
1389
1390 if (Opc == AArch64::BL)
1391 return matchLibcall(TLI, MBBI->getOperand(0), RTLIB::SMEABI_GET_CURRENT_VG);
1392
1393 return Opc == TargetOpcode::COPY;
1394}
1395
1397AArch64FrameLowering::convertCalleeSaveRestoreToSPPrePostIncDec(
1399 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1400 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1401 MachineInstr::MIFlag FrameFlag, int CFAOffset) const {
1402 unsigned NewOpc;
1403
1404 // If the function contains streaming mode changes, we expect instructions
1405 // to calculate the value of VG before spilling. Move past these instructions
1406 // if necessary.
1407 MachineFunction &MF = *MBB.getParent();
1408 if (requiresSaveVG(MF)) {
1409 auto &TLI = *MF.getSubtarget().getTargetLowering();
1410 while (isVGInstruction(MBBI, TLI))
1411 ++MBBI;
1412 }
1413
1414 switch (MBBI->getOpcode()) {
1415 default:
1416 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1417 case AArch64::STPXi:
1418 NewOpc = AArch64::STPXpre;
1419 break;
1420 case AArch64::STPDi:
1421 NewOpc = AArch64::STPDpre;
1422 break;
1423 case AArch64::STPQi:
1424 NewOpc = AArch64::STPQpre;
1425 break;
1426 case AArch64::STRXui:
1427 NewOpc = AArch64::STRXpre;
1428 break;
1429 case AArch64::STRDui:
1430 NewOpc = AArch64::STRDpre;
1431 break;
1432 case AArch64::STRQui:
1433 NewOpc = AArch64::STRQpre;
1434 break;
1435 case AArch64::LDPXi:
1436 NewOpc = AArch64::LDPXpost;
1437 break;
1438 case AArch64::LDPDi:
1439 NewOpc = AArch64::LDPDpost;
1440 break;
1441 case AArch64::LDPQi:
1442 NewOpc = AArch64::LDPQpost;
1443 break;
1444 case AArch64::LDRXui:
1445 NewOpc = AArch64::LDRXpost;
1446 break;
1447 case AArch64::LDRDui:
1448 NewOpc = AArch64::LDRDpost;
1449 break;
1450 case AArch64::LDRQui:
1451 NewOpc = AArch64::LDRQpost;
1452 break;
1453 }
1454 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1455 int64_t MinOffset, MaxOffset;
1456 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1457 NewOpc, Scale, Width, MinOffset, MaxOffset);
1458 (void)Success;
1459 assert(Success && "unknown load/store opcode");
1460
1461 // If the first store isn't right where we want SP then we can't fold the
1462 // update in so create a normal arithmetic instruction instead.
1463 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1464 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1465 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1466 // If we are destroying the frame, make sure we add the increment after the
1467 // last frame operation.
1468 if (FrameFlag == MachineInstr::FrameDestroy) {
1469 ++MBBI;
1470 // Also skip the SEH instruction, if needed
1471 if (NeedsWinCFI && AArch64InstrInfo::isSEHInstruction(*MBBI))
1472 ++MBBI;
1473 }
1474 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1475 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1476 false, NeedsWinCFI, HasWinCFI, EmitCFI,
1477 StackOffset::getFixed(CFAOffset));
1478
1479 return std::prev(MBBI);
1480 }
1481
1482 // Get rid of the SEH code associated with the old instruction.
1483 if (NeedsWinCFI) {
1484 auto SEH = std::next(MBBI);
1485 if (AArch64InstrInfo::isSEHInstruction(*SEH))
1486 SEH->eraseFromParent();
1487 }
1488
1489 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1490 MIB.addReg(AArch64::SP, RegState::Define);
1491
1492 // Copy all operands other than the immediate offset.
1493 unsigned OpndIdx = 0;
1494 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1495 ++OpndIdx)
1496 MIB.add(MBBI->getOperand(OpndIdx));
1497
1498 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1499 "Unexpected immediate offset in first/last callee-save save/restore "
1500 "instruction!");
1501 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1502 "Unexpected base register in callee-save save/restore instruction!");
1503 assert(CSStackSizeInc % Scale == 0);
1504 MIB.addImm(CSStackSizeInc / (int)Scale);
1505
1506 MIB.setMIFlags(MBBI->getFlags());
1507 MIB.setMemRefs(MBBI->memoperands());
1508
1509 // Generate a new SEH code that corresponds to the new instruction.
1510 if (NeedsWinCFI) {
1511 *HasWinCFI = true;
1512 InsertSEH(*MIB, *TII, FrameFlag);
1513 }
1514
1515 if (EmitCFI)
1516 CFIInstBuilder(MBB, MBBI, FrameFlag)
1517 .buildDefCFAOffset(CFAOffset - CSStackSizeInc);
1518
1519 return std::prev(MBB.erase(MBBI));
1520}
1521
1522void AArch64FrameLowering::fixupCalleeSaveRestoreStackOffset(
1523 MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI,
1524 bool *HasWinCFI) const {
1525 if (AArch64InstrInfo::isSEHInstruction(MI))
1526 return;
1527
1528 unsigned Opc = MI.getOpcode();
1529 unsigned Scale;
1530 switch (Opc) {
1531 case AArch64::STPXi:
1532 case AArch64::STRXui:
1533 case AArch64::STPDi:
1534 case AArch64::STRDui:
1535 case AArch64::LDPXi:
1536 case AArch64::LDRXui:
1537 case AArch64::LDPDi:
1538 case AArch64::LDRDui:
1539 Scale = 8;
1540 break;
1541 case AArch64::STPQi:
1542 case AArch64::STRQui:
1543 case AArch64::LDPQi:
1544 case AArch64::LDRQui:
1545 Scale = 16;
1546 break;
1547 default:
1548 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1549 }
1550
1551 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1552 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1553 "Unexpected base register in callee-save save/restore instruction!");
1554 // Last operand is immediate offset that needs fixing.
1555 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1556 // All generated opcodes have scaled offsets.
1557 assert(LocalStackSize % Scale == 0);
1558 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1559
1560 if (NeedsWinCFI) {
1561 *HasWinCFI = true;
1562 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1563 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1564 assert(AArch64InstrInfo::isSEHInstruction(*MBBI) &&
1565 "Expecting a SEH instruction");
1566 fixupSEHOpcode(MBBI, LocalStackSize);
1567 }
1568}
1569
1570static bool isTargetWindows(const MachineFunction &MF) {
1572}
1573
1574static unsigned getStackHazardSize(const MachineFunction &MF) {
1575 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1576}
1577
1578// Convenience function to determine whether I is an SVE callee save.
1579bool AArch64FrameLowering::isSVECalleeSave(
1581 switch (I->getOpcode()) {
1582 default:
1583 return false;
1584 case AArch64::PTRUE_C_B:
1585 case AArch64::LD1B_2Z_IMM:
1586 case AArch64::ST1B_2Z_IMM:
1587 case AArch64::STR_ZXI:
1588 case AArch64::STR_PXI:
1589 case AArch64::LDR_ZXI:
1590 case AArch64::LDR_PXI:
1591 case AArch64::PTRUE_B:
1592 case AArch64::CPY_ZPzI_B:
1593 case AArch64::CMPNE_PPzZI_B:
1594 return I->getFlag(MachineInstr::FrameSetup) ||
1595 I->getFlag(MachineInstr::FrameDestroy);
1596 case AArch64::SEH_SavePReg:
1597 case AArch64::SEH_SaveZReg:
1598 return true;
1599 }
1600}
1601
1603 MachineFunction &MF) const {
1604 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1605 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1606
1607 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1608 DebugLoc DL; // Set debug location to unknown.
1610
1611 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1613 };
1614
1615 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1616 DebugLoc DL;
1617 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1618 if (MBBI != MBB.end())
1619 DL = MBBI->getDebugLoc();
1620
1621 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1623 };
1624
1625 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1626 EmitSignRA(MF.front());
1627 for (MachineBasicBlock &MBB : MF) {
1628 if (MBB.isEHFuncletEntry())
1629 EmitSignRA(MBB);
1630 if (MBB.isReturnBlock())
1631 EmitAuthRA(MBB);
1632 }
1633}
1634
1636 MachineBasicBlock &MBB) const {
1637 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1638 PrologueEmitter.emitPrologue();
1639}
1640
1642 MachineBasicBlock &MBB) const {
1643 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1644 EpilogueEmitter.emitEpilogue();
1645}
1646
1649 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1650}
1651
1653 return enableCFIFixup(MF) &&
1654 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1655}
1656
1657/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1658/// debug info. It's the same as what we use for resolving the code-gen
1659/// references for now. FIXME: This can go wrong when references are
1660/// SP-relative and simple call frames aren't used.
1663 Register &FrameReg) const {
1665 MF, FI, FrameReg,
1666 /*PreferFP=*/
1667 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1668 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1669 /*ForSimm=*/false);
1670}
1671
1674 int FI) const {
1675 // This function serves to provide a comparable offset from a single reference
1676 // point (the value of SP at function entry) that can be used for analysis,
1677 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1678 // correct for all objects in the presence of VLA-area objects or dynamic
1679 // stack re-alignment.
1680
1681 const auto &MFI = MF.getFrameInfo();
1682
1683 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1684 StackOffset SVEStackSize = getSVEStackSize(MF);
1685
1686 // For VLA-area objects, just emit an offset at the end of the stack frame.
1687 // Whilst not quite correct, these objects do live at the end of the frame and
1688 // so it is more useful for analysis for the offset to reflect this.
1689 if (MFI.isVariableSizedObjectIndex(FI)) {
1690 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1691 }
1692
1693 // This is correct in the absence of any SVE stack objects.
1694 if (!SVEStackSize)
1695 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1696
1697 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1698 bool FPAfterSVECalleeSaves =
1700 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
1701 if (FPAfterSVECalleeSaves &&
1702 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
1703 return StackOffset::getScalable(ObjectOffset);
1704 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1705 ObjectOffset);
1706 }
1707
1708 bool IsFixed = MFI.isFixedObjectIndex(FI);
1709 bool IsCSR =
1710 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1711
1712 StackOffset ScalableOffset = {};
1713 if (!IsFixed && !IsCSR) {
1714 ScalableOffset = -SVEStackSize;
1715 } else if (FPAfterSVECalleeSaves && IsCSR) {
1716 ScalableOffset =
1718 }
1719
1720 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1721}
1722
1728
1729StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1730 int64_t ObjectOffset) const {
1731 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1732 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1733 const Function &F = MF.getFunction();
1734 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1735 unsigned FixedObject =
1736 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1737 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1738 int64_t FPAdjust =
1739 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1740 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1741}
1742
1743StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1744 int64_t ObjectOffset) const {
1745 const auto &MFI = MF.getFrameInfo();
1746 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1747}
1748
1749// TODO: This function currently does not work for scalable vectors.
1751 int FI) const {
1752 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1753 MF.getSubtarget().getRegisterInfo());
1754 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1755 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1756 ? getFPOffset(MF, ObjectOffset).getFixed()
1757 : getStackOffset(MF, ObjectOffset).getFixed();
1758}
1759
1761 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1762 bool ForSimm) const {
1763 const auto &MFI = MF.getFrameInfo();
1764 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1765 bool isFixed = MFI.isFixedObjectIndex(FI);
1766 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
1767 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
1768 PreferFP, ForSimm);
1769}
1770
1772 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
1773 Register &FrameReg, bool PreferFP, bool ForSimm) const {
1774 const auto &MFI = MF.getFrameInfo();
1775 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1776 MF.getSubtarget().getRegisterInfo());
1777 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1778 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1779
1780 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1781 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1782 bool isCSR =
1783 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1784
1785 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1786
1787 // Use frame pointer to reference fixed objects. Use it for locals if
1788 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1789 // reliable as a base). Make sure useFPForScavengingIndex() does the
1790 // right thing for the emergency spill slot.
1791 bool UseFP = false;
1792 if (AFI->hasStackFrame() && !isSVE) {
1793 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1794 // there are scalable (SVE) objects in between the FP and the fixed-sized
1795 // objects.
1796 PreferFP &= !SVEStackSize;
1797
1798 // Note: Keeping the following as multiple 'if' statements rather than
1799 // merging to a single expression for readability.
1800 //
1801 // Argument access should always use the FP.
1802 if (isFixed) {
1803 UseFP = hasFP(MF);
1804 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1805 // References to the CSR area must use FP if we're re-aligning the stack
1806 // since the dynamically-sized alignment padding is between the SP/BP and
1807 // the CSR area.
1808 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1809 UseFP = true;
1810 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1811 // If the FPOffset is negative and we're producing a signed immediate, we
1812 // have to keep in mind that the available offset range for negative
1813 // offsets is smaller than for positive ones. If an offset is available
1814 // via the FP and the SP, use whichever is closest.
1815 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1816 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1817
1818 if (FPOffset >= 0) {
1819 // If the FPOffset is positive, that'll always be best, as the SP/BP
1820 // will be even further away.
1821 UseFP = true;
1822 } else if (MFI.hasVarSizedObjects()) {
1823 // If we have variable sized objects, we can use either FP or BP, as the
1824 // SP offset is unknown. We can use the base pointer if we have one and
1825 // FP is not preferred. If not, we're stuck with using FP.
1826 bool CanUseBP = RegInfo->hasBasePointer(MF);
1827 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1828 UseFP = PreferFP;
1829 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1830 UseFP = true;
1831 // else we can use BP and FP, but the offset from FP won't fit.
1832 // That will make us scavenge registers which we can probably avoid by
1833 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1834 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1835 // Funclets access the locals contained in the parent's stack frame
1836 // via the frame pointer, so we have to use the FP in the parent
1837 // function.
1838 (void) Subtarget;
1839 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1840 MF.getFunction().isVarArg()) &&
1841 "Funclets should only be present on Win64");
1842 UseFP = true;
1843 } else {
1844 // We have the choice between FP and (SP or BP).
1845 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1846 UseFP = true;
1847 }
1848 }
1849 }
1850
1851 assert(
1852 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1853 "In the presence of dynamic stack pointer realignment, "
1854 "non-argument/CSR objects cannot be accessed through the frame pointer");
1855
1856 bool FPAfterSVECalleeSaves =
1858
1859 if (isSVE) {
1860 StackOffset FPOffset =
1862 StackOffset SPOffset =
1863 SVEStackSize +
1864 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1865 ObjectOffset);
1866 if (FPAfterSVECalleeSaves) {
1868 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1871 }
1872 }
1873 // Always use the FP for SVE spills if available and beneficial.
1874 if (hasFP(MF) && (SPOffset.getFixed() ||
1875 FPOffset.getScalable() < SPOffset.getScalable() ||
1876 RegInfo->hasStackRealignment(MF))) {
1877 FrameReg = RegInfo->getFrameRegister(MF);
1878 return FPOffset;
1879 }
1880
1881 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1882 : (unsigned)AArch64::SP;
1883 return SPOffset;
1884 }
1885
1886 StackOffset ScalableOffset = {};
1887 if (FPAfterSVECalleeSaves) {
1888 // In this stack layout, the FP is in between the callee saves and other
1889 // SVE allocations.
1890 StackOffset SVECalleeSavedStack =
1892 if (UseFP) {
1893 if (isFixed)
1894 ScalableOffset = SVECalleeSavedStack;
1895 else if (!isCSR)
1896 ScalableOffset = SVECalleeSavedStack - SVEStackSize;
1897 } else {
1898 if (isFixed)
1899 ScalableOffset = SVEStackSize;
1900 else if (isCSR)
1901 ScalableOffset = SVEStackSize - SVECalleeSavedStack;
1902 }
1903 } else {
1904 if (UseFP && !(isFixed || isCSR))
1905 ScalableOffset = -SVEStackSize;
1906 if (!UseFP && (isFixed || isCSR))
1907 ScalableOffset = SVEStackSize;
1908 }
1909
1910 if (UseFP) {
1911 FrameReg = RegInfo->getFrameRegister(MF);
1912 return StackOffset::getFixed(FPOffset) + ScalableOffset;
1913 }
1914
1915 // Use the base pointer if we have one.
1916 if (RegInfo->hasBasePointer(MF))
1917 FrameReg = RegInfo->getBaseRegister();
1918 else {
1919 assert(!MFI.hasVarSizedObjects() &&
1920 "Can't use SP when we have var sized objects.");
1921 FrameReg = AArch64::SP;
1922 // If we're using the red zone for this function, the SP won't actually
1923 // be adjusted, so the offsets will be negative. They're also all
1924 // within range of the signed 9-bit immediate instructions.
1925 if (canUseRedZone(MF))
1926 Offset -= AFI->getLocalStackSize();
1927 }
1928
1929 return StackOffset::getFixed(Offset) + ScalableOffset;
1930}
1931
1932static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1933 // Do not set a kill flag on values that are also marked as live-in. This
1934 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1935 // callee saved registers.
1936 // Omitting the kill flags is conservatively correct even if the live-in
1937 // is not used after all.
1938 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1939 return getKillRegState(!IsLiveIn);
1940}
1941
1943 MachineFunction &MF) {
1944 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1945 AttributeList Attrs = MF.getFunction().getAttributes();
1947 return Subtarget.isTargetMachO() &&
1948 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1949 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1951 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1952}
1953
1954static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1955 bool NeedsWinCFI, bool IsFirst,
1956 const TargetRegisterInfo *TRI) {
1957 // If we are generating register pairs for a Windows function that requires
1958 // EH support, then pair consecutive registers only. There are no unwind
1959 // opcodes for saves/restores of non-consecutive register pairs.
1960 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1961 // save_lrpair.
1962 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1963
1964 if (Reg2 == AArch64::FP)
1965 return true;
1966 if (!NeedsWinCFI)
1967 return false;
1968 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1969 return false;
1970 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1971 // opcode. If this is the first register pair, it would end up with a
1972 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1973 // if LR is paired with something else than the first register.
1974 // The save_lrpair opcode requires the first register to be an odd one.
1975 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1976 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1977 return false;
1978 return true;
1979}
1980
1981/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1982/// WindowsCFI requires that only consecutive registers can be paired.
1983/// LR and FP need to be allocated together when the frame needs to save
1984/// the frame-record. This means any other register pairing with LR is invalid.
1985static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1986 bool UsesWinAAPCS, bool NeedsWinCFI,
1987 bool NeedsFrameRecord, bool IsFirst,
1988 const TargetRegisterInfo *TRI) {
1989 if (UsesWinAAPCS)
1990 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1991 TRI);
1992
1993 // If we need to store the frame record, don't pair any register
1994 // with LR other than FP.
1995 if (NeedsFrameRecord)
1996 return Reg2 == AArch64::LR;
1997
1998 return false;
1999}
2000
2001namespace {
2002
2003struct RegPairInfo {
2004 unsigned Reg1 = AArch64::NoRegister;
2005 unsigned Reg2 = AArch64::NoRegister;
2006 int FrameIdx;
2007 int Offset;
2008 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2009 const TargetRegisterClass *RC;
2010
2011 RegPairInfo() = default;
2012
2013 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2014
2015 bool isScalable() const { return Type == PPR || Type == ZPR; }
2016};
2017
2018} // end anonymous namespace
2019
2020unsigned findFreePredicateReg(BitVector &SavedRegs) {
2021 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2022 if (SavedRegs.test(PReg)) {
2023 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2024 return PNReg;
2025 }
2026 }
2027 return AArch64::NoRegister;
2028}
2029
2030// The multivector LD/ST are available only for SME or SVE2p1 targets
2032 MachineFunction &MF) {
2034 return false;
2035
2036 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
2037 bool IsLocallyStreaming =
2038 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
2039
2040 // Only when in streaming mode SME2 instructions can be safely used.
2041 // It is not safe to use SME2 instructions when in streaming compatible or
2042 // locally streaming mode.
2043 return Subtarget.hasSVE2p1() ||
2044 (Subtarget.hasSME2() &&
2045 (!IsLocallyStreaming && Subtarget.isStreaming()));
2046}
2047
2049 MachineFunction &MF,
2051 const TargetRegisterInfo *TRI,
2053 bool NeedsFrameRecord) {
2054
2055 if (CSI.empty())
2056 return;
2057
2058 bool IsWindows = isTargetWindows(MF);
2059 bool NeedsWinCFI = AFL.needsWinCFI(MF);
2061 unsigned StackHazardSize = getStackHazardSize(MF);
2062 MachineFrameInfo &MFI = MF.getFrameInfo();
2064 unsigned Count = CSI.size();
2065 (void)CC;
2066 // MachO's compact unwind format relies on all registers being stored in
2067 // pairs.
2068 assert((!produceCompactUnwindFrame(AFL, MF) ||
2071 (Count & 1) == 0) &&
2072 "Odd number of callee-saved regs to spill!");
2073 int ByteOffset = AFI->getCalleeSavedStackSize();
2074 int StackFillDir = -1;
2075 int RegInc = 1;
2076 unsigned FirstReg = 0;
2077 if (NeedsWinCFI) {
2078 // For WinCFI, fill the stack from the bottom up.
2079 ByteOffset = 0;
2080 StackFillDir = 1;
2081 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2082 // backwards, to pair up registers starting from lower numbered registers.
2083 RegInc = -1;
2084 FirstReg = Count - 1;
2085 }
2086 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
2087 int ScalableByteOffset =
2088 FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
2089 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2090 Register LastReg = 0;
2091
2092 // When iterating backwards, the loop condition relies on unsigned wraparound.
2093 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2094 RegPairInfo RPI;
2095 RPI.Reg1 = CSI[i].getReg();
2096
2097 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
2098 RPI.Type = RegPairInfo::GPR;
2099 RPI.RC = &AArch64::GPR64RegClass;
2100 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
2101 RPI.Type = RegPairInfo::FPR64;
2102 RPI.RC = &AArch64::FPR64RegClass;
2103 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
2104 RPI.Type = RegPairInfo::FPR128;
2105 RPI.RC = &AArch64::FPR128RegClass;
2106 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
2107 RPI.Type = RegPairInfo::ZPR;
2108 RPI.RC = &AArch64::ZPRRegClass;
2109 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
2110 RPI.Type = RegPairInfo::PPR;
2111 RPI.RC = &AArch64::PPRRegClass;
2112 } else if (RPI.Reg1 == AArch64::VG) {
2113 RPI.Type = RegPairInfo::VG;
2114 RPI.RC = &AArch64::FIXED_REGSRegClass;
2115 } else {
2116 llvm_unreachable("Unsupported register class.");
2117 }
2118
2119 // Add the stack hazard size as we transition from GPR->FPR CSRs.
2120 if (AFI->hasStackHazardSlotIndex() &&
2121 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2123 ByteOffset += StackFillDir * StackHazardSize;
2124 LastReg = RPI.Reg1;
2125
2126 int Scale = TRI->getSpillSize(*RPI.RC);
2127 // Add the next reg to the pair if it is in the same register class.
2128 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
2129 MCRegister NextReg = CSI[i + RegInc].getReg();
2130 bool IsFirst = i == FirstReg;
2131 switch (RPI.Type) {
2132 case RegPairInfo::GPR:
2133 if (AArch64::GPR64RegClass.contains(NextReg) &&
2134 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2135 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2136 TRI))
2137 RPI.Reg2 = NextReg;
2138 break;
2139 case RegPairInfo::FPR64:
2140 if (AArch64::FPR64RegClass.contains(NextReg) &&
2141 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2142 IsFirst, TRI))
2143 RPI.Reg2 = NextReg;
2144 break;
2145 case RegPairInfo::FPR128:
2146 if (AArch64::FPR128RegClass.contains(NextReg))
2147 RPI.Reg2 = NextReg;
2148 break;
2149 case RegPairInfo::PPR:
2150 break;
2151 case RegPairInfo::ZPR:
2152 if (AFI->getPredicateRegForFillSpill() != 0 &&
2153 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
2154 // Calculate offset of register pair to see if pair instruction can be
2155 // used.
2156 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
2157 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
2158 RPI.Reg2 = NextReg;
2159 }
2160 break;
2161 case RegPairInfo::VG:
2162 break;
2163 }
2164 }
2165
2166 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2167 // list to come in sorted by frame index so that we can issue the store
2168 // pair instructions directly. Assert if we see anything otherwise.
2169 //
2170 // The order of the registers in the list is controlled by
2171 // getCalleeSavedRegs(), so they will always be in-order, as well.
2172 assert((!RPI.isPaired() ||
2173 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2174 "Out of order callee saved regs!");
2175
2176 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2177 RPI.Reg1 == AArch64::LR) &&
2178 "FrameRecord must be allocated together with LR");
2179
2180 // Windows AAPCS has FP and LR reversed.
2181 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2182 RPI.Reg2 == AArch64::LR) &&
2183 "FrameRecord must be allocated together with LR");
2184
2185 // MachO's compact unwind format relies on all registers being stored in
2186 // adjacent register pairs.
2187 assert((!produceCompactUnwindFrame(AFL, MF) ||
2190 (RPI.isPaired() &&
2191 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2192 RPI.Reg1 + 1 == RPI.Reg2))) &&
2193 "Callee-save registers not saved as adjacent register pair!");
2194
2195 RPI.FrameIdx = CSI[i].getFrameIdx();
2196 if (NeedsWinCFI &&
2197 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2198 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2199
2200 // Realign the scalable offset if necessary. This is relevant when
2201 // spilling predicates on Windows.
2202 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
2203 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
2204 }
2205
2206 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2207 assert(OffsetPre % Scale == 0);
2208
2209 if (RPI.isScalable())
2210 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2211 else
2212 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2213
2214 // Swift's async context is directly before FP, so allocate an extra
2215 // 8 bytes for it.
2216 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2217 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2218 (IsWindows && RPI.Reg2 == AArch64::LR)))
2219 ByteOffset += StackFillDir * 8;
2220
2221 // Round up size of non-pair to pair size if we need to pad the
2222 // callee-save area to ensure 16-byte alignment.
2223 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
2224 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
2225 ByteOffset % 16 != 0) {
2226 ByteOffset += 8 * StackFillDir;
2227 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2228 // A stack frame with a gap looks like this, bottom up:
2229 // d9, d8. x21, gap, x20, x19.
2230 // Set extra alignment on the x21 object to create the gap above it.
2231 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2232 NeedGapToAlignStack = false;
2233 }
2234
2235 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2236 assert(OffsetPost % Scale == 0);
2237 // If filling top down (default), we want the offset after incrementing it.
2238 // If filling bottom up (WinCFI) we need the original offset.
2239 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2240
2241 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2242 // Swift context can directly precede FP.
2243 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2244 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2245 (IsWindows && RPI.Reg2 == AArch64::LR)))
2246 Offset += 8;
2247 RPI.Offset = Offset / Scale;
2248
2249 assert((!RPI.isPaired() ||
2250 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2251 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2252 "Offset out of bounds for LDP/STP immediate");
2253
2254 auto isFrameRecord = [&] {
2255 if (RPI.isPaired())
2256 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
2257 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
2258 // Otherwise, look for the frame record as two unpaired registers. This is
2259 // needed for -aarch64-stack-hazard-size=<val>, which disables register
2260 // pairing (as the padding may be too large for the LDP/STP offset). Note:
2261 // On Windows, this check works out as current reg == FP, next reg == LR,
2262 // and on other platforms current reg == FP, previous reg == LR. This
2263 // works out as the correct pre-increment or post-increment offsets
2264 // respectively.
2265 return i > 0 && RPI.Reg1 == AArch64::FP &&
2266 CSI[i - 1].getReg() == AArch64::LR;
2267 };
2268
2269 // Save the offset to frame record so that the FP register can point to the
2270 // innermost frame record (spilled FP and LR registers).
2271 if (NeedsFrameRecord && isFrameRecord())
2273
2274 RegPairs.push_back(RPI);
2275 if (RPI.isPaired())
2276 i += RegInc;
2277 }
2278 if (NeedsWinCFI) {
2279 // If we need an alignment gap in the stack, align the topmost stack
2280 // object. A stack frame with a gap looks like this, bottom up:
2281 // x19, d8. d9, gap.
2282 // Set extra alignment on the topmost stack object (the first element in
2283 // CSI, which goes top down), to create the gap above it.
2284 if (AFI->hasCalleeSaveStackFreeSpace())
2285 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2286 // We iterated bottom up over the registers; flip RegPairs back to top
2287 // down order.
2288 std::reverse(RegPairs.begin(), RegPairs.end());
2289 }
2290}
2291
2295 MachineFunction &MF = *MBB.getParent();
2296 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
2298 bool NeedsWinCFI = needsWinCFI(MF);
2299 DebugLoc DL;
2301
2302 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2303
2305 // Refresh the reserved regs in case there are any potential changes since the
2306 // last freeze.
2307 MRI.freezeReservedRegs();
2308
2309 if (homogeneousPrologEpilog(MF)) {
2310 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2312
2313 for (auto &RPI : RegPairs) {
2314 MIB.addReg(RPI.Reg1);
2315 MIB.addReg(RPI.Reg2);
2316
2317 // Update register live in.
2318 if (!MRI.isReserved(RPI.Reg1))
2319 MBB.addLiveIn(RPI.Reg1);
2320 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
2321 MBB.addLiveIn(RPI.Reg2);
2322 }
2323 return true;
2324 }
2325 bool PTrueCreated = false;
2326 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2327 unsigned Reg1 = RPI.Reg1;
2328 unsigned Reg2 = RPI.Reg2;
2329 unsigned StrOpc;
2330
2331 // Issue sequence of spills for cs regs. The first spill may be converted
2332 // to a pre-decrement store later by emitPrologue if the callee-save stack
2333 // area allocation can't be combined with the local stack area allocation.
2334 // For example:
2335 // stp x22, x21, [sp, #0] // addImm(+0)
2336 // stp x20, x19, [sp, #16] // addImm(+2)
2337 // stp fp, lr, [sp, #32] // addImm(+4)
2338 // Rationale: This sequence saves uop updates compared to a sequence of
2339 // pre-increment spills like stp xi,xj,[sp,#-16]!
2340 // Note: Similar rationale and sequence for restores in epilog.
2341 unsigned Size = TRI->getSpillSize(*RPI.RC);
2342 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2343 switch (RPI.Type) {
2344 case RegPairInfo::GPR:
2345 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2346 break;
2347 case RegPairInfo::FPR64:
2348 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2349 break;
2350 case RegPairInfo::FPR128:
2351 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2352 break;
2353 case RegPairInfo::ZPR:
2354 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2355 break;
2356 case RegPairInfo::PPR:
2357 StrOpc =
2358 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
2359 break;
2360 case RegPairInfo::VG:
2361 StrOpc = AArch64::STRXui;
2362 break;
2363 }
2364
2365 unsigned X0Scratch = AArch64::NoRegister;
2366 auto RestoreX0 = make_scope_exit([&] {
2367 if (X0Scratch != AArch64::NoRegister)
2368 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2369 .addReg(X0Scratch)
2371 });
2372
2373 if (Reg1 == AArch64::VG) {
2374 // Find an available register to store value of VG to.
2375 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2376 assert(Reg1 != AArch64::NoRegister);
2377 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2378 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2379 .addImm(31)
2380 .addImm(1)
2382 } else {
2384 if (any_of(MBB.liveins(),
2385 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2386 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2387 AArch64::X0, LiveIn.PhysReg);
2388 })) {
2389 X0Scratch = Reg1;
2390 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2391 .addReg(AArch64::X0)
2393 }
2394
2395 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2396 const uint32_t *RegMask =
2397 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2398 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2400 .addRegMask(RegMask)
2401 .addReg(AArch64::X0, RegState::ImplicitDefine)
2403 Reg1 = AArch64::X0;
2404 }
2405 }
2406
2407 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2408 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2409 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2410 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2411 dbgs() << ")\n");
2412
2413 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2414 "Windows unwdinding requires a consecutive (FP,LR) pair");
2415 // Windows unwind codes require consecutive registers if registers are
2416 // paired. Make the switch here, so that the code below will save (x,x+1)
2417 // and not (x+1,x).
2418 unsigned FrameIdxReg1 = RPI.FrameIdx;
2419 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2420 if (NeedsWinCFI && RPI.isPaired()) {
2421 std::swap(Reg1, Reg2);
2422 std::swap(FrameIdxReg1, FrameIdxReg2);
2423 }
2424
2425 if (RPI.isPaired() && RPI.isScalable()) {
2426 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2429 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2430 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2431 "Expects SVE2.1 or SME2 target and a predicate register");
2432#ifdef EXPENSIVE_CHECKS
2433 auto IsPPR = [](const RegPairInfo &c) {
2434 return c.Reg1 == RegPairInfo::PPR;
2435 };
2436 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2437 auto IsZPR = [](const RegPairInfo &c) {
2438 return c.Type == RegPairInfo::ZPR;
2439 };
2440 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2441 assert(!(PPRBegin < ZPRBegin) &&
2442 "Expected callee save predicate to be handled first");
2443#endif
2444 if (!PTrueCreated) {
2445 PTrueCreated = true;
2446 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2448 }
2449 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2450 if (!MRI.isReserved(Reg1))
2451 MBB.addLiveIn(Reg1);
2452 if (!MRI.isReserved(Reg2))
2453 MBB.addLiveIn(Reg2);
2454 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2456 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2457 MachineMemOperand::MOStore, Size, Alignment));
2458 MIB.addReg(PnReg);
2459 MIB.addReg(AArch64::SP)
2460 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2461 // where 2*vscale is implicit
2464 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2465 MachineMemOperand::MOStore, Size, Alignment));
2466 if (NeedsWinCFI)
2468 } else { // The code when the pair of ZReg is not present
2469 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2470 if (!MRI.isReserved(Reg1))
2471 MBB.addLiveIn(Reg1);
2472 if (RPI.isPaired()) {
2473 if (!MRI.isReserved(Reg2))
2474 MBB.addLiveIn(Reg2);
2475 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2477 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2478 MachineMemOperand::MOStore, Size, Alignment));
2479 }
2480 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2481 .addReg(AArch64::SP)
2482 .addImm(RPI.Offset) // [sp, #offset*vscale],
2483 // where factor*vscale is implicit
2486 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2487 MachineMemOperand::MOStore, Size, Alignment));
2488 if (NeedsWinCFI)
2490 }
2491 // Update the StackIDs of the SVE stack slots.
2492 MachineFrameInfo &MFI = MF.getFrameInfo();
2493 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
2494 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2495 if (RPI.isPaired())
2496 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2497 }
2498 }
2499 return true;
2500}
2501
2505 MachineFunction &MF = *MBB.getParent();
2507 DebugLoc DL;
2509 bool NeedsWinCFI = needsWinCFI(MF);
2510
2511 if (MBBI != MBB.end())
2512 DL = MBBI->getDebugLoc();
2513
2514 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2515 if (homogeneousPrologEpilog(MF, &MBB)) {
2516 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2518 for (auto &RPI : RegPairs) {
2519 MIB.addReg(RPI.Reg1, RegState::Define);
2520 MIB.addReg(RPI.Reg2, RegState::Define);
2521 }
2522 return true;
2523 }
2524
2525 // For performance reasons restore SVE register in increasing order
2526 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2527 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2528 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2529 std::reverse(PPRBegin, PPREnd);
2530 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2531 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2532 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2533 std::reverse(ZPRBegin, ZPREnd);
2534
2535 bool PTrueCreated = false;
2536 for (const RegPairInfo &RPI : RegPairs) {
2537 unsigned Reg1 = RPI.Reg1;
2538 unsigned Reg2 = RPI.Reg2;
2539
2540 // Issue sequence of restores for cs regs. The last restore may be converted
2541 // to a post-increment load later by emitEpilogue if the callee-save stack
2542 // area allocation can't be combined with the local stack area allocation.
2543 // For example:
2544 // ldp fp, lr, [sp, #32] // addImm(+4)
2545 // ldp x20, x19, [sp, #16] // addImm(+2)
2546 // ldp x22, x21, [sp, #0] // addImm(+0)
2547 // Note: see comment in spillCalleeSavedRegisters()
2548 unsigned LdrOpc;
2549 unsigned Size = TRI->getSpillSize(*RPI.RC);
2550 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2551 switch (RPI.Type) {
2552 case RegPairInfo::GPR:
2553 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2554 break;
2555 case RegPairInfo::FPR64:
2556 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2557 break;
2558 case RegPairInfo::FPR128:
2559 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2560 break;
2561 case RegPairInfo::ZPR:
2562 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2563 break;
2564 case RegPairInfo::PPR:
2565 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
2566 : AArch64::LDR_PXI;
2567 break;
2568 case RegPairInfo::VG:
2569 continue;
2570 }
2571 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2572 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2573 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2574 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2575 dbgs() << ")\n");
2576
2577 // Windows unwind codes require consecutive registers if registers are
2578 // paired. Make the switch here, so that the code below will save (x,x+1)
2579 // and not (x+1,x).
2580 unsigned FrameIdxReg1 = RPI.FrameIdx;
2581 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2582 if (NeedsWinCFI && RPI.isPaired()) {
2583 std::swap(Reg1, Reg2);
2584 std::swap(FrameIdxReg1, FrameIdxReg2);
2585 }
2586
2588 if (RPI.isPaired() && RPI.isScalable()) {
2589 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2591 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2592 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2593 "Expects SVE2.1 or SME2 target and a predicate register");
2594#ifdef EXPENSIVE_CHECKS
2595 assert(!(PPRBegin < ZPRBegin) &&
2596 "Expected callee save predicate to be handled first");
2597#endif
2598 if (!PTrueCreated) {
2599 PTrueCreated = true;
2600 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2602 }
2603 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2604 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2605 getDefRegState(true));
2607 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2608 MachineMemOperand::MOLoad, Size, Alignment));
2609 MIB.addReg(PnReg);
2610 MIB.addReg(AArch64::SP)
2611 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2612 // where 2*vscale is implicit
2615 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2616 MachineMemOperand::MOLoad, Size, Alignment));
2617 if (NeedsWinCFI)
2619 } else {
2620 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2621 if (RPI.isPaired()) {
2622 MIB.addReg(Reg2, getDefRegState(true));
2624 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2625 MachineMemOperand::MOLoad, Size, Alignment));
2626 }
2627 MIB.addReg(Reg1, getDefRegState(true));
2628 MIB.addReg(AArch64::SP)
2629 .addImm(RPI.Offset) // [sp, #offset*vscale]
2630 // where factor*vscale is implicit
2633 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2634 MachineMemOperand::MOLoad, Size, Alignment));
2635 if (NeedsWinCFI)
2637 }
2638 }
2639 return true;
2640}
2641
2642// Return the FrameID for a MMO.
2643static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2644 const MachineFrameInfo &MFI) {
2645 auto *PSV =
2647 if (PSV)
2648 return std::optional<int>(PSV->getFrameIndex());
2649
2650 if (MMO->getValue()) {
2651 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2652 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2653 FI++)
2654 if (MFI.getObjectAllocation(FI) == Al)
2655 return FI;
2656 }
2657 }
2658
2659 return std::nullopt;
2660}
2661
2662// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2663static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2664 const MachineFrameInfo &MFI) {
2665 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2666 return std::nullopt;
2667
2668 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2669}
2670
2671// Check if a Hazard slot is needed for the current function, and if so create
2672// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2673// which can be used to determine if any hazard padding is needed.
2674void AArch64FrameLowering::determineStackHazardSlot(
2675 MachineFunction &MF, BitVector &SavedRegs) const {
2676 unsigned StackHazardSize = getStackHazardSize(MF);
2677 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2678 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2680 return;
2681
2682 // Stack hazards are only needed in streaming functions.
2683 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2684 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2685 return;
2686
2687 MachineFrameInfo &MFI = MF.getFrameInfo();
2688
2689 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2690 // stack objects.
2691 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2692 return AArch64::FPR64RegClass.contains(Reg) ||
2693 AArch64::FPR128RegClass.contains(Reg) ||
2694 AArch64::ZPRRegClass.contains(Reg) ||
2695 AArch64::PPRRegClass.contains(Reg);
2696 });
2697 bool HasFPRStackObjects = false;
2698 if (!HasFPRCSRs) {
2699 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
2700 for (auto &MBB : MF) {
2701 for (auto &MI : MBB) {
2702 std::optional<int> FI = getLdStFrameID(MI, MFI);
2703 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
2704 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
2706 FrameObjects[*FI] |= 2;
2707 else
2708 FrameObjects[*FI] |= 1;
2709 }
2710 }
2711 }
2712 HasFPRStackObjects =
2713 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
2714 }
2715
2716 if (HasFPRCSRs || HasFPRStackObjects) {
2717 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2718 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2719 << StackHazardSize << "\n");
2721 }
2722}
2723
2725 BitVector &SavedRegs,
2726 RegScavenger *RS) const {
2727 // All calls are tail calls in GHC calling conv, and functions have no
2728 // prologue/epilogue.
2730 return;
2731
2733 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2734 MF.getSubtarget().getRegisterInfo());
2735 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2737 unsigned UnspilledCSGPR = AArch64::NoRegister;
2738 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2739
2740 MachineFrameInfo &MFI = MF.getFrameInfo();
2741 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2742
2743 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2744 ? RegInfo->getBaseRegister()
2745 : (unsigned)AArch64::NoRegister;
2746
2747 unsigned ExtraCSSpill = 0;
2748 bool HasUnpairedGPR64 = false;
2749 bool HasPairZReg = false;
2750 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2751 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2752
2753 // Figure out which callee-saved registers to save/restore.
2754 for (unsigned i = 0; CSRegs[i]; ++i) {
2755 const unsigned Reg = CSRegs[i];
2756
2757 // Add the base pointer register to SavedRegs if it is callee-save.
2758 if (Reg == BasePointerReg)
2759 SavedRegs.set(Reg);
2760
2761 // Don't save manually reserved registers set through +reserve-x#i,
2762 // even for callee-saved registers, as per GCC's behavior.
2763 if (UserReservedRegs[Reg]) {
2764 SavedRegs.reset(Reg);
2765 continue;
2766 }
2767
2768 bool RegUsed = SavedRegs.test(Reg);
2769 unsigned PairedReg = AArch64::NoRegister;
2770 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2771 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2772 AArch64::FPR128RegClass.contains(Reg)) {
2773 // Compensate for odd numbers of GP CSRs.
2774 // For now, all the known cases of odd number of CSRs are of GPRs.
2775 if (HasUnpairedGPR64)
2776 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2777 else
2778 PairedReg = CSRegs[i ^ 1];
2779 }
2780
2781 // If the function requires all the GP registers to save (SavedRegs),
2782 // and there are an odd number of GP CSRs at the same time (CSRegs),
2783 // PairedReg could be in a different register class from Reg, which would
2784 // lead to a FPR (usually D8) accidentally being marked saved.
2785 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2786 PairedReg = AArch64::NoRegister;
2787 HasUnpairedGPR64 = true;
2788 }
2789 assert(PairedReg == AArch64::NoRegister ||
2790 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2791 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2792 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2793
2794 if (!RegUsed) {
2795 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2796 UnspilledCSGPR = Reg;
2797 UnspilledCSGPRPaired = PairedReg;
2798 }
2799 continue;
2800 }
2801
2802 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
2803 // spilled. If all of p0-p3 are used as return values p4 is must be free
2804 // to reload p8-p15.
2805 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
2806 AArch64::PPR_p8to15RegClass.contains(Reg)) {
2807 SavedRegs.set(AArch64::P4);
2808 }
2809
2810 // MachO's compact unwind format relies on all registers being stored in
2811 // pairs.
2812 // FIXME: the usual format is actually better if unwinding isn't needed.
2813 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2814 !SavedRegs.test(PairedReg)) {
2815 SavedRegs.set(PairedReg);
2816 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2817 !ReservedRegs[PairedReg])
2818 ExtraCSSpill = PairedReg;
2819 }
2820 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2821 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2822 SavedRegs.test(CSRegs[i ^ 1]));
2823 }
2824
2825 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2827 // Find a suitable predicate register for the multi-vector spill/fill
2828 // instructions.
2829 unsigned PnReg = findFreePredicateReg(SavedRegs);
2830 if (PnReg != AArch64::NoRegister)
2831 AFI->setPredicateRegForFillSpill(PnReg);
2832 // If no free callee-save has been found assign one.
2833 if (!AFI->getPredicateRegForFillSpill() &&
2834 MF.getFunction().getCallingConv() ==
2836 SavedRegs.set(AArch64::P8);
2837 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2838 }
2839
2840 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2841 "Predicate cannot be a reserved register");
2842 }
2843
2845 !Subtarget.isTargetWindows()) {
2846 // For Windows calling convention on a non-windows OS, where X18 is treated
2847 // as reserved, back up X18 when entering non-windows code (marked with the
2848 // Windows calling convention) and restore when returning regardless of
2849 // whether the individual function uses it - it might call other functions
2850 // that clobber it.
2851 SavedRegs.set(AArch64::X18);
2852 }
2853
2854 // Calculates the callee saved stack size.
2855 unsigned CSStackSize = 0;
2856 unsigned SVECSStackSize = 0;
2858 for (unsigned Reg : SavedRegs.set_bits()) {
2859 auto *RC = TRI->getMinimalPhysRegClass(Reg);
2860 assert(RC && "expected register class!");
2861 auto SpillSize = TRI->getSpillSize(*RC);
2862 if (AArch64::PPRRegClass.contains(Reg) ||
2863 AArch64::ZPRRegClass.contains(Reg))
2864 SVECSStackSize += SpillSize;
2865 else
2866 CSStackSize += SpillSize;
2867 }
2868
2869 // Save number of saved regs, so we can easily update CSStackSize later to
2870 // account for any additional 64-bit GPR saves. Note: After this point
2871 // only 64-bit GPRs can be added to SavedRegs.
2872 unsigned NumSavedRegs = SavedRegs.count();
2873
2874 // Increase the callee-saved stack size if the function has streaming mode
2875 // changes, as we will need to spill the value of the VG register.
2876 if (requiresSaveVG(MF))
2877 CSStackSize += 8;
2878
2879 // Determine if a Hazard slot should be used, and increase the CSStackSize by
2880 // StackHazardSize if so.
2881 determineStackHazardSlot(MF, SavedRegs);
2882 if (AFI->hasStackHazardSlotIndex())
2883 CSStackSize += getStackHazardSize(MF);
2884
2885 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2886 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2887 SavedRegs.set(AArch64::LR);
2888
2889 // The frame record needs to be created by saving the appropriate registers
2890 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2891 if (hasFP(MF) ||
2892 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2893 SavedRegs.set(AArch64::FP);
2894 SavedRegs.set(AArch64::LR);
2895 }
2896
2897 LLVM_DEBUG({
2898 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2899 for (unsigned Reg : SavedRegs.set_bits())
2900 dbgs() << ' ' << printReg(Reg, RegInfo);
2901 dbgs() << "\n";
2902 });
2903
2904 // If any callee-saved registers are used, the frame cannot be eliminated.
2905 int64_t SVEStackSize =
2906 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
2907 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2908
2909 // The CSR spill slots have not been allocated yet, so estimateStackSize
2910 // won't include them.
2911 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2912
2913 // We may address some of the stack above the canonical frame address, either
2914 // for our own arguments or during a call. Include that in calculating whether
2915 // we have complicated addressing concerns.
2916 int64_t CalleeStackUsed = 0;
2917 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2918 int64_t FixedOff = MFI.getObjectOffset(I);
2919 if (FixedOff > CalleeStackUsed)
2920 CalleeStackUsed = FixedOff;
2921 }
2922
2923 // Conservatively always assume BigStack when there are SVE spills.
2924 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2925 CalleeStackUsed) > EstimatedStackSizeLimit;
2926 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2927 AFI->setHasStackFrame(true);
2928
2929 // Estimate if we might need to scavenge a register at some point in order
2930 // to materialize a stack offset. If so, either spill one additional
2931 // callee-saved register or reserve a special spill slot to facilitate
2932 // register scavenging. If we already spilled an extra callee-saved register
2933 // above to keep the number of spills even, we don't need to do anything else
2934 // here.
2935 if (BigStack) {
2936 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2937 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2938 << " to get a scratch register.\n");
2939 SavedRegs.set(UnspilledCSGPR);
2940 ExtraCSSpill = UnspilledCSGPR;
2941
2942 // MachO's compact unwind format relies on all registers being stored in
2943 // pairs, so if we need to spill one extra for BigStack, then we need to
2944 // store the pair.
2945 if (producePairRegisters(MF)) {
2946 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2947 // Failed to make a pair for compact unwind format, revert spilling.
2948 if (produceCompactUnwindFrame(*this, MF)) {
2949 SavedRegs.reset(UnspilledCSGPR);
2950 ExtraCSSpill = AArch64::NoRegister;
2951 }
2952 } else
2953 SavedRegs.set(UnspilledCSGPRPaired);
2954 }
2955 }
2956
2957 // If we didn't find an extra callee-saved register to spill, create
2958 // an emergency spill slot.
2959 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2961 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2962 unsigned Size = TRI->getSpillSize(RC);
2963 Align Alignment = TRI->getSpillAlign(RC);
2964 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2966 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2967 << " as the emergency spill slot.\n");
2968 }
2969 }
2970
2971 // Adding the size of additional 64bit GPR saves.
2972 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2973
2974 // A Swift asynchronous context extends the frame record with a pointer
2975 // directly before FP.
2976 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2977 CSStackSize += 8;
2978
2979 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2980 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2981 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2982
2984 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2985 "Should not invalidate callee saved info");
2986
2987 // Round up to register pair alignment to avoid additional SP adjustment
2988 // instructions.
2989 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2990 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2991 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
2992}
2993
2995 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2996 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2997 unsigned &MaxCSFrameIndex) const {
2998 bool NeedsWinCFI = needsWinCFI(MF);
2999 unsigned StackHazardSize = getStackHazardSize(MF);
3000 // To match the canonical windows frame layout, reverse the list of
3001 // callee saved registers to get them laid out by PrologEpilogInserter
3002 // in the right order. (PrologEpilogInserter allocates stack objects top
3003 // down. Windows canonical prologs store higher numbered registers at
3004 // the top, thus have the CSI array start from the highest registers.)
3005 if (NeedsWinCFI)
3006 std::reverse(CSI.begin(), CSI.end());
3007
3008 if (CSI.empty())
3009 return true; // Early exit if no callee saved registers are modified!
3010
3011 // Now that we know which registers need to be saved and restored, allocate
3012 // stack slots for them.
3013 MachineFrameInfo &MFI = MF.getFrameInfo();
3014 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3015
3016 bool UsesWinAAPCS = isTargetWindows(MF);
3017 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3018 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3019 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3020 if ((unsigned)FrameIdx < MinCSFrameIndex)
3021 MinCSFrameIndex = FrameIdx;
3022 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3023 MaxCSFrameIndex = FrameIdx;
3024 }
3025
3026 // Insert VG into the list of CSRs, immediately before LR if saved.
3027 if (requiresSaveVG(MF)) {
3028 CalleeSavedInfo VGInfo(AArch64::VG);
3029 auto It =
3030 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
3031 if (It != CSI.end())
3032 CSI.insert(It, VGInfo);
3033 else
3034 CSI.push_back(VGInfo);
3035 }
3036
3037 Register LastReg = 0;
3038 int HazardSlotIndex = std::numeric_limits<int>::max();
3039 for (auto &CS : CSI) {
3040 MCRegister Reg = CS.getReg();
3041 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3042
3043 // Create a hazard slot as we switch between GPR and FPR CSRs.
3044 if (AFI->hasStackHazardSlotIndex() &&
3045 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3047 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3048 "Unexpected register order for hazard slot");
3049 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3050 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3051 << "\n");
3052 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3053 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3054 MinCSFrameIndex = HazardSlotIndex;
3055 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3056 MaxCSFrameIndex = HazardSlotIndex;
3057 }
3058
3059 unsigned Size = RegInfo->getSpillSize(*RC);
3060 Align Alignment(RegInfo->getSpillAlign(*RC));
3061 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3062 CS.setFrameIdx(FrameIdx);
3063
3064 if ((unsigned)FrameIdx < MinCSFrameIndex)
3065 MinCSFrameIndex = FrameIdx;
3066 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3067 MaxCSFrameIndex = FrameIdx;
3068
3069 // Grab 8 bytes below FP for the extended asynchronous frame info.
3070 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3071 Reg == AArch64::FP) {
3072 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3073 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3074 if ((unsigned)FrameIdx < MinCSFrameIndex)
3075 MinCSFrameIndex = FrameIdx;
3076 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3077 MaxCSFrameIndex = FrameIdx;
3078 }
3079 LastReg = Reg;
3080 }
3081
3082 // Add hazard slot in the case where no FPR CSRs are present.
3083 if (AFI->hasStackHazardSlotIndex() &&
3084 HazardSlotIndex == std::numeric_limits<int>::max()) {
3085 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3086 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3087 << "\n");
3088 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3089 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3090 MinCSFrameIndex = HazardSlotIndex;
3091 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3092 MaxCSFrameIndex = HazardSlotIndex;
3093 }
3094
3095 return true;
3096}
3097
3099 const MachineFunction &MF) const {
3101 // If the function has streaming-mode changes, don't scavenge a
3102 // spillslot in the callee-save area, as that might require an
3103 // 'addvl' in the streaming-mode-changing call-sequence when the
3104 // function doesn't use a FP.
3105 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3106 return false;
3107 // Don't allow register salvaging with hazard slots, in case it moves objects
3108 // into the wrong place.
3109 if (AFI->hasStackHazardSlotIndex())
3110 return false;
3111 return AFI->hasCalleeSaveStackFreeSpace();
3112}
3113
3114/// returns true if there are any SVE callee saves.
3116 int &Min, int &Max) {
3117 Min = std::numeric_limits<int>::max();
3118 Max = std::numeric_limits<int>::min();
3119
3120 if (!MFI.isCalleeSavedInfoValid())
3121 return false;
3122
3123 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3124 for (auto &CS : CSI) {
3125 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3126 AArch64::PPRRegClass.contains(CS.getReg())) {
3127 assert((Max == std::numeric_limits<int>::min() ||
3128 Max + 1 == CS.getFrameIdx()) &&
3129 "SVE CalleeSaves are not consecutive");
3130
3131 Min = std::min(Min, CS.getFrameIdx());
3132 Max = std::max(Max, CS.getFrameIdx());
3133 }
3134 }
3135 return Min != std::numeric_limits<int>::max();
3136}
3137
3138// Process all the SVE stack objects and determine offsets for each
3139// object. If AssignOffsets is true, the offsets get assigned.
3140// Fills in the first and last callee-saved frame indices into
3141// Min/MaxCSFrameIndex, respectively.
3142// Returns the size of the stack.
3144 int &MinCSFrameIndex,
3145 int &MaxCSFrameIndex,
3146 bool AssignOffsets) {
3147#ifndef NDEBUG
3148 // First process all fixed stack objects.
3149 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3151 "SVE vectors should never be passed on the stack by value, only by "
3152 "reference.");
3153#endif
3154
3155 auto Assign = [&MFI](int FI, int64_t Offset) {
3156 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3157 MFI.setObjectOffset(FI, Offset);
3158 };
3159
3160 int64_t Offset = 0;
3161
3162 // Then process all callee saved slots.
3163 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3164 // Assign offsets to the callee save slots.
3165 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3166 Offset += MFI.getObjectSize(I);
3168 if (AssignOffsets)
3169 Assign(I, -Offset);
3170 }
3171 }
3172
3173 // Ensure that the Callee-save area is aligned to 16bytes.
3174 Offset = alignTo(Offset, Align(16U));
3175
3176 // Create a buffer of SVE objects to allocate and sort it.
3177 SmallVector<int, 8> ObjectsToAllocate;
3178 // If we have a stack protector, and we've previously decided that we have SVE
3179 // objects on the stack and thus need it to go in the SVE stack area, then it
3180 // needs to go first.
3181 int StackProtectorFI = -1;
3182 if (MFI.hasStackProtectorIndex()) {
3183 StackProtectorFI = MFI.getStackProtectorIndex();
3184 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3185 ObjectsToAllocate.push_back(StackProtectorFI);
3186 }
3187 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3188 unsigned StackID = MFI.getStackID(I);
3189 if (StackID != TargetStackID::ScalableVector)
3190 continue;
3191 if (I == StackProtectorFI)
3192 continue;
3193 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3194 continue;
3195 if (MFI.isDeadObjectIndex(I))
3196 continue;
3197
3198 ObjectsToAllocate.push_back(I);
3199 }
3200
3201 // Allocate all SVE locals and spills
3202 for (unsigned FI : ObjectsToAllocate) {
3203 Align Alignment = MFI.getObjectAlign(FI);
3204 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3205 // two, we'd need to align every object dynamically at runtime if the
3206 // alignment is larger than 16. This is not yet supported.
3207 if (Alignment > Align(16))
3209 "Alignment of scalable vectors > 16 bytes is not yet supported");
3210
3211 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3212 if (AssignOffsets)
3213 Assign(FI, -Offset);
3214 }
3215
3216 return Offset;
3217}
3218
3219int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3220 MachineFrameInfo &MFI) const {
3221 int MinCSFrameIndex, MaxCSFrameIndex;
3222 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3223}
3224
3225int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3226 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3227 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3228 true);
3229}
3230
3231/// Attempts to scavenge a register from \p ScavengeableRegs given the used
3232/// registers in \p UsedRegs.
3235 Register PreferredReg) {
3236 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
3237 return PreferredReg;
3238 for (auto Reg : ScavengeableRegs.set_bits()) {
3239 if (UsedRegs.available(Reg))
3240 return Reg;
3241 }
3242 return AArch64::NoRegister;
3243}
3244
3245/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
3246/// \p MachineInstrs.
3247static void propagateFrameFlags(MachineInstr &SourceMI,
3248 ArrayRef<MachineInstr *> MachineInstrs) {
3249 for (MachineInstr *MI : MachineInstrs) {
3250 if (SourceMI.getFlag(MachineInstr::FrameSetup))
3251 MI->setFlag(MachineInstr::FrameSetup);
3252 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
3254 }
3255}
3256
3257/// RAII helper class for scavenging or spilling a register. On construction
3258/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
3259/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
3260/// MaybeSpillFI to free a register. The free'd register is returned via the \p
3261/// FreeReg output parameter. On destruction, if there is a spill, its previous
3262/// value is reloaded. The spilling and scavenging is only valid at the
3263/// insertion point \p MBBI, this class should _not_ be used in places that
3264/// create or manipulate basic blocks, moving the expected insertion point.
3268
3271 Register SpillCandidate, const TargetRegisterClass &RC,
3272 LiveRegUnits const &UsedRegs,
3273 BitVector const &AllocatableRegs,
3274 std::optional<int> *MaybeSpillFI,
3275 Register PreferredReg = AArch64::NoRegister)
3276 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
3277 *MF.getSubtarget().getInstrInfo())),
3278 TRI(*MF.getSubtarget().getRegisterInfo()) {
3279 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
3280 if (FreeReg != AArch64::NoRegister)
3281 return;
3282 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
3283 "(attempted to spill in prologue/epilogue?)");
3284 if (!MaybeSpillFI->has_value()) {
3285 MachineFrameInfo &MFI = MF.getFrameInfo();
3286 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
3287 TRI.getSpillAlign(RC));
3288 }
3289 FreeReg = SpillCandidate;
3290 SpillFI = MaybeSpillFI->value();
3291 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
3292 Register());
3293 }
3294
3295 bool hasSpilled() const { return SpillFI.has_value(); }
3296
3297 /// Returns the free register (found from scavenging or spilling a register).
3298 Register freeRegister() const { return FreeReg; }
3299
3300 Register operator*() const { return freeRegister(); }
3301
3303 if (hasSpilled())
3304 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
3305 Register());
3306 }
3307
3308private:
3311 const TargetRegisterClass &RC;
3312 const AArch64InstrInfo &TII;
3313 const TargetRegisterInfo &TRI;
3314 Register FreeReg = AArch64::NoRegister;
3315 std::optional<int> SpillFI;
3316};
3317
3318/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
3319/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3321 std::optional<int> ZPRSpillFI;
3322 std::optional<int> PPRSpillFI;
3323 std::optional<int> GPRSpillFI;
3324};
3325
3326/// Registers available for scavenging (ZPR, PPR3b, GPR).
3332
3334 return MI.getFlag(MachineInstr::FrameSetup) ||
3336}
3337
3338/// Expands:
3339/// ```
3340/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
3341/// ```
3342/// To:
3343/// ```
3344/// $z0 = CPY_ZPzI_B $p0, 1, 0
3345/// STR_ZXI $z0, $stack.0, 0
3346/// ```
3347/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3348/// spilling if necessary).
3351 const TargetRegisterInfo &TRI,
3352 LiveRegUnits const &UsedRegs,
3353 ScavengeableRegs const &SR,
3354 EmergencyStackSlots &SpillSlots) {
3355 MachineFunction &MF = *MBB.getParent();
3356 auto *TII =
3357 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3358
3359 ScopedScavengeOrSpill ZPredReg(
3360 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3361 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3362
3363 SmallVector<MachineInstr *, 2> MachineInstrs;
3364 const DebugLoc &DL = MI.getDebugLoc();
3365 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
3366 .addReg(*ZPredReg, RegState::Define)
3367 .add(MI.getOperand(0))
3368 .addImm(1)
3369 .addImm(0)
3370 .getInstr());
3371 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
3372 .addReg(*ZPredReg)
3373 .add(MI.getOperand(1))
3374 .addImm(MI.getOperand(2).getImm())
3375 .setMemRefs(MI.memoperands())
3376 .getInstr());
3377 propagateFrameFlags(MI, MachineInstrs);
3378}
3379
3380/// Expands:
3381/// ```
3382/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
3383/// ```
3384/// To:
3385/// ```
3386/// $z0 = LDR_ZXI %stack.0, 0
3387/// $p0 = PTRUE_B 31, implicit $vg
3388/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
3389/// ```
3390/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3391/// spilling if necessary). If the status flags are in use at the point of
3392/// expansion they are preserved (by moving them to/from a GPR). This may cause
3393/// an additional spill if no GPR is free at the expansion point.
3396 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
3397 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
3398 MachineFunction &MF = *MBB.getParent();
3399 auto *TII =
3400 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3401
3402 ScopedScavengeOrSpill ZPredReg(
3403 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3404 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3405
3406 ScopedScavengeOrSpill PredReg(
3407 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
3408 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
3409 /*PreferredReg=*/
3410 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
3411
3412 // Elide NZCV spills if we know it is not used.
3413 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
3414 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
3415 if (IsNZCVUsed)
3416 NZCVSaveReg.emplace(
3417 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
3418 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
3419 SmallVector<MachineInstr *, 4> MachineInstrs;
3420 const DebugLoc &DL = MI.getDebugLoc();
3421 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
3422 .addReg(*ZPredReg, RegState::Define)
3423 .add(MI.getOperand(1))
3424 .addImm(MI.getOperand(2).getImm())
3425 .setMemRefs(MI.memoperands())
3426 .getInstr());
3427 if (IsNZCVUsed)
3428 MachineInstrs.push_back(
3429 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
3430 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
3431 .addImm(AArch64SysReg::NZCV)
3432 .addReg(AArch64::NZCV, RegState::Implicit)
3433 .getInstr());
3434
3435 // Reuse previous ptrue if we know it has not been clobbered.
3436 if (LastPTrue) {
3437 assert(*PredReg == LastPTrue->getOperand(0).getReg());
3438 LastPTrue->moveBefore(&MI);
3439 } else {
3440 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
3441 .addReg(*PredReg, RegState::Define)
3442 .addImm(31);
3443 }
3444 MachineInstrs.push_back(LastPTrue);
3445 MachineInstrs.push_back(
3446 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
3447 .addReg(MI.getOperand(0).getReg(), RegState::Define)
3448 .addReg(*PredReg)
3449 .addReg(*ZPredReg)
3450 .addImm(0)
3451 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3452 .getInstr());
3453 if (IsNZCVUsed)
3454 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
3455 .addImm(AArch64SysReg::NZCV)
3456 .addReg(NZCVSaveReg->freeRegister())
3457 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3458 .getInstr());
3459
3460 propagateFrameFlags(MI, MachineInstrs);
3461 return PredReg.hasSpilled();
3462}
3463
3464/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
3465/// operations within the MachineBasicBlock \p MBB.
3467 const TargetRegisterInfo &TRI,
3468 ScavengeableRegs const &SR,
3469 EmergencyStackSlots &SpillSlots) {
3470 LiveRegUnits UsedRegs(TRI);
3471 UsedRegs.addLiveOuts(MBB);
3472 bool HasPPRSpills = false;
3473 MachineInstr *LastPTrue = nullptr;
3475 UsedRegs.stepBackward(MI);
3476 switch (MI.getOpcode()) {
3477 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
3478 if (LastPTrue &&
3479 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
3480 LastPTrue = nullptr;
3481 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
3482 LastPTrue, SpillSlots);
3483 MI.eraseFromParent();
3484 break;
3485 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
3486 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
3487 MI.eraseFromParent();
3488 [[fallthrough]];
3489 default:
3490 LastPTrue = nullptr;
3491 break;
3492 }
3493 }
3494
3495 return HasPPRSpills;
3496}
3497
3499 MachineFunction &MF, RegScavenger *RS) const {
3500
3502 const TargetSubtargetInfo &TSI = MF.getSubtarget();
3503 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
3504
3505 // If predicates spills are 16-bytes we may need to expand
3506 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3507 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
3508 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
3509 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
3510 assert(Regs.count() > 0 && "Expected scavengeable registers");
3511 return Regs;
3512 };
3513
3514 ScavengeableRegs SR{};
3515 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
3516 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
3517 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
3518 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
3519
3520 EmergencyStackSlots SpillSlots;
3521 for (MachineBasicBlock &MBB : MF) {
3522 // In the case we had to spill a predicate (in the range p0-p7) to reload
3523 // a predicate (>= p8), additional spill/fill pseudos will be created.
3524 // These need an additional expansion pass. Note: There will only be at
3525 // most two expansion passes, as spilling/filling a predicate in the range
3526 // p0-p7 never requires spilling another predicate.
3527 for (int Pass = 0; Pass < 2; Pass++) {
3528 bool HasPPRSpills =
3529 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
3530 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
3531 if (!HasPPRSpills)
3532 break;
3533 }
3534 }
3535 }
3536
3537 MachineFrameInfo &MFI = MF.getFrameInfo();
3538
3540 "Upwards growing stack unsupported");
3541
3542 int MinCSFrameIndex, MaxCSFrameIndex;
3543 int64_t SVEStackSize =
3544 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3545
3546 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3547 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3548
3549 // If this function isn't doing Win64-style C++ EH, we don't need to do
3550 // anything.
3551 if (!MF.hasEHFunclets())
3552 return;
3553
3554 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3555 // object area right next to the UnwindHelp object.
3556 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3557 int64_t CurrentOffset =
3559 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3560 for (WinEHHandlerType &H : TBME.HandlerArray) {
3561 int FrameIndex = H.CatchObj.FrameIndex;
3562 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3563 CurrentOffset =
3564 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3565 CurrentOffset += MFI.getObjectSize(FrameIndex);
3566 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3567 }
3568 }
3569 }
3570
3571 // Create an UnwindHelp object.
3572 // The UnwindHelp object is allocated at the start of the fixed object area
3573 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3574 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3575 /*IsFunclet*/ false) &&
3576 "UnwindHelpOffset must be at the start of the fixed object area");
3577 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3578 /*IsImmutable=*/false);
3579 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3580
3581 MachineBasicBlock &MBB = MF.front();
3582 auto MBBI = MBB.begin();
3583 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3584 ++MBBI;
3585
3586 // We need to store -2 into the UnwindHelp object at the start of the
3587 // function.
3588 DebugLoc DL;
3590 RS->backward(MBBI);
3591 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3592 assert(DstReg && "There must be a free register after frame setup");
3594 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3595 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3596 .addReg(DstReg, getKillRegState(true))
3597 .addFrameIndex(UnwindHelpFI)
3598 .addImm(0);
3599}
3600
3601namespace {
3602struct TagStoreInstr {
3604 int64_t Offset, Size;
3605 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3606 : MI(MI), Offset(Offset), Size(Size) {}
3607};
3608
3609class TagStoreEdit {
3610 MachineFunction *MF;
3611 MachineBasicBlock *MBB;
3612 MachineRegisterInfo *MRI;
3613 // Tag store instructions that are being replaced.
3615 // Combined memref arguments of the above instructions.
3617
3618 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3619 // FrameRegOffset + Size) with the address tag of SP.
3620 Register FrameReg;
3621 StackOffset FrameRegOffset;
3622 int64_t Size;
3623 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3624 // end.
3625 std::optional<int64_t> FrameRegUpdate;
3626 // MIFlags for any FrameReg updating instructions.
3627 unsigned FrameRegUpdateFlags;
3628
3629 // Use zeroing instruction variants.
3630 bool ZeroData;
3631 DebugLoc DL;
3632
3633 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3634 void emitLoop(MachineBasicBlock::iterator InsertI);
3635
3636public:
3637 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3638 : MBB(MBB), ZeroData(ZeroData) {
3639 MF = MBB->getParent();
3640 MRI = &MF->getRegInfo();
3641 }
3642 // Add an instruction to be replaced. Instructions must be added in the
3643 // ascending order of Offset, and have to be adjacent.
3644 void addInstruction(TagStoreInstr I) {
3645 assert((TagStores.empty() ||
3646 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3647 "Non-adjacent tag store instructions.");
3648 TagStores.push_back(I);
3649 }
3650 void clear() { TagStores.clear(); }
3651 // Emit equivalent code at the given location, and erase the current set of
3652 // instructions. May skip if the replacement is not profitable. May invalidate
3653 // the input iterator and replace it with a valid one.
3654 void emitCode(MachineBasicBlock::iterator &InsertI,
3655 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3656};
3657
3658void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3659 const AArch64InstrInfo *TII =
3660 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3661
3662 const int64_t kMinOffset = -256 * 16;
3663 const int64_t kMaxOffset = 255 * 16;
3664
3665 Register BaseReg = FrameReg;
3666 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3667 if (BaseRegOffsetBytes < kMinOffset ||
3668 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3669 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3670 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3671 // is required for the offset of ST2G.
3672 BaseRegOffsetBytes % 16 != 0) {
3673 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3674 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3675 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3676 BaseReg = ScratchReg;
3677 BaseRegOffsetBytes = 0;
3678 }
3679
3680 MachineInstr *LastI = nullptr;
3681 while (Size) {
3682 int64_t InstrSize = (Size > 16) ? 32 : 16;
3683 unsigned Opcode =
3684 InstrSize == 16
3685 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3686 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3687 assert(BaseRegOffsetBytes % 16 == 0);
3688 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3689 .addReg(AArch64::SP)
3690 .addReg(BaseReg)
3691 .addImm(BaseRegOffsetBytes / 16)
3692 .setMemRefs(CombinedMemRefs);
3693 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3694 // final SP adjustment in the epilogue.
3695 if (BaseRegOffsetBytes == 0)
3696 LastI = I;
3697 BaseRegOffsetBytes += InstrSize;
3698 Size -= InstrSize;
3699 }
3700
3701 if (LastI)
3702 MBB->splice(InsertI, MBB, LastI);
3703}
3704
3705void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3706 const AArch64InstrInfo *TII =
3707 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3708
3709 Register BaseReg = FrameRegUpdate
3710 ? FrameReg
3711 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3712 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3713
3714 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3715
3716 int64_t LoopSize = Size;
3717 // If the loop size is not a multiple of 32, split off one 16-byte store at
3718 // the end to fold BaseReg update into.
3719 if (FrameRegUpdate && *FrameRegUpdate)
3720 LoopSize -= LoopSize % 32;
3721 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3722 TII->get(ZeroData ? AArch64::STZGloop_wback
3723 : AArch64::STGloop_wback))
3724 .addDef(SizeReg)
3725 .addDef(BaseReg)
3726 .addImm(LoopSize)
3727 .addReg(BaseReg)
3728 .setMemRefs(CombinedMemRefs);
3729 if (FrameRegUpdate)
3730 LoopI->setFlags(FrameRegUpdateFlags);
3731
3732 int64_t ExtraBaseRegUpdate =
3733 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3734 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3735 << ", Size=" << Size
3736 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3737 << ", FrameRegUpdate=" << FrameRegUpdate
3738 << ", FrameRegOffset.getFixed()="
3739 << FrameRegOffset.getFixed() << "\n");
3740 if (LoopSize < Size) {
3741 assert(FrameRegUpdate);
3742 assert(Size - LoopSize == 16);
3743 // Tag 16 more bytes at BaseReg and update BaseReg.
3744 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3745 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3746 "STG immediate out of range");
3747 BuildMI(*MBB, InsertI, DL,
3748 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3749 .addDef(BaseReg)
3750 .addReg(BaseReg)
3751 .addReg(BaseReg)
3752 .addImm(STGOffset / 16)
3753 .setMemRefs(CombinedMemRefs)
3754 .setMIFlags(FrameRegUpdateFlags);
3755 } else if (ExtraBaseRegUpdate) {
3756 // Update BaseReg.
3757 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3758 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3759 BuildMI(
3760 *MBB, InsertI, DL,
3761 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3762 .addDef(BaseReg)
3763 .addReg(BaseReg)
3764 .addImm(AddSubOffset)
3765 .addImm(0)
3766 .setMIFlags(FrameRegUpdateFlags);
3767 }
3768}
3769
3770// Check if *II is a register update that can be merged into STGloop that ends
3771// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3772// end of the loop.
3773bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3774 int64_t Size, int64_t *TotalOffset) {
3775 MachineInstr &MI = *II;
3776 if ((MI.getOpcode() == AArch64::ADDXri ||
3777 MI.getOpcode() == AArch64::SUBXri) &&
3778 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3779 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3780 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3781 if (MI.getOpcode() == AArch64::SUBXri)
3782 Offset = -Offset;
3783 int64_t PostOffset = Offset - Size;
3784 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3785 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3786 // chosen depends on the alignment of the loop size, but the difference
3787 // between the valid ranges for the two instructions is small, so we
3788 // conservatively assume that it could be either case here.
3789 //
3790 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3791 // instruction.
3792 const int64_t kMaxOffset = 4080 - 16;
3793 // Max offset of SUBXri.
3794 const int64_t kMinOffset = -4095;
3795 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3796 PostOffset % 16 == 0) {
3797 *TotalOffset = Offset;
3798 return true;
3799 }
3800 }
3801 return false;
3802}
3803
3804void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3806 MemRefs.clear();
3807 for (auto &TS : TSE) {
3808 MachineInstr *MI = TS.MI;
3809 // An instruction without memory operands may access anything. Be
3810 // conservative and return an empty list.
3811 if (MI->memoperands_empty()) {
3812 MemRefs.clear();
3813 return;
3814 }
3815 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3816 }
3817}
3818
3819void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3820 const AArch64FrameLowering *TFI,
3821 bool TryMergeSPUpdate) {
3822 if (TagStores.empty())
3823 return;
3824 TagStoreInstr &FirstTagStore = TagStores[0];
3825 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3826 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3827 DL = TagStores[0].MI->getDebugLoc();
3828
3829 Register Reg;
3830 FrameRegOffset = TFI->resolveFrameOffsetReference(
3831 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
3832 /*PreferFP=*/false, /*ForSimm=*/true);
3833 FrameReg = Reg;
3834 FrameRegUpdate = std::nullopt;
3835
3836 mergeMemRefs(TagStores, CombinedMemRefs);
3837
3838 LLVM_DEBUG({
3839 dbgs() << "Replacing adjacent STG instructions:\n";
3840 for (const auto &Instr : TagStores) {
3841 dbgs() << " " << *Instr.MI;
3842 }
3843 });
3844
3845 // Size threshold where a loop becomes shorter than a linear sequence of
3846 // tagging instructions.
3847 const int kSetTagLoopThreshold = 176;
3848 if (Size < kSetTagLoopThreshold) {
3849 if (TagStores.size() < 2)
3850 return;
3851 emitUnrolled(InsertI);
3852 } else {
3853 MachineInstr *UpdateInstr = nullptr;
3854 int64_t TotalOffset = 0;
3855 if (TryMergeSPUpdate) {
3856 // See if we can merge base register update into the STGloop.
3857 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3858 // but STGloop is way too unusual for that, and also it only
3859 // realistically happens in function epilogue. Also, STGloop is expanded
3860 // before that pass.
3861 if (InsertI != MBB->end() &&
3862 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3863 &TotalOffset)) {
3864 UpdateInstr = &*InsertI++;
3865 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3866 << *UpdateInstr);
3867 }
3868 }
3869
3870 if (!UpdateInstr && TagStores.size() < 2)
3871 return;
3872
3873 if (UpdateInstr) {
3874 FrameRegUpdate = TotalOffset;
3875 FrameRegUpdateFlags = UpdateInstr->getFlags();
3876 }
3877 emitLoop(InsertI);
3878 if (UpdateInstr)
3879 UpdateInstr->eraseFromParent();
3880 }
3881
3882 for (auto &TS : TagStores)
3883 TS.MI->eraseFromParent();
3884}
3885
3886bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3887 int64_t &Size, bool &ZeroData) {
3888 MachineFunction &MF = *MI.getParent()->getParent();
3889 const MachineFrameInfo &MFI = MF.getFrameInfo();
3890
3891 unsigned Opcode = MI.getOpcode();
3892 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3893 Opcode == AArch64::STZ2Gi);
3894
3895 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3896 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3897 return false;
3898 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3899 return false;
3900 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3901 Size = MI.getOperand(2).getImm();
3902 return true;
3903 }
3904
3905 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3906 Size = 16;
3907 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3908 Size = 32;
3909 else
3910 return false;
3911
3912 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3913 return false;
3914
3915 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3916 16 * MI.getOperand(2).getImm();
3917 return true;
3918}
3919
3920// Detect a run of memory tagging instructions for adjacent stack frame slots,
3921// and replace them with a shorter instruction sequence:
3922// * replace STG + STG with ST2G
3923// * replace STGloop + STGloop with STGloop
3924// This code needs to run when stack slot offsets are already known, but before
3925// FrameIndex operands in STG instructions are eliminated.
3927 const AArch64FrameLowering *TFI,
3928 RegScavenger *RS) {
3929 bool FirstZeroData;
3930 int64_t Size, Offset;
3931 MachineInstr &MI = *II;
3932 MachineBasicBlock *MBB = MI.getParent();
3934 if (&MI == &MBB->instr_back())
3935 return II;
3936 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3937 return II;
3938
3940 Instrs.emplace_back(&MI, Offset, Size);
3941
3942 constexpr int kScanLimit = 10;
3943 int Count = 0;
3945 NextI != E && Count < kScanLimit; ++NextI) {
3946 MachineInstr &MI = *NextI;
3947 bool ZeroData;
3948 int64_t Size, Offset;
3949 // Collect instructions that update memory tags with a FrameIndex operand
3950 // and (when applicable) constant size, and whose output registers are dead
3951 // (the latter is almost always the case in practice). Since these
3952 // instructions effectively have no inputs or outputs, we are free to skip
3953 // any non-aliasing instructions in between without tracking used registers.
3954 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3955 if (ZeroData != FirstZeroData)
3956 break;
3957 Instrs.emplace_back(&MI, Offset, Size);
3958 continue;
3959 }
3960
3961 // Only count non-transient, non-tagging instructions toward the scan
3962 // limit.
3963 if (!MI.isTransient())
3964 ++Count;
3965
3966 // Just in case, stop before the epilogue code starts.
3967 if (MI.getFlag(MachineInstr::FrameSetup) ||
3969 break;
3970
3971 // Reject anything that may alias the collected instructions.
3972 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3973 break;
3974 }
3975
3976 // New code will be inserted after the last tagging instruction we've found.
3977 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3978
3979 // All the gathered stack tag instructions are merged and placed after
3980 // last tag store in the list. The check should be made if the nzcv
3981 // flag is live at the point where we are trying to insert. Otherwise
3982 // the nzcv flag might get clobbered if any stg loops are present.
3983
3984 // FIXME : This approach of bailing out from merge is conservative in
3985 // some ways like even if stg loops are not present after merge the
3986 // insert list, this liveness check is done (which is not needed).
3988 LiveRegs.addLiveOuts(*MBB);
3989 for (auto I = MBB->rbegin();; ++I) {
3990 MachineInstr &MI = *I;
3991 if (MI == InsertI)
3992 break;
3993 LiveRegs.stepBackward(*I);
3994 }
3995 InsertI++;
3996 if (LiveRegs.contains(AArch64::NZCV))
3997 return InsertI;
3998
3999 llvm::stable_sort(Instrs,
4000 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4001 return Left.Offset < Right.Offset;
4002 });
4003
4004 // Make sure that we don't have any overlapping stores.
4005 int64_t CurOffset = Instrs[0].Offset;
4006 for (auto &Instr : Instrs) {
4007 if (CurOffset > Instr.Offset)
4008 return NextI;
4009 CurOffset = Instr.Offset + Instr.Size;
4010 }
4011
4012 // Find contiguous runs of tagged memory and emit shorter instruction
4013 // sequences for them when possible.
4014 TagStoreEdit TSE(MBB, FirstZeroData);
4015 std::optional<int64_t> EndOffset;
4016 for (auto &Instr : Instrs) {
4017 if (EndOffset && *EndOffset != Instr.Offset) {
4018 // Found a gap.
4019 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4020 TSE.clear();
4021 }
4022
4023 TSE.addInstruction(Instr);
4024 EndOffset = Instr.Offset + Instr.Size;
4025 }
4026
4027 const MachineFunction *MF = MBB->getParent();
4028 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4029 TSE.emitCode(
4030 InsertI, TFI, /*TryMergeSPUpdate = */
4032
4033 return InsertI;
4034}
4035} // namespace
4036
4038 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4039 for (auto &BB : MF)
4040 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4042 II = tryMergeAdjacentSTG(II, this, RS);
4043 }
4044
4045 // By the time this method is called, most of the prologue/epilogue code is
4046 // already emitted, whether its location was affected by the shrink-wrapping
4047 // optimization or not.
4048 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
4049 shouldSignReturnAddressEverywhere(MF))
4051}
4052
4053/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4054/// before the update. This is easily retrieved as it is exactly the offset
4055/// that is set in processFunctionBeforeFrameFinalized.
4057 const MachineFunction &MF, int FI, Register &FrameReg,
4058 bool IgnoreSPUpdates) const {
4059 const MachineFrameInfo &MFI = MF.getFrameInfo();
4060 if (IgnoreSPUpdates) {
4061 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4062 << MFI.getObjectOffset(FI) << "\n");
4063 FrameReg = AArch64::SP;
4064 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4065 }
4066
4067 // Go to common code if we cannot provide sp + offset.
4068 if (MFI.hasVarSizedObjects() ||
4071 return getFrameIndexReference(MF, FI, FrameReg);
4072
4073 FrameReg = AArch64::SP;
4074 return getStackOffset(MF, MFI.getObjectOffset(FI));
4075}
4076
4077/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4078/// the parent's frame pointer
4080 const MachineFunction &MF) const {
4081 return 0;
4082}
4083
4084/// Funclets only need to account for space for the callee saved registers,
4085/// as the locals are accounted for in the parent's stack frame.
4087 const MachineFunction &MF) const {
4088 // This is the size of the pushed CSRs.
4089 unsigned CSSize =
4090 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4091 // This is the amount of stack a funclet needs to allocate.
4092 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4093 getStackAlign());
4094}
4095
4096namespace {
4097struct FrameObject {
4098 bool IsValid = false;
4099 // Index of the object in MFI.
4100 int ObjectIndex = 0;
4101 // Group ID this object belongs to.
4102 int GroupIndex = -1;
4103 // This object should be placed first (closest to SP).
4104 bool ObjectFirst = false;
4105 // This object's group (which always contains the object with
4106 // ObjectFirst==true) should be placed first.
4107 bool GroupFirst = false;
4108
4109 // Used to distinguish between FP and GPR accesses. The values are decided so
4110 // that they sort FPR < Hazard < GPR and they can be or'd together.
4111 unsigned Accesses = 0;
4112 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4113};
4114
4115class GroupBuilder {
4116 SmallVector<int, 8> CurrentMembers;
4117 int NextGroupIndex = 0;
4118 std::vector<FrameObject> &Objects;
4119
4120public:
4121 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4122 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4123 void EndCurrentGroup() {
4124 if (CurrentMembers.size() > 1) {
4125 // Create a new group with the current member list. This might remove them
4126 // from their pre-existing groups. That's OK, dealing with overlapping
4127 // groups is too hard and unlikely to make a difference.
4128 LLVM_DEBUG(dbgs() << "group:");
4129 for (int Index : CurrentMembers) {
4130 Objects[Index].GroupIndex = NextGroupIndex;
4131 LLVM_DEBUG(dbgs() << " " << Index);
4132 }
4133 LLVM_DEBUG(dbgs() << "\n");
4134 NextGroupIndex++;
4135 }
4136 CurrentMembers.clear();
4137 }
4138};
4139
4140bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4141 // Objects at a lower index are closer to FP; objects at a higher index are
4142 // closer to SP.
4143 //
4144 // For consistency in our comparison, all invalid objects are placed
4145 // at the end. This also allows us to stop walking when we hit the
4146 // first invalid item after it's all sorted.
4147 //
4148 // If we want to include a stack hazard region, order FPR accesses < the
4149 // hazard object < GPRs accesses in order to create a separation between the
4150 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4151 //
4152 // Otherwise the "first" object goes first (closest to SP), followed by the
4153 // members of the "first" group.
4154 //
4155 // The rest are sorted by the group index to keep the groups together.
4156 // Higher numbered groups are more likely to be around longer (i.e. untagged
4157 // in the function epilogue and not at some earlier point). Place them closer
4158 // to SP.
4159 //
4160 // If all else equal, sort by the object index to keep the objects in the
4161 // original order.
4162 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4163 A.GroupIndex, A.ObjectIndex) <
4164 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4165 B.GroupIndex, B.ObjectIndex);
4166}
4167} // namespace
4168
4170 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4171 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4172 return;
4173
4175 const MachineFrameInfo &MFI = MF.getFrameInfo();
4176 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4177 for (auto &Obj : ObjectsToAllocate) {
4178 FrameObjects[Obj].IsValid = true;
4179 FrameObjects[Obj].ObjectIndex = Obj;
4180 }
4181
4182 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4183 // the same time.
4184 GroupBuilder GB(FrameObjects);
4185 for (auto &MBB : MF) {
4186 for (auto &MI : MBB) {
4187 if (MI.isDebugInstr())
4188 continue;
4189
4190 if (AFI.hasStackHazardSlotIndex()) {
4191 std::optional<int> FI = getLdStFrameID(MI, MFI);
4192 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4193 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4195 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4196 else
4197 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4198 }
4199 }
4200
4201 int OpIndex;
4202 switch (MI.getOpcode()) {
4203 case AArch64::STGloop:
4204 case AArch64::STZGloop:
4205 OpIndex = 3;
4206 break;
4207 case AArch64::STGi:
4208 case AArch64::STZGi:
4209 case AArch64::ST2Gi:
4210 case AArch64::STZ2Gi:
4211 OpIndex = 1;
4212 break;
4213 default:
4214 OpIndex = -1;
4215 }
4216
4217 int TaggedFI = -1;
4218 if (OpIndex >= 0) {
4219 const MachineOperand &MO = MI.getOperand(OpIndex);
4220 if (MO.isFI()) {
4221 int FI = MO.getIndex();
4222 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4223 FrameObjects[FI].IsValid)
4224 TaggedFI = FI;
4225 }
4226 }
4227
4228 // If this is a stack tagging instruction for a slot that is not part of a
4229 // group yet, either start a new group or add it to the current one.
4230 if (TaggedFI >= 0)
4231 GB.AddMember(TaggedFI);
4232 else
4233 GB.EndCurrentGroup();
4234 }
4235 // Groups should never span multiple basic blocks.
4236 GB.EndCurrentGroup();
4237 }
4238
4239 if (AFI.hasStackHazardSlotIndex()) {
4240 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4241 FrameObject::AccessHazard;
4242 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4243 for (auto &Obj : FrameObjects)
4244 if (!Obj.Accesses ||
4245 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4246 Obj.Accesses = FrameObject::AccessGPR;
4247 }
4248
4249 // If the function's tagged base pointer is pinned to a stack slot, we want to
4250 // put that slot first when possible. This will likely place it at SP + 0,
4251 // and save one instruction when generating the base pointer because IRG does
4252 // not allow an immediate offset.
4253 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4254 if (TBPI) {
4255 FrameObjects[*TBPI].ObjectFirst = true;
4256 FrameObjects[*TBPI].GroupFirst = true;
4257 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4258 if (FirstGroupIndex >= 0)
4259 for (FrameObject &Object : FrameObjects)
4260 if (Object.GroupIndex == FirstGroupIndex)
4261 Object.GroupFirst = true;
4262 }
4263
4264 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4265
4266 int i = 0;
4267 for (auto &Obj : FrameObjects) {
4268 // All invalid items are sorted at the end, so it's safe to stop.
4269 if (!Obj.IsValid)
4270 break;
4271 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4272 }
4273
4274 LLVM_DEBUG({
4275 dbgs() << "Final frame order:\n";
4276 for (auto &Obj : FrameObjects) {
4277 if (!Obj.IsValid)
4278 break;
4279 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4280 if (Obj.ObjectFirst)
4281 dbgs() << ", first";
4282 if (Obj.GroupFirst)
4283 dbgs() << ", group-first";
4284 dbgs() << "\n";
4285 }
4286 });
4287}
4288
4289/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4290/// least every ProbeSize bytes. Returns an iterator of the first instruction
4291/// after the loop. The difference between SP and TargetReg must be an exact
4292/// multiple of ProbeSize.
4294AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4295 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4296 Register TargetReg) const {
4297 MachineBasicBlock &MBB = *MBBI->getParent();
4298 MachineFunction &MF = *MBB.getParent();
4299 const AArch64InstrInfo *TII =
4300 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4301 DebugLoc DL = MBB.findDebugLoc(MBBI);
4302
4303 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4304 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4305 MF.insert(MBBInsertPoint, LoopMBB);
4306 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4307 MF.insert(MBBInsertPoint, ExitMBB);
4308
4309 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4310 // in SUB).
4311 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4312 StackOffset::getFixed(-ProbeSize), TII,
4314 // STR XZR, [SP]
4315 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4316 .addReg(AArch64::XZR)
4317 .addReg(AArch64::SP)
4318 .addImm(0)
4320 // CMP SP, TargetReg
4321 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4322 AArch64::XZR)
4323 .addReg(AArch64::SP)
4324 .addReg(TargetReg)
4327 // B.CC Loop
4328 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4330 .addMBB(LoopMBB)
4332
4333 LoopMBB->addSuccessor(ExitMBB);
4334 LoopMBB->addSuccessor(LoopMBB);
4335 // Synthesize the exit MBB.
4336 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4338 MBB.addSuccessor(LoopMBB);
4339 // Update liveins.
4340 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4341
4342 return ExitMBB->begin();
4343}
4344
4345void AArch64FrameLowering::inlineStackProbeFixed(
4346 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4347 StackOffset CFAOffset) const {
4348 MachineBasicBlock *MBB = MBBI->getParent();
4349 MachineFunction &MF = *MBB->getParent();
4350 const AArch64InstrInfo *TII =
4351 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4352 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
4353 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4354 bool HasFP = hasFP(MF);
4355
4356 DebugLoc DL;
4357 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4358 int64_t NumBlocks = FrameSize / ProbeSize;
4359 int64_t ResidualSize = FrameSize % ProbeSize;
4360
4361 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4362 << NumBlocks << " blocks of " << ProbeSize
4363 << " bytes, plus " << ResidualSize << " bytes\n");
4364
4365 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4366 // ordinary loop.
4367 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4368 for (int i = 0; i < NumBlocks; ++i) {
4369 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4370 // encodable in a SUB).
4371 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4372 StackOffset::getFixed(-ProbeSize), TII,
4373 MachineInstr::FrameSetup, false, false, nullptr,
4374 EmitAsyncCFI && !HasFP, CFAOffset);
4375 CFAOffset += StackOffset::getFixed(ProbeSize);
4376 // STR XZR, [SP]
4377 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4378 .addReg(AArch64::XZR)
4379 .addReg(AArch64::SP)
4380 .addImm(0)
4382 }
4383 } else if (NumBlocks != 0) {
4384 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4385 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4386 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4387 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4388 MachineInstr::FrameSetup, false, false, nullptr,
4389 EmitAsyncCFI && !HasFP, CFAOffset);
4390 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4391 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4392 MBB = MBBI->getParent();
4393 if (EmitAsyncCFI && !HasFP) {
4394 // Set the CFA register back to SP.
4395 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
4396 .buildDefCFARegister(AArch64::SP);
4397 }
4398 }
4399
4400 if (ResidualSize != 0) {
4401 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4402 // in SUB).
4403 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4404 StackOffset::getFixed(-ResidualSize), TII,
4405 MachineInstr::FrameSetup, false, false, nullptr,
4406 EmitAsyncCFI && !HasFP, CFAOffset);
4407 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4408 // STR XZR, [SP]
4409 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4410 .addReg(AArch64::XZR)
4411 .addReg(AArch64::SP)
4412 .addImm(0)
4414 }
4415 }
4416}
4417
4418void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4419 MachineBasicBlock &MBB) const {
4420 // Get the instructions that need to be replaced. We emit at most two of
4421 // these. Remember them in order to avoid complications coming from the need
4422 // to traverse the block while potentially creating more blocks.
4423 SmallVector<MachineInstr *, 4> ToReplace;
4424 for (MachineInstr &MI : MBB)
4425 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4426 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4427 ToReplace.push_back(&MI);
4428
4429 for (MachineInstr *MI : ToReplace) {
4430 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4431 Register ScratchReg = MI->getOperand(0).getReg();
4432 int64_t FrameSize = MI->getOperand(1).getImm();
4433 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4434 MI->getOperand(3).getImm());
4435 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4436 CFAOffset);
4437 } else {
4438 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4439 "Stack probe pseudo-instruction expected");
4440 const AArch64InstrInfo *TII =
4441 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4442 Register TargetReg = MI->getOperand(0).getReg();
4443 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4444 }
4445 MI->eraseFromParent();
4446 }
4447}
4448
4451 NotAccessed = 0, // Stack object not accessed by load/store instructions.
4452 GPR = 1 << 0, // A general purpose register.
4453 PPR = 1 << 1, // A predicate register.
4454 FPR = 1 << 2, // A floating point/Neon/SVE register.
4455 };
4456
4457 int Idx;
4459 int64_t Size;
4460 unsigned AccessTypes;
4461
4463
4464 bool operator<(const StackAccess &Rhs) const {
4465 return std::make_tuple(start(), Idx) <
4466 std::make_tuple(Rhs.start(), Rhs.Idx);
4467 }
4468
4469 bool isCPU() const {
4470 // Predicate register load and store instructions execute on the CPU.
4472 }
4473 bool isSME() const { return AccessTypes & AccessType::FPR; }
4474 bool isMixed() const { return isCPU() && isSME(); }
4475
4476 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
4477 int64_t end() const { return start() + Size; }
4478
4479 std::string getTypeString() const {
4480 switch (AccessTypes) {
4481 case AccessType::FPR:
4482 return "FPR";
4483 case AccessType::PPR:
4484 return "PPR";
4485 case AccessType::GPR:
4486 return "GPR";
4488 return "NA";
4489 default:
4490 return "Mixed";
4491 }
4492 }
4493
4494 void print(raw_ostream &OS) const {
4495 OS << getTypeString() << " stack object at [SP"
4496 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
4497 if (Offset.getScalable())
4498 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
4499 << " * vscale";
4500 OS << "]";
4501 }
4502};
4503
4504static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4505 SA.print(OS);
4506 return OS;
4507}
4508
4509void AArch64FrameLowering::emitRemarks(
4510 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4511
4512 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4514 return;
4515
4516 unsigned StackHazardSize = getStackHazardSize(MF);
4517 const uint64_t HazardSize =
4518 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4519
4520 if (HazardSize == 0)
4521 return;
4522
4523 const MachineFrameInfo &MFI = MF.getFrameInfo();
4524 // Bail if function has no stack objects.
4525 if (!MFI.hasStackObjects())
4526 return;
4527
4528 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4529
4530 size_t NumFPLdSt = 0;
4531 size_t NumNonFPLdSt = 0;
4532
4533 // Collect stack accesses via Load/Store instructions.
4534 for (const MachineBasicBlock &MBB : MF) {
4535 for (const MachineInstr &MI : MBB) {
4536 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4537 continue;
4538 for (MachineMemOperand *MMO : MI.memoperands()) {
4539 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4540 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4541 int FrameIdx = *FI;
4542
4543 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4544 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4545 StackAccesses[ArrIdx].Idx = FrameIdx;
4546 StackAccesses[ArrIdx].Offset =
4547 getFrameIndexReferenceFromSP(MF, FrameIdx);
4548 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4549 }
4550
4551 unsigned RegTy = StackAccess::AccessType::GPR;
4552 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
4553 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
4554 // spill/fill the predicate as a data vector (so are an FPR access).
4555 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
4556 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
4557 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
4558 RegTy = StackAccess::PPR;
4559 } else
4560 RegTy = StackAccess::FPR;
4561 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
4562 RegTy = StackAccess::FPR;
4563 }
4564
4565 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4566
4567 if (RegTy == StackAccess::FPR)
4568 ++NumFPLdSt;
4569 else
4570 ++NumNonFPLdSt;
4571 }
4572 }
4573 }
4574 }
4575
4576 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4577 return;
4578
4579 llvm::sort(StackAccesses);
4580 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4582 });
4583
4586
4587 if (StackAccesses.front().isMixed())
4588 MixedObjects.push_back(&StackAccesses.front());
4589
4590 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4591 It != End; ++It) {
4592 const auto &First = *It;
4593 const auto &Second = *(It + 1);
4594
4595 if (Second.isMixed())
4596 MixedObjects.push_back(&Second);
4597
4598 if ((First.isSME() && Second.isCPU()) ||
4599 (First.isCPU() && Second.isSME())) {
4600 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4601 if (Distance < HazardSize)
4602 HazardPairs.emplace_back(&First, &Second);
4603 }
4604 }
4605
4606 auto EmitRemark = [&](llvm::StringRef Str) {
4607 ORE->emit([&]() {
4608 auto R = MachineOptimizationRemarkAnalysis(
4609 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4610 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4611 });
4612 };
4613
4614 for (const auto &P : HazardPairs)
4615 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4616
4617 for (const auto *Obj : MixedObjects)
4618 EmitRemark(
4619 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4620}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO, RTLIB::Libcall LC)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static bool isTargetWindows(const MachineFunction &MF)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (calleesaves + spills).
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:461
BitVector & reset()
Definition BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:162
BitVector & set()
Definition BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:140
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition Function.h:706
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
A set of register units used to track register liveness.
bool available(MCRegister Reg) const
Returns true if no part of physical register Reg is live.
LLVM_ABI void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
LLVM_ABI void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
LLVM_ABI instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
bool isSymbol() const
isSymbol - Tests if this is a MO_ExternalSymbol operand.
const char * getSymbolName() const
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Pass interface - Implemented by all 'passes'.
Definition Pass.h:99
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:168
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:356
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const
Get the CallingConv that should be used for the specified libcall.
const char * getLibcallName(RTLIB::Libcall Call) const
Get the libcall routine name for the specified libcall.
This class defines information used to lower LLVM code to legal SelectionDAG operators that the targe...
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
virtual const TargetLowering * getTargetLowering() const
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Define
Register definition.
@ Kill
The last use of a register.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2040
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:649
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:626
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:759
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1714
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:400
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1632
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ Success
The lock was released successfully.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1740
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2102
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:853
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:85
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray