LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// | <hazard padding> |
60// |-----------------------------------|
61// | |
62// | callee-saved fp/simd/SVE regs |
63// | |
64// |-----------------------------------|
65// | |
66// | SVE stack objects |
67// | |
68// |-----------------------------------|
69// |.empty.space.to.make.part.below....|
70// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
71// |.the.standard.16-byte.alignment....| compile time; if present)
72// |-----------------------------------|
73// | local variables of fixed size |
74// | including spill slots |
75// | <FPR> |
76// | <hazard padding> |
77// | <GPR> |
78// |-----------------------------------| <- bp(not defined by ABI,
79// |.variable-sized.local.variables....| LLVM chooses X19)
80// |.(VLAs)............................| (size of this area is unknown at
81// |...................................| compile time)
82// |-----------------------------------| <- sp
83// | | Lower address
84//
85//
86// To access the data in a frame, at-compile time, a constant offset must be
87// computable from one of the pointers (fp, bp, sp) to access it. The size
88// of the areas with a dotted background cannot be computed at compile-time
89// if they are present, making it required to have all three of fp, bp and
90// sp to be set up to be able to access all contents in the frame areas,
91// assuming all of the frame areas are non-empty.
92//
93// For most functions, some of the frame areas are empty. For those functions,
94// it may not be necessary to set up fp or bp:
95// * A base pointer is definitely needed when there are both VLAs and local
96// variables with more-than-default alignment requirements.
97// * A frame pointer is definitely needed when there are local variables with
98// more-than-default alignment requirements.
99//
100// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
101// callee-saved area, since the unwind encoding does not allow for encoding
102// this dynamically and existing tools depend on this layout. For other
103// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
104// area to allow SVE stack objects (allocated directly below the callee-saves,
105// if available) to be accessed directly from the framepointer.
106// The SVE spill/fill instructions have VL-scaled addressing modes such
107// as:
108// ldr z8, [fp, #-7 mul vl]
109// For SVE the size of the vector length (VL) is not known at compile-time, so
110// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
111// layout, we don't need to add an unscaled offset to the framepointer before
112// accessing the SVE object in the frame.
113//
114// In some cases when a base pointer is not strictly needed, it is generated
115// anyway when offsets from the frame pointer to access local variables become
116// so large that the offset can't be encoded in the immediate fields of loads
117// or stores.
118//
119// Outgoing function arguments must be at the bottom of the stack frame when
120// calling another function. If we do not have variable-sized stack objects, we
121// can allocate a "reserved call frame" area at the bottom of the local
122// variable area, large enough for all outgoing calls. If we do have VLAs, then
123// the stack pointer must be decremented and incremented around each call to
124// make space for the arguments below the VLAs.
125//
126// FIXME: also explain the redzone concept.
127//
128// About stack hazards: Under some SME contexts, a coprocessor with its own
129// separate cache can used for FP operations. This can create hazards if the CPU
130// and the SME unit try to access the same area of memory, including if the
131// access is to an area of the stack. To try to alleviate this we attempt to
132// introduce extra padding into the stack frame between FP and GPR accesses,
133// controlled by the aarch64-stack-hazard-size option. Without changing the
134// layout of the stack frame in the diagram above, a stack object of size
135// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
136// to the stack objects section, and stack objects are sorted so that FPR >
137// Hazard padding slot > GPRs (where possible). Unfortunately some things are
138// not handled well (VLA area, arguments on the stack, objects with both GPR and
139// FPR accesses), but if those are controlled by the user then the entire stack
140// frame becomes GPR at the start/end with FPR in the middle, surrounded by
141// Hazard padding.
142//
143// An example of the prologue:
144//
145// .globl __foo
146// .align 2
147// __foo:
148// Ltmp0:
149// .cfi_startproc
150// .cfi_personality 155, ___gxx_personality_v0
151// Leh_func_begin:
152// .cfi_lsda 16, Lexception33
153//
154// stp xa,bx, [sp, -#offset]!
155// ...
156// stp x28, x27, [sp, #offset-32]
157// stp fp, lr, [sp, #offset-16]
158// add fp, sp, #offset - 16
159// sub sp, sp, #1360
160//
161// The Stack:
162// +-------------------------------------------+
163// 10000 | ........ | ........ | ........ | ........ |
164// 10004 | ........ | ........ | ........ | ........ |
165// +-------------------------------------------+
166// 10008 | ........ | ........ | ........ | ........ |
167// 1000c | ........ | ........ | ........ | ........ |
168// +===========================================+
169// 10010 | X28 Register |
170// 10014 | X28 Register |
171// +-------------------------------------------+
172// 10018 | X27 Register |
173// 1001c | X27 Register |
174// +===========================================+
175// 10020 | Frame Pointer |
176// 10024 | Frame Pointer |
177// +-------------------------------------------+
178// 10028 | Link Register |
179// 1002c | Link Register |
180// +===========================================+
181// 10030 | ........ | ........ | ........ | ........ |
182// 10034 | ........ | ........ | ........ | ........ |
183// +-------------------------------------------+
184// 10038 | ........ | ........ | ........ | ........ |
185// 1003c | ........ | ........ | ........ | ........ |
186// +-------------------------------------------+
187//
188// [sp] = 10030 :: >>initial value<<
189// sp = 10020 :: stp fp, lr, [sp, #-16]!
190// fp = sp == 10020 :: mov fp, sp
191// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
192// sp == 10010 :: >>final value<<
193//
194// The frame pointer (w29) points to address 10020. If we use an offset of
195// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
196// for w27, and -32 for w28:
197//
198// Ltmp1:
199// .cfi_def_cfa w29, 16
200// Ltmp2:
201// .cfi_offset w30, -8
202// Ltmp3:
203// .cfi_offset w29, -16
204// Ltmp4:
205// .cfi_offset w27, -24
206// Ltmp5:
207// .cfi_offset w28, -32
208//
209//===----------------------------------------------------------------------===//
210
211#include "AArch64FrameLowering.h"
212#include "AArch64InstrInfo.h"
215#include "AArch64RegisterInfo.h"
216#include "AArch64Subtarget.h"
220#include "llvm/ADT/ScopeExit.h"
221#include "llvm/ADT/SmallVector.h"
239#include "llvm/IR/Attributes.h"
240#include "llvm/IR/CallingConv.h"
241#include "llvm/IR/DataLayout.h"
242#include "llvm/IR/DebugLoc.h"
243#include "llvm/IR/Function.h"
244#include "llvm/MC/MCAsmInfo.h"
245#include "llvm/MC/MCDwarf.h"
247#include "llvm/Support/Debug.h"
254#include <cassert>
255#include <cstdint>
256#include <iterator>
257#include <optional>
258#include <vector>
259
260using namespace llvm;
261
262#define DEBUG_TYPE "frame-info"
263
264static cl::opt<bool> EnableRedZone("aarch64-redzone",
265 cl::desc("enable use of redzone on AArch64"),
266 cl::init(false), cl::Hidden);
267
269 "stack-tagging-merge-settag",
270 cl::desc("merge settag instruction in function epilog"), cl::init(true),
271 cl::Hidden);
272
273static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
274 cl::desc("sort stack allocations"),
275 cl::init(true), cl::Hidden);
276
278 "homogeneous-prolog-epilog", cl::Hidden,
279 cl::desc("Emit homogeneous prologue and epilogue for the size "
280 "optimization (default = off)"));
281
282// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
284 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
285 cl::Hidden);
286// Whether to insert padding into non-streaming functions (for testing).
287static cl::opt<bool>
288 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
289 cl::init(false), cl::Hidden);
290
292 "aarch64-disable-multivector-spill-fill",
293 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
294 cl::Hidden);
295
296/// Returns how much of the incoming argument stack area (in bytes) we should
297/// clean up in an epilogue. For the C calling convention this will be 0, for
298/// guaranteed tail call conventions it can be positive (a normal return or a
299/// tail call to a function that uses less stack space for arguments) or
300/// negative (for a tail call to a function that needs more stack space than us
301/// for arguments).
304 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
306 bool IsTailCallReturn = (MBB.end() != MBBI)
308 : false;
309
310 int64_t ArgumentPopSize = 0;
311 if (IsTailCallReturn) {
312 MachineOperand &StackAdjust = MBBI->getOperand(1);
313
314 // For a tail-call in a callee-pops-arguments environment, some or all of
315 // the stack may actually be in use for the call's arguments, this is
316 // calculated during LowerCall and consumed here...
317 ArgumentPopSize = StackAdjust.getImm();
318 } else {
319 // ... otherwise the amount to pop is *all* of the argument space,
320 // conveniently stored in the MachineFunctionInfo by
321 // LowerFormalArguments. This will, of course, be zero for the C calling
322 // convention.
323 ArgumentPopSize = AFI->getArgumentStackToRestore();
324 }
325
326 return ArgumentPopSize;
327}
328
330 MachineFunction &MF);
331
332// Conservatively, returns true if the function is likely to have an SVE vectors
333// on the stack. This function is safe to be called before callee-saves or
334// object offsets have been determined.
336 const MachineFunction &MF) {
337 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
338 if (AFI->isSVECC())
339 return true;
340
341 if (AFI->hasCalculatedStackSizeSVE())
342 return bool(AFL.getSVEStackSize(MF));
343
344 const MachineFrameInfo &MFI = MF.getFrameInfo();
345 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
347 return true;
348 }
349
350 return false;
351}
352
353/// Returns true if a homogeneous prolog or epilog code can be emitted
354/// for the size optimization. If possible, a frame helper call is injected.
355/// When Exit block is given, this check is for epilog.
356bool AArch64FrameLowering::homogeneousPrologEpilog(
357 MachineFunction &MF, MachineBasicBlock *Exit) const {
358 if (!MF.getFunction().hasMinSize())
359 return false;
361 return false;
362 if (EnableRedZone)
363 return false;
364
365 // TODO: Window is supported yet.
366 if (needsWinCFI(MF))
367 return false;
368
369 // TODO: SVE is not supported yet.
370 if (isLikelyToHaveSVEStack(*this, MF))
371 return false;
372
373 // Bail on stack adjustment needed on return for simplicity.
374 const MachineFrameInfo &MFI = MF.getFrameInfo();
375 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
376 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
377 return false;
378 if (Exit && getArgumentStackToRestore(MF, *Exit))
379 return false;
380
381 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
382 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
383 return false;
384
385 // If there are an odd number of GPRs before LR and FP in the CSRs list,
386 // they will not be paired into one RegPairInfo, which is incompatible with
387 // the assumption made by the homogeneous prolog epilog pass.
388 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
389 unsigned NumGPRs = 0;
390 for (unsigned I = 0; CSRegs[I]; ++I) {
391 Register Reg = CSRegs[I];
392 if (Reg == AArch64::LR) {
393 assert(CSRegs[I + 1] == AArch64::FP);
394 if (NumGPRs % 2 != 0)
395 return false;
396 break;
397 }
398 if (AArch64::GPR64RegClass.contains(Reg))
399 ++NumGPRs;
400 }
401
402 return true;
403}
404
405/// Returns true if CSRs should be paired.
406bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
407 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
408}
409
410/// This is the biggest offset to the stack pointer we can encode in aarch64
411/// instructions (without using a separate calculation and a temp register).
412/// Note that the exception here are vector stores/loads which cannot encode any
413/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
414static const unsigned DefaultSafeSPDisplacement = 255;
415
416/// Look at each instruction that references stack frames and return the stack
417/// size limit beyond which some of these instructions will require a scratch
418/// register during their expansion later.
420 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
421 // range. We'll end up allocating an unnecessary spill slot a lot, but
422 // realistically that's not a big deal at this stage of the game.
423 for (MachineBasicBlock &MBB : MF) {
424 for (MachineInstr &MI : MBB) {
425 if (MI.isDebugInstr() || MI.isPseudo() ||
426 MI.getOpcode() == AArch64::ADDXri ||
427 MI.getOpcode() == AArch64::ADDSXri)
428 continue;
429
430 for (const MachineOperand &MO : MI.operands()) {
431 if (!MO.isFI())
432 continue;
433
435 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
437 return 0;
438 }
439 }
440 }
442}
443
448
449unsigned
450AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
451 const AArch64FunctionInfo *AFI,
452 bool IsWin64, bool IsFunclet) const {
453 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
454 "Tail call reserved stack must be aligned to 16 bytes");
455 if (!IsWin64 || IsFunclet) {
456 return AFI->getTailCallReservedStack();
457 } else {
458 if (AFI->getTailCallReservedStack() != 0 &&
459 !MF.getFunction().getAttributes().hasAttrSomewhere(
460 Attribute::SwiftAsync))
461 report_fatal_error("cannot generate ABI-changing tail call for Win64");
462 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
463
464 // Var args are stored here in the primary function.
465 FixedObjectSize += AFI->getVarArgsGPRSize();
466
467 if (MF.hasEHFunclets()) {
468 // Catch objects are stored here in the primary function.
469 const MachineFrameInfo &MFI = MF.getFrameInfo();
470 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
471 SmallSetVector<int, 8> CatchObjFrameIndices;
472 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
473 for (const WinEHHandlerType &H : TBME.HandlerArray) {
474 int FrameIndex = H.CatchObj.FrameIndex;
475 if ((FrameIndex != INT_MAX) &&
476 CatchObjFrameIndices.insert(FrameIndex)) {
477 FixedObjectSize = alignTo(FixedObjectSize,
478 MFI.getObjectAlign(FrameIndex).value()) +
479 MFI.getObjectSize(FrameIndex);
480 }
481 }
482 }
483 // To support EH funclets we allocate an UnwindHelp object
484 FixedObjectSize += 8;
485 }
486 return alignTo(FixedObjectSize, 16);
487 }
488}
489
490/// Returns the size of the entire SVE stackframe (calleesaves + spills).
496
498 if (!EnableRedZone)
499 return false;
500
501 // Don't use the red zone if the function explicitly asks us not to.
502 // This is typically used for kernel code.
503 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
504 const unsigned RedZoneSize =
506 if (!RedZoneSize)
507 return false;
508
509 const MachineFrameInfo &MFI = MF.getFrameInfo();
511 uint64_t NumBytes = AFI->getLocalStackSize();
512
513 // If neither NEON or SVE are available, a COPY from one Q-reg to
514 // another requires a spill -> reload sequence. We can do that
515 // using a pre-decrementing store/post-decrementing load, but
516 // if we do so, we can't use the Red Zone.
517 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
518 !Subtarget.isNeonAvailable() &&
519 !Subtarget.hasSVE();
520
521 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
522 getSVEStackSize(MF) || LowerQRegCopyThroughMem);
523}
524
525/// hasFPImpl - Return true if the specified function should have a dedicated
526/// frame pointer register.
528 const MachineFrameInfo &MFI = MF.getFrameInfo();
529 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
531
532 // Win64 EH requires a frame pointer if funclets are present, as the locals
533 // are accessed off the frame pointer in both the parent function and the
534 // funclets.
535 if (MF.hasEHFunclets())
536 return true;
537 // Retain behavior of always omitting the FP for leaf functions when possible.
539 return true;
540 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
541 MFI.hasStackMap() || MFI.hasPatchPoint() ||
542 RegInfo->hasStackRealignment(MF))
543 return true;
544
545 // If we:
546 //
547 // 1. Have streaming mode changes
548 // OR:
549 // 2. Have a streaming body with SVE stack objects
550 //
551 // Then the value of VG restored when unwinding to this function may not match
552 // the value of VG used to set up the stack.
553 //
554 // This is a problem as the CFA can be described with an expression of the
555 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
556 //
557 // If the value of VG used in that expression does not match the value used to
558 // set up the stack, an incorrect address for the CFA will be computed, and
559 // unwinding will fail.
560 //
561 // We work around this issue by ensuring the frame-pointer can describe the
562 // CFA in either of these cases.
563 if (AFI.needsDwarfUnwindInfo(MF) &&
565 (!AFI.hasCalculatedStackSizeSVE() || AFI.getStackSizeSVE() > 0)))
566 return true;
567 // With large callframes around we may need to use FP to access the scavenging
568 // emergency spillslot.
569 //
570 // Unfortunately some calls to hasFP() like machine verifier ->
571 // getReservedReg() -> hasFP in the middle of global isel are too early
572 // to know the max call frame size. Hopefully conservatively returning "true"
573 // in those cases is fine.
574 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
575 if (!MFI.isMaxCallFrameSizeComputed() ||
577 return true;
578
579 return false;
580}
581
582/// Should the Frame Pointer be reserved for the current function?
584 const TargetMachine &TM = MF.getTarget();
585 const Triple &TT = TM.getTargetTriple();
586
587 // These OSes require the frame chain is valid, even if the current frame does
588 // not use a frame pointer.
589 if (TT.isOSDarwin() || TT.isOSWindows())
590 return true;
591
592 // If the function has a frame pointer, it is reserved.
593 if (hasFP(MF))
594 return true;
595
596 // Frontend has requested to preserve the frame pointer.
597 if (TM.Options.FramePointerIsReserved(MF))
598 return true;
599
600 return false;
601}
602
603/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
604/// not required, we reserve argument space for call sites in the function
605/// immediately on entry to the current function. This eliminates the need for
606/// add/sub sp brackets around call sites. Returns true if the call frame is
607/// included as part of the stack frame.
609 const MachineFunction &MF) const {
610 // The stack probing code for the dynamically allocated outgoing arguments
611 // area assumes that the stack is probed at the top - either by the prologue
612 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
613 // most recent variable-sized object allocation. Changing the condition here
614 // may need to be followed up by changes to the probe issuing logic.
615 return !MF.getFrameInfo().hasVarSizedObjects();
616}
617
621 const AArch64InstrInfo *TII =
622 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
623 const AArch64TargetLowering *TLI =
624 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
625 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
626 DebugLoc DL = I->getDebugLoc();
627 unsigned Opc = I->getOpcode();
628 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
629 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
630
631 if (!hasReservedCallFrame(MF)) {
632 int64_t Amount = I->getOperand(0).getImm();
633 Amount = alignTo(Amount, getStackAlign());
634 if (!IsDestroy)
635 Amount = -Amount;
636
637 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
638 // doesn't have to pop anything), then the first operand will be zero too so
639 // this adjustment is a no-op.
640 if (CalleePopAmount == 0) {
641 // FIXME: in-function stack adjustment for calls is limited to 24-bits
642 // because there's no guaranteed temporary register available.
643 //
644 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
645 // 1) For offset <= 12-bit, we use LSL #0
646 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
647 // LSL #0, and the other uses LSL #12.
648 //
649 // Most call frames will be allocated at the start of a function so
650 // this is OK, but it is a limitation that needs dealing with.
651 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
652
653 if (TLI->hasInlineStackProbe(MF) &&
655 // When stack probing is enabled, the decrement of SP may need to be
656 // probed. We only need to do this if the call site needs 1024 bytes of
657 // space or more, because a region smaller than that is allowed to be
658 // unprobed at an ABI boundary. We rely on the fact that SP has been
659 // probed exactly at this point, either by the prologue or most recent
660 // dynamic allocation.
662 "non-reserved call frame without var sized objects?");
663 Register ScratchReg =
664 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
665 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
666 } else {
667 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
668 StackOffset::getFixed(Amount), TII);
669 }
670 }
671 } else if (CalleePopAmount != 0) {
672 // If the calling convention demands that the callee pops arguments from the
673 // stack, we want to add it back if we have a reserved call frame.
674 assert(CalleePopAmount < 0xffffff && "call frame too large");
675 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
676 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
677 }
678 return MBB.erase(I);
679}
680
682 MachineBasicBlock &MBB) const {
683
684 MachineFunction &MF = *MBB.getParent();
685 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
686 const auto &TRI = *Subtarget.getRegisterInfo();
687 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
688
689 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
690
691 // Reset the CFA to `SP + 0`.
692 CFIBuilder.buildDefCFA(AArch64::SP, 0);
693
694 // Flip the RA sign state.
695 if (MFI.shouldSignReturnAddress(MF))
696 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
697 : CFIBuilder.buildNegateRAState();
698
699 // Shadow call stack uses X18, reset it.
700 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
701 CFIBuilder.buildSameValue(AArch64::X18);
702
703 // Emit .cfi_same_value for callee-saved registers.
704 const std::vector<CalleeSavedInfo> &CSI =
706 for (const auto &Info : CSI) {
707 MCRegister Reg = Info.getReg();
708 if (!TRI.regNeedsCFI(Reg, Reg))
709 continue;
710 CFIBuilder.buildSameValue(Reg);
711 }
712}
713
716 bool SVE) {
717 MachineFunction &MF = *MBB.getParent();
718 MachineFrameInfo &MFI = MF.getFrameInfo();
719
720 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
721 if (CSI.empty())
722 return;
723
724 const TargetSubtargetInfo &STI = MF.getSubtarget();
725 const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
727
728 for (const auto &Info : CSI) {
729 if (SVE !=
730 (MFI.getStackID(Info.getFrameIdx()) == TargetStackID::ScalableVector))
731 continue;
732
733 MCRegister Reg = Info.getReg();
734 if (SVE &&
735 !static_cast<const AArch64RegisterInfo &>(TRI).regNeedsCFI(Reg, Reg))
736 continue;
737
738 CFIBuilder.buildRestore(Info.getReg());
739 }
740}
741
742void AArch64FrameLowering::emitCalleeSavedGPRRestores(
745}
746
747void AArch64FrameLowering::emitCalleeSavedSVERestores(
750}
751
752// Return the maximum possible number of bytes for `Size` due to the
753// architectural limit on the size of a SVE register.
754static int64_t upperBound(StackOffset Size) {
755 static const int64_t MAX_BYTES_PER_SCALABLE_BYTE = 16;
756 return Size.getScalable() * MAX_BYTES_PER_SCALABLE_BYTE + Size.getFixed();
757}
758
759void AArch64FrameLowering::allocateStackSpace(
761 int64_t RealignmentPadding, StackOffset AllocSize, bool NeedsWinCFI,
762 bool *HasWinCFI, bool EmitCFI, StackOffset InitialOffset,
763 bool FollowupAllocs) const {
764
765 if (!AllocSize)
766 return;
767
768 DebugLoc DL;
769 MachineFunction &MF = *MBB.getParent();
770 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
771 const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
772 AArch64FunctionInfo &AFI = *MF.getInfo<AArch64FunctionInfo>();
773 const MachineFrameInfo &MFI = MF.getFrameInfo();
774
775 const int64_t MaxAlign = MFI.getMaxAlign().value();
776 const uint64_t AndMask = ~(MaxAlign - 1);
777
778 if (!Subtarget.getTargetLowering()->hasInlineStackProbe(MF)) {
779 Register TargetReg = RealignmentPadding
780 ? findScratchNonCalleeSaveRegister(&MBB)
781 : AArch64::SP;
782 // SUB Xd/SP, SP, AllocSize
783 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
784 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
785 EmitCFI, InitialOffset);
786
787 if (RealignmentPadding) {
788 // AND SP, X9, 0b11111...0000
789 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
790 .addReg(TargetReg, RegState::Kill)
793 AFI.setStackRealigned(true);
794
795 // No need for SEH instructions here; if we're realigning the stack,
796 // we've set a frame pointer and already finished the SEH prologue.
797 assert(!NeedsWinCFI);
798 }
799 return;
800 }
801
802 //
803 // Stack probing allocation.
804 //
805
806 // Fixed length allocation. If we don't need to re-align the stack and don't
807 // have SVE objects, we can use a more efficient sequence for stack probing.
808 if (AllocSize.getScalable() == 0 && RealignmentPadding == 0) {
809 Register ScratchReg = findScratchNonCalleeSaveRegister(&MBB);
810 assert(ScratchReg != AArch64::NoRegister);
811 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC))
812 .addDef(ScratchReg)
813 .addImm(AllocSize.getFixed())
814 .addImm(InitialOffset.getFixed())
815 .addImm(InitialOffset.getScalable());
816 // The fixed allocation may leave unprobed bytes at the top of the
817 // stack. If we have subsequent allocation (e.g. if we have variable-sized
818 // objects), we need to issue an extra probe, so these allocations start in
819 // a known state.
820 if (FollowupAllocs) {
821 // STR XZR, [SP]
822 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
823 .addReg(AArch64::XZR)
824 .addReg(AArch64::SP)
825 .addImm(0)
827 }
828
829 return;
830 }
831
832 // Variable length allocation.
833
834 // If the (unknown) allocation size cannot exceed the probe size, decrement
835 // the stack pointer right away.
836 int64_t ProbeSize = AFI.getStackProbeSize();
837 if (upperBound(AllocSize) + RealignmentPadding <= ProbeSize) {
838 Register ScratchReg = RealignmentPadding
839 ? findScratchNonCalleeSaveRegister(&MBB)
840 : AArch64::SP;
841 assert(ScratchReg != AArch64::NoRegister);
842 // SUB Xd, SP, AllocSize
843 emitFrameOffset(MBB, MBBI, DL, ScratchReg, AArch64::SP, -AllocSize, &TII,
844 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
845 EmitCFI, InitialOffset);
846 if (RealignmentPadding) {
847 // AND SP, Xn, 0b11111...0000
848 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), AArch64::SP)
849 .addReg(ScratchReg, RegState::Kill)
852 AFI.setStackRealigned(true);
853 }
854 if (FollowupAllocs || upperBound(AllocSize) + RealignmentPadding >
856 // STR XZR, [SP]
857 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STRXui))
858 .addReg(AArch64::XZR)
859 .addReg(AArch64::SP)
860 .addImm(0)
862 }
863 return;
864 }
865
866 // Emit a variable-length allocation probing loop.
867 // TODO: As an optimisation, the loop can be "unrolled" into a few parts,
868 // each of them guaranteed to adjust the stack by less than the probe size.
869 Register TargetReg = findScratchNonCalleeSaveRegister(&MBB);
870 assert(TargetReg != AArch64::NoRegister);
871 // SUB Xd, SP, AllocSize
872 emitFrameOffset(MBB, MBBI, DL, TargetReg, AArch64::SP, -AllocSize, &TII,
873 MachineInstr::FrameSetup, false, NeedsWinCFI, HasWinCFI,
874 EmitCFI, InitialOffset);
875 if (RealignmentPadding) {
876 // AND Xn, Xn, 0b11111...0000
877 BuildMI(MBB, MBBI, DL, TII.get(AArch64::ANDXri), TargetReg)
878 .addReg(TargetReg, RegState::Kill)
881 }
882
883 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PROBED_STACKALLOC_VAR))
884 .addReg(TargetReg);
885 if (EmitCFI) {
886 // Set the CFA register back to SP.
887 CFIInstBuilder(MBB, MBBI, MachineInstr::FrameSetup)
888 .buildDefCFARegister(AArch64::SP);
889 }
890 if (RealignmentPadding)
891 AFI.setStackRealigned(true);
892}
893
895 switch (Reg.id()) {
896 default:
897 // The called routine is expected to preserve r19-r28
898 // r29 and r30 are used as frame pointer and link register resp.
899 return 0;
900
901 // GPRs
902#define CASE(n) \
903 case AArch64::W##n: \
904 case AArch64::X##n: \
905 return AArch64::X##n
906 CASE(0);
907 CASE(1);
908 CASE(2);
909 CASE(3);
910 CASE(4);
911 CASE(5);
912 CASE(6);
913 CASE(7);
914 CASE(8);
915 CASE(9);
916 CASE(10);
917 CASE(11);
918 CASE(12);
919 CASE(13);
920 CASE(14);
921 CASE(15);
922 CASE(16);
923 CASE(17);
924 CASE(18);
925#undef CASE
926
927 // FPRs
928#define CASE(n) \
929 case AArch64::B##n: \
930 case AArch64::H##n: \
931 case AArch64::S##n: \
932 case AArch64::D##n: \
933 case AArch64::Q##n: \
934 return HasSVE ? AArch64::Z##n : AArch64::Q##n
935 CASE(0);
936 CASE(1);
937 CASE(2);
938 CASE(3);
939 CASE(4);
940 CASE(5);
941 CASE(6);
942 CASE(7);
943 CASE(8);
944 CASE(9);
945 CASE(10);
946 CASE(11);
947 CASE(12);
948 CASE(13);
949 CASE(14);
950 CASE(15);
951 CASE(16);
952 CASE(17);
953 CASE(18);
954 CASE(19);
955 CASE(20);
956 CASE(21);
957 CASE(22);
958 CASE(23);
959 CASE(24);
960 CASE(25);
961 CASE(26);
962 CASE(27);
963 CASE(28);
964 CASE(29);
965 CASE(30);
966 CASE(31);
967#undef CASE
968 }
969}
970
971void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
972 MachineBasicBlock &MBB) const {
973 // Insertion point.
975
976 // Fake a debug loc.
977 DebugLoc DL;
978 if (MBBI != MBB.end())
979 DL = MBBI->getDebugLoc();
980
981 const MachineFunction &MF = *MBB.getParent();
982 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
983 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
984
985 BitVector GPRsToZero(TRI.getNumRegs());
986 BitVector FPRsToZero(TRI.getNumRegs());
987 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
988 for (MCRegister Reg : RegsToZero.set_bits()) {
989 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
990 // For GPRs, we only care to clear out the 64-bit register.
991 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
992 GPRsToZero.set(XReg);
993 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
994 // For FPRs,
995 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
996 FPRsToZero.set(XReg);
997 }
998 }
999
1000 const AArch64InstrInfo &TII = *STI.getInstrInfo();
1001
1002 // Zero out GPRs.
1003 for (MCRegister Reg : GPRsToZero.set_bits())
1004 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1005
1006 // Zero out FP/vector registers.
1007 for (MCRegister Reg : FPRsToZero.set_bits())
1008 TII.buildClearRegister(Reg, MBB, MBBI, DL);
1009
1010 if (HasSVE) {
1011 for (MCRegister PReg :
1012 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
1013 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
1014 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
1015 AArch64::P15}) {
1016 if (RegsToZero[PReg])
1017 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
1018 }
1019 }
1020}
1021
1022bool AArch64FrameLowering::windowsRequiresStackProbe(
1023 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
1024 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1025 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
1026 // TODO: When implementing stack protectors, take that into account
1027 // for the probe threshold.
1028 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
1029 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
1030}
1031
1033 const MachineBasicBlock &MBB) {
1034 const MachineFunction *MF = MBB.getParent();
1035 LiveRegs.addLiveIns(MBB);
1036 // Mark callee saved registers as used so we will not choose them.
1037 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
1038 for (unsigned i = 0; CSRegs[i]; ++i)
1039 LiveRegs.addReg(CSRegs[i]);
1040}
1041
1043AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
1044 bool HasCall) const {
1045 MachineFunction *MF = MBB->getParent();
1046
1047 // If MBB is an entry block, use X9 as the scratch register
1048 // preserve_none functions may be using X9 to pass arguments,
1049 // so prefer to pick an available register below.
1050 if (&MF->front() == MBB &&
1052 return AArch64::X9;
1053
1054 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1055 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1056 LivePhysRegs LiveRegs(TRI);
1057 getLiveRegsForEntryMBB(LiveRegs, *MBB);
1058 if (HasCall) {
1059 LiveRegs.addReg(AArch64::X16);
1060 LiveRegs.addReg(AArch64::X17);
1061 LiveRegs.addReg(AArch64::X18);
1062 }
1063
1064 // Prefer X9 since it was historically used for the prologue scratch reg.
1065 const MachineRegisterInfo &MRI = MF->getRegInfo();
1066 if (LiveRegs.available(MRI, AArch64::X9))
1067 return AArch64::X9;
1068
1069 for (unsigned Reg : AArch64::GPR64RegClass) {
1070 if (LiveRegs.available(MRI, Reg))
1071 return Reg;
1072 }
1073 return AArch64::NoRegister;
1074}
1075
1077 const MachineBasicBlock &MBB) const {
1078 const MachineFunction *MF = MBB.getParent();
1079 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
1080 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
1081 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1082 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
1084
1085 if (AFI->hasSwiftAsyncContext()) {
1086 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
1087 const MachineRegisterInfo &MRI = MF->getRegInfo();
1088 LivePhysRegs LiveRegs(TRI);
1089 getLiveRegsForEntryMBB(LiveRegs, MBB);
1090 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
1091 // available.
1092 if (!LiveRegs.available(MRI, AArch64::X16) ||
1093 !LiveRegs.available(MRI, AArch64::X17))
1094 return false;
1095 }
1096
1097 // Certain stack probing sequences might clobber flags, then we can't use
1098 // the block as a prologue if the flags register is a live-in.
1100 MBB.isLiveIn(AArch64::NZCV))
1101 return false;
1102
1103 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
1104 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
1105 return false;
1106
1107 // May need a scratch register (for return value) if require making a special
1108 // call
1109 if (requiresSaveVG(*MF) ||
1110 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
1111 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
1112 return false;
1113
1114 return true;
1115}
1116
1118 const Function &F = MF.getFunction();
1119 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
1120 F.needsUnwindTableEntry();
1121}
1122
1123bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
1124 const MachineFunction &MF) const {
1125 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
1126 // and SEH_EpilogEnd instructions in the correct order.
1128 return false;
1130 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
1131 return SignReturnAddressAll;
1132}
1133
1134bool AArch64FrameLowering::shouldCombineCSRLocalStackBump(
1135 MachineFunction &MF, uint64_t StackBumpBytes) const {
1137 const MachineFrameInfo &MFI = MF.getFrameInfo();
1138 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1139 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1140 if (homogeneousPrologEpilog(MF))
1141 return false;
1142
1143 if (AFI->getLocalStackSize() == 0)
1144 return false;
1145
1146 // For WinCFI, if optimizing for size, prefer to not combine the stack bump
1147 // (to force a stp with predecrement) to match the packed unwind format,
1148 // provided that there actually are any callee saved registers to merge the
1149 // decrement with.
1150 // This is potentially marginally slower, but allows using the packed
1151 // unwind format for functions that both have a local area and callee saved
1152 // registers. Using the packed unwind format notably reduces the size of
1153 // the unwind info.
1154 if (needsWinCFI(MF) && AFI->getCalleeSavedStackSize() > 0 &&
1155 MF.getFunction().hasOptSize())
1156 return false;
1157
1158 // 512 is the maximum immediate for stp/ldp that will be used for
1159 // callee-save save/restores
1160 if (StackBumpBytes >= 512 || windowsRequiresStackProbe(MF, StackBumpBytes))
1161 return false;
1162
1163 if (MFI.hasVarSizedObjects())
1164 return false;
1165
1166 if (RegInfo->hasStackRealignment(MF))
1167 return false;
1168
1169 // This isn't strictly necessary, but it simplifies things a bit since the
1170 // current RedZone handling code assumes the SP is adjusted by the
1171 // callee-save save/restore code.
1172 if (canUseRedZone(MF))
1173 return false;
1174
1175 // When there is an SVE area on the stack, always allocate the
1176 // callee-saves and spills/locals separately.
1177 if (getSVEStackSize(MF))
1178 return false;
1179
1180 return true;
1181}
1182
1183bool AArch64FrameLowering::shouldCombineCSRLocalStackBumpInEpilogue(
1184 MachineBasicBlock &MBB, uint64_t StackBumpBytes) const {
1185 if (!shouldCombineCSRLocalStackBump(*MBB.getParent(), StackBumpBytes))
1186 return false;
1187 if (MBB.empty())
1188 return true;
1189
1190 // Disable combined SP bump if the last instruction is an MTE tag store. It
1191 // is almost always better to merge SP adjustment into those instructions.
1194 while (LastI != Begin) {
1195 --LastI;
1196 if (LastI->isTransient())
1197 continue;
1198 if (!LastI->getFlag(MachineInstr::FrameDestroy))
1199 break;
1200 }
1201 switch (LastI->getOpcode()) {
1202 case AArch64::STGloop:
1203 case AArch64::STZGloop:
1204 case AArch64::STGi:
1205 case AArch64::STZGi:
1206 case AArch64::ST2Gi:
1207 case AArch64::STZ2Gi:
1208 return false;
1209 default:
1210 return true;
1211 }
1212 llvm_unreachable("unreachable");
1213}
1214
1215// Given a load or a store instruction, generate an appropriate unwinding SEH
1216// code on Windows.
1218 const TargetInstrInfo &TII,
1219 MachineInstr::MIFlag Flag) {
1220 unsigned Opc = MBBI->getOpcode();
1221 MachineBasicBlock *MBB = MBBI->getParent();
1222 MachineFunction &MF = *MBB->getParent();
1223 DebugLoc DL = MBBI->getDebugLoc();
1224 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1225 int Imm = MBBI->getOperand(ImmIdx).getImm();
1227 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1228 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1229
1230 switch (Opc) {
1231 default:
1232 report_fatal_error("No SEH Opcode for this instruction");
1233 case AArch64::STR_ZXI:
1234 case AArch64::LDR_ZXI: {
1235 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1236 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1237 .addImm(Reg0)
1238 .addImm(Imm)
1239 .setMIFlag(Flag);
1240 break;
1241 }
1242 case AArch64::STR_PXI:
1243 case AArch64::LDR_PXI: {
1244 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1245 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1246 .addImm(Reg0)
1247 .addImm(Imm)
1248 .setMIFlag(Flag);
1249 break;
1250 }
1251 case AArch64::LDPDpost:
1252 Imm = -Imm;
1253 [[fallthrough]];
1254 case AArch64::STPDpre: {
1255 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1256 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1257 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1258 .addImm(Reg0)
1259 .addImm(Reg1)
1260 .addImm(Imm * 8)
1261 .setMIFlag(Flag);
1262 break;
1263 }
1264 case AArch64::LDPXpost:
1265 Imm = -Imm;
1266 [[fallthrough]];
1267 case AArch64::STPXpre: {
1268 Register Reg0 = MBBI->getOperand(1).getReg();
1269 Register Reg1 = MBBI->getOperand(2).getReg();
1270 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1271 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1272 .addImm(Imm * 8)
1273 .setMIFlag(Flag);
1274 else
1275 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1276 .addImm(RegInfo->getSEHRegNum(Reg0))
1277 .addImm(RegInfo->getSEHRegNum(Reg1))
1278 .addImm(Imm * 8)
1279 .setMIFlag(Flag);
1280 break;
1281 }
1282 case AArch64::LDRDpost:
1283 Imm = -Imm;
1284 [[fallthrough]];
1285 case AArch64::STRDpre: {
1286 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1287 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1288 .addImm(Reg)
1289 .addImm(Imm)
1290 .setMIFlag(Flag);
1291 break;
1292 }
1293 case AArch64::LDRXpost:
1294 Imm = -Imm;
1295 [[fallthrough]];
1296 case AArch64::STRXpre: {
1297 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1298 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1299 .addImm(Reg)
1300 .addImm(Imm)
1301 .setMIFlag(Flag);
1302 break;
1303 }
1304 case AArch64::STPDi:
1305 case AArch64::LDPDi: {
1306 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1307 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1308 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1309 .addImm(Reg0)
1310 .addImm(Reg1)
1311 .addImm(Imm * 8)
1312 .setMIFlag(Flag);
1313 break;
1314 }
1315 case AArch64::STPXi:
1316 case AArch64::LDPXi: {
1317 Register Reg0 = MBBI->getOperand(0).getReg();
1318 Register Reg1 = MBBI->getOperand(1).getReg();
1319 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1320 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1321 .addImm(Imm * 8)
1322 .setMIFlag(Flag);
1323 else
1324 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1325 .addImm(RegInfo->getSEHRegNum(Reg0))
1326 .addImm(RegInfo->getSEHRegNum(Reg1))
1327 .addImm(Imm * 8)
1328 .setMIFlag(Flag);
1329 break;
1330 }
1331 case AArch64::STRXui:
1332 case AArch64::LDRXui: {
1333 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1334 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1335 .addImm(Reg)
1336 .addImm(Imm * 8)
1337 .setMIFlag(Flag);
1338 break;
1339 }
1340 case AArch64::STRDui:
1341 case AArch64::LDRDui: {
1342 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1343 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1344 .addImm(Reg)
1345 .addImm(Imm * 8)
1346 .setMIFlag(Flag);
1347 break;
1348 }
1349 case AArch64::STPQi:
1350 case AArch64::LDPQi: {
1351 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1352 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1353 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1354 .addImm(Reg0)
1355 .addImm(Reg1)
1356 .addImm(Imm * 16)
1357 .setMIFlag(Flag);
1358 break;
1359 }
1360 case AArch64::LDPQpost:
1361 Imm = -Imm;
1362 [[fallthrough]];
1363 case AArch64::STPQpre: {
1364 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1365 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1366 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1367 .addImm(Reg0)
1368 .addImm(Reg1)
1369 .addImm(Imm * 16)
1370 .setMIFlag(Flag);
1371 break;
1372 }
1373 }
1374 auto I = MBB->insertAfter(MBBI, MIB);
1375 return I;
1376}
1377
1378// Fix up the SEH opcode associated with the save/restore instruction.
1380 unsigned LocalStackSize) {
1381 MachineOperand *ImmOpnd = nullptr;
1382 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1383 switch (MBBI->getOpcode()) {
1384 default:
1385 llvm_unreachable("Fix the offset in the SEH instruction");
1386 case AArch64::SEH_SaveFPLR:
1387 case AArch64::SEH_SaveRegP:
1388 case AArch64::SEH_SaveReg:
1389 case AArch64::SEH_SaveFRegP:
1390 case AArch64::SEH_SaveFReg:
1391 case AArch64::SEH_SaveAnyRegQP:
1392 case AArch64::SEH_SaveAnyRegQPX:
1393 ImmOpnd = &MBBI->getOperand(ImmIdx);
1394 break;
1395 }
1396 if (ImmOpnd)
1397 ImmOpnd->setImm(ImmOpnd->getImm() + LocalStackSize);
1398}
1399
1400bool AArch64FrameLowering::requiresGetVGCall(const MachineFunction &MF) const {
1401 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1402 return AFI->hasStreamingModeChanges() &&
1403 !MF.getSubtarget<AArch64Subtarget>().hasSVE();
1404}
1405
1408 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1409 return false;
1410 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1411 // is enabled with streaming mode changes.
1412 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1413 if (ST.isTargetDarwin())
1414 return ST.hasSVE();
1415 return true;
1416}
1417
1418static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO,
1419 RTLIB::Libcall LC) {
1420 return MO.isSymbol() &&
1421 StringRef(TLI.getLibcallName(LC)) == MO.getSymbolName();
1422}
1423
1424bool AArch64FrameLowering::isVGInstruction(MachineBasicBlock::iterator MBBI,
1425 const TargetLowering &TLI) const {
1426 unsigned Opc = MBBI->getOpcode();
1427 if (Opc == AArch64::CNTD_XPiI)
1428 return true;
1429
1430 if (!requiresGetVGCall(*MBBI->getMF()))
1431 return false;
1432
1433 if (Opc == AArch64::BL)
1434 return matchLibcall(TLI, MBBI->getOperand(0), RTLIB::SMEABI_GET_CURRENT_VG);
1435
1436 return Opc == TargetOpcode::COPY;
1437}
1438
1440AArch64FrameLowering::convertCalleeSaveRestoreToSPPrePostIncDec(
1442 const DebugLoc &DL, const TargetInstrInfo *TII, int CSStackSizeInc,
1443 bool NeedsWinCFI, bool *HasWinCFI, bool EmitCFI,
1444 MachineInstr::MIFlag FrameFlag, int CFAOffset) const {
1445 unsigned NewOpc;
1446
1447 // If the function contains streaming mode changes, we expect instructions
1448 // to calculate the value of VG before spilling. Move past these instructions
1449 // if necessary.
1450 MachineFunction &MF = *MBB.getParent();
1451 if (requiresSaveVG(MF)) {
1452 auto &TLI = *MF.getSubtarget().getTargetLowering();
1453 while (isVGInstruction(MBBI, TLI))
1454 ++MBBI;
1455 }
1456
1457 switch (MBBI->getOpcode()) {
1458 default:
1459 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1460 case AArch64::STPXi:
1461 NewOpc = AArch64::STPXpre;
1462 break;
1463 case AArch64::STPDi:
1464 NewOpc = AArch64::STPDpre;
1465 break;
1466 case AArch64::STPQi:
1467 NewOpc = AArch64::STPQpre;
1468 break;
1469 case AArch64::STRXui:
1470 NewOpc = AArch64::STRXpre;
1471 break;
1472 case AArch64::STRDui:
1473 NewOpc = AArch64::STRDpre;
1474 break;
1475 case AArch64::STRQui:
1476 NewOpc = AArch64::STRQpre;
1477 break;
1478 case AArch64::LDPXi:
1479 NewOpc = AArch64::LDPXpost;
1480 break;
1481 case AArch64::LDPDi:
1482 NewOpc = AArch64::LDPDpost;
1483 break;
1484 case AArch64::LDPQi:
1485 NewOpc = AArch64::LDPQpost;
1486 break;
1487 case AArch64::LDRXui:
1488 NewOpc = AArch64::LDRXpost;
1489 break;
1490 case AArch64::LDRDui:
1491 NewOpc = AArch64::LDRDpost;
1492 break;
1493 case AArch64::LDRQui:
1494 NewOpc = AArch64::LDRQpost;
1495 break;
1496 }
1497 TypeSize Scale = TypeSize::getFixed(1), Width = TypeSize::getFixed(0);
1498 int64_t MinOffset, MaxOffset;
1499 bool Success = static_cast<const AArch64InstrInfo *>(TII)->getMemOpInfo(
1500 NewOpc, Scale, Width, MinOffset, MaxOffset);
1501 (void)Success;
1502 assert(Success && "unknown load/store opcode");
1503
1504 // If the first store isn't right where we want SP then we can't fold the
1505 // update in so create a normal arithmetic instruction instead.
1506 if (MBBI->getOperand(MBBI->getNumOperands() - 1).getImm() != 0 ||
1507 CSStackSizeInc < MinOffset * (int64_t)Scale.getFixedValue() ||
1508 CSStackSizeInc > MaxOffset * (int64_t)Scale.getFixedValue()) {
1509 // If we are destroying the frame, make sure we add the increment after the
1510 // last frame operation.
1511 if (FrameFlag == MachineInstr::FrameDestroy) {
1512 ++MBBI;
1513 // Also skip the SEH instruction, if needed
1514 if (NeedsWinCFI && AArch64InstrInfo::isSEHInstruction(*MBBI))
1515 ++MBBI;
1516 }
1517 emitFrameOffset(MBB, MBBI, DL, AArch64::SP, AArch64::SP,
1518 StackOffset::getFixed(CSStackSizeInc), TII, FrameFlag,
1519 false, NeedsWinCFI, HasWinCFI, EmitCFI,
1520 StackOffset::getFixed(CFAOffset));
1521
1522 return std::prev(MBBI);
1523 }
1524
1525 // Get rid of the SEH code associated with the old instruction.
1526 if (NeedsWinCFI) {
1527 auto SEH = std::next(MBBI);
1528 if (AArch64InstrInfo::isSEHInstruction(*SEH))
1529 SEH->eraseFromParent();
1530 }
1531
1532 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII->get(NewOpc));
1533 MIB.addReg(AArch64::SP, RegState::Define);
1534
1535 // Copy all operands other than the immediate offset.
1536 unsigned OpndIdx = 0;
1537 for (unsigned OpndEnd = MBBI->getNumOperands() - 1; OpndIdx < OpndEnd;
1538 ++OpndIdx)
1539 MIB.add(MBBI->getOperand(OpndIdx));
1540
1541 assert(MBBI->getOperand(OpndIdx).getImm() == 0 &&
1542 "Unexpected immediate offset in first/last callee-save save/restore "
1543 "instruction!");
1544 assert(MBBI->getOperand(OpndIdx - 1).getReg() == AArch64::SP &&
1545 "Unexpected base register in callee-save save/restore instruction!");
1546 assert(CSStackSizeInc % Scale == 0);
1547 MIB.addImm(CSStackSizeInc / (int)Scale);
1548
1549 MIB.setMIFlags(MBBI->getFlags());
1550 MIB.setMemRefs(MBBI->memoperands());
1551
1552 // Generate a new SEH code that corresponds to the new instruction.
1553 if (NeedsWinCFI) {
1554 *HasWinCFI = true;
1555 InsertSEH(*MIB, *TII, FrameFlag);
1556 }
1557
1558 if (EmitCFI)
1559 CFIInstBuilder(MBB, MBBI, FrameFlag)
1560 .buildDefCFAOffset(CFAOffset - CSStackSizeInc);
1561
1562 return std::prev(MBB.erase(MBBI));
1563}
1564
1565void AArch64FrameLowering::fixupCalleeSaveRestoreStackOffset(
1566 MachineInstr &MI, uint64_t LocalStackSize, bool NeedsWinCFI,
1567 bool *HasWinCFI) const {
1568 if (AArch64InstrInfo::isSEHInstruction(MI))
1569 return;
1570
1571 unsigned Opc = MI.getOpcode();
1572 unsigned Scale;
1573 switch (Opc) {
1574 case AArch64::STPXi:
1575 case AArch64::STRXui:
1576 case AArch64::STPDi:
1577 case AArch64::STRDui:
1578 case AArch64::LDPXi:
1579 case AArch64::LDRXui:
1580 case AArch64::LDPDi:
1581 case AArch64::LDRDui:
1582 Scale = 8;
1583 break;
1584 case AArch64::STPQi:
1585 case AArch64::STRQui:
1586 case AArch64::LDPQi:
1587 case AArch64::LDRQui:
1588 Scale = 16;
1589 break;
1590 default:
1591 llvm_unreachable("Unexpected callee-save save/restore opcode!");
1592 }
1593
1594 unsigned OffsetIdx = MI.getNumExplicitOperands() - 1;
1595 assert(MI.getOperand(OffsetIdx - 1).getReg() == AArch64::SP &&
1596 "Unexpected base register in callee-save save/restore instruction!");
1597 // Last operand is immediate offset that needs fixing.
1598 MachineOperand &OffsetOpnd = MI.getOperand(OffsetIdx);
1599 // All generated opcodes have scaled offsets.
1600 assert(LocalStackSize % Scale == 0);
1601 OffsetOpnd.setImm(OffsetOpnd.getImm() + LocalStackSize / Scale);
1602
1603 if (NeedsWinCFI) {
1604 *HasWinCFI = true;
1605 auto MBBI = std::next(MachineBasicBlock::iterator(MI));
1606 assert(MBBI != MI.getParent()->end() && "Expecting a valid instruction");
1607 assert(AArch64InstrInfo::isSEHInstruction(*MBBI) &&
1608 "Expecting a SEH instruction");
1609 fixupSEHOpcode(MBBI, LocalStackSize);
1610 }
1611}
1612
1613static bool isTargetWindows(const MachineFunction &MF) {
1615}
1616
1617static unsigned getStackHazardSize(const MachineFunction &MF) {
1618 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
1619}
1620
1621// Convenience function to determine whether I is an SVE callee save.
1622bool AArch64FrameLowering::isSVECalleeSave(
1624 switch (I->getOpcode()) {
1625 default:
1626 return false;
1627 case AArch64::PTRUE_C_B:
1628 case AArch64::LD1B_2Z_IMM:
1629 case AArch64::ST1B_2Z_IMM:
1630 case AArch64::STR_ZXI:
1631 case AArch64::STR_PXI:
1632 case AArch64::LDR_ZXI:
1633 case AArch64::LDR_PXI:
1634 case AArch64::PTRUE_B:
1635 case AArch64::CPY_ZPzI_B:
1636 case AArch64::CMPNE_PPzZI_B:
1637 return I->getFlag(MachineInstr::FrameSetup) ||
1638 I->getFlag(MachineInstr::FrameDestroy);
1639 case AArch64::SEH_SavePReg:
1640 case AArch64::SEH_SaveZReg:
1641 return true;
1642 }
1643}
1644
1646 MachineFunction &MF,
1649 const DebugLoc &DL, bool NeedsWinCFI) {
1650 // Shadow call stack epilog: ldr x30, [x18, #-8]!
1651 BuildMI(MBB, MBBI, DL, TII.get(AArch64::LDRXpre))
1652 .addReg(AArch64::X18, RegState::Define)
1653 .addReg(AArch64::LR, RegState::Define)
1654 .addReg(AArch64::X18)
1655 .addImm(-8)
1657
1658 if (NeedsWinCFI)
1659 BuildMI(MBB, MBBI, DL, TII.get(AArch64::SEH_Nop))
1661
1664 .buildRestore(AArch64::X18);
1665}
1666
1668 MachineFunction &MF) const {
1669 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1670 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1671
1672 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1673 DebugLoc DL; // Set debug location to unknown.
1675
1676 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1678 };
1679
1680 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1681 DebugLoc DL;
1682 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1683 if (MBBI != MBB.end())
1684 DL = MBBI->getDebugLoc();
1685
1686 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1688 };
1689
1690 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1691 EmitSignRA(MF.front());
1692 for (MachineBasicBlock &MBB : MF) {
1693 if (MBB.isEHFuncletEntry())
1694 EmitSignRA(MBB);
1695 if (MBB.isReturnBlock())
1696 EmitAuthRA(MBB);
1697 }
1698}
1699
1701 MachineBasicBlock &MBB) const {
1702 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1703 PrologueEmitter.emitPrologue();
1704}
1705
1707 switch (MI.getOpcode()) {
1708 default:
1709 return false;
1710 case AArch64::CATCHRET:
1711 case AArch64::CLEANUPRET:
1712 return true;
1713 }
1714}
1715
1717 MachineBasicBlock &MBB) const {
1718 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
1719 MachineFrameInfo &MFI = MF.getFrameInfo();
1721 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1722 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1723 DebugLoc DL;
1724 bool NeedsWinCFI = needsWinCFI(MF);
1725 bool EmitCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
1726 bool HasWinCFI = false;
1727 bool IsFunclet = false;
1728
1729 if (MBB.end() != MBBI) {
1730 DL = MBBI->getDebugLoc();
1731 IsFunclet = isFuncletReturnInstr(*MBBI);
1732 }
1733
1734 MachineBasicBlock::iterator EpilogStartI = MBB.end();
1735
1736 auto FinishingTouches = make_scope_exit([&]() {
1738 emitShadowCallStackEpilogue(*TII, MF, MBB, MBB.getFirstTerminator(), DL,
1739 NeedsWinCFI);
1740 HasWinCFI |= NeedsWinCFI;
1741 }
1742 if (EmitCFI)
1743 emitCalleeSavedGPRRestores(MBB, MBB.getFirstTerminator());
1744 if (AFI->shouldSignReturnAddress(MF)) {
1745 // If pac-ret+leaf is in effect, PAUTH_EPILOGUE pseudo instructions
1746 // are inserted by emitPacRetPlusLeafHardening().
1747 if (!shouldSignReturnAddressEverywhere(MF)) {
1748 BuildMI(MBB, MBB.getFirstTerminator(), DL,
1749 TII->get(AArch64::PAUTH_EPILOGUE))
1751 }
1752 // AArch64PointerAuth pass will insert SEH_PACSignLR
1753 HasWinCFI |= NeedsWinCFI;
1754 }
1755 if (HasWinCFI) {
1756 BuildMI(MBB, MBB.getFirstTerminator(), DL,
1757 TII->get(AArch64::SEH_EpilogEnd))
1759 if (!MF.hasWinCFI())
1760 MF.setHasWinCFI(true);
1761 }
1762 if (NeedsWinCFI) {
1763 assert(EpilogStartI != MBB.end());
1764 if (!HasWinCFI)
1765 MBB.erase(EpilogStartI);
1766 }
1767 });
1768
1769 int64_t NumBytes = IsFunclet ? getWinEHFuncletFrameSize(MF)
1770 : MFI.getStackSize();
1771
1772 // All calls are tail calls in GHC calling conv, and functions have no
1773 // prologue/epilogue.
1775 return;
1776
1777 // How much of the stack used by incoming arguments this function is expected
1778 // to restore in this particular epilogue.
1779 int64_t ArgumentStackToRestore = getArgumentStackToRestore(MF, MBB);
1780 bool IsWin64 = Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1781 MF.getFunction().isVarArg());
1782 unsigned FixedObject = getFixedObjectSize(MF, AFI, IsWin64, IsFunclet);
1783
1784 int64_t AfterCSRPopSize = ArgumentStackToRestore;
1785 auto PrologueSaveSize = AFI->getCalleeSavedStackSize() + FixedObject;
1786 // We cannot rely on the local stack size set in emitPrologue if the function
1787 // has funclets, as funclets have different local stack size requirements, and
1788 // the current value set in emitPrologue may be that of the containing
1789 // function.
1790 if (MF.hasEHFunclets())
1791 AFI->setLocalStackSize(NumBytes - PrologueSaveSize);
1792 if (homogeneousPrologEpilog(MF, &MBB)) {
1793 assert(!NeedsWinCFI);
1794 auto FirstHomogenousEpilogI = MBB.getFirstTerminator();
1795 if (FirstHomogenousEpilogI != MBB.begin()) {
1796 auto HomogeneousEpilog = std::prev(FirstHomogenousEpilogI);
1797 if (HomogeneousEpilog->getOpcode() == AArch64::HOM_Epilog)
1798 FirstHomogenousEpilogI = HomogeneousEpilog;
1799 }
1800
1801 // Adjust local stack
1802 emitFrameOffset(MBB, FirstHomogenousEpilogI, DL, AArch64::SP, AArch64::SP,
1804 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
1805
1806 // SP has been already adjusted while restoring callee save regs.
1807 // We've bailed-out the case with adjusting SP for arguments.
1808 assert(AfterCSRPopSize == 0);
1809 return;
1810 }
1811
1812 bool FPAfterSVECalleeSaves =
1813 Subtarget.isTargetWindows() && AFI->getSVECalleeSavedStackSize();
1814
1815 bool CombineSPBump = shouldCombineCSRLocalStackBumpInEpilogue(MBB, NumBytes);
1816 // Assume we can't combine the last pop with the sp restore.
1817 bool CombineAfterCSRBump = false;
1818 if (FPAfterSVECalleeSaves) {
1819 AfterCSRPopSize += FixedObject;
1820 } else if (!CombineSPBump && PrologueSaveSize != 0) {
1821 MachineBasicBlock::iterator Pop = std::prev(MBB.getFirstTerminator());
1822 while (Pop->getOpcode() == TargetOpcode::CFI_INSTRUCTION ||
1823 AArch64InstrInfo::isSEHInstruction(*Pop))
1824 Pop = std::prev(Pop);
1825 // Converting the last ldp to a post-index ldp is valid only if the last
1826 // ldp's offset is 0.
1827 const MachineOperand &OffsetOp = Pop->getOperand(Pop->getNumOperands() - 1);
1828 // If the offset is 0 and the AfterCSR pop is not actually trying to
1829 // allocate more stack for arguments (in space that an untimely interrupt
1830 // may clobber), convert it to a post-index ldp.
1831 if (OffsetOp.getImm() == 0 && AfterCSRPopSize >= 0) {
1832 convertCalleeSaveRestoreToSPPrePostIncDec(
1833 MBB, Pop, DL, TII, PrologueSaveSize, NeedsWinCFI, &HasWinCFI, EmitCFI,
1834 MachineInstr::FrameDestroy, PrologueSaveSize);
1835 } else {
1836 // If not, make sure to emit an add after the last ldp.
1837 // We're doing this by transferring the size to be restored from the
1838 // adjustment *before* the CSR pops to the adjustment *after* the CSR
1839 // pops.
1840 AfterCSRPopSize += PrologueSaveSize;
1841 CombineAfterCSRBump = true;
1842 }
1843 }
1844
1845 // Move past the restores of the callee-saved registers.
1846 // If we plan on combining the sp bump of the local stack size and the callee
1847 // save stack size, we might need to adjust the CSR save and restore offsets.
1848 MachineBasicBlock::iterator FirstGPRRestoreI = MBB.getFirstTerminator();
1849 MachineBasicBlock::iterator Begin = MBB.begin();
1850 while (FirstGPRRestoreI != Begin) {
1851 --FirstGPRRestoreI;
1852 if (!FirstGPRRestoreI->getFlag(MachineInstr::FrameDestroy) ||
1853 (!FPAfterSVECalleeSaves && isSVECalleeSave(FirstGPRRestoreI))) {
1854 ++FirstGPRRestoreI;
1855 break;
1856 } else if (CombineSPBump)
1857 fixupCalleeSaveRestoreStackOffset(
1858 *FirstGPRRestoreI, AFI->getLocalStackSize(), NeedsWinCFI, &HasWinCFI);
1859 }
1860
1861 if (NeedsWinCFI) {
1862 // Note that there are cases where we insert SEH opcodes in the
1863 // epilogue when we had no SEH opcodes in the prologue. For
1864 // example, when there is no stack frame but there are stack
1865 // arguments. Insert the SEH_EpilogStart and remove it later if it
1866 // we didn't emit any SEH opcodes to avoid generating WinCFI for
1867 // functions that don't need it.
1868 BuildMI(MBB, FirstGPRRestoreI, DL, TII->get(AArch64::SEH_EpilogStart))
1870 EpilogStartI = FirstGPRRestoreI;
1871 --EpilogStartI;
1872 }
1873
1874 if (hasFP(MF) && AFI->hasSwiftAsyncContext()) {
1877 // Avoid the reload as it is GOT relative, and instead fall back to the
1878 // hardcoded value below. This allows a mismatch between the OS and
1879 // application without immediately terminating on the difference.
1880 [[fallthrough]];
1882 // We need to reset FP to its untagged state on return. Bit 60 is
1883 // currently used to show the presence of an extended frame.
1884
1885 // BIC x29, x29, #0x1000_0000_0000_0000
1886 BuildMI(MBB, MBB.getFirstTerminator(), DL, TII->get(AArch64::ANDXri),
1887 AArch64::FP)
1888 .addUse(AArch64::FP)
1889 .addImm(0x10fe)
1891 if (NeedsWinCFI) {
1892 BuildMI(MBB, MBBI, DL, TII->get(AArch64::SEH_Nop))
1894 HasWinCFI = true;
1895 }
1896 break;
1897
1899 break;
1900 }
1901 }
1902
1903 const StackOffset &SVEStackSize = getSVEStackSize(MF);
1904
1905 // If there is a single SP update, insert it before the ret and we're done.
1906 if (CombineSPBump) {
1907 assert(!SVEStackSize && "Cannot combine SP bump with SVE");
1908
1909 // When we are about to restore the CSRs, the CFA register is SP again.
1910 if (EmitCFI && hasFP(MF))
1912 .buildDefCFA(AArch64::SP, NumBytes);
1913
1914 emitFrameOffset(MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
1915 StackOffset::getFixed(NumBytes + AfterCSRPopSize), TII,
1916 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI,
1917 EmitCFI, StackOffset::getFixed(NumBytes));
1918 return;
1919 }
1920
1921 NumBytes -= PrologueSaveSize;
1922 assert(NumBytes >= 0 && "Negative stack allocation size!?");
1923
1924 // Process the SVE callee-saves to determine what space needs to be
1925 // deallocated.
1926 StackOffset DeallocateBefore = {}, DeallocateAfter = SVEStackSize;
1927 MachineBasicBlock::iterator RestoreBegin = FirstGPRRestoreI,
1928 RestoreEnd = FirstGPRRestoreI;
1929 if (int64_t CalleeSavedSize = AFI->getSVECalleeSavedStackSize()) {
1930 if (FPAfterSVECalleeSaves)
1931 RestoreEnd = MBB.getFirstTerminator();
1932
1933 RestoreBegin = std::prev(RestoreEnd);
1934 while (RestoreBegin != MBB.begin() &&
1935 isSVECalleeSave(std::prev(RestoreBegin)))
1936 --RestoreBegin;
1937
1938 assert(isSVECalleeSave(RestoreBegin) &&
1939 isSVECalleeSave(std::prev(RestoreEnd)) && "Unexpected instruction");
1940
1941 StackOffset CalleeSavedSizeAsOffset =
1942 StackOffset::getScalable(CalleeSavedSize);
1943 DeallocateBefore = SVEStackSize - CalleeSavedSizeAsOffset;
1944 DeallocateAfter = CalleeSavedSizeAsOffset;
1945 }
1946
1947 // Deallocate the SVE area.
1948 if (FPAfterSVECalleeSaves) {
1949 // If the callee-save area is before FP, restoring the FP implicitly
1950 // deallocates non-callee-save SVE allocations. Otherwise, deallocate
1951 // them explicitly.
1952 if (!AFI->isStackRealigned() && !MFI.hasVarSizedObjects()) {
1953 emitFrameOffset(MBB, FirstGPRRestoreI, DL, AArch64::SP, AArch64::SP,
1954 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
1955 NeedsWinCFI, &HasWinCFI);
1956 }
1957
1958 // Deallocate callee-save non-SVE registers.
1959 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
1961 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
1962
1963 // Deallocate fixed objects.
1964 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
1965 StackOffset::getFixed(FixedObject), TII,
1966 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
1967
1968 // Deallocate callee-save SVE registers.
1969 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
1970 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
1971 NeedsWinCFI, &HasWinCFI);
1972 } else if (SVEStackSize) {
1973 int64_t SVECalleeSavedSize = AFI->getSVECalleeSavedStackSize();
1974 // If we have stack realignment or variable-sized objects we must use the
1975 // FP to restore SVE callee saves (as there is an unknown amount of
1976 // data/padding between the SP and SVE CS area).
1977 Register BaseForSVEDealloc =
1978 (AFI->isStackRealigned() || MFI.hasVarSizedObjects()) ? AArch64::FP
1979 : AArch64::SP;
1980 if (SVECalleeSavedSize && BaseForSVEDealloc == AArch64::FP) {
1981 Register CalleeSaveBase = AArch64::FP;
1982 if (int64_t CalleeSaveBaseOffset =
1984 // If we have have an non-zero offset to the non-SVE CS base we need to
1985 // compute the base address by subtracting the offest in a temporary
1986 // register first (to avoid briefly deallocating the SVE CS).
1987 CalleeSaveBase = MBB.getParent()->getRegInfo().createVirtualRegister(
1988 &AArch64::GPR64RegClass);
1989 emitFrameOffset(MBB, RestoreBegin, DL, CalleeSaveBase, AArch64::FP,
1990 StackOffset::getFixed(-CalleeSaveBaseOffset), TII,
1992 }
1993 // The code below will deallocate the stack space space by moving the
1994 // SP to the start of the SVE callee-save area.
1995 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, CalleeSaveBase,
1996 StackOffset::getScalable(-SVECalleeSavedSize), TII,
1998 } else if (BaseForSVEDealloc == AArch64::SP) {
1999 if (SVECalleeSavedSize) {
2000 // Deallocate the non-SVE locals first before we can deallocate (and
2001 // restore callee saves) from the SVE area.
2003 MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2005 false, NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2006 SVEStackSize + StackOffset::getFixed(NumBytes + PrologueSaveSize));
2007 NumBytes = 0;
2008 }
2009
2010 emitFrameOffset(MBB, RestoreBegin, DL, AArch64::SP, AArch64::SP,
2011 DeallocateBefore, TII, MachineInstr::FrameDestroy, false,
2012 NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2013 SVEStackSize +
2014 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2015
2016 emitFrameOffset(MBB, RestoreEnd, DL, AArch64::SP, AArch64::SP,
2017 DeallocateAfter, TII, MachineInstr::FrameDestroy, false,
2018 NeedsWinCFI, &HasWinCFI, EmitCFI && !hasFP(MF),
2019 DeallocateAfter +
2020 StackOffset::getFixed(NumBytes + PrologueSaveSize));
2021 }
2022 if (EmitCFI)
2023 emitCalleeSavedSVERestores(MBB, RestoreEnd);
2024 }
2025
2026 if (!hasFP(MF)) {
2027 bool RedZone = canUseRedZone(MF);
2028 // If this was a redzone leaf function, we don't need to restore the
2029 // stack pointer (but we may need to pop stack args for fastcc).
2030 if (RedZone && AfterCSRPopSize == 0)
2031 return;
2032
2033 // Pop the local variables off the stack. If there are no callee-saved
2034 // registers, it means we are actually positioned at the terminator and can
2035 // combine stack increment for the locals and the stack increment for
2036 // callee-popped arguments into (possibly) a single instruction and be done.
2037 bool NoCalleeSaveRestore = PrologueSaveSize == 0;
2038 int64_t StackRestoreBytes = RedZone ? 0 : NumBytes;
2039 if (NoCalleeSaveRestore)
2040 StackRestoreBytes += AfterCSRPopSize;
2041
2043 MBB, FirstGPRRestoreI, DL, AArch64::SP, AArch64::SP,
2044 StackOffset::getFixed(StackRestoreBytes), TII,
2045 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2046 StackOffset::getFixed((RedZone ? 0 : NumBytes) + PrologueSaveSize));
2047
2048 // If we were able to combine the local stack pop with the argument pop,
2049 // then we're done.
2050 if (NoCalleeSaveRestore || AfterCSRPopSize == 0) {
2051 return;
2052 }
2053
2054 NumBytes = 0;
2055 }
2056
2057 // Restore the original stack pointer.
2058 // FIXME: Rather than doing the math here, we should instead just use
2059 // non-post-indexed loads for the restores if we aren't actually going to
2060 // be able to save any instructions.
2061 if (!IsFunclet && (MFI.hasVarSizedObjects() || AFI->isStackRealigned())) {
2063 MBB, FirstGPRRestoreI, DL, AArch64::SP, AArch64::FP,
2065 TII, MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2066 } else if (NumBytes)
2067 emitFrameOffset(MBB, FirstGPRRestoreI, DL, AArch64::SP, AArch64::SP,
2068 StackOffset::getFixed(NumBytes), TII,
2069 MachineInstr::FrameDestroy, false, NeedsWinCFI, &HasWinCFI);
2070
2071 // When we are about to restore the CSRs, the CFA register is SP again.
2072 if (EmitCFI && hasFP(MF))
2074 .buildDefCFA(AArch64::SP, PrologueSaveSize);
2075
2076 // This must be placed after the callee-save restore code because that code
2077 // assumes the SP is at the same location as it was after the callee-save save
2078 // code in the prologue.
2079 if (AfterCSRPopSize) {
2080 assert(AfterCSRPopSize > 0 && "attempting to reallocate arg stack that an "
2081 "interrupt may have clobbered");
2082
2084 MBB, MBB.getFirstTerminator(), DL, AArch64::SP, AArch64::SP,
2086 false, NeedsWinCFI, &HasWinCFI, EmitCFI,
2087 StackOffset::getFixed(CombineAfterCSRBump ? PrologueSaveSize : 0));
2088 }
2089}
2090
2093 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
2094}
2095
2097 return enableCFIFixup(MF) &&
2098 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
2099}
2100
2101/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
2102/// debug info. It's the same as what we use for resolving the code-gen
2103/// references for now. FIXME: This can go wrong when references are
2104/// SP-relative and simple call frames aren't used.
2107 Register &FrameReg) const {
2109 MF, FI, FrameReg,
2110 /*PreferFP=*/
2111 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
2112 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
2113 /*ForSimm=*/false);
2114}
2115
2118 int FI) const {
2119 // This function serves to provide a comparable offset from a single reference
2120 // point (the value of SP at function entry) that can be used for analysis,
2121 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
2122 // correct for all objects in the presence of VLA-area objects or dynamic
2123 // stack re-alignment.
2124
2125 const auto &MFI = MF.getFrameInfo();
2126
2127 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2128 StackOffset SVEStackSize = getSVEStackSize(MF);
2129
2130 // For VLA-area objects, just emit an offset at the end of the stack frame.
2131 // Whilst not quite correct, these objects do live at the end of the frame and
2132 // so it is more useful for analysis for the offset to reflect this.
2133 if (MFI.isVariableSizedObjectIndex(FI)) {
2134 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
2135 }
2136
2137 // This is correct in the absence of any SVE stack objects.
2138 if (!SVEStackSize)
2139 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
2140
2141 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2142 bool FPAfterSVECalleeSaves =
2144 if (MFI.getStackID(FI) == TargetStackID::ScalableVector) {
2145 if (FPAfterSVECalleeSaves &&
2146 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize())
2147 return StackOffset::getScalable(ObjectOffset);
2148 return StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
2149 ObjectOffset);
2150 }
2151
2152 bool IsFixed = MFI.isFixedObjectIndex(FI);
2153 bool IsCSR =
2154 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2155
2156 StackOffset ScalableOffset = {};
2157 if (!IsFixed && !IsCSR) {
2158 ScalableOffset = -SVEStackSize;
2159 } else if (FPAfterSVECalleeSaves && IsCSR) {
2160 ScalableOffset =
2162 }
2163
2164 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
2165}
2166
2172
2173StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
2174 int64_t ObjectOffset) const {
2175 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2176 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2177 const Function &F = MF.getFunction();
2178 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
2179 unsigned FixedObject =
2180 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
2181 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
2182 int64_t FPAdjust =
2183 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
2184 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
2185}
2186
2187StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
2188 int64_t ObjectOffset) const {
2189 const auto &MFI = MF.getFrameInfo();
2190 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
2191}
2192
2193// TODO: This function currently does not work for scalable vectors.
2195 int FI) const {
2196 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2197 MF.getSubtarget().getRegisterInfo());
2198 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
2199 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
2200 ? getFPOffset(MF, ObjectOffset).getFixed()
2201 : getStackOffset(MF, ObjectOffset).getFixed();
2202}
2203
2205 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
2206 bool ForSimm) const {
2207 const auto &MFI = MF.getFrameInfo();
2208 int64_t ObjectOffset = MFI.getObjectOffset(FI);
2209 bool isFixed = MFI.isFixedObjectIndex(FI);
2210 bool isSVE = MFI.getStackID(FI) == TargetStackID::ScalableVector;
2211 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, isSVE, FrameReg,
2212 PreferFP, ForSimm);
2213}
2214
2216 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE,
2217 Register &FrameReg, bool PreferFP, bool ForSimm) const {
2218 const auto &MFI = MF.getFrameInfo();
2219 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
2220 MF.getSubtarget().getRegisterInfo());
2221 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2222 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2223
2224 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
2225 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
2226 bool isCSR =
2227 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
2228
2229 const StackOffset &SVEStackSize = getSVEStackSize(MF);
2230
2231 // Use frame pointer to reference fixed objects. Use it for locals if
2232 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
2233 // reliable as a base). Make sure useFPForScavengingIndex() does the
2234 // right thing for the emergency spill slot.
2235 bool UseFP = false;
2236 if (AFI->hasStackFrame() && !isSVE) {
2237 // We shouldn't prefer using the FP to access fixed-sized stack objects when
2238 // there are scalable (SVE) objects in between the FP and the fixed-sized
2239 // objects.
2240 PreferFP &= !SVEStackSize;
2241
2242 // Note: Keeping the following as multiple 'if' statements rather than
2243 // merging to a single expression for readability.
2244 //
2245 // Argument access should always use the FP.
2246 if (isFixed) {
2247 UseFP = hasFP(MF);
2248 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
2249 // References to the CSR area must use FP if we're re-aligning the stack
2250 // since the dynamically-sized alignment padding is between the SP/BP and
2251 // the CSR area.
2252 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
2253 UseFP = true;
2254 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
2255 // If the FPOffset is negative and we're producing a signed immediate, we
2256 // have to keep in mind that the available offset range for negative
2257 // offsets is smaller than for positive ones. If an offset is available
2258 // via the FP and the SP, use whichever is closest.
2259 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
2260 PreferFP |= Offset > -FPOffset && !SVEStackSize;
2261
2262 if (FPOffset >= 0) {
2263 // If the FPOffset is positive, that'll always be best, as the SP/BP
2264 // will be even further away.
2265 UseFP = true;
2266 } else if (MFI.hasVarSizedObjects()) {
2267 // If we have variable sized objects, we can use either FP or BP, as the
2268 // SP offset is unknown. We can use the base pointer if we have one and
2269 // FP is not preferred. If not, we're stuck with using FP.
2270 bool CanUseBP = RegInfo->hasBasePointer(MF);
2271 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
2272 UseFP = PreferFP;
2273 else if (!CanUseBP) // Can't use BP. Forced to use FP.
2274 UseFP = true;
2275 // else we can use BP and FP, but the offset from FP won't fit.
2276 // That will make us scavenge registers which we can probably avoid by
2277 // using BP. If it won't fit for BP either, we'll scavenge anyway.
2278 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
2279 // Funclets access the locals contained in the parent's stack frame
2280 // via the frame pointer, so we have to use the FP in the parent
2281 // function.
2282 (void) Subtarget;
2283 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
2284 MF.getFunction().isVarArg()) &&
2285 "Funclets should only be present on Win64");
2286 UseFP = true;
2287 } else {
2288 // We have the choice between FP and (SP or BP).
2289 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
2290 UseFP = true;
2291 }
2292 }
2293 }
2294
2295 assert(
2296 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
2297 "In the presence of dynamic stack pointer realignment, "
2298 "non-argument/CSR objects cannot be accessed through the frame pointer");
2299
2300 bool FPAfterSVECalleeSaves =
2302
2303 if (isSVE) {
2304 StackOffset FPOffset =
2306 StackOffset SPOffset =
2307 SVEStackSize +
2308 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
2309 ObjectOffset);
2310 if (FPAfterSVECalleeSaves) {
2312 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
2315 }
2316 }
2317 // Always use the FP for SVE spills if available and beneficial.
2318 if (hasFP(MF) && (SPOffset.getFixed() ||
2319 FPOffset.getScalable() < SPOffset.getScalable() ||
2320 RegInfo->hasStackRealignment(MF))) {
2321 FrameReg = RegInfo->getFrameRegister(MF);
2322 return FPOffset;
2323 }
2324
2325 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
2326 : (unsigned)AArch64::SP;
2327 return SPOffset;
2328 }
2329
2330 StackOffset ScalableOffset = {};
2331 if (FPAfterSVECalleeSaves) {
2332 // In this stack layout, the FP is in between the callee saves and other
2333 // SVE allocations.
2334 StackOffset SVECalleeSavedStack =
2336 if (UseFP) {
2337 if (isFixed)
2338 ScalableOffset = SVECalleeSavedStack;
2339 else if (!isCSR)
2340 ScalableOffset = SVECalleeSavedStack - SVEStackSize;
2341 } else {
2342 if (isFixed)
2343 ScalableOffset = SVEStackSize;
2344 else if (isCSR)
2345 ScalableOffset = SVEStackSize - SVECalleeSavedStack;
2346 }
2347 } else {
2348 if (UseFP && !(isFixed || isCSR))
2349 ScalableOffset = -SVEStackSize;
2350 if (!UseFP && (isFixed || isCSR))
2351 ScalableOffset = SVEStackSize;
2352 }
2353
2354 if (UseFP) {
2355 FrameReg = RegInfo->getFrameRegister(MF);
2356 return StackOffset::getFixed(FPOffset) + ScalableOffset;
2357 }
2358
2359 // Use the base pointer if we have one.
2360 if (RegInfo->hasBasePointer(MF))
2361 FrameReg = RegInfo->getBaseRegister();
2362 else {
2363 assert(!MFI.hasVarSizedObjects() &&
2364 "Can't use SP when we have var sized objects.");
2365 FrameReg = AArch64::SP;
2366 // If we're using the red zone for this function, the SP won't actually
2367 // be adjusted, so the offsets will be negative. They're also all
2368 // within range of the signed 9-bit immediate instructions.
2369 if (canUseRedZone(MF))
2370 Offset -= AFI->getLocalStackSize();
2371 }
2372
2373 return StackOffset::getFixed(Offset) + ScalableOffset;
2374}
2375
2376static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
2377 // Do not set a kill flag on values that are also marked as live-in. This
2378 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
2379 // callee saved registers.
2380 // Omitting the kill flags is conservatively correct even if the live-in
2381 // is not used after all.
2382 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
2383 return getKillRegState(!IsLiveIn);
2384}
2385
2387 MachineFunction &MF) {
2388 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2389 AttributeList Attrs = MF.getFunction().getAttributes();
2391 return Subtarget.isTargetMachO() &&
2392 !(Subtarget.getTargetLowering()->supportSwiftError() &&
2393 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
2395 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
2396}
2397
2398static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
2399 bool NeedsWinCFI, bool IsFirst,
2400 const TargetRegisterInfo *TRI) {
2401 // If we are generating register pairs for a Windows function that requires
2402 // EH support, then pair consecutive registers only. There are no unwind
2403 // opcodes for saves/restores of non-consecutive register pairs.
2404 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
2405 // save_lrpair.
2406 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
2407
2408 if (Reg2 == AArch64::FP)
2409 return true;
2410 if (!NeedsWinCFI)
2411 return false;
2412 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
2413 return false;
2414 // If pairing a GPR with LR, the pair can be described by the save_lrpair
2415 // opcode. If this is the first register pair, it would end up with a
2416 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
2417 // if LR is paired with something else than the first register.
2418 // The save_lrpair opcode requires the first register to be an odd one.
2419 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
2420 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
2421 return false;
2422 return true;
2423}
2424
2425/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
2426/// WindowsCFI requires that only consecutive registers can be paired.
2427/// LR and FP need to be allocated together when the frame needs to save
2428/// the frame-record. This means any other register pairing with LR is invalid.
2429static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
2430 bool UsesWinAAPCS, bool NeedsWinCFI,
2431 bool NeedsFrameRecord, bool IsFirst,
2432 const TargetRegisterInfo *TRI) {
2433 if (UsesWinAAPCS)
2434 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
2435 TRI);
2436
2437 // If we need to store the frame record, don't pair any register
2438 // with LR other than FP.
2439 if (NeedsFrameRecord)
2440 return Reg2 == AArch64::LR;
2441
2442 return false;
2443}
2444
2445namespace {
2446
2447struct RegPairInfo {
2448 unsigned Reg1 = AArch64::NoRegister;
2449 unsigned Reg2 = AArch64::NoRegister;
2450 int FrameIdx;
2451 int Offset;
2452 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
2453 const TargetRegisterClass *RC;
2454
2455 RegPairInfo() = default;
2456
2457 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
2458
2459 bool isScalable() const { return Type == PPR || Type == ZPR; }
2460};
2461
2462} // end anonymous namespace
2463
2464unsigned findFreePredicateReg(BitVector &SavedRegs) {
2465 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
2466 if (SavedRegs.test(PReg)) {
2467 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
2468 return PNReg;
2469 }
2470 }
2471 return AArch64::NoRegister;
2472}
2473
2474// The multivector LD/ST are available only for SME or SVE2p1 targets
2476 MachineFunction &MF) {
2478 return false;
2479
2480 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
2481 bool IsLocallyStreaming =
2482 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
2483
2484 // Only when in streaming mode SME2 instructions can be safely used.
2485 // It is not safe to use SME2 instructions when in streaming compatible or
2486 // locally streaming mode.
2487 return Subtarget.hasSVE2p1() ||
2488 (Subtarget.hasSME2() &&
2489 (!IsLocallyStreaming && Subtarget.isStreaming()));
2490}
2491
2493 MachineFunction &MF,
2495 const TargetRegisterInfo *TRI,
2497 bool NeedsFrameRecord) {
2498
2499 if (CSI.empty())
2500 return;
2501
2502 bool IsWindows = isTargetWindows(MF);
2503 bool NeedsWinCFI = AFL.needsWinCFI(MF);
2505 unsigned StackHazardSize = getStackHazardSize(MF);
2506 MachineFrameInfo &MFI = MF.getFrameInfo();
2508 unsigned Count = CSI.size();
2509 (void)CC;
2510 // MachO's compact unwind format relies on all registers being stored in
2511 // pairs.
2512 assert((!produceCompactUnwindFrame(AFL, MF) ||
2515 (Count & 1) == 0) &&
2516 "Odd number of callee-saved regs to spill!");
2517 int ByteOffset = AFI->getCalleeSavedStackSize();
2518 int StackFillDir = -1;
2519 int RegInc = 1;
2520 unsigned FirstReg = 0;
2521 if (NeedsWinCFI) {
2522 // For WinCFI, fill the stack from the bottom up.
2523 ByteOffset = 0;
2524 StackFillDir = 1;
2525 // As the CSI array is reversed to match PrologEpilogInserter, iterate
2526 // backwards, to pair up registers starting from lower numbered registers.
2527 RegInc = -1;
2528 FirstReg = Count - 1;
2529 }
2530 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
2531 int ScalableByteOffset =
2532 FPAfterSVECalleeSaves ? 0 : AFI->getSVECalleeSavedStackSize();
2533 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
2534 Register LastReg = 0;
2535
2536 // When iterating backwards, the loop condition relies on unsigned wraparound.
2537 for (unsigned i = FirstReg; i < Count; i += RegInc) {
2538 RegPairInfo RPI;
2539 RPI.Reg1 = CSI[i].getReg();
2540
2541 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
2542 RPI.Type = RegPairInfo::GPR;
2543 RPI.RC = &AArch64::GPR64RegClass;
2544 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
2545 RPI.Type = RegPairInfo::FPR64;
2546 RPI.RC = &AArch64::FPR64RegClass;
2547 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
2548 RPI.Type = RegPairInfo::FPR128;
2549 RPI.RC = &AArch64::FPR128RegClass;
2550 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
2551 RPI.Type = RegPairInfo::ZPR;
2552 RPI.RC = &AArch64::ZPRRegClass;
2553 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
2554 RPI.Type = RegPairInfo::PPR;
2555 RPI.RC = &AArch64::PPRRegClass;
2556 } else if (RPI.Reg1 == AArch64::VG) {
2557 RPI.Type = RegPairInfo::VG;
2558 RPI.RC = &AArch64::FIXED_REGSRegClass;
2559 } else {
2560 llvm_unreachable("Unsupported register class.");
2561 }
2562
2563 // Add the stack hazard size as we transition from GPR->FPR CSRs.
2564 if (AFI->hasStackHazardSlotIndex() &&
2565 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2567 ByteOffset += StackFillDir * StackHazardSize;
2568 LastReg = RPI.Reg1;
2569
2570 int Scale = TRI->getSpillSize(*RPI.RC);
2571 // Add the next reg to the pair if it is in the same register class.
2572 if (unsigned(i + RegInc) < Count && !AFI->hasStackHazardSlotIndex()) {
2573 MCRegister NextReg = CSI[i + RegInc].getReg();
2574 bool IsFirst = i == FirstReg;
2575 switch (RPI.Type) {
2576 case RegPairInfo::GPR:
2577 if (AArch64::GPR64RegClass.contains(NextReg) &&
2578 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
2579 NeedsWinCFI, NeedsFrameRecord, IsFirst,
2580 TRI))
2581 RPI.Reg2 = NextReg;
2582 break;
2583 case RegPairInfo::FPR64:
2584 if (AArch64::FPR64RegClass.contains(NextReg) &&
2585 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
2586 IsFirst, TRI))
2587 RPI.Reg2 = NextReg;
2588 break;
2589 case RegPairInfo::FPR128:
2590 if (AArch64::FPR128RegClass.contains(NextReg))
2591 RPI.Reg2 = NextReg;
2592 break;
2593 case RegPairInfo::PPR:
2594 break;
2595 case RegPairInfo::ZPR:
2596 if (AFI->getPredicateRegForFillSpill() != 0 &&
2597 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
2598 // Calculate offset of register pair to see if pair instruction can be
2599 // used.
2600 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
2601 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
2602 RPI.Reg2 = NextReg;
2603 }
2604 break;
2605 case RegPairInfo::VG:
2606 break;
2607 }
2608 }
2609
2610 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
2611 // list to come in sorted by frame index so that we can issue the store
2612 // pair instructions directly. Assert if we see anything otherwise.
2613 //
2614 // The order of the registers in the list is controlled by
2615 // getCalleeSavedRegs(), so they will always be in-order, as well.
2616 assert((!RPI.isPaired() ||
2617 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
2618 "Out of order callee saved regs!");
2619
2620 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
2621 RPI.Reg1 == AArch64::LR) &&
2622 "FrameRecord must be allocated together with LR");
2623
2624 // Windows AAPCS has FP and LR reversed.
2625 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
2626 RPI.Reg2 == AArch64::LR) &&
2627 "FrameRecord must be allocated together with LR");
2628
2629 // MachO's compact unwind format relies on all registers being stored in
2630 // adjacent register pairs.
2631 assert((!produceCompactUnwindFrame(AFL, MF) ||
2634 (RPI.isPaired() &&
2635 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
2636 RPI.Reg1 + 1 == RPI.Reg2))) &&
2637 "Callee-save registers not saved as adjacent register pair!");
2638
2639 RPI.FrameIdx = CSI[i].getFrameIdx();
2640 if (NeedsWinCFI &&
2641 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
2642 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
2643
2644 // Realign the scalable offset if necessary. This is relevant when
2645 // spilling predicates on Windows.
2646 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
2647 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
2648 }
2649
2650 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2651 assert(OffsetPre % Scale == 0);
2652
2653 if (RPI.isScalable())
2654 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2655 else
2656 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
2657
2658 // Swift's async context is directly before FP, so allocate an extra
2659 // 8 bytes for it.
2660 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2661 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2662 (IsWindows && RPI.Reg2 == AArch64::LR)))
2663 ByteOffset += StackFillDir * 8;
2664
2665 // Round up size of non-pair to pair size if we need to pad the
2666 // callee-save area to ensure 16-byte alignment.
2667 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
2668 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
2669 ByteOffset % 16 != 0) {
2670 ByteOffset += 8 * StackFillDir;
2671 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
2672 // A stack frame with a gap looks like this, bottom up:
2673 // d9, d8. x21, gap, x20, x19.
2674 // Set extra alignment on the x21 object to create the gap above it.
2675 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
2676 NeedGapToAlignStack = false;
2677 }
2678
2679 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
2680 assert(OffsetPost % Scale == 0);
2681 // If filling top down (default), we want the offset after incrementing it.
2682 // If filling bottom up (WinCFI) we need the original offset.
2683 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
2684
2685 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
2686 // Swift context can directly precede FP.
2687 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
2688 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
2689 (IsWindows && RPI.Reg2 == AArch64::LR)))
2690 Offset += 8;
2691 RPI.Offset = Offset / Scale;
2692
2693 assert((!RPI.isPaired() ||
2694 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
2695 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
2696 "Offset out of bounds for LDP/STP immediate");
2697
2698 auto isFrameRecord = [&] {
2699 if (RPI.isPaired())
2700 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
2701 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
2702 // Otherwise, look for the frame record as two unpaired registers. This is
2703 // needed for -aarch64-stack-hazard-size=<val>, which disables register
2704 // pairing (as the padding may be too large for the LDP/STP offset). Note:
2705 // On Windows, this check works out as current reg == FP, next reg == LR,
2706 // and on other platforms current reg == FP, previous reg == LR. This
2707 // works out as the correct pre-increment or post-increment offsets
2708 // respectively.
2709 return i > 0 && RPI.Reg1 == AArch64::FP &&
2710 CSI[i - 1].getReg() == AArch64::LR;
2711 };
2712
2713 // Save the offset to frame record so that the FP register can point to the
2714 // innermost frame record (spilled FP and LR registers).
2715 if (NeedsFrameRecord && isFrameRecord())
2717
2718 RegPairs.push_back(RPI);
2719 if (RPI.isPaired())
2720 i += RegInc;
2721 }
2722 if (NeedsWinCFI) {
2723 // If we need an alignment gap in the stack, align the topmost stack
2724 // object. A stack frame with a gap looks like this, bottom up:
2725 // x19, d8. d9, gap.
2726 // Set extra alignment on the topmost stack object (the first element in
2727 // CSI, which goes top down), to create the gap above it.
2728 if (AFI->hasCalleeSaveStackFreeSpace())
2729 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
2730 // We iterated bottom up over the registers; flip RegPairs back to top
2731 // down order.
2732 std::reverse(RegPairs.begin(), RegPairs.end());
2733 }
2734}
2735
2739 MachineFunction &MF = *MBB.getParent();
2740 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
2742 bool NeedsWinCFI = needsWinCFI(MF);
2743 DebugLoc DL;
2745
2746 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2747
2749 // Refresh the reserved regs in case there are any potential changes since the
2750 // last freeze.
2751 MRI.freezeReservedRegs();
2752
2753 if (homogeneousPrologEpilog(MF)) {
2754 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
2756
2757 for (auto &RPI : RegPairs) {
2758 MIB.addReg(RPI.Reg1);
2759 MIB.addReg(RPI.Reg2);
2760
2761 // Update register live in.
2762 if (!MRI.isReserved(RPI.Reg1))
2763 MBB.addLiveIn(RPI.Reg1);
2764 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
2765 MBB.addLiveIn(RPI.Reg2);
2766 }
2767 return true;
2768 }
2769 bool PTrueCreated = false;
2770 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
2771 unsigned Reg1 = RPI.Reg1;
2772 unsigned Reg2 = RPI.Reg2;
2773 unsigned StrOpc;
2774
2775 // Issue sequence of spills for cs regs. The first spill may be converted
2776 // to a pre-decrement store later by emitPrologue if the callee-save stack
2777 // area allocation can't be combined with the local stack area allocation.
2778 // For example:
2779 // stp x22, x21, [sp, #0] // addImm(+0)
2780 // stp x20, x19, [sp, #16] // addImm(+2)
2781 // stp fp, lr, [sp, #32] // addImm(+4)
2782 // Rationale: This sequence saves uop updates compared to a sequence of
2783 // pre-increment spills like stp xi,xj,[sp,#-16]!
2784 // Note: Similar rationale and sequence for restores in epilog.
2785 unsigned Size = TRI->getSpillSize(*RPI.RC);
2786 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2787 switch (RPI.Type) {
2788 case RegPairInfo::GPR:
2789 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2790 break;
2791 case RegPairInfo::FPR64:
2792 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2793 break;
2794 case RegPairInfo::FPR128:
2795 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2796 break;
2797 case RegPairInfo::ZPR:
2798 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2799 break;
2800 case RegPairInfo::PPR:
2801 StrOpc =
2802 Size == 16 ? AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO : AArch64::STR_PXI;
2803 break;
2804 case RegPairInfo::VG:
2805 StrOpc = AArch64::STRXui;
2806 break;
2807 }
2808
2809 unsigned X0Scratch = AArch64::NoRegister;
2810 auto RestoreX0 = make_scope_exit([&] {
2811 if (X0Scratch != AArch64::NoRegister)
2812 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2813 .addReg(X0Scratch)
2815 });
2816
2817 if (Reg1 == AArch64::VG) {
2818 // Find an available register to store value of VG to.
2819 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2820 assert(Reg1 != AArch64::NoRegister);
2821 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2822 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2823 .addImm(31)
2824 .addImm(1)
2826 } else {
2828 if (any_of(MBB.liveins(),
2829 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2830 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2831 AArch64::X0, LiveIn.PhysReg);
2832 })) {
2833 X0Scratch = Reg1;
2834 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2835 .addReg(AArch64::X0)
2837 }
2838
2839 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2840 const uint32_t *RegMask =
2841 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2842 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2844 .addRegMask(RegMask)
2845 .addReg(AArch64::X0, RegState::ImplicitDefine)
2847 Reg1 = AArch64::X0;
2848 }
2849 }
2850
2851 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2852 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2853 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2854 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2855 dbgs() << ")\n");
2856
2857 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2858 "Windows unwdinding requires a consecutive (FP,LR) pair");
2859 // Windows unwind codes require consecutive registers if registers are
2860 // paired. Make the switch here, so that the code below will save (x,x+1)
2861 // and not (x+1,x).
2862 unsigned FrameIdxReg1 = RPI.FrameIdx;
2863 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2864 if (NeedsWinCFI && RPI.isPaired()) {
2865 std::swap(Reg1, Reg2);
2866 std::swap(FrameIdxReg1, FrameIdxReg2);
2867 }
2868
2869 if (RPI.isPaired() && RPI.isScalable()) {
2870 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2873 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2874 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2875 "Expects SVE2.1 or SME2 target and a predicate register");
2876#ifdef EXPENSIVE_CHECKS
2877 auto IsPPR = [](const RegPairInfo &c) {
2878 return c.Reg1 == RegPairInfo::PPR;
2879 };
2880 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2881 auto IsZPR = [](const RegPairInfo &c) {
2882 return c.Type == RegPairInfo::ZPR;
2883 };
2884 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2885 assert(!(PPRBegin < ZPRBegin) &&
2886 "Expected callee save predicate to be handled first");
2887#endif
2888 if (!PTrueCreated) {
2889 PTrueCreated = true;
2890 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2892 }
2893 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2894 if (!MRI.isReserved(Reg1))
2895 MBB.addLiveIn(Reg1);
2896 if (!MRI.isReserved(Reg2))
2897 MBB.addLiveIn(Reg2);
2898 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2900 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2901 MachineMemOperand::MOStore, Size, Alignment));
2902 MIB.addReg(PnReg);
2903 MIB.addReg(AArch64::SP)
2904 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2905 // where 2*vscale is implicit
2908 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2909 MachineMemOperand::MOStore, Size, Alignment));
2910 if (NeedsWinCFI)
2912 } else { // The code when the pair of ZReg is not present
2913 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2914 if (!MRI.isReserved(Reg1))
2915 MBB.addLiveIn(Reg1);
2916 if (RPI.isPaired()) {
2917 if (!MRI.isReserved(Reg2))
2918 MBB.addLiveIn(Reg2);
2919 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2921 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2922 MachineMemOperand::MOStore, Size, Alignment));
2923 }
2924 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2925 .addReg(AArch64::SP)
2926 .addImm(RPI.Offset) // [sp, #offset*vscale],
2927 // where factor*vscale is implicit
2930 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2931 MachineMemOperand::MOStore, Size, Alignment));
2932 if (NeedsWinCFI)
2934 }
2935 // Update the StackIDs of the SVE stack slots.
2936 MachineFrameInfo &MFI = MF.getFrameInfo();
2937 if (RPI.Type == RegPairInfo::ZPR || RPI.Type == RegPairInfo::PPR) {
2938 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2939 if (RPI.isPaired())
2940 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2941 }
2942 }
2943 return true;
2944}
2945
2949 MachineFunction &MF = *MBB.getParent();
2951 DebugLoc DL;
2953 bool NeedsWinCFI = needsWinCFI(MF);
2954
2955 if (MBBI != MBB.end())
2956 DL = MBBI->getDebugLoc();
2957
2958 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2959 if (homogeneousPrologEpilog(MF, &MBB)) {
2960 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2962 for (auto &RPI : RegPairs) {
2963 MIB.addReg(RPI.Reg1, RegState::Define);
2964 MIB.addReg(RPI.Reg2, RegState::Define);
2965 }
2966 return true;
2967 }
2968
2969 // For performance reasons restore SVE register in increasing order
2970 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2971 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2972 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2973 std::reverse(PPRBegin, PPREnd);
2974 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2975 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2976 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2977 std::reverse(ZPRBegin, ZPREnd);
2978
2979 bool PTrueCreated = false;
2980 for (const RegPairInfo &RPI : RegPairs) {
2981 unsigned Reg1 = RPI.Reg1;
2982 unsigned Reg2 = RPI.Reg2;
2983
2984 // Issue sequence of restores for cs regs. The last restore may be converted
2985 // to a post-increment load later by emitEpilogue if the callee-save stack
2986 // area allocation can't be combined with the local stack area allocation.
2987 // For example:
2988 // ldp fp, lr, [sp, #32] // addImm(+4)
2989 // ldp x20, x19, [sp, #16] // addImm(+2)
2990 // ldp x22, x21, [sp, #0] // addImm(+0)
2991 // Note: see comment in spillCalleeSavedRegisters()
2992 unsigned LdrOpc;
2993 unsigned Size = TRI->getSpillSize(*RPI.RC);
2994 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2995 switch (RPI.Type) {
2996 case RegPairInfo::GPR:
2997 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2998 break;
2999 case RegPairInfo::FPR64:
3000 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
3001 break;
3002 case RegPairInfo::FPR128:
3003 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
3004 break;
3005 case RegPairInfo::ZPR:
3006 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
3007 break;
3008 case RegPairInfo::PPR:
3009 LdrOpc = Size == 16 ? AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO
3010 : AArch64::LDR_PXI;
3011 break;
3012 case RegPairInfo::VG:
3013 continue;
3014 }
3015 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
3016 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
3017 dbgs() << ") -> fi#(" << RPI.FrameIdx;
3018 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
3019 dbgs() << ")\n");
3020
3021 // Windows unwind codes require consecutive registers if registers are
3022 // paired. Make the switch here, so that the code below will save (x,x+1)
3023 // and not (x+1,x).
3024 unsigned FrameIdxReg1 = RPI.FrameIdx;
3025 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
3026 if (NeedsWinCFI && RPI.isPaired()) {
3027 std::swap(Reg1, Reg2);
3028 std::swap(FrameIdxReg1, FrameIdxReg2);
3029 }
3030
3032 if (RPI.isPaired() && RPI.isScalable()) {
3033 [[maybe_unused]] const AArch64Subtarget &Subtarget =
3035 unsigned PnReg = AFI->getPredicateRegForFillSpill();
3036 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
3037 "Expects SVE2.1 or SME2 target and a predicate register");
3038#ifdef EXPENSIVE_CHECKS
3039 assert(!(PPRBegin < ZPRBegin) &&
3040 "Expected callee save predicate to be handled first");
3041#endif
3042 if (!PTrueCreated) {
3043 PTrueCreated = true;
3044 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
3046 }
3047 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3048 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
3049 getDefRegState(true));
3051 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3052 MachineMemOperand::MOLoad, Size, Alignment));
3053 MIB.addReg(PnReg);
3054 MIB.addReg(AArch64::SP)
3055 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
3056 // where 2*vscale is implicit
3059 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3060 MachineMemOperand::MOLoad, Size, Alignment));
3061 if (NeedsWinCFI)
3063 } else {
3064 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
3065 if (RPI.isPaired()) {
3066 MIB.addReg(Reg2, getDefRegState(true));
3068 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
3069 MachineMemOperand::MOLoad, Size, Alignment));
3070 }
3071 MIB.addReg(Reg1, getDefRegState(true));
3072 MIB.addReg(AArch64::SP)
3073 .addImm(RPI.Offset) // [sp, #offset*vscale]
3074 // where factor*vscale is implicit
3077 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
3078 MachineMemOperand::MOLoad, Size, Alignment));
3079 if (NeedsWinCFI)
3081 }
3082 }
3083 return true;
3084}
3085
3086// Return the FrameID for a MMO.
3087static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
3088 const MachineFrameInfo &MFI) {
3089 auto *PSV =
3091 if (PSV)
3092 return std::optional<int>(PSV->getFrameIndex());
3093
3094 if (MMO->getValue()) {
3095 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
3096 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
3097 FI++)
3098 if (MFI.getObjectAllocation(FI) == Al)
3099 return FI;
3100 }
3101 }
3102
3103 return std::nullopt;
3104}
3105
3106// Return the FrameID for a Load/Store instruction by looking at the first MMO.
3107static std::optional<int> getLdStFrameID(const MachineInstr &MI,
3108 const MachineFrameInfo &MFI) {
3109 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3110 return std::nullopt;
3111
3112 return getMMOFrameID(*MI.memoperands_begin(), MFI);
3113}
3114
3115// Check if a Hazard slot is needed for the current function, and if so create
3116// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
3117// which can be used to determine if any hazard padding is needed.
3118void AArch64FrameLowering::determineStackHazardSlot(
3119 MachineFunction &MF, BitVector &SavedRegs) const {
3120 unsigned StackHazardSize = getStackHazardSize(MF);
3121 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3122 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
3124 return;
3125
3126 // Stack hazards are only needed in streaming functions.
3127 SMEAttrs Attrs = AFI->getSMEFnAttrs();
3128 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
3129 return;
3130
3131 MachineFrameInfo &MFI = MF.getFrameInfo();
3132
3133 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
3134 // stack objects.
3135 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
3136 return AArch64::FPR64RegClass.contains(Reg) ||
3137 AArch64::FPR128RegClass.contains(Reg) ||
3138 AArch64::ZPRRegClass.contains(Reg) ||
3139 AArch64::PPRRegClass.contains(Reg);
3140 });
3141 bool HasFPRStackObjects = false;
3142 if (!HasFPRCSRs) {
3143 std::vector<unsigned> FrameObjects(MFI.getObjectIndexEnd());
3144 for (auto &MBB : MF) {
3145 for (auto &MI : MBB) {
3146 std::optional<int> FI = getLdStFrameID(MI, MFI);
3147 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3148 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3150 FrameObjects[*FI] |= 2;
3151 else
3152 FrameObjects[*FI] |= 1;
3153 }
3154 }
3155 }
3156 HasFPRStackObjects =
3157 any_of(FrameObjects, [](unsigned B) { return (B & 3) == 2; });
3158 }
3159
3160 if (HasFPRCSRs || HasFPRStackObjects) {
3161 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
3162 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
3163 << StackHazardSize << "\n");
3165 }
3166}
3167
3169 BitVector &SavedRegs,
3170 RegScavenger *RS) const {
3171 // All calls are tail calls in GHC calling conv, and functions have no
3172 // prologue/epilogue.
3174 return;
3175
3177 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
3178 MF.getSubtarget().getRegisterInfo());
3179 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
3181 unsigned UnspilledCSGPR = AArch64::NoRegister;
3182 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
3183
3184 MachineFrameInfo &MFI = MF.getFrameInfo();
3185 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
3186
3187 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
3188 ? RegInfo->getBaseRegister()
3189 : (unsigned)AArch64::NoRegister;
3190
3191 unsigned ExtraCSSpill = 0;
3192 bool HasUnpairedGPR64 = false;
3193 bool HasPairZReg = false;
3194 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
3195 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
3196
3197 // Figure out which callee-saved registers to save/restore.
3198 for (unsigned i = 0; CSRegs[i]; ++i) {
3199 const unsigned Reg = CSRegs[i];
3200
3201 // Add the base pointer register to SavedRegs if it is callee-save.
3202 if (Reg == BasePointerReg)
3203 SavedRegs.set(Reg);
3204
3205 // Don't save manually reserved registers set through +reserve-x#i,
3206 // even for callee-saved registers, as per GCC's behavior.
3207 if (UserReservedRegs[Reg]) {
3208 SavedRegs.reset(Reg);
3209 continue;
3210 }
3211
3212 bool RegUsed = SavedRegs.test(Reg);
3213 unsigned PairedReg = AArch64::NoRegister;
3214 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
3215 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
3216 AArch64::FPR128RegClass.contains(Reg)) {
3217 // Compensate for odd numbers of GP CSRs.
3218 // For now, all the known cases of odd number of CSRs are of GPRs.
3219 if (HasUnpairedGPR64)
3220 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
3221 else
3222 PairedReg = CSRegs[i ^ 1];
3223 }
3224
3225 // If the function requires all the GP registers to save (SavedRegs),
3226 // and there are an odd number of GP CSRs at the same time (CSRegs),
3227 // PairedReg could be in a different register class from Reg, which would
3228 // lead to a FPR (usually D8) accidentally being marked saved.
3229 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
3230 PairedReg = AArch64::NoRegister;
3231 HasUnpairedGPR64 = true;
3232 }
3233 assert(PairedReg == AArch64::NoRegister ||
3234 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
3235 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
3236 AArch64::FPR128RegClass.contains(Reg, PairedReg));
3237
3238 if (!RegUsed) {
3239 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
3240 UnspilledCSGPR = Reg;
3241 UnspilledCSGPRPaired = PairedReg;
3242 }
3243 continue;
3244 }
3245
3246 // Always save P4 when PPR spills are ZPR-sized and a predicate above p8 is
3247 // spilled. If all of p0-p3 are used as return values p4 is must be free
3248 // to reload p8-p15.
3249 if (RegInfo->getSpillSize(AArch64::PPRRegClass) == 16 &&
3250 AArch64::PPR_p8to15RegClass.contains(Reg)) {
3251 SavedRegs.set(AArch64::P4);
3252 }
3253
3254 // MachO's compact unwind format relies on all registers being stored in
3255 // pairs.
3256 // FIXME: the usual format is actually better if unwinding isn't needed.
3257 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
3258 !SavedRegs.test(PairedReg)) {
3259 SavedRegs.set(PairedReg);
3260 if (AArch64::GPR64RegClass.contains(PairedReg) &&
3261 !ReservedRegs[PairedReg])
3262 ExtraCSSpill = PairedReg;
3263 }
3264 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
3265 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
3266 SavedRegs.test(CSRegs[i ^ 1]));
3267 }
3268
3269 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
3271 // Find a suitable predicate register for the multi-vector spill/fill
3272 // instructions.
3273 unsigned PnReg = findFreePredicateReg(SavedRegs);
3274 if (PnReg != AArch64::NoRegister)
3275 AFI->setPredicateRegForFillSpill(PnReg);
3276 // If no free callee-save has been found assign one.
3277 if (!AFI->getPredicateRegForFillSpill() &&
3278 MF.getFunction().getCallingConv() ==
3280 SavedRegs.set(AArch64::P8);
3281 AFI->setPredicateRegForFillSpill(AArch64::PN8);
3282 }
3283
3284 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
3285 "Predicate cannot be a reserved register");
3286 }
3287
3289 !Subtarget.isTargetWindows()) {
3290 // For Windows calling convention on a non-windows OS, where X18 is treated
3291 // as reserved, back up X18 when entering non-windows code (marked with the
3292 // Windows calling convention) and restore when returning regardless of
3293 // whether the individual function uses it - it might call other functions
3294 // that clobber it.
3295 SavedRegs.set(AArch64::X18);
3296 }
3297
3298 // Calculates the callee saved stack size.
3299 unsigned CSStackSize = 0;
3300 unsigned SVECSStackSize = 0;
3302 for (unsigned Reg : SavedRegs.set_bits()) {
3303 auto *RC = TRI->getMinimalPhysRegClass(Reg);
3304 assert(RC && "expected register class!");
3305 auto SpillSize = TRI->getSpillSize(*RC);
3306 if (AArch64::PPRRegClass.contains(Reg) ||
3307 AArch64::ZPRRegClass.contains(Reg))
3308 SVECSStackSize += SpillSize;
3309 else
3310 CSStackSize += SpillSize;
3311 }
3312
3313 // Save number of saved regs, so we can easily update CSStackSize later to
3314 // account for any additional 64-bit GPR saves. Note: After this point
3315 // only 64-bit GPRs can be added to SavedRegs.
3316 unsigned NumSavedRegs = SavedRegs.count();
3317
3318 // Increase the callee-saved stack size if the function has streaming mode
3319 // changes, as we will need to spill the value of the VG register.
3320 if (requiresSaveVG(MF))
3321 CSStackSize += 8;
3322
3323 // Determine if a Hazard slot should be used, and increase the CSStackSize by
3324 // StackHazardSize if so.
3325 determineStackHazardSlot(MF, SavedRegs);
3326 if (AFI->hasStackHazardSlotIndex())
3327 CSStackSize += getStackHazardSize(MF);
3328
3329 // If we must call __arm_get_current_vg in the prologue preserve the LR.
3330 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
3331 SavedRegs.set(AArch64::LR);
3332
3333 // The frame record needs to be created by saving the appropriate registers
3334 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
3335 if (hasFP(MF) ||
3336 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
3337 SavedRegs.set(AArch64::FP);
3338 SavedRegs.set(AArch64::LR);
3339 }
3340
3341 LLVM_DEBUG({
3342 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
3343 for (unsigned Reg : SavedRegs.set_bits())
3344 dbgs() << ' ' << printReg(Reg, RegInfo);
3345 dbgs() << "\n";
3346 });
3347
3348 // If any callee-saved registers are used, the frame cannot be eliminated.
3349 int64_t SVEStackSize =
3350 alignTo(SVECSStackSize + estimateSVEStackObjectOffsets(MFI), 16);
3351 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
3352
3353 // The CSR spill slots have not been allocated yet, so estimateStackSize
3354 // won't include them.
3355 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
3356
3357 // We may address some of the stack above the canonical frame address, either
3358 // for our own arguments or during a call. Include that in calculating whether
3359 // we have complicated addressing concerns.
3360 int64_t CalleeStackUsed = 0;
3361 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
3362 int64_t FixedOff = MFI.getObjectOffset(I);
3363 if (FixedOff > CalleeStackUsed)
3364 CalleeStackUsed = FixedOff;
3365 }
3366
3367 // Conservatively always assume BigStack when there are SVE spills.
3368 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
3369 CalleeStackUsed) > EstimatedStackSizeLimit;
3370 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
3371 AFI->setHasStackFrame(true);
3372
3373 // Estimate if we might need to scavenge a register at some point in order
3374 // to materialize a stack offset. If so, either spill one additional
3375 // callee-saved register or reserve a special spill slot to facilitate
3376 // register scavenging. If we already spilled an extra callee-saved register
3377 // above to keep the number of spills even, we don't need to do anything else
3378 // here.
3379 if (BigStack) {
3380 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
3381 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
3382 << " to get a scratch register.\n");
3383 SavedRegs.set(UnspilledCSGPR);
3384 ExtraCSSpill = UnspilledCSGPR;
3385
3386 // MachO's compact unwind format relies on all registers being stored in
3387 // pairs, so if we need to spill one extra for BigStack, then we need to
3388 // store the pair.
3389 if (producePairRegisters(MF)) {
3390 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
3391 // Failed to make a pair for compact unwind format, revert spilling.
3392 if (produceCompactUnwindFrame(*this, MF)) {
3393 SavedRegs.reset(UnspilledCSGPR);
3394 ExtraCSSpill = AArch64::NoRegister;
3395 }
3396 } else
3397 SavedRegs.set(UnspilledCSGPRPaired);
3398 }
3399 }
3400
3401 // If we didn't find an extra callee-saved register to spill, create
3402 // an emergency spill slot.
3403 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
3405 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
3406 unsigned Size = TRI->getSpillSize(RC);
3407 Align Alignment = TRI->getSpillAlign(RC);
3408 int FI = MFI.CreateSpillStackObject(Size, Alignment);
3410 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
3411 << " as the emergency spill slot.\n");
3412 }
3413 }
3414
3415 // Adding the size of additional 64bit GPR saves.
3416 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
3417
3418 // A Swift asynchronous context extends the frame record with a pointer
3419 // directly before FP.
3420 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
3421 CSStackSize += 8;
3422
3423 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
3424 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
3425 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
3426
3428 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
3429 "Should not invalidate callee saved info");
3430
3431 // Round up to register pair alignment to avoid additional SP adjustment
3432 // instructions.
3433 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
3434 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
3435 AFI->setSVECalleeSavedStackSize(alignTo(SVECSStackSize, 16));
3436}
3437
3439 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
3440 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
3441 unsigned &MaxCSFrameIndex) const {
3442 bool NeedsWinCFI = needsWinCFI(MF);
3443 unsigned StackHazardSize = getStackHazardSize(MF);
3444 // To match the canonical windows frame layout, reverse the list of
3445 // callee saved registers to get them laid out by PrologEpilogInserter
3446 // in the right order. (PrologEpilogInserter allocates stack objects top
3447 // down. Windows canonical prologs store higher numbered registers at
3448 // the top, thus have the CSI array start from the highest registers.)
3449 if (NeedsWinCFI)
3450 std::reverse(CSI.begin(), CSI.end());
3451
3452 if (CSI.empty())
3453 return true; // Early exit if no callee saved registers are modified!
3454
3455 // Now that we know which registers need to be saved and restored, allocate
3456 // stack slots for them.
3457 MachineFrameInfo &MFI = MF.getFrameInfo();
3458 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3459
3460 bool UsesWinAAPCS = isTargetWindows(MF);
3461 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
3462 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
3463 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3464 if ((unsigned)FrameIdx < MinCSFrameIndex)
3465 MinCSFrameIndex = FrameIdx;
3466 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3467 MaxCSFrameIndex = FrameIdx;
3468 }
3469
3470 // Insert VG into the list of CSRs, immediately before LR if saved.
3471 if (requiresSaveVG(MF)) {
3472 CalleeSavedInfo VGInfo(AArch64::VG);
3473 auto It =
3474 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
3475 if (It != CSI.end())
3476 CSI.insert(It, VGInfo);
3477 else
3478 CSI.push_back(VGInfo);
3479 }
3480
3481 Register LastReg = 0;
3482 int HazardSlotIndex = std::numeric_limits<int>::max();
3483 for (auto &CS : CSI) {
3484 MCRegister Reg = CS.getReg();
3485 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
3486
3487 // Create a hazard slot as we switch between GPR and FPR CSRs.
3488 if (AFI->hasStackHazardSlotIndex() &&
3489 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
3491 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
3492 "Unexpected register order for hazard slot");
3493 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3494 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3495 << "\n");
3496 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3497 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3498 MinCSFrameIndex = HazardSlotIndex;
3499 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3500 MaxCSFrameIndex = HazardSlotIndex;
3501 }
3502
3503 unsigned Size = RegInfo->getSpillSize(*RC);
3504 Align Alignment(RegInfo->getSpillAlign(*RC));
3505 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
3506 CS.setFrameIdx(FrameIdx);
3507
3508 if ((unsigned)FrameIdx < MinCSFrameIndex)
3509 MinCSFrameIndex = FrameIdx;
3510 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3511 MaxCSFrameIndex = FrameIdx;
3512
3513 // Grab 8 bytes below FP for the extended asynchronous frame info.
3514 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
3515 Reg == AArch64::FP) {
3516 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
3517 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
3518 if ((unsigned)FrameIdx < MinCSFrameIndex)
3519 MinCSFrameIndex = FrameIdx;
3520 if ((unsigned)FrameIdx > MaxCSFrameIndex)
3521 MaxCSFrameIndex = FrameIdx;
3522 }
3523 LastReg = Reg;
3524 }
3525
3526 // Add hazard slot in the case where no FPR CSRs are present.
3527 if (AFI->hasStackHazardSlotIndex() &&
3528 HazardSlotIndex == std::numeric_limits<int>::max()) {
3529 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
3530 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
3531 << "\n");
3532 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
3533 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
3534 MinCSFrameIndex = HazardSlotIndex;
3535 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
3536 MaxCSFrameIndex = HazardSlotIndex;
3537 }
3538
3539 return true;
3540}
3541
3543 const MachineFunction &MF) const {
3545 // If the function has streaming-mode changes, don't scavenge a
3546 // spillslot in the callee-save area, as that might require an
3547 // 'addvl' in the streaming-mode-changing call-sequence when the
3548 // function doesn't use a FP.
3549 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
3550 return false;
3551 // Don't allow register salvaging with hazard slots, in case it moves objects
3552 // into the wrong place.
3553 if (AFI->hasStackHazardSlotIndex())
3554 return false;
3555 return AFI->hasCalleeSaveStackFreeSpace();
3556}
3557
3558/// returns true if there are any SVE callee saves.
3560 int &Min, int &Max) {
3561 Min = std::numeric_limits<int>::max();
3562 Max = std::numeric_limits<int>::min();
3563
3564 if (!MFI.isCalleeSavedInfoValid())
3565 return false;
3566
3567 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
3568 for (auto &CS : CSI) {
3569 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
3570 AArch64::PPRRegClass.contains(CS.getReg())) {
3571 assert((Max == std::numeric_limits<int>::min() ||
3572 Max + 1 == CS.getFrameIdx()) &&
3573 "SVE CalleeSaves are not consecutive");
3574
3575 Min = std::min(Min, CS.getFrameIdx());
3576 Max = std::max(Max, CS.getFrameIdx());
3577 }
3578 }
3579 return Min != std::numeric_limits<int>::max();
3580}
3581
3582// Process all the SVE stack objects and determine offsets for each
3583// object. If AssignOffsets is true, the offsets get assigned.
3584// Fills in the first and last callee-saved frame indices into
3585// Min/MaxCSFrameIndex, respectively.
3586// Returns the size of the stack.
3588 int &MinCSFrameIndex,
3589 int &MaxCSFrameIndex,
3590 bool AssignOffsets) {
3591#ifndef NDEBUG
3592 // First process all fixed stack objects.
3593 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
3595 "SVE vectors should never be passed on the stack by value, only by "
3596 "reference.");
3597#endif
3598
3599 auto Assign = [&MFI](int FI, int64_t Offset) {
3600 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
3601 MFI.setObjectOffset(FI, Offset);
3602 };
3603
3604 int64_t Offset = 0;
3605
3606 // Then process all callee saved slots.
3607 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
3608 // Assign offsets to the callee save slots.
3609 for (int I = MinCSFrameIndex; I <= MaxCSFrameIndex; ++I) {
3610 Offset += MFI.getObjectSize(I);
3612 if (AssignOffsets)
3613 Assign(I, -Offset);
3614 }
3615 }
3616
3617 // Ensure that the Callee-save area is aligned to 16bytes.
3618 Offset = alignTo(Offset, Align(16U));
3619
3620 // Create a buffer of SVE objects to allocate and sort it.
3621 SmallVector<int, 8> ObjectsToAllocate;
3622 // If we have a stack protector, and we've previously decided that we have SVE
3623 // objects on the stack and thus need it to go in the SVE stack area, then it
3624 // needs to go first.
3625 int StackProtectorFI = -1;
3626 if (MFI.hasStackProtectorIndex()) {
3627 StackProtectorFI = MFI.getStackProtectorIndex();
3628 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
3629 ObjectsToAllocate.push_back(StackProtectorFI);
3630 }
3631 for (int I = 0, E = MFI.getObjectIndexEnd(); I != E; ++I) {
3632 unsigned StackID = MFI.getStackID(I);
3633 if (StackID != TargetStackID::ScalableVector)
3634 continue;
3635 if (I == StackProtectorFI)
3636 continue;
3637 if (MaxCSFrameIndex >= I && I >= MinCSFrameIndex)
3638 continue;
3639 if (MFI.isDeadObjectIndex(I))
3640 continue;
3641
3642 ObjectsToAllocate.push_back(I);
3643 }
3644
3645 // Allocate all SVE locals and spills
3646 for (unsigned FI : ObjectsToAllocate) {
3647 Align Alignment = MFI.getObjectAlign(FI);
3648 // FIXME: Given that the length of SVE vectors is not necessarily a power of
3649 // two, we'd need to align every object dynamically at runtime if the
3650 // alignment is larger than 16. This is not yet supported.
3651 if (Alignment > Align(16))
3653 "Alignment of scalable vectors > 16 bytes is not yet supported");
3654
3655 Offset = alignTo(Offset + MFI.getObjectSize(FI), Alignment);
3656 if (AssignOffsets)
3657 Assign(FI, -Offset);
3658 }
3659
3660 return Offset;
3661}
3662
3663int64_t AArch64FrameLowering::estimateSVEStackObjectOffsets(
3664 MachineFrameInfo &MFI) const {
3665 int MinCSFrameIndex, MaxCSFrameIndex;
3666 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex, false);
3667}
3668
3669int64_t AArch64FrameLowering::assignSVEStackObjectOffsets(
3670 MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex) const {
3671 return determineSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex,
3672 true);
3673}
3674
3675/// Attempts to scavenge a register from \p ScavengeableRegs given the used
3676/// registers in \p UsedRegs.
3679 Register PreferredReg) {
3680 if (PreferredReg != AArch64::NoRegister && UsedRegs.available(PreferredReg))
3681 return PreferredReg;
3682 for (auto Reg : ScavengeableRegs.set_bits()) {
3683 if (UsedRegs.available(Reg))
3684 return Reg;
3685 }
3686 return AArch64::NoRegister;
3687}
3688
3689/// Propagates frame-setup/destroy flags from \p SourceMI to all instructions in
3690/// \p MachineInstrs.
3691static void propagateFrameFlags(MachineInstr &SourceMI,
3692 ArrayRef<MachineInstr *> MachineInstrs) {
3693 for (MachineInstr *MI : MachineInstrs) {
3694 if (SourceMI.getFlag(MachineInstr::FrameSetup))
3695 MI->setFlag(MachineInstr::FrameSetup);
3696 if (SourceMI.getFlag(MachineInstr::FrameDestroy))
3698 }
3699}
3700
3701/// RAII helper class for scavenging or spilling a register. On construction
3702/// attempts to find a free register of class \p RC (given \p UsedRegs and \p
3703/// AllocatableRegs), if no register can be found spills \p SpillCandidate to \p
3704/// MaybeSpillFI to free a register. The free'd register is returned via the \p
3705/// FreeReg output parameter. On destruction, if there is a spill, its previous
3706/// value is reloaded. The spilling and scavenging is only valid at the
3707/// insertion point \p MBBI, this class should _not_ be used in places that
3708/// create or manipulate basic blocks, moving the expected insertion point.
3712
3715 Register SpillCandidate, const TargetRegisterClass &RC,
3716 LiveRegUnits const &UsedRegs,
3717 BitVector const &AllocatableRegs,
3718 std::optional<int> *MaybeSpillFI,
3719 Register PreferredReg = AArch64::NoRegister)
3720 : MBB(MBB), MBBI(MBBI), RC(RC), TII(static_cast<const AArch64InstrInfo &>(
3721 *MF.getSubtarget().getInstrInfo())),
3722 TRI(*MF.getSubtarget().getRegisterInfo()) {
3723 FreeReg = tryScavengeRegister(UsedRegs, AllocatableRegs, PreferredReg);
3724 if (FreeReg != AArch64::NoRegister)
3725 return;
3726 assert(MaybeSpillFI && "Expected emergency spill slot FI information "
3727 "(attempted to spill in prologue/epilogue?)");
3728 if (!MaybeSpillFI->has_value()) {
3729 MachineFrameInfo &MFI = MF.getFrameInfo();
3730 *MaybeSpillFI = MFI.CreateSpillStackObject(TRI.getSpillSize(RC),
3731 TRI.getSpillAlign(RC));
3732 }
3733 FreeReg = SpillCandidate;
3734 SpillFI = MaybeSpillFI->value();
3735 TII.storeRegToStackSlot(MBB, MBBI, FreeReg, false, *SpillFI, &RC, &TRI,
3736 Register());
3737 }
3738
3739 bool hasSpilled() const { return SpillFI.has_value(); }
3740
3741 /// Returns the free register (found from scavenging or spilling a register).
3742 Register freeRegister() const { return FreeReg; }
3743
3744 Register operator*() const { return freeRegister(); }
3745
3747 if (hasSpilled())
3748 TII.loadRegFromStackSlot(MBB, MBBI, FreeReg, *SpillFI, &RC, &TRI,
3749 Register());
3750 }
3751
3752private:
3755 const TargetRegisterClass &RC;
3756 const AArch64InstrInfo &TII;
3757 const TargetRegisterInfo &TRI;
3758 Register FreeReg = AArch64::NoRegister;
3759 std::optional<int> SpillFI;
3760};
3761
3762/// Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and
3763/// FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3765 std::optional<int> ZPRSpillFI;
3766 std::optional<int> PPRSpillFI;
3767 std::optional<int> GPRSpillFI;
3768};
3769
3770/// Registers available for scavenging (ZPR, PPR3b, GPR).
3776
3778 return MI.getFlag(MachineInstr::FrameSetup) ||
3780}
3781
3782/// Expands:
3783/// ```
3784/// SPILL_PPR_TO_ZPR_SLOT_PSEUDO $p0, %stack.0, 0
3785/// ```
3786/// To:
3787/// ```
3788/// $z0 = CPY_ZPzI_B $p0, 1, 0
3789/// STR_ZXI $z0, $stack.0, 0
3790/// ```
3791/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3792/// spilling if necessary).
3795 const TargetRegisterInfo &TRI,
3796 LiveRegUnits const &UsedRegs,
3797 ScavengeableRegs const &SR,
3798 EmergencyStackSlots &SpillSlots) {
3799 MachineFunction &MF = *MBB.getParent();
3800 auto *TII =
3801 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3802
3803 ScopedScavengeOrSpill ZPredReg(
3804 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3805 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3806
3807 SmallVector<MachineInstr *, 2> MachineInstrs;
3808 const DebugLoc &DL = MI.getDebugLoc();
3809 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::CPY_ZPzI_B))
3810 .addReg(*ZPredReg, RegState::Define)
3811 .add(MI.getOperand(0))
3812 .addImm(1)
3813 .addImm(0)
3814 .getInstr());
3815 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::STR_ZXI))
3816 .addReg(*ZPredReg)
3817 .add(MI.getOperand(1))
3818 .addImm(MI.getOperand(2).getImm())
3819 .setMemRefs(MI.memoperands())
3820 .getInstr());
3821 propagateFrameFlags(MI, MachineInstrs);
3822}
3823
3824/// Expands:
3825/// ```
3826/// $p0 = FILL_PPR_FROM_ZPR_SLOT_PSEUDO %stack.0, 0
3827/// ```
3828/// To:
3829/// ```
3830/// $z0 = LDR_ZXI %stack.0, 0
3831/// $p0 = PTRUE_B 31, implicit $vg
3832/// $p0 = CMPNE_PPzZI_B $p0, $z0, 0, implicit-def $nzcv, implicit-def $nzcv
3833/// ```
3834/// While ensuring a ZPR ($z0 in this example) is free for the predicate (
3835/// spilling if necessary). If the status flags are in use at the point of
3836/// expansion they are preserved (by moving them to/from a GPR). This may cause
3837/// an additional spill if no GPR is free at the expansion point.
3840 LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR,
3841 MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots) {
3842 MachineFunction &MF = *MBB.getParent();
3843 auto *TII =
3844 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
3845
3846 ScopedScavengeOrSpill ZPredReg(
3847 MF, MBB, MI, AArch64::Z0, AArch64::ZPRRegClass, UsedRegs, SR.ZPRRegs,
3848 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.ZPRSpillFI);
3849
3850 ScopedScavengeOrSpill PredReg(
3851 MF, MBB, MI, AArch64::P0, AArch64::PPR_3bRegClass, UsedRegs, SR.PPR3bRegs,
3852 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.PPRSpillFI,
3853 /*PreferredReg=*/
3854 LastPTrue ? LastPTrue->getOperand(0).getReg() : AArch64::NoRegister);
3855
3856 // Elide NZCV spills if we know it is not used.
3857 bool IsNZCVUsed = !UsedRegs.available(AArch64::NZCV);
3858 std::optional<ScopedScavengeOrSpill> NZCVSaveReg;
3859 if (IsNZCVUsed)
3860 NZCVSaveReg.emplace(
3861 MF, MBB, MI, AArch64::X0, AArch64::GPR64RegClass, UsedRegs, SR.GPRRegs,
3862 isInPrologueOrEpilogue(MI) ? nullptr : &SpillSlots.GPRSpillFI);
3863 SmallVector<MachineInstr *, 4> MachineInstrs;
3864 const DebugLoc &DL = MI.getDebugLoc();
3865 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::LDR_ZXI))
3866 .addReg(*ZPredReg, RegState::Define)
3867 .add(MI.getOperand(1))
3868 .addImm(MI.getOperand(2).getImm())
3869 .setMemRefs(MI.memoperands())
3870 .getInstr());
3871 if (IsNZCVUsed)
3872 MachineInstrs.push_back(
3873 BuildMI(MBB, MI, DL, TII->get(AArch64::MRS))
3874 .addReg(NZCVSaveReg->freeRegister(), RegState::Define)
3875 .addImm(AArch64SysReg::NZCV)
3876 .addReg(AArch64::NZCV, RegState::Implicit)
3877 .getInstr());
3878
3879 // Reuse previous ptrue if we know it has not been clobbered.
3880 if (LastPTrue) {
3881 assert(*PredReg == LastPTrue->getOperand(0).getReg());
3882 LastPTrue->moveBefore(&MI);
3883 } else {
3884 LastPTrue = BuildMI(MBB, MI, DL, TII->get(AArch64::PTRUE_B))
3885 .addReg(*PredReg, RegState::Define)
3886 .addImm(31);
3887 }
3888 MachineInstrs.push_back(LastPTrue);
3889 MachineInstrs.push_back(
3890 BuildMI(MBB, MI, DL, TII->get(AArch64::CMPNE_PPzZI_B))
3891 .addReg(MI.getOperand(0).getReg(), RegState::Define)
3892 .addReg(*PredReg)
3893 .addReg(*ZPredReg)
3894 .addImm(0)
3895 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3896 .getInstr());
3897 if (IsNZCVUsed)
3898 MachineInstrs.push_back(BuildMI(MBB, MI, DL, TII->get(AArch64::MSR))
3899 .addImm(AArch64SysReg::NZCV)
3900 .addReg(NZCVSaveReg->freeRegister())
3901 .addReg(AArch64::NZCV, RegState::ImplicitDefine)
3902 .getInstr());
3903
3904 propagateFrameFlags(MI, MachineInstrs);
3905 return PredReg.hasSpilled();
3906}
3907
3908/// Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO
3909/// operations within the MachineBasicBlock \p MBB.
3911 const TargetRegisterInfo &TRI,
3912 ScavengeableRegs const &SR,
3913 EmergencyStackSlots &SpillSlots) {
3914 LiveRegUnits UsedRegs(TRI);
3915 UsedRegs.addLiveOuts(MBB);
3916 bool HasPPRSpills = false;
3917 MachineInstr *LastPTrue = nullptr;
3919 UsedRegs.stepBackward(MI);
3920 switch (MI.getOpcode()) {
3921 case AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO:
3922 if (LastPTrue &&
3923 MI.definesRegister(LastPTrue->getOperand(0).getReg(), &TRI))
3924 LastPTrue = nullptr;
3925 HasPPRSpills |= expandFillPPRFromZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR,
3926 LastPTrue, SpillSlots);
3927 MI.eraseFromParent();
3928 break;
3929 case AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO:
3930 expandSpillPPRToZPRSlotPseudo(MBB, MI, TRI, UsedRegs, SR, SpillSlots);
3931 MI.eraseFromParent();
3932 [[fallthrough]];
3933 default:
3934 LastPTrue = nullptr;
3935 break;
3936 }
3937 }
3938
3939 return HasPPRSpills;
3940}
3941
3943 MachineFunction &MF, RegScavenger *RS) const {
3944
3946 const TargetSubtargetInfo &TSI = MF.getSubtarget();
3947 const TargetRegisterInfo &TRI = *TSI.getRegisterInfo();
3948
3949 // If predicates spills are 16-bytes we may need to expand
3950 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO/FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
3951 if (AFI->hasStackFrame() && TRI.getSpillSize(AArch64::PPRRegClass) == 16) {
3952 auto ComputeScavengeableRegisters = [&](unsigned RegClassID) {
3953 BitVector Regs = TRI.getAllocatableSet(MF, TRI.getRegClass(RegClassID));
3954 assert(Regs.count() > 0 && "Expected scavengeable registers");
3955 return Regs;
3956 };
3957
3958 ScavengeableRegs SR{};
3959 SR.ZPRRegs = ComputeScavengeableRegisters(AArch64::ZPRRegClassID);
3960 // Only p0-7 are possible as the second operand of cmpne (needed for fills).
3961 SR.PPR3bRegs = ComputeScavengeableRegisters(AArch64::PPR_3bRegClassID);
3962 SR.GPRRegs = ComputeScavengeableRegisters(AArch64::GPR64RegClassID);
3963
3964 EmergencyStackSlots SpillSlots;
3965 for (MachineBasicBlock &MBB : MF) {
3966 // In the case we had to spill a predicate (in the range p0-p7) to reload
3967 // a predicate (>= p8), additional spill/fill pseudos will be created.
3968 // These need an additional expansion pass. Note: There will only be at
3969 // most two expansion passes, as spilling/filling a predicate in the range
3970 // p0-p7 never requires spilling another predicate.
3971 for (int Pass = 0; Pass < 2; Pass++) {
3972 bool HasPPRSpills =
3973 expandSMEPPRToZPRSpillPseudos(MBB, TRI, SR, SpillSlots);
3974 assert((Pass == 0 || !HasPPRSpills) && "Did not expect PPR spills");
3975 if (!HasPPRSpills)
3976 break;
3977 }
3978 }
3979 }
3980
3981 MachineFrameInfo &MFI = MF.getFrameInfo();
3982
3984 "Upwards growing stack unsupported");
3985
3986 int MinCSFrameIndex, MaxCSFrameIndex;
3987 int64_t SVEStackSize =
3988 assignSVEStackObjectOffsets(MFI, MinCSFrameIndex, MaxCSFrameIndex);
3989
3990 AFI->setStackSizeSVE(alignTo(SVEStackSize, 16U));
3991 AFI->setMinMaxSVECSFrameIndex(MinCSFrameIndex, MaxCSFrameIndex);
3992
3993 // If this function isn't doing Win64-style C++ EH, we don't need to do
3994 // anything.
3995 if (!MF.hasEHFunclets())
3996 return;
3997
3998 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3999 // object area right next to the UnwindHelp object.
4000 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
4001 int64_t CurrentOffset =
4003 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
4004 for (WinEHHandlerType &H : TBME.HandlerArray) {
4005 int FrameIndex = H.CatchObj.FrameIndex;
4006 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
4007 CurrentOffset =
4008 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
4009 CurrentOffset += MFI.getObjectSize(FrameIndex);
4010 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
4011 }
4012 }
4013 }
4014
4015 // Create an UnwindHelp object.
4016 // The UnwindHelp object is allocated at the start of the fixed object area
4017 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
4018 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
4019 /*IsFunclet*/ false) &&
4020 "UnwindHelpOffset must be at the start of the fixed object area");
4021 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
4022 /*IsImmutable=*/false);
4023 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
4024
4025 MachineBasicBlock &MBB = MF.front();
4026 auto MBBI = MBB.begin();
4027 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
4028 ++MBBI;
4029
4030 // We need to store -2 into the UnwindHelp object at the start of the
4031 // function.
4032 DebugLoc DL;
4034 RS->backward(MBBI);
4035 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
4036 assert(DstReg && "There must be a free register after frame setup");
4038 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
4039 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
4040 .addReg(DstReg, getKillRegState(true))
4041 .addFrameIndex(UnwindHelpFI)
4042 .addImm(0);
4043}
4044
4045namespace {
4046struct TagStoreInstr {
4048 int64_t Offset, Size;
4049 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
4050 : MI(MI), Offset(Offset), Size(Size) {}
4051};
4052
4053class TagStoreEdit {
4054 MachineFunction *MF;
4055 MachineBasicBlock *MBB;
4056 MachineRegisterInfo *MRI;
4057 // Tag store instructions that are being replaced.
4059 // Combined memref arguments of the above instructions.
4061
4062 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
4063 // FrameRegOffset + Size) with the address tag of SP.
4064 Register FrameReg;
4065 StackOffset FrameRegOffset;
4066 int64_t Size;
4067 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
4068 // end.
4069 std::optional<int64_t> FrameRegUpdate;
4070 // MIFlags for any FrameReg updating instructions.
4071 unsigned FrameRegUpdateFlags;
4072
4073 // Use zeroing instruction variants.
4074 bool ZeroData;
4075 DebugLoc DL;
4076
4077 void emitUnrolled(MachineBasicBlock::iterator InsertI);
4078 void emitLoop(MachineBasicBlock::iterator InsertI);
4079
4080public:
4081 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
4082 : MBB(MBB), ZeroData(ZeroData) {
4083 MF = MBB->getParent();
4084 MRI = &MF->getRegInfo();
4085 }
4086 // Add an instruction to be replaced. Instructions must be added in the
4087 // ascending order of Offset, and have to be adjacent.
4088 void addInstruction(TagStoreInstr I) {
4089 assert((TagStores.empty() ||
4090 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
4091 "Non-adjacent tag store instructions.");
4092 TagStores.push_back(I);
4093 }
4094 void clear() { TagStores.clear(); }
4095 // Emit equivalent code at the given location, and erase the current set of
4096 // instructions. May skip if the replacement is not profitable. May invalidate
4097 // the input iterator and replace it with a valid one.
4098 void emitCode(MachineBasicBlock::iterator &InsertI,
4099 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
4100};
4101
4102void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
4103 const AArch64InstrInfo *TII =
4104 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4105
4106 const int64_t kMinOffset = -256 * 16;
4107 const int64_t kMaxOffset = 255 * 16;
4108
4109 Register BaseReg = FrameReg;
4110 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
4111 if (BaseRegOffsetBytes < kMinOffset ||
4112 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
4113 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
4114 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
4115 // is required for the offset of ST2G.
4116 BaseRegOffsetBytes % 16 != 0) {
4117 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4118 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
4119 StackOffset::getFixed(BaseRegOffsetBytes), TII);
4120 BaseReg = ScratchReg;
4121 BaseRegOffsetBytes = 0;
4122 }
4123
4124 MachineInstr *LastI = nullptr;
4125 while (Size) {
4126 int64_t InstrSize = (Size > 16) ? 32 : 16;
4127 unsigned Opcode =
4128 InstrSize == 16
4129 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
4130 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
4131 assert(BaseRegOffsetBytes % 16 == 0);
4132 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
4133 .addReg(AArch64::SP)
4134 .addReg(BaseReg)
4135 .addImm(BaseRegOffsetBytes / 16)
4136 .setMemRefs(CombinedMemRefs);
4137 // A store to [BaseReg, #0] should go last for an opportunity to fold the
4138 // final SP adjustment in the epilogue.
4139 if (BaseRegOffsetBytes == 0)
4140 LastI = I;
4141 BaseRegOffsetBytes += InstrSize;
4142 Size -= InstrSize;
4143 }
4144
4145 if (LastI)
4146 MBB->splice(InsertI, MBB, LastI);
4147}
4148
4149void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
4150 const AArch64InstrInfo *TII =
4151 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
4152
4153 Register BaseReg = FrameRegUpdate
4154 ? FrameReg
4155 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4156 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
4157
4158 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
4159
4160 int64_t LoopSize = Size;
4161 // If the loop size is not a multiple of 32, split off one 16-byte store at
4162 // the end to fold BaseReg update into.
4163 if (FrameRegUpdate && *FrameRegUpdate)
4164 LoopSize -= LoopSize % 32;
4165 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
4166 TII->get(ZeroData ? AArch64::STZGloop_wback
4167 : AArch64::STGloop_wback))
4168 .addDef(SizeReg)
4169 .addDef(BaseReg)
4170 .addImm(LoopSize)
4171 .addReg(BaseReg)
4172 .setMemRefs(CombinedMemRefs);
4173 if (FrameRegUpdate)
4174 LoopI->setFlags(FrameRegUpdateFlags);
4175
4176 int64_t ExtraBaseRegUpdate =
4177 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
4178 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
4179 << ", Size=" << Size
4180 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
4181 << ", FrameRegUpdate=" << FrameRegUpdate
4182 << ", FrameRegOffset.getFixed()="
4183 << FrameRegOffset.getFixed() << "\n");
4184 if (LoopSize < Size) {
4185 assert(FrameRegUpdate);
4186 assert(Size - LoopSize == 16);
4187 // Tag 16 more bytes at BaseReg and update BaseReg.
4188 int64_t STGOffset = ExtraBaseRegUpdate + 16;
4189 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
4190 "STG immediate out of range");
4191 BuildMI(*MBB, InsertI, DL,
4192 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
4193 .addDef(BaseReg)
4194 .addReg(BaseReg)
4195 .addReg(BaseReg)
4196 .addImm(STGOffset / 16)
4197 .setMemRefs(CombinedMemRefs)
4198 .setMIFlags(FrameRegUpdateFlags);
4199 } else if (ExtraBaseRegUpdate) {
4200 // Update BaseReg.
4201 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
4202 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
4203 BuildMI(
4204 *MBB, InsertI, DL,
4205 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
4206 .addDef(BaseReg)
4207 .addReg(BaseReg)
4208 .addImm(AddSubOffset)
4209 .addImm(0)
4210 .setMIFlags(FrameRegUpdateFlags);
4211 }
4212}
4213
4214// Check if *II is a register update that can be merged into STGloop that ends
4215// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
4216// end of the loop.
4217bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
4218 int64_t Size, int64_t *TotalOffset) {
4219 MachineInstr &MI = *II;
4220 if ((MI.getOpcode() == AArch64::ADDXri ||
4221 MI.getOpcode() == AArch64::SUBXri) &&
4222 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
4223 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
4224 int64_t Offset = MI.getOperand(2).getImm() << Shift;
4225 if (MI.getOpcode() == AArch64::SUBXri)
4226 Offset = -Offset;
4227 int64_t PostOffset = Offset - Size;
4228 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
4229 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
4230 // chosen depends on the alignment of the loop size, but the difference
4231 // between the valid ranges for the two instructions is small, so we
4232 // conservatively assume that it could be either case here.
4233 //
4234 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
4235 // instruction.
4236 const int64_t kMaxOffset = 4080 - 16;
4237 // Max offset of SUBXri.
4238 const int64_t kMinOffset = -4095;
4239 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
4240 PostOffset % 16 == 0) {
4241 *TotalOffset = Offset;
4242 return true;
4243 }
4244 }
4245 return false;
4246}
4247
4248void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
4250 MemRefs.clear();
4251 for (auto &TS : TSE) {
4252 MachineInstr *MI = TS.MI;
4253 // An instruction without memory operands may access anything. Be
4254 // conservative and return an empty list.
4255 if (MI->memoperands_empty()) {
4256 MemRefs.clear();
4257 return;
4258 }
4259 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
4260 }
4261}
4262
4263void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
4264 const AArch64FrameLowering *TFI,
4265 bool TryMergeSPUpdate) {
4266 if (TagStores.empty())
4267 return;
4268 TagStoreInstr &FirstTagStore = TagStores[0];
4269 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
4270 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
4271 DL = TagStores[0].MI->getDebugLoc();
4272
4273 Register Reg;
4274 FrameRegOffset = TFI->resolveFrameOffsetReference(
4275 *MF, FirstTagStore.Offset, false /*isFixed*/, false /*isSVE*/, Reg,
4276 /*PreferFP=*/false, /*ForSimm=*/true);
4277 FrameReg = Reg;
4278 FrameRegUpdate = std::nullopt;
4279
4280 mergeMemRefs(TagStores, CombinedMemRefs);
4281
4282 LLVM_DEBUG({
4283 dbgs() << "Replacing adjacent STG instructions:\n";
4284 for (const auto &Instr : TagStores) {
4285 dbgs() << " " << *Instr.MI;
4286 }
4287 });
4288
4289 // Size threshold where a loop becomes shorter than a linear sequence of
4290 // tagging instructions.
4291 const int kSetTagLoopThreshold = 176;
4292 if (Size < kSetTagLoopThreshold) {
4293 if (TagStores.size() < 2)
4294 return;
4295 emitUnrolled(InsertI);
4296 } else {
4297 MachineInstr *UpdateInstr = nullptr;
4298 int64_t TotalOffset = 0;
4299 if (TryMergeSPUpdate) {
4300 // See if we can merge base register update into the STGloop.
4301 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
4302 // but STGloop is way too unusual for that, and also it only
4303 // realistically happens in function epilogue. Also, STGloop is expanded
4304 // before that pass.
4305 if (InsertI != MBB->end() &&
4306 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
4307 &TotalOffset)) {
4308 UpdateInstr = &*InsertI++;
4309 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
4310 << *UpdateInstr);
4311 }
4312 }
4313
4314 if (!UpdateInstr && TagStores.size() < 2)
4315 return;
4316
4317 if (UpdateInstr) {
4318 FrameRegUpdate = TotalOffset;
4319 FrameRegUpdateFlags = UpdateInstr->getFlags();
4320 }
4321 emitLoop(InsertI);
4322 if (UpdateInstr)
4323 UpdateInstr->eraseFromParent();
4324 }
4325
4326 for (auto &TS : TagStores)
4327 TS.MI->eraseFromParent();
4328}
4329
4330bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
4331 int64_t &Size, bool &ZeroData) {
4332 MachineFunction &MF = *MI.getParent()->getParent();
4333 const MachineFrameInfo &MFI = MF.getFrameInfo();
4334
4335 unsigned Opcode = MI.getOpcode();
4336 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
4337 Opcode == AArch64::STZ2Gi);
4338
4339 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
4340 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
4341 return false;
4342 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
4343 return false;
4344 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
4345 Size = MI.getOperand(2).getImm();
4346 return true;
4347 }
4348
4349 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
4350 Size = 16;
4351 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
4352 Size = 32;
4353 else
4354 return false;
4355
4356 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
4357 return false;
4358
4359 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
4360 16 * MI.getOperand(2).getImm();
4361 return true;
4362}
4363
4364// Detect a run of memory tagging instructions for adjacent stack frame slots,
4365// and replace them with a shorter instruction sequence:
4366// * replace STG + STG with ST2G
4367// * replace STGloop + STGloop with STGloop
4368// This code needs to run when stack slot offsets are already known, but before
4369// FrameIndex operands in STG instructions are eliminated.
4371 const AArch64FrameLowering *TFI,
4372 RegScavenger *RS) {
4373 bool FirstZeroData;
4374 int64_t Size, Offset;
4375 MachineInstr &MI = *II;
4376 MachineBasicBlock *MBB = MI.getParent();
4378 if (&MI == &MBB->instr_back())
4379 return II;
4380 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
4381 return II;
4382
4384 Instrs.emplace_back(&MI, Offset, Size);
4385
4386 constexpr int kScanLimit = 10;
4387 int Count = 0;
4389 NextI != E && Count < kScanLimit; ++NextI) {
4390 MachineInstr &MI = *NextI;
4391 bool ZeroData;
4392 int64_t Size, Offset;
4393 // Collect instructions that update memory tags with a FrameIndex operand
4394 // and (when applicable) constant size, and whose output registers are dead
4395 // (the latter is almost always the case in practice). Since these
4396 // instructions effectively have no inputs or outputs, we are free to skip
4397 // any non-aliasing instructions in between without tracking used registers.
4398 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
4399 if (ZeroData != FirstZeroData)
4400 break;
4401 Instrs.emplace_back(&MI, Offset, Size);
4402 continue;
4403 }
4404
4405 // Only count non-transient, non-tagging instructions toward the scan
4406 // limit.
4407 if (!MI.isTransient())
4408 ++Count;
4409
4410 // Just in case, stop before the epilogue code starts.
4411 if (MI.getFlag(MachineInstr::FrameSetup) ||
4413 break;
4414
4415 // Reject anything that may alias the collected instructions.
4416 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
4417 break;
4418 }
4419
4420 // New code will be inserted after the last tagging instruction we've found.
4421 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
4422
4423 // All the gathered stack tag instructions are merged and placed after
4424 // last tag store in the list. The check should be made if the nzcv
4425 // flag is live at the point where we are trying to insert. Otherwise
4426 // the nzcv flag might get clobbered if any stg loops are present.
4427
4428 // FIXME : This approach of bailing out from merge is conservative in
4429 // some ways like even if stg loops are not present after merge the
4430 // insert list, this liveness check is done (which is not needed).
4432 LiveRegs.addLiveOuts(*MBB);
4433 for (auto I = MBB->rbegin();; ++I) {
4434 MachineInstr &MI = *I;
4435 if (MI == InsertI)
4436 break;
4437 LiveRegs.stepBackward(*I);
4438 }
4439 InsertI++;
4440 if (LiveRegs.contains(AArch64::NZCV))
4441 return InsertI;
4442
4443 llvm::stable_sort(Instrs,
4444 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
4445 return Left.Offset < Right.Offset;
4446 });
4447
4448 // Make sure that we don't have any overlapping stores.
4449 int64_t CurOffset = Instrs[0].Offset;
4450 for (auto &Instr : Instrs) {
4451 if (CurOffset > Instr.Offset)
4452 return NextI;
4453 CurOffset = Instr.Offset + Instr.Size;
4454 }
4455
4456 // Find contiguous runs of tagged memory and emit shorter instruction
4457 // sequences for them when possible.
4458 TagStoreEdit TSE(MBB, FirstZeroData);
4459 std::optional<int64_t> EndOffset;
4460 for (auto &Instr : Instrs) {
4461 if (EndOffset && *EndOffset != Instr.Offset) {
4462 // Found a gap.
4463 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
4464 TSE.clear();
4465 }
4466
4467 TSE.addInstruction(Instr);
4468 EndOffset = Instr.Offset + Instr.Size;
4469 }
4470
4471 const MachineFunction *MF = MBB->getParent();
4472 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
4473 TSE.emitCode(
4474 InsertI, TFI, /*TryMergeSPUpdate = */
4476
4477 return InsertI;
4478}
4479} // namespace
4480
4482 MachineFunction &MF, RegScavenger *RS = nullptr) const {
4483 for (auto &BB : MF)
4484 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
4486 II = tryMergeAdjacentSTG(II, this, RS);
4487 }
4488
4489 // By the time this method is called, most of the prologue/epilogue code is
4490 // already emitted, whether its location was affected by the shrink-wrapping
4491 // optimization or not.
4492 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
4493 shouldSignReturnAddressEverywhere(MF))
4495}
4496
4497/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
4498/// before the update. This is easily retrieved as it is exactly the offset
4499/// that is set in processFunctionBeforeFrameFinalized.
4501 const MachineFunction &MF, int FI, Register &FrameReg,
4502 bool IgnoreSPUpdates) const {
4503 const MachineFrameInfo &MFI = MF.getFrameInfo();
4504 if (IgnoreSPUpdates) {
4505 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
4506 << MFI.getObjectOffset(FI) << "\n");
4507 FrameReg = AArch64::SP;
4508 return StackOffset::getFixed(MFI.getObjectOffset(FI));
4509 }
4510
4511 // Go to common code if we cannot provide sp + offset.
4512 if (MFI.hasVarSizedObjects() ||
4515 return getFrameIndexReference(MF, FI, FrameReg);
4516
4517 FrameReg = AArch64::SP;
4518 return getStackOffset(MF, MFI.getObjectOffset(FI));
4519}
4520
4521/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
4522/// the parent's frame pointer
4524 const MachineFunction &MF) const {
4525 return 0;
4526}
4527
4528/// Funclets only need to account for space for the callee saved registers,
4529/// as the locals are accounted for in the parent's stack frame.
4531 const MachineFunction &MF) const {
4532 // This is the size of the pushed CSRs.
4533 unsigned CSSize =
4534 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
4535 // This is the amount of stack a funclet needs to allocate.
4536 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
4537 getStackAlign());
4538}
4539
4540namespace {
4541struct FrameObject {
4542 bool IsValid = false;
4543 // Index of the object in MFI.
4544 int ObjectIndex = 0;
4545 // Group ID this object belongs to.
4546 int GroupIndex = -1;
4547 // This object should be placed first (closest to SP).
4548 bool ObjectFirst = false;
4549 // This object's group (which always contains the object with
4550 // ObjectFirst==true) should be placed first.
4551 bool GroupFirst = false;
4552
4553 // Used to distinguish between FP and GPR accesses. The values are decided so
4554 // that they sort FPR < Hazard < GPR and they can be or'd together.
4555 unsigned Accesses = 0;
4556 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
4557};
4558
4559class GroupBuilder {
4560 SmallVector<int, 8> CurrentMembers;
4561 int NextGroupIndex = 0;
4562 std::vector<FrameObject> &Objects;
4563
4564public:
4565 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
4566 void AddMember(int Index) { CurrentMembers.push_back(Index); }
4567 void EndCurrentGroup() {
4568 if (CurrentMembers.size() > 1) {
4569 // Create a new group with the current member list. This might remove them
4570 // from their pre-existing groups. That's OK, dealing with overlapping
4571 // groups is too hard and unlikely to make a difference.
4572 LLVM_DEBUG(dbgs() << "group:");
4573 for (int Index : CurrentMembers) {
4574 Objects[Index].GroupIndex = NextGroupIndex;
4575 LLVM_DEBUG(dbgs() << " " << Index);
4576 }
4577 LLVM_DEBUG(dbgs() << "\n");
4578 NextGroupIndex++;
4579 }
4580 CurrentMembers.clear();
4581 }
4582};
4583
4584bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
4585 // Objects at a lower index are closer to FP; objects at a higher index are
4586 // closer to SP.
4587 //
4588 // For consistency in our comparison, all invalid objects are placed
4589 // at the end. This also allows us to stop walking when we hit the
4590 // first invalid item after it's all sorted.
4591 //
4592 // If we want to include a stack hazard region, order FPR accesses < the
4593 // hazard object < GPRs accesses in order to create a separation between the
4594 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
4595 //
4596 // Otherwise the "first" object goes first (closest to SP), followed by the
4597 // members of the "first" group.
4598 //
4599 // The rest are sorted by the group index to keep the groups together.
4600 // Higher numbered groups are more likely to be around longer (i.e. untagged
4601 // in the function epilogue and not at some earlier point). Place them closer
4602 // to SP.
4603 //
4604 // If all else equal, sort by the object index to keep the objects in the
4605 // original order.
4606 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
4607 A.GroupIndex, A.ObjectIndex) <
4608 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
4609 B.GroupIndex, B.ObjectIndex);
4610}
4611} // namespace
4612
4614 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
4615 if (!OrderFrameObjects || ObjectsToAllocate.empty())
4616 return;
4617
4619 const MachineFrameInfo &MFI = MF.getFrameInfo();
4620 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
4621 for (auto &Obj : ObjectsToAllocate) {
4622 FrameObjects[Obj].IsValid = true;
4623 FrameObjects[Obj].ObjectIndex = Obj;
4624 }
4625
4626 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
4627 // the same time.
4628 GroupBuilder GB(FrameObjects);
4629 for (auto &MBB : MF) {
4630 for (auto &MI : MBB) {
4631 if (MI.isDebugInstr())
4632 continue;
4633
4634 if (AFI.hasStackHazardSlotIndex()) {
4635 std::optional<int> FI = getLdStFrameID(MI, MFI);
4636 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
4637 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
4639 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
4640 else
4641 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
4642 }
4643 }
4644
4645 int OpIndex;
4646 switch (MI.getOpcode()) {
4647 case AArch64::STGloop:
4648 case AArch64::STZGloop:
4649 OpIndex = 3;
4650 break;
4651 case AArch64::STGi:
4652 case AArch64::STZGi:
4653 case AArch64::ST2Gi:
4654 case AArch64::STZ2Gi:
4655 OpIndex = 1;
4656 break;
4657 default:
4658 OpIndex = -1;
4659 }
4660
4661 int TaggedFI = -1;
4662 if (OpIndex >= 0) {
4663 const MachineOperand &MO = MI.getOperand(OpIndex);
4664 if (MO.isFI()) {
4665 int FI = MO.getIndex();
4666 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
4667 FrameObjects[FI].IsValid)
4668 TaggedFI = FI;
4669 }
4670 }
4671
4672 // If this is a stack tagging instruction for a slot that is not part of a
4673 // group yet, either start a new group or add it to the current one.
4674 if (TaggedFI >= 0)
4675 GB.AddMember(TaggedFI);
4676 else
4677 GB.EndCurrentGroup();
4678 }
4679 // Groups should never span multiple basic blocks.
4680 GB.EndCurrentGroup();
4681 }
4682
4683 if (AFI.hasStackHazardSlotIndex()) {
4684 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
4685 FrameObject::AccessHazard;
4686 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
4687 for (auto &Obj : FrameObjects)
4688 if (!Obj.Accesses ||
4689 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
4690 Obj.Accesses = FrameObject::AccessGPR;
4691 }
4692
4693 // If the function's tagged base pointer is pinned to a stack slot, we want to
4694 // put that slot first when possible. This will likely place it at SP + 0,
4695 // and save one instruction when generating the base pointer because IRG does
4696 // not allow an immediate offset.
4697 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
4698 if (TBPI) {
4699 FrameObjects[*TBPI].ObjectFirst = true;
4700 FrameObjects[*TBPI].GroupFirst = true;
4701 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
4702 if (FirstGroupIndex >= 0)
4703 for (FrameObject &Object : FrameObjects)
4704 if (Object.GroupIndex == FirstGroupIndex)
4705 Object.GroupFirst = true;
4706 }
4707
4708 llvm::stable_sort(FrameObjects, FrameObjectCompare);
4709
4710 int i = 0;
4711 for (auto &Obj : FrameObjects) {
4712 // All invalid items are sorted at the end, so it's safe to stop.
4713 if (!Obj.IsValid)
4714 break;
4715 ObjectsToAllocate[i++] = Obj.ObjectIndex;
4716 }
4717
4718 LLVM_DEBUG({
4719 dbgs() << "Final frame order:\n";
4720 for (auto &Obj : FrameObjects) {
4721 if (!Obj.IsValid)
4722 break;
4723 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
4724 if (Obj.ObjectFirst)
4725 dbgs() << ", first";
4726 if (Obj.GroupFirst)
4727 dbgs() << ", group-first";
4728 dbgs() << "\n";
4729 }
4730 });
4731}
4732
4733/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
4734/// least every ProbeSize bytes. Returns an iterator of the first instruction
4735/// after the loop. The difference between SP and TargetReg must be an exact
4736/// multiple of ProbeSize.
4738AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
4739 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
4740 Register TargetReg) const {
4741 MachineBasicBlock &MBB = *MBBI->getParent();
4742 MachineFunction &MF = *MBB.getParent();
4743 const AArch64InstrInfo *TII =
4744 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4745 DebugLoc DL = MBB.findDebugLoc(MBBI);
4746
4747 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
4748 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4749 MF.insert(MBBInsertPoint, LoopMBB);
4750 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
4751 MF.insert(MBBInsertPoint, ExitMBB);
4752
4753 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
4754 // in SUB).
4755 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
4756 StackOffset::getFixed(-ProbeSize), TII,
4758 // STR XZR, [SP]
4759 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
4760 .addReg(AArch64::XZR)
4761 .addReg(AArch64::SP)
4762 .addImm(0)
4764 // CMP SP, TargetReg
4765 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
4766 AArch64::XZR)
4767 .addReg(AArch64::SP)
4768 .addReg(TargetReg)
4771 // B.CC Loop
4772 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
4774 .addMBB(LoopMBB)
4776
4777 LoopMBB->addSuccessor(ExitMBB);
4778 LoopMBB->addSuccessor(LoopMBB);
4779 // Synthesize the exit MBB.
4780 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
4782 MBB.addSuccessor(LoopMBB);
4783 // Update liveins.
4784 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
4785
4786 return ExitMBB->begin();
4787}
4788
4789void AArch64FrameLowering::inlineStackProbeFixed(
4790 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
4791 StackOffset CFAOffset) const {
4792 MachineBasicBlock *MBB = MBBI->getParent();
4793 MachineFunction &MF = *MBB->getParent();
4794 const AArch64InstrInfo *TII =
4795 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
4796 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
4797 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
4798 bool HasFP = hasFP(MF);
4799
4800 DebugLoc DL;
4801 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
4802 int64_t NumBlocks = FrameSize / ProbeSize;
4803 int64_t ResidualSize = FrameSize % ProbeSize;
4804
4805 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
4806 << NumBlocks << " blocks of " << ProbeSize
4807 << " bytes, plus " << ResidualSize << " bytes\n");
4808
4809 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
4810 // ordinary loop.
4811 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
4812 for (int i = 0; i < NumBlocks; ++i) {
4813 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
4814 // encodable in a SUB).
4815 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4816 StackOffset::getFixed(-ProbeSize), TII,
4817 MachineInstr::FrameSetup, false, false, nullptr,
4818 EmitAsyncCFI && !HasFP, CFAOffset);
4819 CFAOffset += StackOffset::getFixed(ProbeSize);
4820 // STR XZR, [SP]
4821 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4822 .addReg(AArch64::XZR)
4823 .addReg(AArch64::SP)
4824 .addImm(0)
4826 }
4827 } else if (NumBlocks != 0) {
4828 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
4829 // encodable in ADD). ScrathReg may temporarily become the CFA register.
4830 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
4831 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
4832 MachineInstr::FrameSetup, false, false, nullptr,
4833 EmitAsyncCFI && !HasFP, CFAOffset);
4834 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
4835 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
4836 MBB = MBBI->getParent();
4837 if (EmitAsyncCFI && !HasFP) {
4838 // Set the CFA register back to SP.
4839 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
4840 .buildDefCFARegister(AArch64::SP);
4841 }
4842 }
4843
4844 if (ResidualSize != 0) {
4845 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
4846 // in SUB).
4847 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
4848 StackOffset::getFixed(-ResidualSize), TII,
4849 MachineInstr::FrameSetup, false, false, nullptr,
4850 EmitAsyncCFI && !HasFP, CFAOffset);
4851 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
4852 // STR XZR, [SP]
4853 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
4854 .addReg(AArch64::XZR)
4855 .addReg(AArch64::SP)
4856 .addImm(0)
4858 }
4859 }
4860}
4861
4862void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
4863 MachineBasicBlock &MBB) const {
4864 // Get the instructions that need to be replaced. We emit at most two of
4865 // these. Remember them in order to avoid complications coming from the need
4866 // to traverse the block while potentially creating more blocks.
4867 SmallVector<MachineInstr *, 4> ToReplace;
4868 for (MachineInstr &MI : MBB)
4869 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
4870 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
4871 ToReplace.push_back(&MI);
4872
4873 for (MachineInstr *MI : ToReplace) {
4874 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
4875 Register ScratchReg = MI->getOperand(0).getReg();
4876 int64_t FrameSize = MI->getOperand(1).getImm();
4877 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
4878 MI->getOperand(3).getImm());
4879 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
4880 CFAOffset);
4881 } else {
4882 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
4883 "Stack probe pseudo-instruction expected");
4884 const AArch64InstrInfo *TII =
4885 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
4886 Register TargetReg = MI->getOperand(0).getReg();
4887 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
4888 }
4889 MI->eraseFromParent();
4890 }
4891}
4892
4895 NotAccessed = 0, // Stack object not accessed by load/store instructions.
4896 GPR = 1 << 0, // A general purpose register.
4897 PPR = 1 << 1, // A predicate register.
4898 FPR = 1 << 2, // A floating point/Neon/SVE register.
4899 };
4900
4901 int Idx;
4903 int64_t Size;
4904 unsigned AccessTypes;
4905
4907
4908 bool operator<(const StackAccess &Rhs) const {
4909 return std::make_tuple(start(), Idx) <
4910 std::make_tuple(Rhs.start(), Rhs.Idx);
4911 }
4912
4913 bool isCPU() const {
4914 // Predicate register load and store instructions execute on the CPU.
4916 }
4917 bool isSME() const { return AccessTypes & AccessType::FPR; }
4918 bool isMixed() const { return isCPU() && isSME(); }
4919
4920 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
4921 int64_t end() const { return start() + Size; }
4922
4923 std::string getTypeString() const {
4924 switch (AccessTypes) {
4925 case AccessType::FPR:
4926 return "FPR";
4927 case AccessType::PPR:
4928 return "PPR";
4929 case AccessType::GPR:
4930 return "GPR";
4932 return "NA";
4933 default:
4934 return "Mixed";
4935 }
4936 }
4937
4938 void print(raw_ostream &OS) const {
4939 OS << getTypeString() << " stack object at [SP"
4940 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
4941 if (Offset.getScalable())
4942 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
4943 << " * vscale";
4944 OS << "]";
4945 }
4946};
4947
4948static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4949 SA.print(OS);
4950 return OS;
4951}
4952
4953void AArch64FrameLowering::emitRemarks(
4954 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4955
4956 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4958 return;
4959
4960 unsigned StackHazardSize = getStackHazardSize(MF);
4961 const uint64_t HazardSize =
4962 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4963
4964 if (HazardSize == 0)
4965 return;
4966
4967 const MachineFrameInfo &MFI = MF.getFrameInfo();
4968 // Bail if function has no stack objects.
4969 if (!MFI.hasStackObjects())
4970 return;
4971
4972 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4973
4974 size_t NumFPLdSt = 0;
4975 size_t NumNonFPLdSt = 0;
4976
4977 // Collect stack accesses via Load/Store instructions.
4978 for (const MachineBasicBlock &MBB : MF) {
4979 for (const MachineInstr &MI : MBB) {
4980 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4981 continue;
4982 for (MachineMemOperand *MMO : MI.memoperands()) {
4983 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4984 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4985 int FrameIdx = *FI;
4986
4987 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4988 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4989 StackAccesses[ArrIdx].Idx = FrameIdx;
4990 StackAccesses[ArrIdx].Offset =
4991 getFrameIndexReferenceFromSP(MF, FrameIdx);
4992 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4993 }
4994
4995 unsigned RegTy = StackAccess::AccessType::GPR;
4996 if (MFI.getStackID(FrameIdx) == TargetStackID::ScalableVector) {
4997 // SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO
4998 // spill/fill the predicate as a data vector (so are an FPR access).
4999 if (MI.getOpcode() != AArch64::SPILL_PPR_TO_ZPR_SLOT_PSEUDO &&
5000 MI.getOpcode() != AArch64::FILL_PPR_FROM_ZPR_SLOT_PSEUDO &&
5001 AArch64::PPRRegClass.contains(MI.getOperand(0).getReg())) {
5002 RegTy = StackAccess::PPR;
5003 } else
5004 RegTy = StackAccess::FPR;
5005 } else if (AArch64InstrInfo::isFpOrNEON(MI)) {
5006 RegTy = StackAccess::FPR;
5007 }
5008
5009 StackAccesses[ArrIdx].AccessTypes |= RegTy;
5010
5011 if (RegTy == StackAccess::FPR)
5012 ++NumFPLdSt;
5013 else
5014 ++NumNonFPLdSt;
5015 }
5016 }
5017 }
5018 }
5019
5020 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
5021 return;
5022
5023 llvm::sort(StackAccesses);
5024 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
5026 });
5027
5030
5031 if (StackAccesses.front().isMixed())
5032 MixedObjects.push_back(&StackAccesses.front());
5033
5034 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
5035 It != End; ++It) {
5036 const auto &First = *It;
5037 const auto &Second = *(It + 1);
5038
5039 if (Second.isMixed())
5040 MixedObjects.push_back(&Second);
5041
5042 if ((First.isSME() && Second.isCPU()) ||
5043 (First.isCPU() && Second.isSME())) {
5044 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
5045 if (Distance < HazardSize)
5046 HazardPairs.emplace_back(&First, &Second);
5047 }
5048 }
5049
5050 auto EmitRemark = [&](llvm::StringRef Str) {
5051 ORE->emit([&]() {
5052 auto R = MachineOptimizationRemarkAnalysis(
5053 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
5054 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
5055 });
5056 };
5057
5058 for (const auto &P : HazardPairs)
5059 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
5060
5061 for (const auto *Obj : MixedObjects)
5062 EmitRemark(
5063 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
5064}
unsigned const MachineRegisterInfo * MRI
static int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static void emitCalleeSavedRestores(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, bool SVE)
static Register tryScavengeRegister(LiveRegUnits const &UsedRegs, BitVector const &ScavengeableRegs, Register PreferredReg)
Attempts to scavenge a register from ScavengeableRegs given the used registers in UsedRegs.
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool isInPrologueOrEpilogue(const MachineInstr &MI)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static bool expandFillPPRFromZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, MachineInstr *&LastPTrue, EmergencyStackSlots &SpillSlots)
Expands:
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
static bool matchLibcall(const TargetLowering &TLI, const MachineOperand &MO, RTLIB::Libcall LC)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static int64_t determineSVEStackObjectOffsets(MachineFrameInfo &MFI, int &MinCSFrameIndex, int &MaxCSFrameIndex, bool AssignOffsets)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static void fixupSEHOpcode(MachineBasicBlock::iterator MBBI, unsigned LocalStackSize)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static MachineBasicBlock::iterator InsertSEH(MachineBasicBlock::iterator MBBI, const TargetInstrInfo &TII, MachineInstr::MIFlag Flag)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool expandSMEPPRToZPRSpillPseudos(MachineBasicBlock &MBB, const TargetRegisterInfo &TRI, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands all FILL_PPR_FROM_ZPR_SLOT_PSEUDO and SPILL_PPR_TO_ZPR_SLOT_PSEUDO operations within the Mach...
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static void expandSpillPPRToZPRSlotPseudo(MachineBasicBlock &MBB, MachineInstr &MI, const TargetRegisterInfo &TRI, LiveRegUnits const &UsedRegs, ScavengeableRegs const &SR, EmergencyStackSlots &SpillSlots)
Expands:
static bool isTargetWindows(const MachineFunction &MF)
static void emitShadowCallStackEpilogue(const TargetInstrInfo &TII, MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, bool NeedsWinCFI)
static int64_t upperBound(StackOffset Size)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static bool isFuncletReturnInstr(const MachineInstr &MI)
static unsigned getStackHazardSize(const MachineFunction &MF)
static void propagateFrameFlags(MachineInstr &SourceMI, ArrayRef< MachineInstr * > MachineInstrs)
Propagates frame-setup/destroy flags from SourceMI to all instructions in MachineInstrs.
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter class, which is is used to emit the ...
aarch64 promote const
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
std::pair< Instruction::BinaryOps, Value * > OffsetOp
Find all possible pairs (BinOp, RHS) that BinOp V, RHS can be simplified.
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:119
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (calleesaves + spills).
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, bool isSVE, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
bool needsShadowCallStackPrologueEpilogue(MachineFunction &MF) const
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
void setMinMaxSVECSFrameIndex(int Min, int Max)
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:461
BitVector & reset()
Definition BitVector.h:392
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:162
BitVector & set()
Definition BitVector.h:351
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:140
Helper class for creating CFI instructions and inserting them into MIR.
void buildRestore(MCRegister Reg) const
void buildDefCFA(MCRegister Reg, int64_t Offset) const
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasOptSize() const
Optimize this function for size (-Os) or minimum size (-Oz).
Definition Function.h:706
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
A set of register units used to track register liveness.
bool available(MCRegister Reg) const
Returns true if no part of physical register Reg is live.
LLVM_ABI void stepBackward(const MachineInstr &MI)
Updates liveness when stepping backwards over the instruction MI.
LLVM_ABI void addLiveOuts(const MachineBasicBlock &MBB)
Adds registers living out of block MBB.
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
LLVM_ABI instr_iterator erase(instr_iterator I)
Remove an instruction from the instruction list and delete it.
reverse_iterator rbegin()
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
uint64_t getStackSize() const
Return the number of bytes that must be allocated to hold all of the fixed size frame objects.
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
Align getMaxAlign() const
Return the alignment in bytes that this function must be aligned to, which is greater than the defaul...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & add(const MachineOperand &MO) const
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addUse(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register use operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
MachineInstr * getInstr() const
If conversion operators fail, use this method to get the MachineInstr explicitly.
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
bool getFlag(MIFlag Flag) const
Return whether an MI flag is set.
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
const MachineOperand & getOperand(unsigned i) const
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI void moveBefore(MachineInstr *MovePos)
Move the instruction before MovePos.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
void setImm(int64_t immVal)
int64_t getImm() const
bool isSymbol() const
isSymbol - Tests if this is a MO_ExternalSymbol operand.
const char * getSymbolName() const
Register getReg() const
getReg - Returns the register number.
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Pass interface - Implemented by all 'passes'.
Definition Pass.h:99
void enterBasicBlockEnd(MachineBasicBlock &MBB)
Start tracking liveness from the end of basic block MBB.
Register FindUnusedReg(const TargetRegisterClass *RC) const
Find an unused register of the specified register class.
void backward()
Update internal register state and move MBB iterator backwards.
void addScavengingFrameIndex(int FI)
Add a scavenging frame index.
Wrapper class representing virtual and physical registers.
Definition Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:168
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:356
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
CallingConv::ID getLibcallCallingConv(RTLIB::Libcall Call) const
Get the CallingConv that should be used for the specified libcall.
const char * getLibcallName(RTLIB::Libcall Call) const
Get the libcall routine name for the specified libcall.
This class defines information used to lower LLVM code to legal SelectionDAG operators that the targe...
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
SwiftAsyncFramePointerMode SwiftAsyncFramePointer
Control when and how the Swift async frame pointer bit should be set.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
TargetSubtargetInfo - Generic base class for all target subtargets.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
virtual const TargetLowering * getTargetLowering() const
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
static constexpr TypeSize getFixed(ScalarTy ExactSize)
Definition TypeSize.h:343
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
static uint64_t encodeLogicalImmediate(uint64_t imm, unsigned regSize)
encodeLogicalImmediate - Return the encoded immediate value for a logical immediate instruction of th...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ Implicit
Not emitted register (e.g. carry, or temporary result).
@ Define
Register definition.
@ Kill
The last use of a register.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2060
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:649
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:646
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:759
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1734
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:420
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1652
@ DeploymentBased
Determine whether to set the bit statically or dynamically based on the deployment target.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ Success
The lock was released successfully.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:155
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1760
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2122
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:853
Emergency stack slots for expanding SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO.
std::optional< int > PPRSpillFI
std::optional< int > GPRSpillFI
std::optional< int > ZPRSpillFI
Registers available for scavenging (ZPR, PPR3b, GPR).
RAII helper class for scavenging or spilling a register.
ScopedScavengeOrSpill(ScopedScavengeOrSpill &&)=delete
ScopedScavengeOrSpill(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, Register SpillCandidate, const TargetRegisterClass &RC, LiveRegUnits const &UsedRegs, BitVector const &AllocatableRegs, std::optional< int > *MaybeSpillFI, Register PreferredReg=AArch64::NoRegister)
Register freeRegister() const
Returns the free register (found from scavenging or spilling a register).
ScopedScavengeOrSpill(const ScopedScavengeOrSpill &)=delete
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:85
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray