LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386/// Returns true if a homogeneous prolog or epilog code can be emitted
387/// for the size optimization. If possible, a frame helper call is injected.
388/// When Exit block is given, this check is for epilog.
389bool AArch64FrameLowering::homogeneousPrologEpilog(
390 MachineFunction &MF, MachineBasicBlock *Exit) const {
391 if (!MF.getFunction().hasMinSize())
392 return false;
394 return false;
395 if (EnableRedZone)
396 return false;
397
398 // TODO: Window is supported yet.
399 if (needsWinCFI(MF))
400 return false;
401
402 // TODO: SVE is not supported yet.
403 if (isLikelyToHaveSVEStack(*this, MF))
404 return false;
405
406 // Bail on stack adjustment needed on return for simplicity.
407 const MachineFrameInfo &MFI = MF.getFrameInfo();
408 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
409 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
410 return false;
411 if (Exit && getArgumentStackToRestore(MF, *Exit))
412 return false;
413
414 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
416 return false;
417
418 // If there are an odd number of GPRs before LR and FP in the CSRs list,
419 // they will not be paired into one RegPairInfo, which is incompatible with
420 // the assumption made by the homogeneous prolog epilog pass.
421 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
422 unsigned NumGPRs = 0;
423 for (unsigned I = 0; CSRegs[I]; ++I) {
424 Register Reg = CSRegs[I];
425 if (Reg == AArch64::LR) {
426 assert(CSRegs[I + 1] == AArch64::FP);
427 if (NumGPRs % 2 != 0)
428 return false;
429 break;
430 }
431 if (AArch64::GPR64RegClass.contains(Reg))
432 ++NumGPRs;
433 }
434
435 return true;
436}
437
438/// Returns true if CSRs should be paired.
439bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
440 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
441}
442
443/// This is the biggest offset to the stack pointer we can encode in aarch64
444/// instructions (without using a separate calculation and a temp register).
445/// Note that the exception here are vector stores/loads which cannot encode any
446/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
447static const unsigned DefaultSafeSPDisplacement = 255;
448
449/// Look at each instruction that references stack frames and return the stack
450/// size limit beyond which some of these instructions will require a scratch
451/// register during their expansion later.
453 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
454 // range. We'll end up allocating an unnecessary spill slot a lot, but
455 // realistically that's not a big deal at this stage of the game.
456 for (MachineBasicBlock &MBB : MF) {
457 for (MachineInstr &MI : MBB) {
458 if (MI.isDebugInstr() || MI.isPseudo() ||
459 MI.getOpcode() == AArch64::ADDXri ||
460 MI.getOpcode() == AArch64::ADDSXri)
461 continue;
462
463 for (const MachineOperand &MO : MI.operands()) {
464 if (!MO.isFI())
465 continue;
466
468 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
470 return 0;
471 }
472 }
473 }
475}
476
481
482unsigned
483AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
484 const AArch64FunctionInfo *AFI,
485 bool IsWin64, bool IsFunclet) const {
486 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
487 "Tail call reserved stack must be aligned to 16 bytes");
488 if (!IsWin64 || IsFunclet) {
489 return AFI->getTailCallReservedStack();
490 } else {
491 if (AFI->getTailCallReservedStack() != 0 &&
492 !MF.getFunction().getAttributes().hasAttrSomewhere(
493 Attribute::SwiftAsync))
494 report_fatal_error("cannot generate ABI-changing tail call for Win64");
495 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
496
497 // Var args are stored here in the primary function.
498 FixedObjectSize += AFI->getVarArgsGPRSize();
499
500 if (MF.hasEHFunclets()) {
501 // Catch objects are stored here in the primary function.
502 const MachineFrameInfo &MFI = MF.getFrameInfo();
503 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
504 SmallSetVector<int, 8> CatchObjFrameIndices;
505 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
506 for (const WinEHHandlerType &H : TBME.HandlerArray) {
507 int FrameIndex = H.CatchObj.FrameIndex;
508 if ((FrameIndex != INT_MAX) &&
509 CatchObjFrameIndices.insert(FrameIndex)) {
510 FixedObjectSize = alignTo(FixedObjectSize,
511 MFI.getObjectAlign(FrameIndex).value()) +
512 MFI.getObjectSize(FrameIndex);
513 }
514 }
515 }
516 // To support EH funclets we allocate an UnwindHelp object
517 FixedObjectSize += 8;
518 }
519 return alignTo(FixedObjectSize, 16);
520 }
521}
522
524 if (!EnableRedZone)
525 return false;
526
527 // Don't use the red zone if the function explicitly asks us not to.
528 // This is typically used for kernel code.
529 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
530 const unsigned RedZoneSize =
532 if (!RedZoneSize)
533 return false;
534
535 const MachineFrameInfo &MFI = MF.getFrameInfo();
537 uint64_t NumBytes = AFI->getLocalStackSize();
538
539 // If neither NEON or SVE are available, a COPY from one Q-reg to
540 // another requires a spill -> reload sequence. We can do that
541 // using a pre-decrementing store/post-decrementing load, but
542 // if we do so, we can't use the Red Zone.
543 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
544 !Subtarget.isNeonAvailable() &&
545 !Subtarget.hasSVE();
546
547 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
548 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
549}
550
551/// hasFPImpl - Return true if the specified function should have a dedicated
552/// frame pointer register.
554 const MachineFrameInfo &MFI = MF.getFrameInfo();
555 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
557
558 // Win64 EH requires a frame pointer if funclets are present, as the locals
559 // are accessed off the frame pointer in both the parent function and the
560 // funclets.
561 if (MF.hasEHFunclets())
562 return true;
563 // Retain behavior of always omitting the FP for leaf functions when possible.
565 return true;
566 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
567 MFI.hasStackMap() || MFI.hasPatchPoint() ||
568 RegInfo->hasStackRealignment(MF))
569 return true;
570
571 // If we:
572 //
573 // 1. Have streaming mode changes
574 // OR:
575 // 2. Have a streaming body with SVE stack objects
576 //
577 // Then the value of VG restored when unwinding to this function may not match
578 // the value of VG used to set up the stack.
579 //
580 // This is a problem as the CFA can be described with an expression of the
581 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
582 //
583 // If the value of VG used in that expression does not match the value used to
584 // set up the stack, an incorrect address for the CFA will be computed, and
585 // unwinding will fail.
586 //
587 // We work around this issue by ensuring the frame-pointer can describe the
588 // CFA in either of these cases.
589 if (AFI.needsDwarfUnwindInfo(MF) &&
592 return true;
593 // With large callframes around we may need to use FP to access the scavenging
594 // emergency spillslot.
595 //
596 // Unfortunately some calls to hasFP() like machine verifier ->
597 // getReservedReg() -> hasFP in the middle of global isel are too early
598 // to know the max call frame size. Hopefully conservatively returning "true"
599 // in those cases is fine.
600 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
601 if (!MFI.isMaxCallFrameSizeComputed() ||
603 return true;
604
605 return false;
606}
607
608/// Should the Frame Pointer be reserved for the current function?
610 const TargetMachine &TM = MF.getTarget();
611 const Triple &TT = TM.getTargetTriple();
612
613 // These OSes require the frame chain is valid, even if the current frame does
614 // not use a frame pointer.
615 if (TT.isOSDarwin() || TT.isOSWindows())
616 return true;
617
618 // If the function has a frame pointer, it is reserved.
619 if (hasFP(MF))
620 return true;
621
622 // Frontend has requested to preserve the frame pointer.
623 if (TM.Options.FramePointerIsReserved(MF))
624 return true;
625
626 return false;
627}
628
629/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
630/// not required, we reserve argument space for call sites in the function
631/// immediately on entry to the current function. This eliminates the need for
632/// add/sub sp brackets around call sites. Returns true if the call frame is
633/// included as part of the stack frame.
635 const MachineFunction &MF) const {
636 // The stack probing code for the dynamically allocated outgoing arguments
637 // area assumes that the stack is probed at the top - either by the prologue
638 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
639 // most recent variable-sized object allocation. Changing the condition here
640 // may need to be followed up by changes to the probe issuing logic.
641 return !MF.getFrameInfo().hasVarSizedObjects();
642}
643
647 const AArch64InstrInfo *TII =
648 static_cast<const AArch64InstrInfo *>(MF.getSubtarget().getInstrInfo());
649 const AArch64TargetLowering *TLI =
650 MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
651 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
652 DebugLoc DL = I->getDebugLoc();
653 unsigned Opc = I->getOpcode();
654 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
655 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
656
657 if (!hasReservedCallFrame(MF)) {
658 int64_t Amount = I->getOperand(0).getImm();
659 Amount = alignTo(Amount, getStackAlign());
660 if (!IsDestroy)
661 Amount = -Amount;
662
663 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
664 // doesn't have to pop anything), then the first operand will be zero too so
665 // this adjustment is a no-op.
666 if (CalleePopAmount == 0) {
667 // FIXME: in-function stack adjustment for calls is limited to 24-bits
668 // because there's no guaranteed temporary register available.
669 //
670 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
671 // 1) For offset <= 12-bit, we use LSL #0
672 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
673 // LSL #0, and the other uses LSL #12.
674 //
675 // Most call frames will be allocated at the start of a function so
676 // this is OK, but it is a limitation that needs dealing with.
677 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
678
679 if (TLI->hasInlineStackProbe(MF) &&
681 // When stack probing is enabled, the decrement of SP may need to be
682 // probed. We only need to do this if the call site needs 1024 bytes of
683 // space or more, because a region smaller than that is allowed to be
684 // unprobed at an ABI boundary. We rely on the fact that SP has been
685 // probed exactly at this point, either by the prologue or most recent
686 // dynamic allocation.
688 "non-reserved call frame without var sized objects?");
689 Register ScratchReg =
690 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
691 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
692 } else {
693 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
694 StackOffset::getFixed(Amount), TII);
695 }
696 }
697 } else if (CalleePopAmount != 0) {
698 // If the calling convention demands that the callee pops arguments from the
699 // stack, we want to add it back if we have a reserved call frame.
700 assert(CalleePopAmount < 0xffffff && "call frame too large");
701 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
702 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
703 }
704 return MBB.erase(I);
705}
706
708 MachineBasicBlock &MBB) const {
709
710 MachineFunction &MF = *MBB.getParent();
711 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
712 const auto &TRI = *Subtarget.getRegisterInfo();
713 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
714
715 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
716
717 // Reset the CFA to `SP + 0`.
718 CFIBuilder.buildDefCFA(AArch64::SP, 0);
719
720 // Flip the RA sign state.
721 if (MFI.shouldSignReturnAddress(MF))
722 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
723 : CFIBuilder.buildNegateRAState();
724
725 // Shadow call stack uses X18, reset it.
726 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
727 CFIBuilder.buildSameValue(AArch64::X18);
728
729 // Emit .cfi_same_value for callee-saved registers.
730 const std::vector<CalleeSavedInfo> &CSI =
732 for (const auto &Info : CSI) {
733 MCRegister Reg = Info.getReg();
734 if (!TRI.regNeedsCFI(Reg, Reg))
735 continue;
736 CFIBuilder.buildSameValue(Reg);
737 }
738}
739
741 switch (Reg.id()) {
742 default:
743 // The called routine is expected to preserve r19-r28
744 // r29 and r30 are used as frame pointer and link register resp.
745 return 0;
746
747 // GPRs
748#define CASE(n) \
749 case AArch64::W##n: \
750 case AArch64::X##n: \
751 return AArch64::X##n
752 CASE(0);
753 CASE(1);
754 CASE(2);
755 CASE(3);
756 CASE(4);
757 CASE(5);
758 CASE(6);
759 CASE(7);
760 CASE(8);
761 CASE(9);
762 CASE(10);
763 CASE(11);
764 CASE(12);
765 CASE(13);
766 CASE(14);
767 CASE(15);
768 CASE(16);
769 CASE(17);
770 CASE(18);
771#undef CASE
772
773 // FPRs
774#define CASE(n) \
775 case AArch64::B##n: \
776 case AArch64::H##n: \
777 case AArch64::S##n: \
778 case AArch64::D##n: \
779 case AArch64::Q##n: \
780 return HasSVE ? AArch64::Z##n : AArch64::Q##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800 CASE(19);
801 CASE(20);
802 CASE(21);
803 CASE(22);
804 CASE(23);
805 CASE(24);
806 CASE(25);
807 CASE(26);
808 CASE(27);
809 CASE(28);
810 CASE(29);
811 CASE(30);
812 CASE(31);
813#undef CASE
814 }
815}
816
817void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
818 MachineBasicBlock &MBB) const {
819 // Insertion point.
821
822 // Fake a debug loc.
823 DebugLoc DL;
824 if (MBBI != MBB.end())
825 DL = MBBI->getDebugLoc();
826
827 const MachineFunction &MF = *MBB.getParent();
828 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
829 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
830
831 BitVector GPRsToZero(TRI.getNumRegs());
832 BitVector FPRsToZero(TRI.getNumRegs());
833 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
834 for (MCRegister Reg : RegsToZero.set_bits()) {
835 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
836 // For GPRs, we only care to clear out the 64-bit register.
837 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
838 GPRsToZero.set(XReg);
839 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
840 // For FPRs,
841 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
842 FPRsToZero.set(XReg);
843 }
844 }
845
846 const AArch64InstrInfo &TII = *STI.getInstrInfo();
847
848 // Zero out GPRs.
849 for (MCRegister Reg : GPRsToZero.set_bits())
850 TII.buildClearRegister(Reg, MBB, MBBI, DL);
851
852 // Zero out FP/vector registers.
853 for (MCRegister Reg : FPRsToZero.set_bits())
854 TII.buildClearRegister(Reg, MBB, MBBI, DL);
855
856 if (HasSVE) {
857 for (MCRegister PReg :
858 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
859 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
860 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
861 AArch64::P15}) {
862 if (RegsToZero[PReg])
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
864 }
865 }
866}
867
868bool AArch64FrameLowering::windowsRequiresStackProbe(
869 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
870 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
871 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
875 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
876}
877
879 const MachineBasicBlock &MBB) {
880 const MachineFunction *MF = MBB.getParent();
881 LiveRegs.addLiveIns(MBB);
882 // Mark callee saved registers as used so we will not choose them.
883 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
884 for (unsigned i = 0; CSRegs[i]; ++i)
885 LiveRegs.addReg(CSRegs[i]);
886}
887
889AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
890 bool HasCall) const {
891 MachineFunction *MF = MBB->getParent();
892
893 // If MBB is an entry block, use X9 as the scratch register
894 // preserve_none functions may be using X9 to pass arguments,
895 // so prefer to pick an available register below.
896 if (&MF->front() == MBB &&
898 return AArch64::X9;
899
900 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 LivePhysRegs LiveRegs(TRI);
903 getLiveRegsForEntryMBB(LiveRegs, *MBB);
904 if (HasCall) {
905 LiveRegs.addReg(AArch64::X16);
906 LiveRegs.addReg(AArch64::X17);
907 LiveRegs.addReg(AArch64::X18);
908 }
909
910 // Prefer X9 since it was historically used for the prologue scratch reg.
911 const MachineRegisterInfo &MRI = MF->getRegInfo();
912 if (LiveRegs.available(MRI, AArch64::X9))
913 return AArch64::X9;
914
915 for (unsigned Reg : AArch64::GPR64RegClass) {
916 if (LiveRegs.available(MRI, Reg))
917 return Reg;
918 }
919 return AArch64::NoRegister;
920}
921
923 const MachineBasicBlock &MBB) const {
924 const MachineFunction *MF = MBB.getParent();
925 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
926 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
927 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
928 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
930
931 if (AFI->hasSwiftAsyncContext()) {
932 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
933 const MachineRegisterInfo &MRI = MF->getRegInfo();
936 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
937 // available.
938 if (!LiveRegs.available(MRI, AArch64::X16) ||
939 !LiveRegs.available(MRI, AArch64::X17))
940 return false;
941 }
942
943 // Certain stack probing sequences might clobber flags, then we can't use
944 // the block as a prologue if the flags register is a live-in.
946 MBB.isLiveIn(AArch64::NZCV))
947 return false;
948
949 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
950 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
951 return false;
952
953 // May need a scratch register (for return value) if require making a special
954 // call
955 if (requiresSaveVG(*MF) ||
956 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
957 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
958 return false;
959
960 return true;
961}
962
964 const Function &F = MF.getFunction();
965 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
966 F.needsUnwindTableEntry();
967}
968
969bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
970 const MachineFunction &MF) const {
971 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
972 // and SEH_EpilogEnd instructions in the correct order.
974 return false;
976 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
977 return SignReturnAddressAll;
978}
979
980// Given a load or a store instruction, generate an appropriate unwinding SEH
981// code on Windows.
983AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
984 const TargetInstrInfo &TII,
985 MachineInstr::MIFlag Flag) const {
986 unsigned Opc = MBBI->getOpcode();
987 MachineBasicBlock *MBB = MBBI->getParent();
988 MachineFunction &MF = *MBB->getParent();
989 DebugLoc DL = MBBI->getDebugLoc();
990 unsigned ImmIdx = MBBI->getNumOperands() - 1;
991 int Imm = MBBI->getOperand(ImmIdx).getImm();
993 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
994 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
995
996 switch (Opc) {
997 default:
998 report_fatal_error("No SEH Opcode for this instruction");
999 case AArch64::STR_ZXI:
1000 case AArch64::LDR_ZXI: {
1001 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1002 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1003 .addImm(Reg0)
1004 .addImm(Imm)
1005 .setMIFlag(Flag);
1006 break;
1007 }
1008 case AArch64::STR_PXI:
1009 case AArch64::LDR_PXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDPDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STPDpre: {
1021 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1023 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1024 .addImm(Reg0)
1025 .addImm(Reg1)
1026 .addImm(Imm * 8)
1027 .setMIFlag(Flag);
1028 break;
1029 }
1030 case AArch64::LDPXpost:
1031 Imm = -Imm;
1032 [[fallthrough]];
1033 case AArch64::STPXpre: {
1034 Register Reg0 = MBBI->getOperand(1).getReg();
1035 Register Reg1 = MBBI->getOperand(2).getReg();
1036 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1037 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1038 .addImm(Imm * 8)
1039 .setMIFlag(Flag);
1040 else
1041 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1042 .addImm(RegInfo->getSEHRegNum(Reg0))
1043 .addImm(RegInfo->getSEHRegNum(Reg1))
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 break;
1047 }
1048 case AArch64::LDRDpost:
1049 Imm = -Imm;
1050 [[fallthrough]];
1051 case AArch64::STRDpre: {
1052 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1053 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1054 .addImm(Reg)
1055 .addImm(Imm)
1056 .setMIFlag(Flag);
1057 break;
1058 }
1059 case AArch64::LDRXpost:
1060 Imm = -Imm;
1061 [[fallthrough]];
1062 case AArch64::STRXpre: {
1063 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1064 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1065 .addImm(Reg)
1066 .addImm(Imm)
1067 .setMIFlag(Flag);
1068 break;
1069 }
1070 case AArch64::STPDi:
1071 case AArch64::LDPDi: {
1072 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1073 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1074 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1075 .addImm(Reg0)
1076 .addImm(Reg1)
1077 .addImm(Imm * 8)
1078 .setMIFlag(Flag);
1079 break;
1080 }
1081 case AArch64::STPXi:
1082 case AArch64::LDPXi: {
1083 Register Reg0 = MBBI->getOperand(0).getReg();
1084 Register Reg1 = MBBI->getOperand(1).getReg();
1085 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1086 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1087 .addImm(Imm * 8)
1088 .setMIFlag(Flag);
1089 else
1090 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1091 .addImm(RegInfo->getSEHRegNum(Reg0))
1092 .addImm(RegInfo->getSEHRegNum(Reg1))
1093 .addImm(Imm * 8)
1094 .setMIFlag(Flag);
1095 break;
1096 }
1097 case AArch64::STRXui:
1098 case AArch64::LDRXui: {
1099 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1100 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1101 .addImm(Reg)
1102 .addImm(Imm * 8)
1103 .setMIFlag(Flag);
1104 break;
1105 }
1106 case AArch64::STRDui:
1107 case AArch64::LDRDui: {
1108 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1109 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1110 .addImm(Reg)
1111 .addImm(Imm * 8)
1112 .setMIFlag(Flag);
1113 break;
1114 }
1115 case AArch64::STPQi:
1116 case AArch64::LDPQi: {
1117 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1118 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1119 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1120 .addImm(Reg0)
1121 .addImm(Reg1)
1122 .addImm(Imm * 16)
1123 .setMIFlag(Flag);
1124 break;
1125 }
1126 case AArch64::LDPQpost:
1127 Imm = -Imm;
1128 [[fallthrough]];
1129 case AArch64::STPQpre: {
1130 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1131 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1132 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1133 .addImm(Reg0)
1134 .addImm(Reg1)
1135 .addImm(Imm * 16)
1136 .setMIFlag(Flag);
1137 break;
1138 }
1139 }
1140 auto I = MBB->insertAfter(MBBI, MIB);
1141 return I;
1142}
1143
1146 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1147 return false;
1148 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1149 // is enabled with streaming mode changes.
1150 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1151 if (ST.isTargetDarwin())
1152 return ST.hasSVE();
1153 return true;
1154}
1155
1156static bool isTargetWindows(const MachineFunction &MF) {
1158}
1159
1161 MachineFunction &MF) const {
1162 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1163 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1164
1165 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1166 DebugLoc DL; // Set debug location to unknown.
1168
1169 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1171 };
1172
1173 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1174 DebugLoc DL;
1175 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1176 if (MBBI != MBB.end())
1177 DL = MBBI->getDebugLoc();
1178
1179 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1181 };
1182
1183 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1184 EmitSignRA(MF.front());
1185 for (MachineBasicBlock &MBB : MF) {
1186 if (MBB.isEHFuncletEntry())
1187 EmitSignRA(MBB);
1188 if (MBB.isReturnBlock())
1189 EmitAuthRA(MBB);
1190 }
1191}
1192
1194 MachineBasicBlock &MBB) const {
1195 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1196 PrologueEmitter.emitPrologue();
1197}
1198
1200 MachineBasicBlock &MBB) const {
1201 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1202 EpilogueEmitter.emitEpilogue();
1203}
1204
1207 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1208}
1209
1211 return enableCFIFixup(MF) &&
1212 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1213}
1214
1215/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1216/// debug info. It's the same as what we use for resolving the code-gen
1217/// references for now. FIXME: This can go wrong when references are
1218/// SP-relative and simple call frames aren't used.
1221 Register &FrameReg) const {
1223 MF, FI, FrameReg,
1224 /*PreferFP=*/
1225 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1226 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1227 /*ForSimm=*/false);
1228}
1229
1232 int FI) const {
1233 // This function serves to provide a comparable offset from a single reference
1234 // point (the value of SP at function entry) that can be used for analysis,
1235 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1236 // correct for all objects in the presence of VLA-area objects or dynamic
1237 // stack re-alignment.
1238
1239 const auto &MFI = MF.getFrameInfo();
1240
1241 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1242 StackOffset ZPRStackSize = getZPRStackSize(MF);
1243 StackOffset PPRStackSize = getPPRStackSize(MF);
1244 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1245
1246 // For VLA-area objects, just emit an offset at the end of the stack frame.
1247 // Whilst not quite correct, these objects do live at the end of the frame and
1248 // so it is more useful for analysis for the offset to reflect this.
1249 if (MFI.isVariableSizedObjectIndex(FI)) {
1250 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1251 }
1252
1253 // This is correct in the absence of any SVE stack objects.
1254 if (!SVEStackSize)
1255 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1256
1257 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1258 bool FPAfterSVECalleeSaves =
1260 if (MFI.hasScalableStackID(FI)) {
1261 if (FPAfterSVECalleeSaves &&
1262 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1263 assert(!AFI->hasSplitSVEObjects() &&
1264 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1265 return StackOffset::getScalable(ObjectOffset);
1266 }
1267 StackOffset AccessOffset{};
1268 // The scalable vectors are below (lower address) the scalable predicates
1269 // with split SVE objects, so we must subtract the size of the predicates.
1270 if (AFI->hasSplitSVEObjects() &&
1271 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1272 AccessOffset = -PPRStackSize;
1273 return AccessOffset +
1274 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1275 ObjectOffset);
1276 }
1277
1278 bool IsFixed = MFI.isFixedObjectIndex(FI);
1279 bool IsCSR =
1280 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1281
1282 StackOffset ScalableOffset = {};
1283 if (!IsFixed && !IsCSR) {
1284 ScalableOffset = -SVEStackSize;
1285 } else if (FPAfterSVECalleeSaves && IsCSR) {
1286 ScalableOffset =
1288 }
1289
1290 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1291}
1292
1298
1299StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1300 int64_t ObjectOffset) const {
1301 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1302 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1303 const Function &F = MF.getFunction();
1304 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1305 unsigned FixedObject =
1306 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1307 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1308 int64_t FPAdjust =
1309 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1310 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1311}
1312
1313StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1314 int64_t ObjectOffset) const {
1315 const auto &MFI = MF.getFrameInfo();
1316 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1317}
1318
1319// TODO: This function currently does not work for scalable vectors.
1321 int FI) const {
1322 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1323 MF.getSubtarget().getRegisterInfo());
1324 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1325 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1326 ? getFPOffset(MF, ObjectOffset).getFixed()
1327 : getStackOffset(MF, ObjectOffset).getFixed();
1328}
1329
1331 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1332 bool ForSimm) const {
1333 const auto &MFI = MF.getFrameInfo();
1334 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1335 bool isFixed = MFI.isFixedObjectIndex(FI);
1336 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1337 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1338 FrameReg, PreferFP, ForSimm);
1339}
1340
1342 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1343 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1344 bool ForSimm) const {
1345 const auto &MFI = MF.getFrameInfo();
1346 const auto *RegInfo = static_cast<const AArch64RegisterInfo *>(
1347 MF.getSubtarget().getRegisterInfo());
1348 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1349 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1350
1351 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1352 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1353 bool isCSR =
1354 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1355 bool isSVE = MFI.isScalableStackID(StackID);
1356
1357 StackOffset ZPRStackSize = getZPRStackSize(MF);
1358 StackOffset PPRStackSize = getPPRStackSize(MF);
1359 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1360
1361 // Use frame pointer to reference fixed objects. Use it for locals if
1362 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1363 // reliable as a base). Make sure useFPForScavengingIndex() does the
1364 // right thing for the emergency spill slot.
1365 bool UseFP = false;
1366 if (AFI->hasStackFrame() && !isSVE) {
1367 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1368 // there are scalable (SVE) objects in between the FP and the fixed-sized
1369 // objects.
1370 PreferFP &= !SVEStackSize;
1371
1372 // Note: Keeping the following as multiple 'if' statements rather than
1373 // merging to a single expression for readability.
1374 //
1375 // Argument access should always use the FP.
1376 if (isFixed) {
1377 UseFP = hasFP(MF);
1378 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1379 // References to the CSR area must use FP if we're re-aligning the stack
1380 // since the dynamically-sized alignment padding is between the SP/BP and
1381 // the CSR area.
1382 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1383 UseFP = true;
1384 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1385 // If the FPOffset is negative and we're producing a signed immediate, we
1386 // have to keep in mind that the available offset range for negative
1387 // offsets is smaller than for positive ones. If an offset is available
1388 // via the FP and the SP, use whichever is closest.
1389 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1390 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1391
1392 if (FPOffset >= 0) {
1393 // If the FPOffset is positive, that'll always be best, as the SP/BP
1394 // will be even further away.
1395 UseFP = true;
1396 } else if (MFI.hasVarSizedObjects()) {
1397 // If we have variable sized objects, we can use either FP or BP, as the
1398 // SP offset is unknown. We can use the base pointer if we have one and
1399 // FP is not preferred. If not, we're stuck with using FP.
1400 bool CanUseBP = RegInfo->hasBasePointer(MF);
1401 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1402 UseFP = PreferFP;
1403 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1404 UseFP = true;
1405 // else we can use BP and FP, but the offset from FP won't fit.
1406 // That will make us scavenge registers which we can probably avoid by
1407 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1408 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1409 // Funclets access the locals contained in the parent's stack frame
1410 // via the frame pointer, so we have to use the FP in the parent
1411 // function.
1412 (void) Subtarget;
1413 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1414 MF.getFunction().isVarArg()) &&
1415 "Funclets should only be present on Win64");
1416 UseFP = true;
1417 } else {
1418 // We have the choice between FP and (SP or BP).
1419 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1420 UseFP = true;
1421 }
1422 }
1423 }
1424
1425 assert(
1426 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1427 "In the presence of dynamic stack pointer realignment, "
1428 "non-argument/CSR objects cannot be accessed through the frame pointer");
1429
1430 bool FPAfterSVECalleeSaves =
1432
1433 if (isSVE) {
1434 StackOffset FPOffset = StackOffset::get(
1435 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1436 StackOffset SPOffset =
1437 SVEStackSize +
1438 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1439 ObjectOffset);
1440
1441 // With split SVE objects the ObjectOffset is relative to the split area
1442 // (i.e. the PPR area or ZPR area respectively).
1443 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1444 // If we're accessing an SVE vector with split SVE objects...
1445 // - From the FP we need to move down past the PPR area:
1446 FPOffset -= PPRStackSize;
1447 // - From the SP we only need to move up to the ZPR area:
1448 SPOffset -= PPRStackSize;
1449 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1450 // `SPOffset = ZPRStackSize + ...`.
1451 }
1452
1453 if (FPAfterSVECalleeSaves) {
1455 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1458 }
1459 }
1460
1461 // Always use the FP for SVE spills if available and beneficial.
1462 if (hasFP(MF) && (SPOffset.getFixed() ||
1463 FPOffset.getScalable() < SPOffset.getScalable() ||
1464 RegInfo->hasStackRealignment(MF))) {
1465 FrameReg = RegInfo->getFrameRegister(MF);
1466 return FPOffset;
1467 }
1468 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1469 : (unsigned)AArch64::SP;
1470
1471 return SPOffset;
1472 }
1473
1474 StackOffset SVEAreaOffset = {};
1475 if (FPAfterSVECalleeSaves) {
1476 // In this stack layout, the FP is in between the callee saves and other
1477 // SVE allocations.
1478 StackOffset SVECalleeSavedStack =
1480 if (UseFP) {
1481 if (isFixed)
1482 SVEAreaOffset = SVECalleeSavedStack;
1483 else if (!isCSR)
1484 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1485 } else {
1486 if (isFixed)
1487 SVEAreaOffset = SVEStackSize;
1488 else if (isCSR)
1489 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1490 }
1491 } else {
1492 if (UseFP && !(isFixed || isCSR))
1493 SVEAreaOffset = -SVEStackSize;
1494 if (!UseFP && (isFixed || isCSR))
1495 SVEAreaOffset = SVEStackSize;
1496 }
1497
1498 if (UseFP) {
1499 FrameReg = RegInfo->getFrameRegister(MF);
1500 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1501 }
1502
1503 // Use the base pointer if we have one.
1504 if (RegInfo->hasBasePointer(MF))
1505 FrameReg = RegInfo->getBaseRegister();
1506 else {
1507 assert(!MFI.hasVarSizedObjects() &&
1508 "Can't use SP when we have var sized objects.");
1509 FrameReg = AArch64::SP;
1510 // If we're using the red zone for this function, the SP won't actually
1511 // be adjusted, so the offsets will be negative. They're also all
1512 // within range of the signed 9-bit immediate instructions.
1513 if (canUseRedZone(MF))
1514 Offset -= AFI->getLocalStackSize();
1515 }
1516
1517 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1518}
1519
1520static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1521 // Do not set a kill flag on values that are also marked as live-in. This
1522 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1523 // callee saved registers.
1524 // Omitting the kill flags is conservatively correct even if the live-in
1525 // is not used after all.
1526 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1527 return getKillRegState(!IsLiveIn);
1528}
1529
1531 MachineFunction &MF) {
1532 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1533 AttributeList Attrs = MF.getFunction().getAttributes();
1535 return Subtarget.isTargetMachO() &&
1536 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1537 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1539 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1540}
1541
1542static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1543 bool NeedsWinCFI, bool IsFirst,
1544 const TargetRegisterInfo *TRI) {
1545 // If we are generating register pairs for a Windows function that requires
1546 // EH support, then pair consecutive registers only. There are no unwind
1547 // opcodes for saves/restores of non-consecutive register pairs.
1548 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1549 // save_lrpair.
1550 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1551
1552 if (Reg2 == AArch64::FP)
1553 return true;
1554 if (!NeedsWinCFI)
1555 return false;
1556 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1557 return false;
1558 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1559 // opcode. If this is the first register pair, it would end up with a
1560 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1561 // if LR is paired with something else than the first register.
1562 // The save_lrpair opcode requires the first register to be an odd one.
1563 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1564 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1565 return false;
1566 return true;
1567}
1568
1569/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1570/// WindowsCFI requires that only consecutive registers can be paired.
1571/// LR and FP need to be allocated together when the frame needs to save
1572/// the frame-record. This means any other register pairing with LR is invalid.
1573static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1574 bool UsesWinAAPCS, bool NeedsWinCFI,
1575 bool NeedsFrameRecord, bool IsFirst,
1576 const TargetRegisterInfo *TRI) {
1577 if (UsesWinAAPCS)
1578 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1579 TRI);
1580
1581 // If we need to store the frame record, don't pair any register
1582 // with LR other than FP.
1583 if (NeedsFrameRecord)
1584 return Reg2 == AArch64::LR;
1585
1586 return false;
1587}
1588
1589namespace {
1590
1591struct RegPairInfo {
1592 unsigned Reg1 = AArch64::NoRegister;
1593 unsigned Reg2 = AArch64::NoRegister;
1594 int FrameIdx;
1595 int Offset;
1596 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1597 const TargetRegisterClass *RC;
1598
1599 RegPairInfo() = default;
1600
1601 bool isPaired() const { return Reg2 != AArch64::NoRegister; }
1602
1603 bool isScalable() const { return Type == PPR || Type == ZPR; }
1604};
1605
1606} // end anonymous namespace
1607
1608unsigned findFreePredicateReg(BitVector &SavedRegs) {
1609 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1610 if (SavedRegs.test(PReg)) {
1611 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1612 return PNReg;
1613 }
1614 }
1615 return AArch64::NoRegister;
1616}
1617
1618// The multivector LD/ST are available only for SME or SVE2p1 targets
1620 MachineFunction &MF) {
1622 return false;
1623
1624 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1625 bool IsLocallyStreaming =
1626 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1627
1628 // Only when in streaming mode SME2 instructions can be safely used.
1629 // It is not safe to use SME2 instructions when in streaming compatible or
1630 // locally streaming mode.
1631 return Subtarget.hasSVE2p1() ||
1632 (Subtarget.hasSME2() &&
1633 (!IsLocallyStreaming && Subtarget.isStreaming()));
1634}
1635
1637 MachineFunction &MF,
1639 const TargetRegisterInfo *TRI,
1641 bool NeedsFrameRecord) {
1642
1643 if (CSI.empty())
1644 return;
1645
1646 bool IsWindows = isTargetWindows(MF);
1647 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1649 unsigned StackHazardSize = getStackHazardSize(MF);
1650 MachineFrameInfo &MFI = MF.getFrameInfo();
1652 unsigned Count = CSI.size();
1653 (void)CC;
1654 // MachO's compact unwind format relies on all registers being stored in
1655 // pairs.
1656 assert((!produceCompactUnwindFrame(AFL, MF) ||
1659 (Count & 1) == 0) &&
1660 "Odd number of callee-saved regs to spill!");
1661 int ByteOffset = AFI->getCalleeSavedStackSize();
1662 int StackFillDir = -1;
1663 int RegInc = 1;
1664 unsigned FirstReg = 0;
1665 if (NeedsWinCFI) {
1666 // For WinCFI, fill the stack from the bottom up.
1667 ByteOffset = 0;
1668 StackFillDir = 1;
1669 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1670 // backwards, to pair up registers starting from lower numbered registers.
1671 RegInc = -1;
1672 FirstReg = Count - 1;
1673 }
1674
1675 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1676
1677 int ZPRByteOffset = 0;
1678 int PPRByteOffset = 0;
1679 bool SplitPPRs = AFI->hasSplitSVEObjects();
1680 if (SplitPPRs) {
1681 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1682 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1683 } else if (!FPAfterSVECalleeSaves) {
1684 ZPRByteOffset =
1686 // Unused: Everything goes in ZPR space.
1687 PPRByteOffset = 0;
1688 }
1689
1690 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1691 Register LastReg = 0;
1692 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1693
1694 // When iterating backwards, the loop condition relies on unsigned wraparound.
1695 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1696 RegPairInfo RPI;
1697 RPI.Reg1 = CSI[i].getReg();
1698
1699 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1700 RPI.Type = RegPairInfo::GPR;
1701 RPI.RC = &AArch64::GPR64RegClass;
1702 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1703 RPI.Type = RegPairInfo::FPR64;
1704 RPI.RC = &AArch64::FPR64RegClass;
1705 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1706 RPI.Type = RegPairInfo::FPR128;
1707 RPI.RC = &AArch64::FPR128RegClass;
1708 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1709 RPI.Type = RegPairInfo::ZPR;
1710 RPI.RC = &AArch64::ZPRRegClass;
1711 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1712 RPI.Type = RegPairInfo::PPR;
1713 RPI.RC = &AArch64::PPRRegClass;
1714 } else if (RPI.Reg1 == AArch64::VG) {
1715 RPI.Type = RegPairInfo::VG;
1716 RPI.RC = &AArch64::FIXED_REGSRegClass;
1717 } else {
1718 llvm_unreachable("Unsupported register class.");
1719 }
1720
1721 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1722 ? PPRByteOffset
1723 : ZPRByteOffset;
1724
1725 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1726 if (HasCSHazardPadding &&
1727 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1729 ByteOffset += StackFillDir * StackHazardSize;
1730 LastReg = RPI.Reg1;
1731
1732 int Scale = TRI->getSpillSize(*RPI.RC);
1733 // Add the next reg to the pair if it is in the same register class.
1734 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1735 MCRegister NextReg = CSI[i + RegInc].getReg();
1736 bool IsFirst = i == FirstReg;
1737 switch (RPI.Type) {
1738 case RegPairInfo::GPR:
1739 if (AArch64::GPR64RegClass.contains(NextReg) &&
1740 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
1741 NeedsWinCFI, NeedsFrameRecord, IsFirst,
1742 TRI))
1743 RPI.Reg2 = NextReg;
1744 break;
1745 case RegPairInfo::FPR64:
1746 if (AArch64::FPR64RegClass.contains(NextReg) &&
1747 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
1748 IsFirst, TRI))
1749 RPI.Reg2 = NextReg;
1750 break;
1751 case RegPairInfo::FPR128:
1752 if (AArch64::FPR128RegClass.contains(NextReg))
1753 RPI.Reg2 = NextReg;
1754 break;
1755 case RegPairInfo::PPR:
1756 break;
1757 case RegPairInfo::ZPR:
1758 if (AFI->getPredicateRegForFillSpill() != 0 &&
1759 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1760 // Calculate offset of register pair to see if pair instruction can be
1761 // used.
1762 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1763 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1764 RPI.Reg2 = NextReg;
1765 }
1766 break;
1767 case RegPairInfo::VG:
1768 break;
1769 }
1770 }
1771
1772 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1773 // list to come in sorted by frame index so that we can issue the store
1774 // pair instructions directly. Assert if we see anything otherwise.
1775 //
1776 // The order of the registers in the list is controlled by
1777 // getCalleeSavedRegs(), so they will always be in-order, as well.
1778 assert((!RPI.isPaired() ||
1779 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1780 "Out of order callee saved regs!");
1781
1782 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1783 RPI.Reg1 == AArch64::LR) &&
1784 "FrameRecord must be allocated together with LR");
1785
1786 // Windows AAPCS has FP and LR reversed.
1787 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1788 RPI.Reg2 == AArch64::LR) &&
1789 "FrameRecord must be allocated together with LR");
1790
1791 // MachO's compact unwind format relies on all registers being stored in
1792 // adjacent register pairs.
1793 assert((!produceCompactUnwindFrame(AFL, MF) ||
1796 (RPI.isPaired() &&
1797 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1798 RPI.Reg1 + 1 == RPI.Reg2))) &&
1799 "Callee-save registers not saved as adjacent register pair!");
1800
1801 RPI.FrameIdx = CSI[i].getFrameIdx();
1802 if (NeedsWinCFI &&
1803 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1804 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1805
1806 // Realign the scalable offset if necessary. This is relevant when
1807 // spilling predicates on Windows.
1808 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1809 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1810 }
1811
1812 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1813 assert(OffsetPre % Scale == 0);
1814
1815 if (RPI.isScalable())
1816 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1817 else
1818 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1819
1820 // Swift's async context is directly before FP, so allocate an extra
1821 // 8 bytes for it.
1822 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1823 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1824 (IsWindows && RPI.Reg2 == AArch64::LR)))
1825 ByteOffset += StackFillDir * 8;
1826
1827 // Round up size of non-pair to pair size if we need to pad the
1828 // callee-save area to ensure 16-byte alignment.
1829 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1830 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1831 ByteOffset % 16 != 0) {
1832 ByteOffset += 8 * StackFillDir;
1833 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1834 // A stack frame with a gap looks like this, bottom up:
1835 // d9, d8. x21, gap, x20, x19.
1836 // Set extra alignment on the x21 object to create the gap above it.
1837 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1838 NeedGapToAlignStack = false;
1839 }
1840
1841 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1842 assert(OffsetPost % Scale == 0);
1843 // If filling top down (default), we want the offset after incrementing it.
1844 // If filling bottom up (WinCFI) we need the original offset.
1845 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1846
1847 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1848 // Swift context can directly precede FP.
1849 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1850 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1851 (IsWindows && RPI.Reg2 == AArch64::LR)))
1852 Offset += 8;
1853 RPI.Offset = Offset / Scale;
1854
1855 assert((!RPI.isPaired() ||
1856 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1857 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1858 "Offset out of bounds for LDP/STP immediate");
1859
1860 auto isFrameRecord = [&] {
1861 if (RPI.isPaired())
1862 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1863 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1864 // Otherwise, look for the frame record as two unpaired registers. This is
1865 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1866 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1867 // On Windows, this check works out as current reg == FP, next reg == LR,
1868 // and on other platforms current reg == FP, previous reg == LR. This
1869 // works out as the correct pre-increment or post-increment offsets
1870 // respectively.
1871 return i > 0 && RPI.Reg1 == AArch64::FP &&
1872 CSI[i - 1].getReg() == AArch64::LR;
1873 };
1874
1875 // Save the offset to frame record so that the FP register can point to the
1876 // innermost frame record (spilled FP and LR registers).
1877 if (NeedsFrameRecord && isFrameRecord())
1879
1880 RegPairs.push_back(RPI);
1881 if (RPI.isPaired())
1882 i += RegInc;
1883 }
1884 if (NeedsWinCFI) {
1885 // If we need an alignment gap in the stack, align the topmost stack
1886 // object. A stack frame with a gap looks like this, bottom up:
1887 // x19, d8. d9, gap.
1888 // Set extra alignment on the topmost stack object (the first element in
1889 // CSI, which goes top down), to create the gap above it.
1890 if (AFI->hasCalleeSaveStackFreeSpace())
1891 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1892 // We iterated bottom up over the registers; flip RegPairs back to top
1893 // down order.
1894 std::reverse(RegPairs.begin(), RegPairs.end());
1895 }
1896}
1897
1901 MachineFunction &MF = *MBB.getParent();
1902 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1904 bool NeedsWinCFI = needsWinCFI(MF);
1905 DebugLoc DL;
1907
1908 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1909
1911 // Refresh the reserved regs in case there are any potential changes since the
1912 // last freeze.
1913 MRI.freezeReservedRegs();
1914
1915 if (homogeneousPrologEpilog(MF)) {
1916 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1918
1919 for (auto &RPI : RegPairs) {
1920 MIB.addReg(RPI.Reg1);
1921 MIB.addReg(RPI.Reg2);
1922
1923 // Update register live in.
1924 if (!MRI.isReserved(RPI.Reg1))
1925 MBB.addLiveIn(RPI.Reg1);
1926 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1927 MBB.addLiveIn(RPI.Reg2);
1928 }
1929 return true;
1930 }
1931 bool PTrueCreated = false;
1932 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1933 unsigned Reg1 = RPI.Reg1;
1934 unsigned Reg2 = RPI.Reg2;
1935 unsigned StrOpc;
1936
1937 // Issue sequence of spills for cs regs. The first spill may be converted
1938 // to a pre-decrement store later by emitPrologue if the callee-save stack
1939 // area allocation can't be combined with the local stack area allocation.
1940 // For example:
1941 // stp x22, x21, [sp, #0] // addImm(+0)
1942 // stp x20, x19, [sp, #16] // addImm(+2)
1943 // stp fp, lr, [sp, #32] // addImm(+4)
1944 // Rationale: This sequence saves uop updates compared to a sequence of
1945 // pre-increment spills like stp xi,xj,[sp,#-16]!
1946 // Note: Similar rationale and sequence for restores in epilog.
1947 unsigned Size = TRI->getSpillSize(*RPI.RC);
1948 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1949 switch (RPI.Type) {
1950 case RegPairInfo::GPR:
1951 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1952 break;
1953 case RegPairInfo::FPR64:
1954 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1955 break;
1956 case RegPairInfo::FPR128:
1957 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
1958 break;
1959 case RegPairInfo::ZPR:
1960 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
1961 break;
1962 case RegPairInfo::PPR:
1963 StrOpc = AArch64::STR_PXI;
1964 break;
1965 case RegPairInfo::VG:
1966 StrOpc = AArch64::STRXui;
1967 break;
1968 }
1969
1970 unsigned X0Scratch = AArch64::NoRegister;
1971 auto RestoreX0 = make_scope_exit([&] {
1972 if (X0Scratch != AArch64::NoRegister)
1973 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
1974 .addReg(X0Scratch)
1976 });
1977
1978 if (Reg1 == AArch64::VG) {
1979 // Find an available register to store value of VG to.
1980 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
1981 assert(Reg1 != AArch64::NoRegister);
1982 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
1983 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
1984 .addImm(31)
1985 .addImm(1)
1987 } else {
1989 if (any_of(MBB.liveins(),
1990 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
1991 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
1992 AArch64::X0, LiveIn.PhysReg);
1993 })) {
1994 X0Scratch = Reg1;
1995 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
1996 .addReg(AArch64::X0)
1998 }
1999
2000 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2001 const uint32_t *RegMask =
2002 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2003 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2004 .addExternalSymbol(TLI.getLibcallName(LC))
2005 .addRegMask(RegMask)
2006 .addReg(AArch64::X0, RegState::ImplicitDefine)
2008 Reg1 = AArch64::X0;
2009 }
2010 }
2011
2012 LLVM_DEBUG(dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2013 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2014 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2015 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2016 dbgs() << ")\n");
2017
2018 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2019 "Windows unwdinding requires a consecutive (FP,LR) pair");
2020 // Windows unwind codes require consecutive registers if registers are
2021 // paired. Make the switch here, so that the code below will save (x,x+1)
2022 // and not (x+1,x).
2023 unsigned FrameIdxReg1 = RPI.FrameIdx;
2024 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2025 if (NeedsWinCFI && RPI.isPaired()) {
2026 std::swap(Reg1, Reg2);
2027 std::swap(FrameIdxReg1, FrameIdxReg2);
2028 }
2029
2030 if (RPI.isPaired() && RPI.isScalable()) {
2031 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2034 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2035 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2036 "Expects SVE2.1 or SME2 target and a predicate register");
2037#ifdef EXPENSIVE_CHECKS
2038 auto IsPPR = [](const RegPairInfo &c) {
2039 return c.Reg1 == RegPairInfo::PPR;
2040 };
2041 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2042 auto IsZPR = [](const RegPairInfo &c) {
2043 return c.Type == RegPairInfo::ZPR;
2044 };
2045 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2046 assert(!(PPRBegin < ZPRBegin) &&
2047 "Expected callee save predicate to be handled first");
2048#endif
2049 if (!PTrueCreated) {
2050 PTrueCreated = true;
2051 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2053 }
2054 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2055 if (!MRI.isReserved(Reg1))
2056 MBB.addLiveIn(Reg1);
2057 if (!MRI.isReserved(Reg2))
2058 MBB.addLiveIn(Reg2);
2059 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2061 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2062 MachineMemOperand::MOStore, Size, Alignment));
2063 MIB.addReg(PnReg);
2064 MIB.addReg(AArch64::SP)
2065 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2066 // where 2*vscale is implicit
2069 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2070 MachineMemOperand::MOStore, Size, Alignment));
2071 if (NeedsWinCFI)
2072 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2073 } else { // The code when the pair of ZReg is not present
2074 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2075 if (!MRI.isReserved(Reg1))
2076 MBB.addLiveIn(Reg1);
2077 if (RPI.isPaired()) {
2078 if (!MRI.isReserved(Reg2))
2079 MBB.addLiveIn(Reg2);
2080 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2082 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2083 MachineMemOperand::MOStore, Size, Alignment));
2084 }
2085 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2086 .addReg(AArch64::SP)
2087 .addImm(RPI.Offset) // [sp, #offset*vscale],
2088 // where factor*vscale is implicit
2091 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2092 MachineMemOperand::MOStore, Size, Alignment));
2093 if (NeedsWinCFI)
2094 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2095 }
2096 // Update the StackIDs of the SVE stack slots.
2097 MachineFrameInfo &MFI = MF.getFrameInfo();
2098 if (RPI.Type == RegPairInfo::ZPR) {
2099 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2100 if (RPI.isPaired())
2101 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2102 } else if (RPI.Type == RegPairInfo::PPR) {
2104 if (RPI.isPaired())
2106 }
2107 }
2108 return true;
2109}
2110
2114 MachineFunction &MF = *MBB.getParent();
2116 DebugLoc DL;
2118 bool NeedsWinCFI = needsWinCFI(MF);
2119
2120 if (MBBI != MBB.end())
2121 DL = MBBI->getDebugLoc();
2122
2123 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2124 if (homogeneousPrologEpilog(MF, &MBB)) {
2125 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2127 for (auto &RPI : RegPairs) {
2128 MIB.addReg(RPI.Reg1, RegState::Define);
2129 MIB.addReg(RPI.Reg2, RegState::Define);
2130 }
2131 return true;
2132 }
2133
2134 // For performance reasons restore SVE register in increasing order
2135 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2136 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2137 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2138 std::reverse(PPRBegin, PPREnd);
2139 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2140 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2141 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2142 std::reverse(ZPRBegin, ZPREnd);
2143
2144 bool PTrueCreated = false;
2145 for (const RegPairInfo &RPI : RegPairs) {
2146 unsigned Reg1 = RPI.Reg1;
2147 unsigned Reg2 = RPI.Reg2;
2148
2149 // Issue sequence of restores for cs regs. The last restore may be converted
2150 // to a post-increment load later by emitEpilogue if the callee-save stack
2151 // area allocation can't be combined with the local stack area allocation.
2152 // For example:
2153 // ldp fp, lr, [sp, #32] // addImm(+4)
2154 // ldp x20, x19, [sp, #16] // addImm(+2)
2155 // ldp x22, x21, [sp, #0] // addImm(+0)
2156 // Note: see comment in spillCalleeSavedRegisters()
2157 unsigned LdrOpc;
2158 unsigned Size = TRI->getSpillSize(*RPI.RC);
2159 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2160 switch (RPI.Type) {
2161 case RegPairInfo::GPR:
2162 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2163 break;
2164 case RegPairInfo::FPR64:
2165 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2166 break;
2167 case RegPairInfo::FPR128:
2168 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2169 break;
2170 case RegPairInfo::ZPR:
2171 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2172 break;
2173 case RegPairInfo::PPR:
2174 LdrOpc = AArch64::LDR_PXI;
2175 break;
2176 case RegPairInfo::VG:
2177 continue;
2178 }
2179 LLVM_DEBUG(dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2180 if (RPI.isPaired()) dbgs() << ", " << printReg(Reg2, TRI);
2181 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2182 if (RPI.isPaired()) dbgs() << ", " << RPI.FrameIdx + 1;
2183 dbgs() << ")\n");
2184
2185 // Windows unwind codes require consecutive registers if registers are
2186 // paired. Make the switch here, so that the code below will save (x,x+1)
2187 // and not (x+1,x).
2188 unsigned FrameIdxReg1 = RPI.FrameIdx;
2189 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2190 if (NeedsWinCFI && RPI.isPaired()) {
2191 std::swap(Reg1, Reg2);
2192 std::swap(FrameIdxReg1, FrameIdxReg2);
2193 }
2194
2196 if (RPI.isPaired() && RPI.isScalable()) {
2197 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2199 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2200 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2201 "Expects SVE2.1 or SME2 target and a predicate register");
2202#ifdef EXPENSIVE_CHECKS
2203 assert(!(PPRBegin < ZPRBegin) &&
2204 "Expected callee save predicate to be handled first");
2205#endif
2206 if (!PTrueCreated) {
2207 PTrueCreated = true;
2208 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2210 }
2211 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2212 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2213 getDefRegState(true));
2215 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2216 MachineMemOperand::MOLoad, Size, Alignment));
2217 MIB.addReg(PnReg);
2218 MIB.addReg(AArch64::SP)
2219 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2220 // where 2*vscale is implicit
2223 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2224 MachineMemOperand::MOLoad, Size, Alignment));
2225 if (NeedsWinCFI)
2226 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2227 } else {
2228 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2229 if (RPI.isPaired()) {
2230 MIB.addReg(Reg2, getDefRegState(true));
2232 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2233 MachineMemOperand::MOLoad, Size, Alignment));
2234 }
2235 MIB.addReg(Reg1, getDefRegState(true));
2236 MIB.addReg(AArch64::SP)
2237 .addImm(RPI.Offset) // [sp, #offset*vscale]
2238 // where factor*vscale is implicit
2241 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2242 MachineMemOperand::MOLoad, Size, Alignment));
2243 if (NeedsWinCFI)
2244 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2245 }
2246 }
2247 return true;
2248}
2249
2250// Return the FrameID for a MMO.
2251static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2252 const MachineFrameInfo &MFI) {
2253 auto *PSV =
2255 if (PSV)
2256 return std::optional<int>(PSV->getFrameIndex());
2257
2258 if (MMO->getValue()) {
2259 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2260 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2261 FI++)
2262 if (MFI.getObjectAllocation(FI) == Al)
2263 return FI;
2264 }
2265 }
2266
2267 return std::nullopt;
2268}
2269
2270// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2271static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2272 const MachineFrameInfo &MFI) {
2273 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2274 return std::nullopt;
2275
2276 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2277}
2278
2279// Returns true if the LDST MachineInstr \p MI is a PPR access.
2280static bool isPPRAccess(const MachineInstr &MI) {
2281 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2282}
2283
2284// Check if a Hazard slot is needed for the current function, and if so create
2285// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2286// which can be used to determine if any hazard padding is needed.
2287void AArch64FrameLowering::determineStackHazardSlot(
2288 MachineFunction &MF, BitVector &SavedRegs) const {
2289 unsigned StackHazardSize = getStackHazardSize(MF);
2290 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2291 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2293 return;
2294
2295 // Stack hazards are only needed in streaming functions.
2296 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2297 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2298 return;
2299
2300 MachineFrameInfo &MFI = MF.getFrameInfo();
2301
2302 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2303 // stack objects.
2304 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2305 return AArch64::FPR64RegClass.contains(Reg) ||
2306 AArch64::FPR128RegClass.contains(Reg) ||
2307 AArch64::ZPRRegClass.contains(Reg);
2308 });
2309 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2310 return AArch64::PPRRegClass.contains(Reg);
2311 });
2312 bool HasFPRStackObjects = false;
2313 bool HasPPRStackObjects = false;
2314 if (!HasFPRCSRs || SplitSVEObjects) {
2315 enum SlotType : uint8_t {
2316 Unknown = 0,
2317 ZPRorFPR = 1 << 0,
2318 PPR = 1 << 1,
2319 GPR = 1 << 2,
2321 };
2322
2323 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2324 // based on the kinds of accesses used in the function.
2325 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2326 for (auto &MBB : MF) {
2327 for (auto &MI : MBB) {
2328 std::optional<int> FI = getLdStFrameID(MI, MFI);
2329 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2330 continue;
2331 if (MFI.hasScalableStackID(*FI)) {
2332 SlotTypes[*FI] |=
2333 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2334 } else {
2335 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2336 ? SlotType::ZPRorFPR
2337 : SlotType::GPR;
2338 }
2339 }
2340 }
2341
2342 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2343 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2344 // For SplitSVEObjects remember that this stack slot is a predicate, this
2345 // will be needed later when determining the frame layout.
2346 if (SlotTypes[FI] == SlotType::PPR) {
2348 HasPPRStackObjects = true;
2349 }
2350 }
2351 }
2352
2353 if (HasFPRCSRs || HasFPRStackObjects) {
2354 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2355 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2356 << StackHazardSize << "\n");
2358 }
2359
2360 // Determine if we should use SplitSVEObjects. This should only be used if
2361 // there's a possibility of a stack hazard between PPRs and ZPRs or FPRs.
2362 if (SplitSVEObjects) {
2363 if (!HasPPRCSRs && !HasPPRStackObjects) {
2364 LLVM_DEBUG(
2365 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2366 return;
2367 }
2368
2369 if (!HasFPRCSRs && !HasFPRStackObjects) {
2370 LLVM_DEBUG(
2371 dbgs()
2372 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2373 return;
2374 }
2375
2376 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2377 if (MFI.hasVarSizedObjects() || TRI->hasStackRealignment(MF)) {
2378 LLVM_DEBUG(dbgs() << "SplitSVEObjects is not supported with variable "
2379 "sized objects or realignment\n");
2380 return;
2381 }
2382
2383 // If another calling convention is explicitly set FPRs can't be promoted to
2384 // ZPR callee-saves.
2387 MF.getFunction().getCallingConv())) {
2388 LLVM_DEBUG(
2389 dbgs() << "Calling convention is not supported with SplitSVEObjects");
2390 return;
2391 }
2392
2393 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2394 MF.getSubtarget<AArch64Subtarget>();
2396 "Expected SVE to be available for PPRs");
2397
2398 // With SplitSVEObjects the CS hazard padding is placed between the
2399 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2400 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2401 BitVector FPRZRegs(SavedRegs.size());
2402 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2403 BitVector::reference RegBit = SavedRegs[Reg];
2404 if (!RegBit)
2405 continue;
2406 unsigned SubRegIdx = 0;
2407 if (AArch64::FPR64RegClass.contains(Reg))
2408 SubRegIdx = AArch64::dsub;
2409 else if (AArch64::FPR128RegClass.contains(Reg))
2410 SubRegIdx = AArch64::zsub;
2411 else
2412 continue;
2413 // Clear the bit for the FPR save.
2414 RegBit = false;
2415 // Mark that we should save the corresponding ZPR.
2416 Register ZReg =
2417 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2418 FPRZRegs.set(ZReg);
2419 }
2420 SavedRegs |= FPRZRegs;
2421
2422 AFI->setSplitSVEObjects(true);
2423 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2424 }
2425}
2426
2428 BitVector &SavedRegs,
2429 RegScavenger *RS) const {
2430 // All calls are tail calls in GHC calling conv, and functions have no
2431 // prologue/epilogue.
2433 return;
2434
2435 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2436
2438 const AArch64RegisterInfo *RegInfo = static_cast<const AArch64RegisterInfo *>(
2439 MF.getSubtarget().getRegisterInfo());
2441 unsigned UnspilledCSGPR = AArch64::NoRegister;
2442 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2443
2444 MachineFrameInfo &MFI = MF.getFrameInfo();
2445 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2446
2447 unsigned BasePointerReg = RegInfo->hasBasePointer(MF)
2448 ? RegInfo->getBaseRegister()
2449 : (unsigned)AArch64::NoRegister;
2450
2451 unsigned ExtraCSSpill = 0;
2452 bool HasUnpairedGPR64 = false;
2453 bool HasPairZReg = false;
2454 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2455 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2456
2457 // Figure out which callee-saved registers to save/restore.
2458 for (unsigned i = 0; CSRegs[i]; ++i) {
2459 const unsigned Reg = CSRegs[i];
2460
2461 // Add the base pointer register to SavedRegs if it is callee-save.
2462 if (Reg == BasePointerReg)
2463 SavedRegs.set(Reg);
2464
2465 // Don't save manually reserved registers set through +reserve-x#i,
2466 // even for callee-saved registers, as per GCC's behavior.
2467 if (UserReservedRegs[Reg]) {
2468 SavedRegs.reset(Reg);
2469 continue;
2470 }
2471
2472 bool RegUsed = SavedRegs.test(Reg);
2473 unsigned PairedReg = AArch64::NoRegister;
2474 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2475 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2476 AArch64::FPR128RegClass.contains(Reg)) {
2477 // Compensate for odd numbers of GP CSRs.
2478 // For now, all the known cases of odd number of CSRs are of GPRs.
2479 if (HasUnpairedGPR64)
2480 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2481 else
2482 PairedReg = CSRegs[i ^ 1];
2483 }
2484
2485 // If the function requires all the GP registers to save (SavedRegs),
2486 // and there are an odd number of GP CSRs at the same time (CSRegs),
2487 // PairedReg could be in a different register class from Reg, which would
2488 // lead to a FPR (usually D8) accidentally being marked saved.
2489 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2490 PairedReg = AArch64::NoRegister;
2491 HasUnpairedGPR64 = true;
2492 }
2493 assert(PairedReg == AArch64::NoRegister ||
2494 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2495 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2496 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2497
2498 if (!RegUsed) {
2499 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2500 UnspilledCSGPR = Reg;
2501 UnspilledCSGPRPaired = PairedReg;
2502 }
2503 continue;
2504 }
2505
2506 // MachO's compact unwind format relies on all registers being stored in
2507 // pairs.
2508 // FIXME: the usual format is actually better if unwinding isn't needed.
2509 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2510 !SavedRegs.test(PairedReg)) {
2511 SavedRegs.set(PairedReg);
2512 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2513 !ReservedRegs[PairedReg])
2514 ExtraCSSpill = PairedReg;
2515 }
2516 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2517 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2518 SavedRegs.test(CSRegs[i ^ 1]));
2519 }
2520
2521 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2523 // Find a suitable predicate register for the multi-vector spill/fill
2524 // instructions.
2525 unsigned PnReg = findFreePredicateReg(SavedRegs);
2526 if (PnReg != AArch64::NoRegister)
2527 AFI->setPredicateRegForFillSpill(PnReg);
2528 // If no free callee-save has been found assign one.
2529 if (!AFI->getPredicateRegForFillSpill() &&
2530 MF.getFunction().getCallingConv() ==
2532 SavedRegs.set(AArch64::P8);
2533 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2534 }
2535
2536 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2537 "Predicate cannot be a reserved register");
2538 }
2539
2541 !Subtarget.isTargetWindows()) {
2542 // For Windows calling convention on a non-windows OS, where X18 is treated
2543 // as reserved, back up X18 when entering non-windows code (marked with the
2544 // Windows calling convention) and restore when returning regardless of
2545 // whether the individual function uses it - it might call other functions
2546 // that clobber it.
2547 SavedRegs.set(AArch64::X18);
2548 }
2549
2550 // Determine if a Hazard slot should be used and where it should go.
2551 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2552 // and ZPRs. Otherwise, it goes in the callee save area.
2553 determineStackHazardSlot(MF, SavedRegs);
2554
2555 // Calculates the callee saved stack size.
2556 unsigned CSStackSize = 0;
2557 unsigned ZPRCSStackSize = 0;
2558 unsigned PPRCSStackSize = 0;
2560 for (unsigned Reg : SavedRegs.set_bits()) {
2561 auto *RC = TRI->getMinimalPhysRegClass(Reg);
2562 assert(RC && "expected register class!");
2563 auto SpillSize = TRI->getSpillSize(*RC);
2564 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2565 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2566 if (IsZPR)
2567 ZPRCSStackSize += SpillSize;
2568 else if (IsPPR)
2569 PPRCSStackSize += SpillSize;
2570 else
2571 CSStackSize += SpillSize;
2572 }
2573
2574 // Save number of saved regs, so we can easily update CSStackSize later to
2575 // account for any additional 64-bit GPR saves. Note: After this point
2576 // only 64-bit GPRs can be added to SavedRegs.
2577 unsigned NumSavedRegs = SavedRegs.count();
2578
2579 // If we have hazard padding in the CS area add that to the size.
2581 CSStackSize += getStackHazardSize(MF);
2582
2583 // Increase the callee-saved stack size if the function has streaming mode
2584 // changes, as we will need to spill the value of the VG register.
2585 if (requiresSaveVG(MF))
2586 CSStackSize += 8;
2587
2588 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2589 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2590 SavedRegs.set(AArch64::LR);
2591
2592 // The frame record needs to be created by saving the appropriate registers
2593 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2594 if (hasFP(MF) ||
2595 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2596 SavedRegs.set(AArch64::FP);
2597 SavedRegs.set(AArch64::LR);
2598 }
2599
2600 LLVM_DEBUG({
2601 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2602 for (unsigned Reg : SavedRegs.set_bits())
2603 dbgs() << ' ' << printReg(Reg, RegInfo);
2604 dbgs() << "\n";
2605 });
2606
2607 // If any callee-saved registers are used, the frame cannot be eliminated.
2608 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2610 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2611 uint64_t SVEStackSize =
2612 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2613 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2614
2615 // The CSR spill slots have not been allocated yet, so estimateStackSize
2616 // won't include them.
2617 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2618
2619 // We may address some of the stack above the canonical frame address, either
2620 // for our own arguments or during a call. Include that in calculating whether
2621 // we have complicated addressing concerns.
2622 int64_t CalleeStackUsed = 0;
2623 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2624 int64_t FixedOff = MFI.getObjectOffset(I);
2625 if (FixedOff > CalleeStackUsed)
2626 CalleeStackUsed = FixedOff;
2627 }
2628
2629 // Conservatively always assume BigStack when there are SVE spills.
2630 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2631 CalleeStackUsed) > EstimatedStackSizeLimit;
2632 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2633 AFI->setHasStackFrame(true);
2634
2635 // Estimate if we might need to scavenge a register at some point in order
2636 // to materialize a stack offset. If so, either spill one additional
2637 // callee-saved register or reserve a special spill slot to facilitate
2638 // register scavenging. If we already spilled an extra callee-saved register
2639 // above to keep the number of spills even, we don't need to do anything else
2640 // here.
2641 if (BigStack) {
2642 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2643 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2644 << " to get a scratch register.\n");
2645 SavedRegs.set(UnspilledCSGPR);
2646 ExtraCSSpill = UnspilledCSGPR;
2647
2648 // MachO's compact unwind format relies on all registers being stored in
2649 // pairs, so if we need to spill one extra for BigStack, then we need to
2650 // store the pair.
2651 if (producePairRegisters(MF)) {
2652 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2653 // Failed to make a pair for compact unwind format, revert spilling.
2654 if (produceCompactUnwindFrame(*this, MF)) {
2655 SavedRegs.reset(UnspilledCSGPR);
2656 ExtraCSSpill = AArch64::NoRegister;
2657 }
2658 } else
2659 SavedRegs.set(UnspilledCSGPRPaired);
2660 }
2661 }
2662
2663 // If we didn't find an extra callee-saved register to spill, create
2664 // an emergency spill slot.
2665 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2667 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2668 unsigned Size = TRI->getSpillSize(RC);
2669 Align Alignment = TRI->getSpillAlign(RC);
2670 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2671 RS->addScavengingFrameIndex(FI);
2672 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2673 << " as the emergency spill slot.\n");
2674 }
2675 }
2676
2677 // Adding the size of additional 64bit GPR saves.
2678 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2679
2680 // A Swift asynchronous context extends the frame record with a pointer
2681 // directly before FP.
2682 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2683 CSStackSize += 8;
2684
2685 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2686 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2687 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2688
2690 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2691 "Should not invalidate callee saved info");
2692
2693 // Round up to register pair alignment to avoid additional SP adjustment
2694 // instructions.
2695 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2696 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2697 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2698}
2699
2701 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2702 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2703 unsigned &MaxCSFrameIndex) const {
2704 bool NeedsWinCFI = needsWinCFI(MF);
2705 unsigned StackHazardSize = getStackHazardSize(MF);
2706 // To match the canonical windows frame layout, reverse the list of
2707 // callee saved registers to get them laid out by PrologEpilogInserter
2708 // in the right order. (PrologEpilogInserter allocates stack objects top
2709 // down. Windows canonical prologs store higher numbered registers at
2710 // the top, thus have the CSI array start from the highest registers.)
2711 if (NeedsWinCFI)
2712 std::reverse(CSI.begin(), CSI.end());
2713
2714 if (CSI.empty())
2715 return true; // Early exit if no callee saved registers are modified!
2716
2717 // Now that we know which registers need to be saved and restored, allocate
2718 // stack slots for them.
2719 MachineFrameInfo &MFI = MF.getFrameInfo();
2720 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2721
2722 bool UsesWinAAPCS = isTargetWindows(MF);
2723 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2724 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2725 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2726 if ((unsigned)FrameIdx < MinCSFrameIndex)
2727 MinCSFrameIndex = FrameIdx;
2728 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2729 MaxCSFrameIndex = FrameIdx;
2730 }
2731
2732 // Insert VG into the list of CSRs, immediately before LR if saved.
2733 if (requiresSaveVG(MF)) {
2734 CalleeSavedInfo VGInfo(AArch64::VG);
2735 auto It =
2736 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2737 if (It != CSI.end())
2738 CSI.insert(It, VGInfo);
2739 else
2740 CSI.push_back(VGInfo);
2741 }
2742
2743 Register LastReg = 0;
2744 int HazardSlotIndex = std::numeric_limits<int>::max();
2745 for (auto &CS : CSI) {
2746 MCRegister Reg = CS.getReg();
2747 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2748
2749 // Create a hazard slot as we switch between GPR and FPR CSRs.
2751 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2753 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2754 "Unexpected register order for hazard slot");
2755 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2756 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2757 << "\n");
2758 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2759 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2760 MinCSFrameIndex = HazardSlotIndex;
2761 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2762 MaxCSFrameIndex = HazardSlotIndex;
2763 }
2764
2765 unsigned Size = RegInfo->getSpillSize(*RC);
2766 Align Alignment(RegInfo->getSpillAlign(*RC));
2767 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2768 CS.setFrameIdx(FrameIdx);
2769
2770 if ((unsigned)FrameIdx < MinCSFrameIndex)
2771 MinCSFrameIndex = FrameIdx;
2772 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2773 MaxCSFrameIndex = FrameIdx;
2774
2775 // Grab 8 bytes below FP for the extended asynchronous frame info.
2776 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2777 Reg == AArch64::FP) {
2778 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2779 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2780 if ((unsigned)FrameIdx < MinCSFrameIndex)
2781 MinCSFrameIndex = FrameIdx;
2782 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2783 MaxCSFrameIndex = FrameIdx;
2784 }
2785 LastReg = Reg;
2786 }
2787
2788 // Add hazard slot in the case where no FPR CSRs are present.
2790 HazardSlotIndex == std::numeric_limits<int>::max()) {
2791 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2792 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2793 << "\n");
2794 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2795 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2796 MinCSFrameIndex = HazardSlotIndex;
2797 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2798 MaxCSFrameIndex = HazardSlotIndex;
2799 }
2800
2801 return true;
2802}
2803
2805 const MachineFunction &MF) const {
2807 // If the function has streaming-mode changes, don't scavenge a
2808 // spillslot in the callee-save area, as that might require an
2809 // 'addvl' in the streaming-mode-changing call-sequence when the
2810 // function doesn't use a FP.
2811 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2812 return false;
2813 // Don't allow register salvaging with hazard slots, in case it moves objects
2814 // into the wrong place.
2815 if (AFI->hasStackHazardSlotIndex())
2816 return false;
2817 return AFI->hasCalleeSaveStackFreeSpace();
2818}
2819
2820/// returns true if there are any SVE callee saves.
2822 int &Min, int &Max) {
2823 Min = std::numeric_limits<int>::max();
2824 Max = std::numeric_limits<int>::min();
2825
2826 if (!MFI.isCalleeSavedInfoValid())
2827 return false;
2828
2829 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2830 for (auto &CS : CSI) {
2831 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2832 AArch64::PPRRegClass.contains(CS.getReg())) {
2833 assert((Max == std::numeric_limits<int>::min() ||
2834 Max + 1 == CS.getFrameIdx()) &&
2835 "SVE CalleeSaves are not consecutive");
2836 Min = std::min(Min, CS.getFrameIdx());
2837 Max = std::max(Max, CS.getFrameIdx());
2838 }
2839 }
2840 return Min != std::numeric_limits<int>::max();
2841}
2842
2844 AssignObjectOffsets AssignOffsets) {
2845 MachineFrameInfo &MFI = MF.getFrameInfo();
2846 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2847
2848 SVEStackSizes SVEStack{};
2849
2850 // With SplitSVEObjects we maintain separate stack offsets for predicates
2851 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2852 // are included in the SVE vector area.
2853 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2854 uint64_t &PPRStackTop =
2855 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2856
2857#ifndef NDEBUG
2858 // First process all fixed stack objects.
2859 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2860 assert(!MFI.hasScalableStackID(I) &&
2861 "SVE vectors should never be passed on the stack by value, only by "
2862 "reference.");
2863#endif
2864
2865 auto AllocateObject = [&](int FI) {
2867 ? ZPRStackTop
2868 : PPRStackTop;
2869
2870 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2871 // two, we'd need to align every object dynamically at runtime if the
2872 // alignment is larger than 16. This is not yet supported.
2873 Align Alignment = MFI.getObjectAlign(FI);
2874 if (Alignment > Align(16))
2876 "Alignment of scalable vectors > 16 bytes is not yet supported");
2877
2878 StackTop += MFI.getObjectSize(FI);
2879 StackTop = alignTo(StackTop, Alignment);
2880
2881 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2882 "SVE StackTop far too large?!");
2883
2884 int64_t Offset = -int64_t(StackTop);
2885 if (AssignOffsets == AssignObjectOffsets::Yes)
2886 MFI.setObjectOffset(FI, Offset);
2887
2888 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2889 };
2890
2891 // Then process all callee saved slots.
2892 int MinCSFrameIndex, MaxCSFrameIndex;
2893 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2894 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2895 AllocateObject(FI);
2896 }
2897
2898 // Ensure the CS area is 16-byte aligned.
2899 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2900 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2901
2902 // Create a buffer of SVE objects to allocate and sort it.
2903 SmallVector<int, 8> ObjectsToAllocate;
2904 // If we have a stack protector, and we've previously decided that we have SVE
2905 // objects on the stack and thus need it to go in the SVE stack area, then it
2906 // needs to go first.
2907 int StackProtectorFI = -1;
2908 if (MFI.hasStackProtectorIndex()) {
2909 StackProtectorFI = MFI.getStackProtectorIndex();
2910 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2911 ObjectsToAllocate.push_back(StackProtectorFI);
2912 }
2913
2914 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2915 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2916 continue;
2917 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2918 continue;
2919
2922 continue;
2923
2924 ObjectsToAllocate.push_back(FI);
2925 }
2926
2927 // Allocate all SVE locals and spills
2928 for (unsigned FI : ObjectsToAllocate)
2929 AllocateObject(FI);
2930
2931 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2932 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2933
2934 if (AssignOffsets == AssignObjectOffsets::Yes)
2935 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2936
2937 return SVEStack;
2938}
2939
2941 MachineFunction &MF, RegScavenger *RS) const {
2943 "Upwards growing stack unsupported");
2944
2946
2947 // If this function isn't doing Win64-style C++ EH, we don't need to do
2948 // anything.
2949 if (!MF.hasEHFunclets())
2950 return;
2951
2952 MachineFrameInfo &MFI = MF.getFrameInfo();
2953 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2954
2955 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
2956 // object area right next to the UnwindHelp object.
2957 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
2958 int64_t CurrentOffset =
2960 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
2961 for (WinEHHandlerType &H : TBME.HandlerArray) {
2962 int FrameIndex = H.CatchObj.FrameIndex;
2963 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
2964 CurrentOffset =
2965 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
2966 CurrentOffset += MFI.getObjectSize(FrameIndex);
2967 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
2968 }
2969 }
2970 }
2971
2972 // Create an UnwindHelp object.
2973 // The UnwindHelp object is allocated at the start of the fixed object area
2974 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
2975 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
2976 /*IsFunclet*/ false) &&
2977 "UnwindHelpOffset must be at the start of the fixed object area");
2978 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
2979 /*IsImmutable=*/false);
2980 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
2981
2982 MachineBasicBlock &MBB = MF.front();
2983 auto MBBI = MBB.begin();
2984 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
2985 ++MBBI;
2986
2987 // We need to store -2 into the UnwindHelp object at the start of the
2988 // function.
2989 DebugLoc DL;
2990 RS->enterBasicBlockEnd(MBB);
2991 RS->backward(MBBI);
2992 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
2993 assert(DstReg && "There must be a free register after frame setup");
2995 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
2996 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
2997 .addReg(DstReg, getKillRegState(true))
2998 .addFrameIndex(UnwindHelpFI)
2999 .addImm(0);
3000}
3001
3002namespace {
3003struct TagStoreInstr {
3005 int64_t Offset, Size;
3006 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3007 : MI(MI), Offset(Offset), Size(Size) {}
3008};
3009
3010class TagStoreEdit {
3011 MachineFunction *MF;
3012 MachineBasicBlock *MBB;
3013 MachineRegisterInfo *MRI;
3014 // Tag store instructions that are being replaced.
3016 // Combined memref arguments of the above instructions.
3018
3019 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3020 // FrameRegOffset + Size) with the address tag of SP.
3021 Register FrameReg;
3022 StackOffset FrameRegOffset;
3023 int64_t Size;
3024 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3025 // end.
3026 std::optional<int64_t> FrameRegUpdate;
3027 // MIFlags for any FrameReg updating instructions.
3028 unsigned FrameRegUpdateFlags;
3029
3030 // Use zeroing instruction variants.
3031 bool ZeroData;
3032 DebugLoc DL;
3033
3034 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3035 void emitLoop(MachineBasicBlock::iterator InsertI);
3036
3037public:
3038 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3039 : MBB(MBB), ZeroData(ZeroData) {
3040 MF = MBB->getParent();
3041 MRI = &MF->getRegInfo();
3042 }
3043 // Add an instruction to be replaced. Instructions must be added in the
3044 // ascending order of Offset, and have to be adjacent.
3045 void addInstruction(TagStoreInstr I) {
3046 assert((TagStores.empty() ||
3047 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3048 "Non-adjacent tag store instructions.");
3049 TagStores.push_back(I);
3050 }
3051 void clear() { TagStores.clear(); }
3052 // Emit equivalent code at the given location, and erase the current set of
3053 // instructions. May skip if the replacement is not profitable. May invalidate
3054 // the input iterator and replace it with a valid one.
3055 void emitCode(MachineBasicBlock::iterator &InsertI,
3056 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3057};
3058
3059void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3060 const AArch64InstrInfo *TII =
3061 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3062
3063 const int64_t kMinOffset = -256 * 16;
3064 const int64_t kMaxOffset = 255 * 16;
3065
3066 Register BaseReg = FrameReg;
3067 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3068 if (BaseRegOffsetBytes < kMinOffset ||
3069 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3070 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3071 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3072 // is required for the offset of ST2G.
3073 BaseRegOffsetBytes % 16 != 0) {
3074 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3075 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3076 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3077 BaseReg = ScratchReg;
3078 BaseRegOffsetBytes = 0;
3079 }
3080
3081 MachineInstr *LastI = nullptr;
3082 while (Size) {
3083 int64_t InstrSize = (Size > 16) ? 32 : 16;
3084 unsigned Opcode =
3085 InstrSize == 16
3086 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3087 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3088 assert(BaseRegOffsetBytes % 16 == 0);
3089 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3090 .addReg(AArch64::SP)
3091 .addReg(BaseReg)
3092 .addImm(BaseRegOffsetBytes / 16)
3093 .setMemRefs(CombinedMemRefs);
3094 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3095 // final SP adjustment in the epilogue.
3096 if (BaseRegOffsetBytes == 0)
3097 LastI = I;
3098 BaseRegOffsetBytes += InstrSize;
3099 Size -= InstrSize;
3100 }
3101
3102 if (LastI)
3103 MBB->splice(InsertI, MBB, LastI);
3104}
3105
3106void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3107 const AArch64InstrInfo *TII =
3108 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3109
3110 Register BaseReg = FrameRegUpdate
3111 ? FrameReg
3112 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3113 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3114
3115 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3116
3117 int64_t LoopSize = Size;
3118 // If the loop size is not a multiple of 32, split off one 16-byte store at
3119 // the end to fold BaseReg update into.
3120 if (FrameRegUpdate && *FrameRegUpdate)
3121 LoopSize -= LoopSize % 32;
3122 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3123 TII->get(ZeroData ? AArch64::STZGloop_wback
3124 : AArch64::STGloop_wback))
3125 .addDef(SizeReg)
3126 .addDef(BaseReg)
3127 .addImm(LoopSize)
3128 .addReg(BaseReg)
3129 .setMemRefs(CombinedMemRefs);
3130 if (FrameRegUpdate)
3131 LoopI->setFlags(FrameRegUpdateFlags);
3132
3133 int64_t ExtraBaseRegUpdate =
3134 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3135 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3136 << ", Size=" << Size
3137 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3138 << ", FrameRegUpdate=" << FrameRegUpdate
3139 << ", FrameRegOffset.getFixed()="
3140 << FrameRegOffset.getFixed() << "\n");
3141 if (LoopSize < Size) {
3142 assert(FrameRegUpdate);
3143 assert(Size - LoopSize == 16);
3144 // Tag 16 more bytes at BaseReg and update BaseReg.
3145 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3146 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3147 "STG immediate out of range");
3148 BuildMI(*MBB, InsertI, DL,
3149 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3150 .addDef(BaseReg)
3151 .addReg(BaseReg)
3152 .addReg(BaseReg)
3153 .addImm(STGOffset / 16)
3154 .setMemRefs(CombinedMemRefs)
3155 .setMIFlags(FrameRegUpdateFlags);
3156 } else if (ExtraBaseRegUpdate) {
3157 // Update BaseReg.
3158 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3159 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3160 BuildMI(
3161 *MBB, InsertI, DL,
3162 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3163 .addDef(BaseReg)
3164 .addReg(BaseReg)
3165 .addImm(AddSubOffset)
3166 .addImm(0)
3167 .setMIFlags(FrameRegUpdateFlags);
3168 }
3169}
3170
3171// Check if *II is a register update that can be merged into STGloop that ends
3172// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3173// end of the loop.
3174bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3175 int64_t Size, int64_t *TotalOffset) {
3176 MachineInstr &MI = *II;
3177 if ((MI.getOpcode() == AArch64::ADDXri ||
3178 MI.getOpcode() == AArch64::SUBXri) &&
3179 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3180 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3181 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3182 if (MI.getOpcode() == AArch64::SUBXri)
3183 Offset = -Offset;
3184 int64_t PostOffset = Offset - Size;
3185 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3186 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3187 // chosen depends on the alignment of the loop size, but the difference
3188 // between the valid ranges for the two instructions is small, so we
3189 // conservatively assume that it could be either case here.
3190 //
3191 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3192 // instruction.
3193 const int64_t kMaxOffset = 4080 - 16;
3194 // Max offset of SUBXri.
3195 const int64_t kMinOffset = -4095;
3196 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3197 PostOffset % 16 == 0) {
3198 *TotalOffset = Offset;
3199 return true;
3200 }
3201 }
3202 return false;
3203}
3204
3205void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3207 MemRefs.clear();
3208 for (auto &TS : TSE) {
3209 MachineInstr *MI = TS.MI;
3210 // An instruction without memory operands may access anything. Be
3211 // conservative and return an empty list.
3212 if (MI->memoperands_empty()) {
3213 MemRefs.clear();
3214 return;
3215 }
3216 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3217 }
3218}
3219
3220void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3221 const AArch64FrameLowering *TFI,
3222 bool TryMergeSPUpdate) {
3223 if (TagStores.empty())
3224 return;
3225 TagStoreInstr &FirstTagStore = TagStores[0];
3226 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3227 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3228 DL = TagStores[0].MI->getDebugLoc();
3229
3230 Register Reg;
3231 FrameRegOffset = TFI->resolveFrameOffsetReference(
3232 *MF, FirstTagStore.Offset, false /*isFixed*/,
3233 TargetStackID::Default /*StackID*/, Reg,
3234 /*PreferFP=*/false, /*ForSimm=*/true);
3235 FrameReg = Reg;
3236 FrameRegUpdate = std::nullopt;
3237
3238 mergeMemRefs(TagStores, CombinedMemRefs);
3239
3240 LLVM_DEBUG({
3241 dbgs() << "Replacing adjacent STG instructions:\n";
3242 for (const auto &Instr : TagStores) {
3243 dbgs() << " " << *Instr.MI;
3244 }
3245 });
3246
3247 // Size threshold where a loop becomes shorter than a linear sequence of
3248 // tagging instructions.
3249 const int kSetTagLoopThreshold = 176;
3250 if (Size < kSetTagLoopThreshold) {
3251 if (TagStores.size() < 2)
3252 return;
3253 emitUnrolled(InsertI);
3254 } else {
3255 MachineInstr *UpdateInstr = nullptr;
3256 int64_t TotalOffset = 0;
3257 if (TryMergeSPUpdate) {
3258 // See if we can merge base register update into the STGloop.
3259 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3260 // but STGloop is way too unusual for that, and also it only
3261 // realistically happens in function epilogue. Also, STGloop is expanded
3262 // before that pass.
3263 if (InsertI != MBB->end() &&
3264 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3265 &TotalOffset)) {
3266 UpdateInstr = &*InsertI++;
3267 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3268 << *UpdateInstr);
3269 }
3270 }
3271
3272 if (!UpdateInstr && TagStores.size() < 2)
3273 return;
3274
3275 if (UpdateInstr) {
3276 FrameRegUpdate = TotalOffset;
3277 FrameRegUpdateFlags = UpdateInstr->getFlags();
3278 }
3279 emitLoop(InsertI);
3280 if (UpdateInstr)
3281 UpdateInstr->eraseFromParent();
3282 }
3283
3284 for (auto &TS : TagStores)
3285 TS.MI->eraseFromParent();
3286}
3287
3288bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3289 int64_t &Size, bool &ZeroData) {
3290 MachineFunction &MF = *MI.getParent()->getParent();
3291 const MachineFrameInfo &MFI = MF.getFrameInfo();
3292
3293 unsigned Opcode = MI.getOpcode();
3294 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3295 Opcode == AArch64::STZ2Gi);
3296
3297 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3298 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3299 return false;
3300 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3301 return false;
3302 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3303 Size = MI.getOperand(2).getImm();
3304 return true;
3305 }
3306
3307 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3308 Size = 16;
3309 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3310 Size = 32;
3311 else
3312 return false;
3313
3314 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3315 return false;
3316
3317 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3318 16 * MI.getOperand(2).getImm();
3319 return true;
3320}
3321
3322// Detect a run of memory tagging instructions for adjacent stack frame slots,
3323// and replace them with a shorter instruction sequence:
3324// * replace STG + STG with ST2G
3325// * replace STGloop + STGloop with STGloop
3326// This code needs to run when stack slot offsets are already known, but before
3327// FrameIndex operands in STG instructions are eliminated.
3329 const AArch64FrameLowering *TFI,
3330 RegScavenger *RS) {
3331 bool FirstZeroData;
3332 int64_t Size, Offset;
3333 MachineInstr &MI = *II;
3334 MachineBasicBlock *MBB = MI.getParent();
3336 if (&MI == &MBB->instr_back())
3337 return II;
3338 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3339 return II;
3340
3342 Instrs.emplace_back(&MI, Offset, Size);
3343
3344 constexpr int kScanLimit = 10;
3345 int Count = 0;
3347 NextI != E && Count < kScanLimit; ++NextI) {
3348 MachineInstr &MI = *NextI;
3349 bool ZeroData;
3350 int64_t Size, Offset;
3351 // Collect instructions that update memory tags with a FrameIndex operand
3352 // and (when applicable) constant size, and whose output registers are dead
3353 // (the latter is almost always the case in practice). Since these
3354 // instructions effectively have no inputs or outputs, we are free to skip
3355 // any non-aliasing instructions in between without tracking used registers.
3356 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3357 if (ZeroData != FirstZeroData)
3358 break;
3359 Instrs.emplace_back(&MI, Offset, Size);
3360 continue;
3361 }
3362
3363 // Only count non-transient, non-tagging instructions toward the scan
3364 // limit.
3365 if (!MI.isTransient())
3366 ++Count;
3367
3368 // Just in case, stop before the epilogue code starts.
3369 if (MI.getFlag(MachineInstr::FrameSetup) ||
3371 break;
3372
3373 // Reject anything that may alias the collected instructions.
3374 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3375 break;
3376 }
3377
3378 // New code will be inserted after the last tagging instruction we've found.
3379 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3380
3381 // All the gathered stack tag instructions are merged and placed after
3382 // last tag store in the list. The check should be made if the nzcv
3383 // flag is live at the point where we are trying to insert. Otherwise
3384 // the nzcv flag might get clobbered if any stg loops are present.
3385
3386 // FIXME : This approach of bailing out from merge is conservative in
3387 // some ways like even if stg loops are not present after merge the
3388 // insert list, this liveness check is done (which is not needed).
3390 LiveRegs.addLiveOuts(*MBB);
3391 for (auto I = MBB->rbegin();; ++I) {
3392 MachineInstr &MI = *I;
3393 if (MI == InsertI)
3394 break;
3395 LiveRegs.stepBackward(*I);
3396 }
3397 InsertI++;
3398 if (LiveRegs.contains(AArch64::NZCV))
3399 return InsertI;
3400
3401 llvm::stable_sort(Instrs,
3402 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3403 return Left.Offset < Right.Offset;
3404 });
3405
3406 // Make sure that we don't have any overlapping stores.
3407 int64_t CurOffset = Instrs[0].Offset;
3408 for (auto &Instr : Instrs) {
3409 if (CurOffset > Instr.Offset)
3410 return NextI;
3411 CurOffset = Instr.Offset + Instr.Size;
3412 }
3413
3414 // Find contiguous runs of tagged memory and emit shorter instruction
3415 // sequences for them when possible.
3416 TagStoreEdit TSE(MBB, FirstZeroData);
3417 std::optional<int64_t> EndOffset;
3418 for (auto &Instr : Instrs) {
3419 if (EndOffset && *EndOffset != Instr.Offset) {
3420 // Found a gap.
3421 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3422 TSE.clear();
3423 }
3424
3425 TSE.addInstruction(Instr);
3426 EndOffset = Instr.Offset + Instr.Size;
3427 }
3428
3429 const MachineFunction *MF = MBB->getParent();
3430 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3431 TSE.emitCode(
3432 InsertI, TFI, /*TryMergeSPUpdate = */
3434
3435 return InsertI;
3436}
3437} // namespace
3438
3440 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3441 for (auto &BB : MF)
3442 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3444 II = tryMergeAdjacentSTG(II, this, RS);
3445 }
3446
3447 // By the time this method is called, most of the prologue/epilogue code is
3448 // already emitted, whether its location was affected by the shrink-wrapping
3449 // optimization or not.
3450 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3451 shouldSignReturnAddressEverywhere(MF))
3453}
3454
3455/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3456/// before the update. This is easily retrieved as it is exactly the offset
3457/// that is set in processFunctionBeforeFrameFinalized.
3459 const MachineFunction &MF, int FI, Register &FrameReg,
3460 bool IgnoreSPUpdates) const {
3461 const MachineFrameInfo &MFI = MF.getFrameInfo();
3462 if (IgnoreSPUpdates) {
3463 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3464 << MFI.getObjectOffset(FI) << "\n");
3465 FrameReg = AArch64::SP;
3466 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3467 }
3468
3469 // Go to common code if we cannot provide sp + offset.
3470 if (MFI.hasVarSizedObjects() ||
3473 return getFrameIndexReference(MF, FI, FrameReg);
3474
3475 FrameReg = AArch64::SP;
3476 return getStackOffset(MF, MFI.getObjectOffset(FI));
3477}
3478
3479/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3480/// the parent's frame pointer
3482 const MachineFunction &MF) const {
3483 return 0;
3484}
3485
3486/// Funclets only need to account for space for the callee saved registers,
3487/// as the locals are accounted for in the parent's stack frame.
3489 const MachineFunction &MF) const {
3490 // This is the size of the pushed CSRs.
3491 unsigned CSSize =
3492 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3493 // This is the amount of stack a funclet needs to allocate.
3494 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3495 getStackAlign());
3496}
3497
3498namespace {
3499struct FrameObject {
3500 bool IsValid = false;
3501 // Index of the object in MFI.
3502 int ObjectIndex = 0;
3503 // Group ID this object belongs to.
3504 int GroupIndex = -1;
3505 // This object should be placed first (closest to SP).
3506 bool ObjectFirst = false;
3507 // This object's group (which always contains the object with
3508 // ObjectFirst==true) should be placed first.
3509 bool GroupFirst = false;
3510
3511 // Used to distinguish between FP and GPR accesses. The values are decided so
3512 // that they sort FPR < Hazard < GPR and they can be or'd together.
3513 unsigned Accesses = 0;
3514 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3515};
3516
3517class GroupBuilder {
3518 SmallVector<int, 8> CurrentMembers;
3519 int NextGroupIndex = 0;
3520 std::vector<FrameObject> &Objects;
3521
3522public:
3523 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3524 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3525 void EndCurrentGroup() {
3526 if (CurrentMembers.size() > 1) {
3527 // Create a new group with the current member list. This might remove them
3528 // from their pre-existing groups. That's OK, dealing with overlapping
3529 // groups is too hard and unlikely to make a difference.
3530 LLVM_DEBUG(dbgs() << "group:");
3531 for (int Index : CurrentMembers) {
3532 Objects[Index].GroupIndex = NextGroupIndex;
3533 LLVM_DEBUG(dbgs() << " " << Index);
3534 }
3535 LLVM_DEBUG(dbgs() << "\n");
3536 NextGroupIndex++;
3537 }
3538 CurrentMembers.clear();
3539 }
3540};
3541
3542bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3543 // Objects at a lower index are closer to FP; objects at a higher index are
3544 // closer to SP.
3545 //
3546 // For consistency in our comparison, all invalid objects are placed
3547 // at the end. This also allows us to stop walking when we hit the
3548 // first invalid item after it's all sorted.
3549 //
3550 // If we want to include a stack hazard region, order FPR accesses < the
3551 // hazard object < GPRs accesses in order to create a separation between the
3552 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3553 //
3554 // Otherwise the "first" object goes first (closest to SP), followed by the
3555 // members of the "first" group.
3556 //
3557 // The rest are sorted by the group index to keep the groups together.
3558 // Higher numbered groups are more likely to be around longer (i.e. untagged
3559 // in the function epilogue and not at some earlier point). Place them closer
3560 // to SP.
3561 //
3562 // If all else equal, sort by the object index to keep the objects in the
3563 // original order.
3564 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3565 A.GroupIndex, A.ObjectIndex) <
3566 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3567 B.GroupIndex, B.ObjectIndex);
3568}
3569} // namespace
3570
3572 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3574
3575 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3576 ObjectsToAllocate.empty())
3577 return;
3578
3579 const MachineFrameInfo &MFI = MF.getFrameInfo();
3580 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3581 for (auto &Obj : ObjectsToAllocate) {
3582 FrameObjects[Obj].IsValid = true;
3583 FrameObjects[Obj].ObjectIndex = Obj;
3584 }
3585
3586 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3587 // the same time.
3588 GroupBuilder GB(FrameObjects);
3589 for (auto &MBB : MF) {
3590 for (auto &MI : MBB) {
3591 if (MI.isDebugInstr())
3592 continue;
3593
3594 if (AFI.hasStackHazardSlotIndex()) {
3595 std::optional<int> FI = getLdStFrameID(MI, MFI);
3596 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3597 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3599 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3600 else
3601 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3602 }
3603 }
3604
3605 int OpIndex;
3606 switch (MI.getOpcode()) {
3607 case AArch64::STGloop:
3608 case AArch64::STZGloop:
3609 OpIndex = 3;
3610 break;
3611 case AArch64::STGi:
3612 case AArch64::STZGi:
3613 case AArch64::ST2Gi:
3614 case AArch64::STZ2Gi:
3615 OpIndex = 1;
3616 break;
3617 default:
3618 OpIndex = -1;
3619 }
3620
3621 int TaggedFI = -1;
3622 if (OpIndex >= 0) {
3623 const MachineOperand &MO = MI.getOperand(OpIndex);
3624 if (MO.isFI()) {
3625 int FI = MO.getIndex();
3626 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3627 FrameObjects[FI].IsValid)
3628 TaggedFI = FI;
3629 }
3630 }
3631
3632 // If this is a stack tagging instruction for a slot that is not part of a
3633 // group yet, either start a new group or add it to the current one.
3634 if (TaggedFI >= 0)
3635 GB.AddMember(TaggedFI);
3636 else
3637 GB.EndCurrentGroup();
3638 }
3639 // Groups should never span multiple basic blocks.
3640 GB.EndCurrentGroup();
3641 }
3642
3643 if (AFI.hasStackHazardSlotIndex()) {
3644 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3645 FrameObject::AccessHazard;
3646 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3647 for (auto &Obj : FrameObjects)
3648 if (!Obj.Accesses ||
3649 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3650 Obj.Accesses = FrameObject::AccessGPR;
3651 }
3652
3653 // If the function's tagged base pointer is pinned to a stack slot, we want to
3654 // put that slot first when possible. This will likely place it at SP + 0,
3655 // and save one instruction when generating the base pointer because IRG does
3656 // not allow an immediate offset.
3657 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3658 if (TBPI) {
3659 FrameObjects[*TBPI].ObjectFirst = true;
3660 FrameObjects[*TBPI].GroupFirst = true;
3661 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3662 if (FirstGroupIndex >= 0)
3663 for (FrameObject &Object : FrameObjects)
3664 if (Object.GroupIndex == FirstGroupIndex)
3665 Object.GroupFirst = true;
3666 }
3667
3668 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3669
3670 int i = 0;
3671 for (auto &Obj : FrameObjects) {
3672 // All invalid items are sorted at the end, so it's safe to stop.
3673 if (!Obj.IsValid)
3674 break;
3675 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3676 }
3677
3678 LLVM_DEBUG({
3679 dbgs() << "Final frame order:\n";
3680 for (auto &Obj : FrameObjects) {
3681 if (!Obj.IsValid)
3682 break;
3683 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3684 if (Obj.ObjectFirst)
3685 dbgs() << ", first";
3686 if (Obj.GroupFirst)
3687 dbgs() << ", group-first";
3688 dbgs() << "\n";
3689 }
3690 });
3691}
3692
3693/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3694/// least every ProbeSize bytes. Returns an iterator of the first instruction
3695/// after the loop. The difference between SP and TargetReg must be an exact
3696/// multiple of ProbeSize.
3698AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3699 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3700 Register TargetReg) const {
3701 MachineBasicBlock &MBB = *MBBI->getParent();
3702 MachineFunction &MF = *MBB.getParent();
3703 const AArch64InstrInfo *TII =
3704 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3705 DebugLoc DL = MBB.findDebugLoc(MBBI);
3706
3707 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3708 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3709 MF.insert(MBBInsertPoint, LoopMBB);
3710 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3711 MF.insert(MBBInsertPoint, ExitMBB);
3712
3713 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3714 // in SUB).
3715 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3716 StackOffset::getFixed(-ProbeSize), TII,
3718 // STR XZR, [SP]
3719 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3720 .addReg(AArch64::XZR)
3721 .addReg(AArch64::SP)
3722 .addImm(0)
3724 // CMP SP, TargetReg
3725 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3726 AArch64::XZR)
3727 .addReg(AArch64::SP)
3728 .addReg(TargetReg)
3731 // B.CC Loop
3732 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3734 .addMBB(LoopMBB)
3736
3737 LoopMBB->addSuccessor(ExitMBB);
3738 LoopMBB->addSuccessor(LoopMBB);
3739 // Synthesize the exit MBB.
3740 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3742 MBB.addSuccessor(LoopMBB);
3743 // Update liveins.
3744 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3745
3746 return ExitMBB->begin();
3747}
3748
3749void AArch64FrameLowering::inlineStackProbeFixed(
3750 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3751 StackOffset CFAOffset) const {
3752 MachineBasicBlock *MBB = MBBI->getParent();
3753 MachineFunction &MF = *MBB->getParent();
3754 const AArch64InstrInfo *TII =
3755 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3756 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3757 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3758 bool HasFP = hasFP(MF);
3759
3760 DebugLoc DL;
3761 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3762 int64_t NumBlocks = FrameSize / ProbeSize;
3763 int64_t ResidualSize = FrameSize % ProbeSize;
3764
3765 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3766 << NumBlocks << " blocks of " << ProbeSize
3767 << " bytes, plus " << ResidualSize << " bytes\n");
3768
3769 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3770 // ordinary loop.
3771 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3772 for (int i = 0; i < NumBlocks; ++i) {
3773 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3774 // encodable in a SUB).
3775 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3776 StackOffset::getFixed(-ProbeSize), TII,
3777 MachineInstr::FrameSetup, false, false, nullptr,
3778 EmitAsyncCFI && !HasFP, CFAOffset);
3779 CFAOffset += StackOffset::getFixed(ProbeSize);
3780 // STR XZR, [SP]
3781 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3782 .addReg(AArch64::XZR)
3783 .addReg(AArch64::SP)
3784 .addImm(0)
3786 }
3787 } else if (NumBlocks != 0) {
3788 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3789 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3790 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3791 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3792 MachineInstr::FrameSetup, false, false, nullptr,
3793 EmitAsyncCFI && !HasFP, CFAOffset);
3794 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3795 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3796 MBB = MBBI->getParent();
3797 if (EmitAsyncCFI && !HasFP) {
3798 // Set the CFA register back to SP.
3799 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3800 .buildDefCFARegister(AArch64::SP);
3801 }
3802 }
3803
3804 if (ResidualSize != 0) {
3805 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3806 // in SUB).
3807 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3808 StackOffset::getFixed(-ResidualSize), TII,
3809 MachineInstr::FrameSetup, false, false, nullptr,
3810 EmitAsyncCFI && !HasFP, CFAOffset);
3811 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3812 // STR XZR, [SP]
3813 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3814 .addReg(AArch64::XZR)
3815 .addReg(AArch64::SP)
3816 .addImm(0)
3818 }
3819 }
3820}
3821
3822void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3823 MachineBasicBlock &MBB) const {
3824 // Get the instructions that need to be replaced. We emit at most two of
3825 // these. Remember them in order to avoid complications coming from the need
3826 // to traverse the block while potentially creating more blocks.
3827 SmallVector<MachineInstr *, 4> ToReplace;
3828 for (MachineInstr &MI : MBB)
3829 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3830 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3831 ToReplace.push_back(&MI);
3832
3833 for (MachineInstr *MI : ToReplace) {
3834 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3835 Register ScratchReg = MI->getOperand(0).getReg();
3836 int64_t FrameSize = MI->getOperand(1).getImm();
3837 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3838 MI->getOperand(3).getImm());
3839 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3840 CFAOffset);
3841 } else {
3842 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3843 "Stack probe pseudo-instruction expected");
3844 const AArch64InstrInfo *TII =
3845 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3846 Register TargetReg = MI->getOperand(0).getReg();
3847 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3848 }
3849 MI->eraseFromParent();
3850 }
3851}
3852
3855 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3856 GPR = 1 << 0, // A general purpose register.
3857 PPR = 1 << 1, // A predicate register.
3858 FPR = 1 << 2, // A floating point/Neon/SVE register.
3859 };
3860
3861 int Idx;
3863 int64_t Size;
3864 unsigned AccessTypes;
3865
3867
3868 bool operator<(const StackAccess &Rhs) const {
3869 return std::make_tuple(start(), Idx) <
3870 std::make_tuple(Rhs.start(), Rhs.Idx);
3871 }
3872
3873 bool isCPU() const {
3874 // Predicate register load and store instructions execute on the CPU.
3876 }
3877 bool isSME() const { return AccessTypes & AccessType::FPR; }
3878 bool isMixed() const { return isCPU() && isSME(); }
3879
3880 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3881 int64_t end() const { return start() + Size; }
3882
3883 std::string getTypeString() const {
3884 switch (AccessTypes) {
3885 case AccessType::FPR:
3886 return "FPR";
3887 case AccessType::PPR:
3888 return "PPR";
3889 case AccessType::GPR:
3890 return "GPR";
3892 return "NA";
3893 default:
3894 return "Mixed";
3895 }
3896 }
3897
3898 void print(raw_ostream &OS) const {
3899 OS << getTypeString() << " stack object at [SP"
3900 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3901 if (Offset.getScalable())
3902 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3903 << " * vscale";
3904 OS << "]";
3905 }
3906};
3907
3908static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3909 SA.print(OS);
3910 return OS;
3911}
3912
3913void AArch64FrameLowering::emitRemarks(
3914 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3915
3916 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3918 return;
3919
3920 unsigned StackHazardSize = getStackHazardSize(MF);
3921 const uint64_t HazardSize =
3922 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3923
3924 if (HazardSize == 0)
3925 return;
3926
3927 const MachineFrameInfo &MFI = MF.getFrameInfo();
3928 // Bail if function has no stack objects.
3929 if (!MFI.hasStackObjects())
3930 return;
3931
3932 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3933
3934 size_t NumFPLdSt = 0;
3935 size_t NumNonFPLdSt = 0;
3936
3937 // Collect stack accesses via Load/Store instructions.
3938 for (const MachineBasicBlock &MBB : MF) {
3939 for (const MachineInstr &MI : MBB) {
3940 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3941 continue;
3942 for (MachineMemOperand *MMO : MI.memoperands()) {
3943 std::optional<int> FI = getMMOFrameID(MMO, MFI);
3944 if (FI && !MFI.isDeadObjectIndex(*FI)) {
3945 int FrameIdx = *FI;
3946
3947 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
3948 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
3949 StackAccesses[ArrIdx].Idx = FrameIdx;
3950 StackAccesses[ArrIdx].Offset =
3951 getFrameIndexReferenceFromSP(MF, FrameIdx);
3952 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
3953 }
3954
3955 unsigned RegTy = StackAccess::AccessType::GPR;
3956 if (MFI.hasScalableStackID(FrameIdx))
3959 RegTy = StackAccess::FPR;
3960
3961 StackAccesses[ArrIdx].AccessTypes |= RegTy;
3962
3963 if (RegTy == StackAccess::FPR)
3964 ++NumFPLdSt;
3965 else
3966 ++NumNonFPLdSt;
3967 }
3968 }
3969 }
3970 }
3971
3972 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
3973 return;
3974
3975 llvm::sort(StackAccesses);
3976 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
3978 });
3979
3982
3983 if (StackAccesses.front().isMixed())
3984 MixedObjects.push_back(&StackAccesses.front());
3985
3986 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
3987 It != End; ++It) {
3988 const auto &First = *It;
3989 const auto &Second = *(It + 1);
3990
3991 if (Second.isMixed())
3992 MixedObjects.push_back(&Second);
3993
3994 if ((First.isSME() && Second.isCPU()) ||
3995 (First.isCPU() && Second.isSME())) {
3996 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
3997 if (Distance < HazardSize)
3998 HazardPairs.emplace_back(&First, &Second);
3999 }
4000 }
4001
4002 auto EmitRemark = [&](llvm::StringRef Str) {
4003 ORE->emit([&]() {
4004 auto R = MachineOptimizationRemarkAnalysis(
4005 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4006 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4007 });
4008 };
4009
4010 for (const auto &P : HazardPairs)
4011 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4012
4013 for (const auto *Obj : MixedObjects)
4014 EmitRemark(
4015 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4016}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
unsigned findFreePredicateReg(BitVector &SavedRegs)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:303
Wrapper class representing virtual and physical registers.
Definition Register.h:19
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:338
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2038
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:644
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:754
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1712
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1624
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1738
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2100
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1877
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray