LLVM 22.0.0git
X86Disassembler.cpp
Go to the documentation of this file.
1//===-- X86Disassembler.cpp - Disassembler for x86 and x86_64 -------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file is part of the X86 Disassembler.
10// It contains code to translate the data produced by the decoder into
11// MCInsts.
12//
13//
14// The X86 disassembler is a table-driven disassembler for the 16-, 32-, and
15// 64-bit X86 instruction sets. The main decode sequence for an assembly
16// instruction in this disassembler is:
17//
18// 1. Read the prefix bytes and determine the attributes of the instruction.
19// These attributes, recorded in enum attributeBits
20// (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM
21// provides a mapping from bitmasks to contexts, which are represented by
22// enum InstructionContext (ibid.).
23//
24// 2. Read the opcode, and determine what kind of opcode it is. The
25// disassembler distinguishes four kinds of opcodes, which are enumerated in
26// OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte
27// (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a
28// (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context.
29//
30// 3. Depending on the opcode type, look in one of four ClassDecision structures
31// (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which
32// OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get
33// a ModRMDecision (ibid.).
34//
35// 4. Some instructions, such as escape opcodes or extended opcodes, or even
36// instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the
37// ModR/M byte to complete decode. The ModRMDecision's type is an entry from
38// ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the
39// ModR/M byte is required and how to interpret it.
40//
41// 5. After resolving the ModRMDecision, the disassembler has a unique ID
42// of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in
43// INSTRUCTIONS_SYM yields the name of the instruction and the encodings and
44// meanings of its operands.
45//
46// 6. For each operand, its encoding is an entry from OperandEncoding
47// (X86DisassemblerDecoderCommon.h) and its type is an entry from
48// OperandType (ibid.). The encoding indicates how to read it from the
49// instruction; the type indicates how to interpret the value once it has
50// been read. For example, a register operand could be stored in the R/M
51// field of the ModR/M byte, the REG field of the ModR/M byte, or added to
52// the main opcode. This is orthogonal from its meaning (an GPR or an XMM
53// register, for instance). Given this information, the operands can be
54// extracted and interpreted.
55//
56// 7. As the last step, the disassembler translates the instruction information
57// and operands into a format understandable by the client - in this case, an
58// MCInst for use by the MC infrastructure.
59//
60// The disassembler is broken broadly into two parts: the table emitter that
61// emits the instruction decode tables discussed above during compilation, and
62// the disassembler itself. The table emitter is documented in more detail in
63// utils/TableGen/X86DisassemblerEmitter.h.
64//
65// X86Disassembler.cpp contains the code responsible for step 7, and for
66// invoking the decoder to execute steps 1-6.
67// X86DisassemblerDecoderCommon.h contains the definitions needed by both the
68// table emitter and the disassembler.
69// X86DisassemblerDecoder.h contains the public interface of the decoder,
70// factored out into C for possible use by other projects.
71// X86DisassemblerDecoder.c contains the source code of the decoder, which is
72// responsible for steps 1-6.
73//
74//===----------------------------------------------------------------------===//
75
80#include "llvm-c/Visibility.h"
81#include "llvm/MC/MCContext.h"
83#include "llvm/MC/MCExpr.h"
84#include "llvm/MC/MCInst.h"
85#include "llvm/MC/MCInstrInfo.h"
88#include "llvm/Support/Debug.h"
89#include "llvm/Support/Format.h"
91
92using namespace llvm;
93using namespace llvm::X86Disassembler;
94
95#define DEBUG_TYPE "x86-disassembler"
96
97#define debug(s) LLVM_DEBUG(dbgs() << __LINE__ << ": " << s);
98
99// Specifies whether a ModR/M byte is needed and (if so) which
100// instruction each possible value of the ModR/M byte corresponds to. Once
101// this information is known, we have narrowed down to a single instruction.
106
107// Specifies which set of ModR/M->instruction tables to look at
108// given a particular opcode.
112
113// Specifies which opcode->instruction tables to look at given
114// a particular context (set of attributes). Since there are many possible
115// contexts, the decoder first uses CONTEXTS_SYM to determine which context
116// applies given a specific set of attributes. Hence there are only IC_max
117// entries in this table, rather than 2^(ATTR_max).
121
122#include "X86GenDisassemblerTables.inc"
123
125 uint8_t opcode, uint8_t modRM) {
126 const struct ModRMDecision *dec;
127
128 switch (type) {
129 case ONEBYTE:
130 dec = &ONEBYTE_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
131 break;
132 case TWOBYTE:
133 dec = &TWOBYTE_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
134 break;
135 case THREEBYTE_38:
136 dec = &THREEBYTE38_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
137 break;
138 case THREEBYTE_3A:
139 dec = &THREEBYTE3A_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
140 break;
141 case XOP8_MAP:
142 dec = &XOP8_MAP_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
143 break;
144 case XOP9_MAP:
145 dec = &XOP9_MAP_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
146 break;
147 case XOPA_MAP:
148 dec = &XOPA_MAP_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
149 break;
150 case THREEDNOW_MAP:
151 dec =
152 &THREEDNOW_MAP_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
153 break;
154 case MAP4:
155 dec = &MAP4_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
156 break;
157 case MAP5:
158 dec = &MAP5_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
159 break;
160 case MAP6:
161 dec = &MAP6_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
162 break;
163 case MAP7:
164 dec = &MAP7_SYM.opcodeDecisions[insnContext].modRMDecisions[opcode];
165 break;
166 }
167
168 switch (dec->modrm_type) {
169 default:
170 llvm_unreachable("Corrupt table! Unknown modrm_type");
171 return 0;
172 case MODRM_ONEENTRY:
173 return modRMTable[dec->instructionIDs];
174 case MODRM_SPLITRM:
175 if (modFromModRM(modRM) == 0x3)
176 return modRMTable[dec->instructionIDs + 1];
177 return modRMTable[dec->instructionIDs];
178 case MODRM_SPLITREG:
179 if (modFromModRM(modRM) == 0x3)
180 return modRMTable[dec->instructionIDs + ((modRM & 0x38) >> 3) + 8];
181 return modRMTable[dec->instructionIDs + ((modRM & 0x38) >> 3)];
182 case MODRM_SPLITMISC:
183 if (modFromModRM(modRM) == 0x3)
184 return modRMTable[dec->instructionIDs + (modRM & 0x3f) + 8];
185 return modRMTable[dec->instructionIDs + ((modRM & 0x38) >> 3)];
186 case MODRM_FULL:
187 return modRMTable[dec->instructionIDs + modRM];
188 }
189}
190
191static bool peek(struct InternalInstruction *insn, uint8_t &byte) {
192 uint64_t offset = insn->readerCursor - insn->startLocation;
193 if (offset >= insn->bytes.size())
194 return true;
195 byte = insn->bytes[offset];
196 return false;
197}
198
199template <typename T> static bool consume(InternalInstruction *insn, T &ptr) {
200 auto r = insn->bytes;
201 uint64_t offset = insn->readerCursor - insn->startLocation;
202 if (offset + sizeof(T) > r.size())
203 return true;
205 insn->readerCursor += sizeof(T);
206 return false;
207}
208
209static bool isREX(struct InternalInstruction *insn, uint8_t prefix) {
210 return insn->mode == MODE_64BIT && prefix >= 0x40 && prefix <= 0x4f;
211}
212
213static bool isREX2(struct InternalInstruction *insn, uint8_t prefix) {
214 return insn->mode == MODE_64BIT && prefix == 0xd5;
215}
216
217// Consumes all of an instruction's prefix bytes, and marks the
218// instruction as having them. Also sets the instruction's default operand,
219// address, and other relevant data sizes to report operands correctly.
220//
221// insn must not be empty.
222static int readPrefixes(struct InternalInstruction *insn) {
223 bool isPrefix = true;
224 uint8_t byte = 0;
226
227 LLVM_DEBUG(dbgs() << "readPrefixes()");
228
229 while (isPrefix) {
230 // If we fail reading prefixes, just stop here and let the opcode reader
231 // deal with it.
232 if (consume(insn, byte))
233 break;
234
235 // If the byte is a LOCK/REP/REPNE prefix and not a part of the opcode, then
236 // break and let it be disassembled as a normal "instruction".
237 if (insn->readerCursor - 1 == insn->startLocation && byte == 0xf0) // LOCK
238 break;
239
240 if ((byte == 0xf2 || byte == 0xf3) && !peek(insn, nextByte)) {
241 // If the byte is 0xf2 or 0xf3, and any of the following conditions are
242 // met:
243 // - it is followed by a LOCK (0xf0) prefix
244 // - it is followed by an xchg instruction
245 // then it should be disassembled as a xacquire/xrelease not repne/rep.
246 if (((nextByte == 0xf0) ||
247 ((nextByte & 0xfe) == 0x86 || (nextByte & 0xf8) == 0x90))) {
248 insn->xAcquireRelease = true;
249 if (!(byte == 0xf3 && nextByte == 0x90)) // PAUSE instruction support
250 break;
251 }
252 // Also if the byte is 0xf3, and the following condition is met:
253 // - it is followed by a "mov mem, reg" (opcode 0x88/0x89) or
254 // "mov mem, imm" (opcode 0xc6/0xc7) instructions.
255 // then it should be disassembled as an xrelease not rep.
256 if (byte == 0xf3 && (nextByte == 0x88 || nextByte == 0x89 ||
257 nextByte == 0xc6 || nextByte == 0xc7)) {
258 insn->xAcquireRelease = true;
259 break;
260 }
261 if (isREX(insn, nextByte)) {
262 uint8_t nnextByte;
263 // Go to REX prefix after the current one
264 if (consume(insn, nnextByte))
265 return -1;
266 // We should be able to read next byte after REX prefix
267 if (peek(insn, nnextByte))
268 return -1;
269 --insn->readerCursor;
270 }
271 }
272
273 switch (byte) {
274 case 0xf0: // LOCK
275 insn->hasLockPrefix = true;
276 break;
277 case 0xf2: // REPNE/REPNZ
278 case 0xf3: { // REP or REPE/REPZ
280 if (peek(insn, nextByte))
281 break;
282 // TODO:
283 // 1. There could be several 0x66
284 // 2. if (nextByte == 0x66) and nextNextByte != 0x0f then
285 // it's not mandatory prefix
286 // 3. if (nextByte >= 0x40 && nextByte <= 0x4f) it's REX and we need
287 // 0x0f exactly after it to be mandatory prefix
288 // 4. if (nextByte == 0xd5) it's REX2 and we need
289 // 0x0f exactly after it to be mandatory prefix
290 if (isREX(insn, nextByte) || isREX2(insn, nextByte) || nextByte == 0x0f ||
291 nextByte == 0x66)
292 // The last of 0xf2 /0xf3 is mandatory prefix
293 insn->mandatoryPrefix = byte;
294 insn->repeatPrefix = byte;
295 break;
296 }
297 case 0x2e: // CS segment override -OR- Branch not taken
299 break;
300 case 0x36: // SS segment override -OR- Branch taken
302 break;
303 case 0x3e: // DS segment override
305 break;
306 case 0x26: // ES segment override
308 break;
309 case 0x64: // FS segment override
311 break;
312 case 0x65: // GS segment override
314 break;
315 case 0x66: { // Operand-size override {
317 insn->hasOpSize = true;
318 if (peek(insn, nextByte))
319 break;
320 // 0x66 can't overwrite existing mandatory prefix and should be ignored
321 if (!insn->mandatoryPrefix && (nextByte == 0x0f || isREX(insn, nextByte)))
322 insn->mandatoryPrefix = byte;
323 break;
324 }
325 case 0x67: // Address-size override
326 insn->hasAdSize = true;
327 break;
328 default: // Not a prefix byte
329 isPrefix = false;
330 break;
331 }
332
333 if (isREX(insn, byte)) {
334 insn->rexPrefix = byte;
335 isPrefix = true;
336 LLVM_DEBUG(dbgs() << format("Found REX prefix 0x%hhx", byte));
337 } else if (isPrefix) {
338 insn->rexPrefix = 0;
339 }
340
341 if (isPrefix)
342 LLVM_DEBUG(dbgs() << format("Found prefix 0x%hhx", byte));
343 }
344
346
347 if (byte == 0x62) {
348 uint8_t byte1, byte2;
349 if (consume(insn, byte1)) {
350 LLVM_DEBUG(dbgs() << "Couldn't read second byte of EVEX prefix");
351 return -1;
352 }
353
354 if (peek(insn, byte2)) {
355 LLVM_DEBUG(dbgs() << "Couldn't read third byte of EVEX prefix");
356 return -1;
357 }
358
359 if ((insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)) {
361 } else {
362 --insn->readerCursor; // unconsume byte1
363 --insn->readerCursor; // unconsume byte
364 }
365
366 if (insn->vectorExtensionType == TYPE_EVEX) {
367 insn->vectorExtensionPrefix[0] = byte;
368 insn->vectorExtensionPrefix[1] = byte1;
369 if (consume(insn, insn->vectorExtensionPrefix[2])) {
370 LLVM_DEBUG(dbgs() << "Couldn't read third byte of EVEX prefix");
371 return -1;
372 }
373 if (consume(insn, insn->vectorExtensionPrefix[3])) {
374 LLVM_DEBUG(dbgs() << "Couldn't read fourth byte of EVEX prefix");
375 return -1;
376 }
377
378 if (insn->mode == MODE_64BIT) {
379 // We simulate the REX prefix for simplicity's sake
380 insn->rexPrefix = 0x40 |
381 (wFromEVEX3of4(insn->vectorExtensionPrefix[2]) << 3) |
382 (rFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 2) |
383 (xFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 1) |
384 (bFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 0);
385
386 // We simulate the REX2 prefix for simplicity's sake
387 insn->rex2ExtensionPrefix[1] =
388 (r2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 6) |
389 (uFromEVEX3of4(insn->vectorExtensionPrefix[2]) << 5) |
390 (b2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4);
391 }
392
394 dbgs() << format(
395 "Found EVEX prefix 0x%hhx 0x%hhx 0x%hhx 0x%hhx",
397 insn->vectorExtensionPrefix[2], insn->vectorExtensionPrefix[3]));
398 }
399 } else if (byte == 0xc4) {
400 uint8_t byte1;
401 if (peek(insn, byte1)) {
402 LLVM_DEBUG(dbgs() << "Couldn't read second byte of VEX");
403 return -1;
404 }
405
406 if (insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)
408 else
409 --insn->readerCursor;
410
411 if (insn->vectorExtensionType == TYPE_VEX_3B) {
412 insn->vectorExtensionPrefix[0] = byte;
413 consume(insn, insn->vectorExtensionPrefix[1]);
414 consume(insn, insn->vectorExtensionPrefix[2]);
415
416 // We simulate the REX prefix for simplicity's sake
417
418 if (insn->mode == MODE_64BIT)
419 insn->rexPrefix = 0x40 |
420 (wFromVEX3of3(insn->vectorExtensionPrefix[2]) << 3) |
421 (rFromVEX2of3(insn->vectorExtensionPrefix[1]) << 2) |
422 (xFromVEX2of3(insn->vectorExtensionPrefix[1]) << 1) |
423 (bFromVEX2of3(insn->vectorExtensionPrefix[1]) << 0);
424
425 LLVM_DEBUG(dbgs() << format("Found VEX prefix 0x%hhx 0x%hhx 0x%hhx",
426 insn->vectorExtensionPrefix[0],
427 insn->vectorExtensionPrefix[1],
428 insn->vectorExtensionPrefix[2]));
429 }
430 } else if (byte == 0xc5) {
431 uint8_t byte1;
432 if (peek(insn, byte1)) {
433 LLVM_DEBUG(dbgs() << "Couldn't read second byte of VEX");
434 return -1;
435 }
436
437 if (insn->mode == MODE_64BIT || (byte1 & 0xc0) == 0xc0)
439 else
440 --insn->readerCursor;
441
442 if (insn->vectorExtensionType == TYPE_VEX_2B) {
443 insn->vectorExtensionPrefix[0] = byte;
444 consume(insn, insn->vectorExtensionPrefix[1]);
445
446 if (insn->mode == MODE_64BIT)
447 insn->rexPrefix =
448 0x40 | (rFromVEX2of2(insn->vectorExtensionPrefix[1]) << 2);
449
450 switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {
451 default:
452 break;
453 case VEX_PREFIX_66:
454 insn->hasOpSize = true;
455 break;
456 }
457
458 LLVM_DEBUG(dbgs() << format("Found VEX prefix 0x%hhx 0x%hhx",
459 insn->vectorExtensionPrefix[0],
460 insn->vectorExtensionPrefix[1]));
461 }
462 } else if (byte == 0x8f) {
463 uint8_t byte1;
464 if (peek(insn, byte1)) {
465 LLVM_DEBUG(dbgs() << "Couldn't read second byte of XOP");
466 return -1;
467 }
468
469 if ((byte1 & 0x38) != 0x0) // 0 in these 3 bits is a POP instruction.
471 else
472 --insn->readerCursor;
473
474 if (insn->vectorExtensionType == TYPE_XOP) {
475 insn->vectorExtensionPrefix[0] = byte;
476 consume(insn, insn->vectorExtensionPrefix[1]);
477 consume(insn, insn->vectorExtensionPrefix[2]);
478
479 // We simulate the REX prefix for simplicity's sake
480
481 if (insn->mode == MODE_64BIT)
482 insn->rexPrefix = 0x40 |
483 (wFromXOP3of3(insn->vectorExtensionPrefix[2]) << 3) |
484 (rFromXOP2of3(insn->vectorExtensionPrefix[1]) << 2) |
485 (xFromXOP2of3(insn->vectorExtensionPrefix[1]) << 1) |
486 (bFromXOP2of3(insn->vectorExtensionPrefix[1]) << 0);
487
488 switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {
489 default:
490 break;
491 case VEX_PREFIX_66:
492 insn->hasOpSize = true;
493 break;
494 }
495
496 LLVM_DEBUG(dbgs() << format("Found XOP prefix 0x%hhx 0x%hhx 0x%hhx",
497 insn->vectorExtensionPrefix[0],
498 insn->vectorExtensionPrefix[1],
499 insn->vectorExtensionPrefix[2]));
500 }
501 } else if (isREX2(insn, byte)) {
502 uint8_t byte1;
503 if (peek(insn, byte1)) {
504 LLVM_DEBUG(dbgs() << "Couldn't read second byte of REX2");
505 return -1;
506 }
507 insn->rex2ExtensionPrefix[0] = byte;
508 consume(insn, insn->rex2ExtensionPrefix[1]);
509
510 // We simulate the REX prefix for simplicity's sake
511 insn->rexPrefix = 0x40 | (wFromREX2(insn->rex2ExtensionPrefix[1]) << 3) |
512 (rFromREX2(insn->rex2ExtensionPrefix[1]) << 2) |
513 (xFromREX2(insn->rex2ExtensionPrefix[1]) << 1) |
514 (bFromREX2(insn->rex2ExtensionPrefix[1]) << 0);
515 LLVM_DEBUG(dbgs() << format("Found REX2 prefix 0x%hhx 0x%hhx",
516 insn->rex2ExtensionPrefix[0],
517 insn->rex2ExtensionPrefix[1]));
518 } else
519 --insn->readerCursor;
520
521 if (insn->mode == MODE_16BIT) {
522 insn->registerSize = (insn->hasOpSize ? 4 : 2);
523 insn->addressSize = (insn->hasAdSize ? 4 : 2);
524 insn->displacementSize = (insn->hasAdSize ? 4 : 2);
525 insn->immediateSize = (insn->hasOpSize ? 4 : 2);
526 } else if (insn->mode == MODE_32BIT) {
527 insn->registerSize = (insn->hasOpSize ? 2 : 4);
528 insn->addressSize = (insn->hasAdSize ? 2 : 4);
529 insn->displacementSize = (insn->hasAdSize ? 2 : 4);
530 insn->immediateSize = (insn->hasOpSize ? 2 : 4);
531 } else if (insn->mode == MODE_64BIT) {
532 insn->displacementSize = 4;
533 if (insn->rexPrefix && wFromREX(insn->rexPrefix)) {
534 insn->registerSize = 8;
535 insn->addressSize = (insn->hasAdSize ? 4 : 8);
536 insn->immediateSize = 4;
537 insn->hasOpSize = false;
538 } else {
539 insn->registerSize = (insn->hasOpSize ? 2 : 4);
540 insn->addressSize = (insn->hasAdSize ? 4 : 8);
541 insn->immediateSize = (insn->hasOpSize ? 2 : 4);
542 }
543 }
544
545 return 0;
546}
547
548// Consumes the SIB byte to determine addressing information.
549static int readSIB(struct InternalInstruction *insn) {
550 SIBBase sibBaseBase = SIB_BASE_NONE;
551 uint8_t index, base;
552
553 LLVM_DEBUG(dbgs() << "readSIB()");
554 switch (insn->addressSize) {
555 case 2:
556 default:
557 llvm_unreachable("SIB-based addressing doesn't work in 16-bit mode");
558 case 4:
559 insn->sibIndexBase = SIB_INDEX_EAX;
560 sibBaseBase = SIB_BASE_EAX;
561 break;
562 case 8:
563 insn->sibIndexBase = SIB_INDEX_RAX;
564 sibBaseBase = SIB_BASE_RAX;
565 break;
566 }
567
568 if (consume(insn, insn->sib))
569 return -1;
570
571 index = indexFromSIB(insn->sib) | (xFromREX(insn->rexPrefix) << 3) |
572 (x2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
573
574 if (index == 0x4) {
575 insn->sibIndex = SIB_INDEX_NONE;
576 } else {
577 insn->sibIndex = (SIBIndex)(insn->sibIndexBase + index);
578 }
579
580 insn->sibScale = 1 << scaleFromSIB(insn->sib);
581
582 base = baseFromSIB(insn->sib) | (bFromREX(insn->rexPrefix) << 3) |
583 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
584
585 switch (base) {
586 case 0x5:
587 case 0xd:
588 switch (modFromModRM(insn->modRM)) {
589 case 0x0:
591 insn->sibBase = SIB_BASE_NONE;
592 break;
593 case 0x1:
595 insn->sibBase = (SIBBase)(sibBaseBase + base);
596 break;
597 case 0x2:
599 insn->sibBase = (SIBBase)(sibBaseBase + base);
600 break;
601 default:
602 llvm_unreachable("Cannot have Mod = 0b11 and a SIB byte");
603 }
604 break;
605 default:
606 insn->sibBase = (SIBBase)(sibBaseBase + base);
607 break;
608 }
609
610 return 0;
611}
612
613static int readDisplacement(struct InternalInstruction *insn) {
614 int8_t d8;
615 int16_t d16;
616 int32_t d32;
617 LLVM_DEBUG(dbgs() << "readDisplacement()");
618
619 insn->displacementOffset = insn->readerCursor - insn->startLocation;
620 switch (insn->eaDisplacement) {
621 case EA_DISP_NONE:
622 break;
623 case EA_DISP_8:
624 if (consume(insn, d8))
625 return -1;
626 insn->displacement = d8;
627 break;
628 case EA_DISP_16:
629 if (consume(insn, d16))
630 return -1;
631 insn->displacement = d16;
632 break;
633 case EA_DISP_32:
634 if (consume(insn, d32))
635 return -1;
636 insn->displacement = d32;
637 break;
638 }
639
640 return 0;
641}
642
643// Consumes all addressing information (ModR/M byte, SIB byte, and displacement.
644static int readModRM(struct InternalInstruction *insn) {
645 uint8_t mod, rm, reg;
646 LLVM_DEBUG(dbgs() << "readModRM()");
647
648 if (insn->consumedModRM)
649 return 0;
650
651 if (consume(insn, insn->modRM))
652 return -1;
653 insn->consumedModRM = true;
654
655 mod = modFromModRM(insn->modRM);
656 rm = rmFromModRM(insn->modRM);
657 reg = regFromModRM(insn->modRM);
658
659 // This goes by insn->registerSize to pick the correct register, which messes
660 // up if we're using (say) XMM or 8-bit register operands. That gets fixed in
661 // fixupReg().
662 switch (insn->registerSize) {
663 case 2:
664 insn->regBase = MODRM_REG_AX;
665 insn->eaRegBase = EA_REG_AX;
666 break;
667 case 4:
668 insn->regBase = MODRM_REG_EAX;
669 insn->eaRegBase = EA_REG_EAX;
670 break;
671 case 8:
672 insn->regBase = MODRM_REG_RAX;
673 insn->eaRegBase = EA_REG_RAX;
674 break;
675 }
676
677 reg |= (rFromREX(insn->rexPrefix) << 3) |
678 (r2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
679 rm |= (bFromREX(insn->rexPrefix) << 3) |
680 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4);
681
682 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT)
683 reg |= r2FromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4;
684
685 insn->reg = (Reg)(insn->regBase + reg);
686
687 switch (insn->addressSize) {
688 case 2: {
689 EABase eaBaseBase = EA_BASE_BX_SI;
690
691 switch (mod) {
692 case 0x0:
693 if (rm == 0x6) {
694 insn->eaBase = EA_BASE_NONE;
696 if (readDisplacement(insn))
697 return -1;
698 } else {
699 insn->eaBase = (EABase)(eaBaseBase + rm);
701 }
702 break;
703 case 0x1:
704 insn->eaBase = (EABase)(eaBaseBase + rm);
706 insn->displacementSize = 1;
707 if (readDisplacement(insn))
708 return -1;
709 break;
710 case 0x2:
711 insn->eaBase = (EABase)(eaBaseBase + rm);
713 if (readDisplacement(insn))
714 return -1;
715 break;
716 case 0x3:
717 insn->eaBase = (EABase)(insn->eaRegBase + rm);
718 if (readDisplacement(insn))
719 return -1;
720 break;
721 }
722 break;
723 }
724 case 4:
725 case 8: {
726 EABase eaBaseBase = (insn->addressSize == 4 ? EA_BASE_EAX : EA_BASE_RAX);
727
728 switch (mod) {
729 case 0x0:
730 insn->eaDisplacement = EA_DISP_NONE; // readSIB may override this
731 // In determining whether RIP-relative mode is used (rm=5),
732 // or whether a SIB byte is present (rm=4),
733 // the extension bits (REX.b and EVEX.x) are ignored.
734 switch (rm & 7) {
735 case 0x4: // SIB byte is present
736 insn->eaBase = (insn->addressSize == 4 ? EA_BASE_sib : EA_BASE_sib64);
737 if (readSIB(insn) || readDisplacement(insn))
738 return -1;
739 break;
740 case 0x5: // RIP-relative
741 insn->eaBase = EA_BASE_NONE;
743 if (readDisplacement(insn))
744 return -1;
745 break;
746 default:
747 insn->eaBase = (EABase)(eaBaseBase + rm);
748 break;
749 }
750 break;
751 case 0x1:
752 insn->displacementSize = 1;
753 [[fallthrough]];
754 case 0x2:
755 insn->eaDisplacement = (mod == 0x1 ? EA_DISP_8 : EA_DISP_32);
756 switch (rm & 7) {
757 case 0x4: // SIB byte is present
758 insn->eaBase = EA_BASE_sib;
759 if (readSIB(insn) || readDisplacement(insn))
760 return -1;
761 break;
762 default:
763 insn->eaBase = (EABase)(eaBaseBase + rm);
764 if (readDisplacement(insn))
765 return -1;
766 break;
767 }
768 break;
769 case 0x3:
771 insn->eaBase = (EABase)(insn->eaRegBase + rm);
772 break;
773 }
774 break;
775 }
776 } // switch (insn->addressSize)
777
778 return 0;
779}
780
781#define GENERIC_FIXUP_FUNC(name, base, prefix) \
782 static uint16_t name(struct InternalInstruction *insn, OperandType type, \
783 uint8_t index, uint8_t *valid) { \
784 *valid = 1; \
785 switch (type) { \
786 default: \
787 debug("Unhandled register type"); \
788 *valid = 0; \
789 return 0; \
790 case TYPE_Rv: \
791 return base + index; \
792 case TYPE_R8: \
793 if (insn->rexPrefix && index >= 4 && index <= 7) \
794 return prefix##_SPL + (index - 4); \
795 else \
796 return prefix##_AL + index; \
797 case TYPE_R16: \
798 return prefix##_AX + index; \
799 case TYPE_R32: \
800 return prefix##_EAX + index; \
801 case TYPE_R64: \
802 return prefix##_RAX + index; \
803 case TYPE_ZMM: \
804 return prefix##_ZMM0 + index; \
805 case TYPE_YMM: \
806 return prefix##_YMM0 + index; \
807 case TYPE_XMM: \
808 return prefix##_XMM0 + index; \
809 case TYPE_TMM: \
810 if (index > 7) \
811 *valid = 0; \
812 return prefix##_TMM0 + index; \
813 case TYPE_TMM_PAIR: \
814 if (index > 7) \
815 *valid = 0; \
816 return prefix##_TMM0_TMM1 + (index / 2); \
817 case TYPE_VK: \
818 index &= 0xf; \
819 if (index > 7) \
820 *valid = 0; \
821 return prefix##_K0 + index; \
822 case TYPE_VK_PAIR: \
823 if (index > 7) \
824 *valid = 0; \
825 return prefix##_K0_K1 + (index / 2); \
826 case TYPE_MM64: \
827 return prefix##_MM0 + (index & 0x7); \
828 case TYPE_SEGMENTREG: \
829 if ((index & 7) > 5) \
830 *valid = 0; \
831 return prefix##_ES + (index & 7); \
832 case TYPE_DEBUGREG: \
833 if (index > 15) \
834 *valid = 0; \
835 return prefix##_DR0 + index; \
836 case TYPE_CONTROLREG: \
837 if (index > 15) \
838 *valid = 0; \
839 return prefix##_CR0 + index; \
840 case TYPE_MVSIBX: \
841 return prefix##_XMM0 + index; \
842 case TYPE_MVSIBY: \
843 return prefix##_YMM0 + index; \
844 case TYPE_MVSIBZ: \
845 return prefix##_ZMM0 + index; \
846 } \
847 }
848
849// Consult an operand type to determine the meaning of the reg or R/M field. If
850// the operand is an XMM operand, for example, an operand would be XMM0 instead
851// of AX, which readModRM() would otherwise misinterpret it as.
852//
853// @param insn - The instruction containing the operand.
854// @param type - The operand type.
855// @param index - The existing value of the field as reported by readModRM().
856// @param valid - The address of a uint8_t. The target is set to 1 if the
857// field is valid for the register class; 0 if not.
858// @return - The proper value.
859GENERIC_FIXUP_FUNC(fixupRegValue, insn->regBase, MODRM_REG)
860GENERIC_FIXUP_FUNC(fixupRMValue, insn->eaRegBase, EA_REG)
861
862// Consult an operand specifier to determine which of the fixup*Value functions
863// to use in correcting readModRM()'ss interpretation.
864//
865// @param insn - See fixup*Value().
866// @param op - The operand specifier.
867// @return - 0 if fixup was successful; -1 if the register returned was
868// invalid for its class.
869static int fixupReg(struct InternalInstruction *insn,
870 const struct OperandSpecifier *op) {
871 uint8_t valid;
872 LLVM_DEBUG(dbgs() << "fixupReg()");
873
874 switch ((OperandEncoding)op->encoding) {
875 default:
876 debug("Expected a REG or R/M encoding in fixupReg");
877 return -1;
878 case ENCODING_VVVV:
879 insn->vvvv =
880 (Reg)fixupRegValue(insn, (OperandType)op->type, insn->vvvv, &valid);
881 if (!valid)
882 return -1;
883 break;
884 case ENCODING_REG:
885 insn->reg = (Reg)fixupRegValue(insn, (OperandType)op->type,
886 insn->reg - insn->regBase, &valid);
887 if (!valid)
888 return -1;
889 break;
891 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT &&
892 modFromModRM(insn->modRM) == 3) {
893 // EVEX_X can extend the register id to 32 for a non-GPR register that is
894 // encoded in RM.
895 // mode : MODE_64_BIT
896 // Only 8 vector registers are available in 32 bit mode
897 // mod : 3
898 // RM encodes a register
899 switch (op->type) {
900 case TYPE_Rv:
901 case TYPE_R8:
902 case TYPE_R16:
903 case TYPE_R32:
904 case TYPE_R64:
905 break;
906 default:
907 insn->eaBase =
908 (EABase)(insn->eaBase +
909 (xFromEVEX2of4(insn->vectorExtensionPrefix[1]) << 4));
910 break;
911 }
912 }
913 [[fallthrough]];
914 case ENCODING_SIB:
915 if (insn->eaBase >= insn->eaRegBase) {
916 insn->eaBase = (EABase)fixupRMValue(
917 insn, (OperandType)op->type, insn->eaBase - insn->eaRegBase, &valid);
918 if (!valid)
919 return -1;
920 }
921 break;
922 }
923
924 return 0;
925}
926
927// Read the opcode (except the ModR/M byte in the case of extended or escape
928// opcodes).
929static bool readOpcode(struct InternalInstruction *insn) {
930 uint8_t current;
931 LLVM_DEBUG(dbgs() << "readOpcode()");
932
933 insn->opcodeType = ONEBYTE;
934 if (insn->vectorExtensionType == TYPE_EVEX) {
935 switch (mmmFromEVEX2of4(insn->vectorExtensionPrefix[1])) {
936 default:
938 dbgs() << format("Unhandled mmm field for instruction (0x%hhx)",
940 return true;
941 case VEX_LOB_0F:
942 insn->opcodeType = TWOBYTE;
943 return consume(insn, insn->opcode);
944 case VEX_LOB_0F38:
945 insn->opcodeType = THREEBYTE_38;
946 return consume(insn, insn->opcode);
947 case VEX_LOB_0F3A:
948 insn->opcodeType = THREEBYTE_3A;
949 return consume(insn, insn->opcode);
950 case VEX_LOB_MAP4:
951 insn->opcodeType = MAP4;
952 return consume(insn, insn->opcode);
953 case VEX_LOB_MAP5:
954 insn->opcodeType = MAP5;
955 return consume(insn, insn->opcode);
956 case VEX_LOB_MAP6:
957 insn->opcodeType = MAP6;
958 return consume(insn, insn->opcode);
959 case VEX_LOB_MAP7:
960 insn->opcodeType = MAP7;
961 return consume(insn, insn->opcode);
962 }
963 } else if (insn->vectorExtensionType == TYPE_VEX_3B) {
964 switch (mmmmmFromVEX2of3(insn->vectorExtensionPrefix[1])) {
965 default:
967 dbgs() << format("Unhandled m-mmmm field for instruction (0x%hhx)",
969 return true;
970 case VEX_LOB_0F:
971 insn->opcodeType = TWOBYTE;
972 return consume(insn, insn->opcode);
973 case VEX_LOB_0F38:
974 insn->opcodeType = THREEBYTE_38;
975 return consume(insn, insn->opcode);
976 case VEX_LOB_0F3A:
977 insn->opcodeType = THREEBYTE_3A;
978 return consume(insn, insn->opcode);
979 case VEX_LOB_MAP5:
980 insn->opcodeType = MAP5;
981 return consume(insn, insn->opcode);
982 case VEX_LOB_MAP6:
983 insn->opcodeType = MAP6;
984 return consume(insn, insn->opcode);
985 case VEX_LOB_MAP7:
986 insn->opcodeType = MAP7;
987 return consume(insn, insn->opcode);
988 }
989 } else if (insn->vectorExtensionType == TYPE_VEX_2B) {
990 insn->opcodeType = TWOBYTE;
991 return consume(insn, insn->opcode);
992 } else if (insn->vectorExtensionType == TYPE_XOP) {
993 switch (mmmmmFromXOP2of3(insn->vectorExtensionPrefix[1])) {
994 default:
996 dbgs() << format("Unhandled m-mmmm field for instruction (0x%hhx)",
998 return true;
999 case XOP_MAP_SELECT_8:
1000 insn->opcodeType = XOP8_MAP;
1001 return consume(insn, insn->opcode);
1002 case XOP_MAP_SELECT_9:
1003 insn->opcodeType = XOP9_MAP;
1004 return consume(insn, insn->opcode);
1005 case XOP_MAP_SELECT_A:
1006 insn->opcodeType = XOPA_MAP;
1007 return consume(insn, insn->opcode);
1008 }
1009 } else if (mFromREX2(insn->rex2ExtensionPrefix[1])) {
1010 // m bit indicates opcode map 1
1011 insn->opcodeType = TWOBYTE;
1012 return consume(insn, insn->opcode);
1013 }
1014
1015 if (consume(insn, current))
1016 return true;
1017
1018 if (current == 0x0f) {
1019 LLVM_DEBUG(
1020 dbgs() << format("Found a two-byte escape prefix (0x%hhx)", current));
1021 if (consume(insn, current))
1022 return true;
1023
1024 if (current == 0x38) {
1025 LLVM_DEBUG(dbgs() << format("Found a three-byte escape prefix (0x%hhx)",
1026 current));
1027 if (consume(insn, current))
1028 return true;
1029
1030 insn->opcodeType = THREEBYTE_38;
1031 } else if (current == 0x3a) {
1032 LLVM_DEBUG(dbgs() << format("Found a three-byte escape prefix (0x%hhx)",
1033 current));
1034 if (consume(insn, current))
1035 return true;
1036
1037 insn->opcodeType = THREEBYTE_3A;
1038 } else if (current == 0x0f) {
1039 LLVM_DEBUG(
1040 dbgs() << format("Found a 3dnow escape prefix (0x%hhx)", current));
1041
1042 // Consume operands before the opcode to comply with the 3DNow encoding
1043 if (readModRM(insn))
1044 return true;
1045
1046 if (consume(insn, current))
1047 return true;
1048
1049 insn->opcodeType = THREEDNOW_MAP;
1050 } else {
1051 LLVM_DEBUG(dbgs() << "Didn't find a three-byte escape prefix");
1052 insn->opcodeType = TWOBYTE;
1053 }
1054 } else if (insn->mandatoryPrefix)
1055 // The opcode with mandatory prefix must start with opcode escape.
1056 // If not it's legacy repeat prefix
1057 insn->mandatoryPrefix = 0;
1058
1059 // At this point we have consumed the full opcode.
1060 // Anything we consume from here on must be unconsumed.
1061 insn->opcode = current;
1062
1063 return false;
1064}
1065
1066// Determine whether equiv is the 16-bit equivalent of orig (32-bit or 64-bit).
1067static bool is16BitEquivalent(const char *orig, const char *equiv) {
1068 for (int i = 0;; i++) {
1069 if (orig[i] == '\0' && equiv[i] == '\0')
1070 return true;
1071 if (orig[i] == '\0' || equiv[i] == '\0')
1072 return false;
1073 if (orig[i] != equiv[i]) {
1074 if ((orig[i] == 'Q' || orig[i] == 'L') && equiv[i] == 'W')
1075 continue;
1076 if ((orig[i] == '6' || orig[i] == '3') && equiv[i] == '1')
1077 continue;
1078 if ((orig[i] == '4' || orig[i] == '2') && equiv[i] == '6')
1079 continue;
1080 return false;
1081 }
1082 }
1083}
1084
1085// Determine whether this instruction is a 64-bit instruction.
1086static bool is64Bit(const char *name) {
1087 for (int i = 0;; ++i) {
1088 if (name[i] == '\0')
1089 return false;
1090 if (name[i] == '6' && name[i + 1] == '4')
1091 return true;
1092 }
1093}
1094
1095// Determine the ID of an instruction, consuming the ModR/M byte as appropriate
1096// for extended and escape opcodes, and using a supplied attribute mask.
1097static int getInstructionIDWithAttrMask(uint16_t *instructionID,
1098 struct InternalInstruction *insn,
1099 uint16_t attrMask) {
1100 auto insnCtx = InstructionContext(x86DisassemblerContexts[attrMask]);
1101 const ContextDecision *decision;
1102 switch (insn->opcodeType) {
1103 case ONEBYTE:
1104 decision = &ONEBYTE_SYM;
1105 break;
1106 case TWOBYTE:
1107 decision = &TWOBYTE_SYM;
1108 break;
1109 case THREEBYTE_38:
1110 decision = &THREEBYTE38_SYM;
1111 break;
1112 case THREEBYTE_3A:
1113 decision = &THREEBYTE3A_SYM;
1114 break;
1115 case XOP8_MAP:
1116 decision = &XOP8_MAP_SYM;
1117 break;
1118 case XOP9_MAP:
1119 decision = &XOP9_MAP_SYM;
1120 break;
1121 case XOPA_MAP:
1122 decision = &XOPA_MAP_SYM;
1123 break;
1124 case THREEDNOW_MAP:
1125 decision = &THREEDNOW_MAP_SYM;
1126 break;
1127 case MAP4:
1128 decision = &MAP4_SYM;
1129 break;
1130 case MAP5:
1131 decision = &MAP5_SYM;
1132 break;
1133 case MAP6:
1134 decision = &MAP6_SYM;
1135 break;
1136 case MAP7:
1137 decision = &MAP7_SYM;
1138 break;
1139 }
1140
1141 if (decision->opcodeDecisions[insnCtx]
1142 .modRMDecisions[insn->opcode]
1143 .modrm_type != MODRM_ONEENTRY) {
1144 if (readModRM(insn))
1145 return -1;
1146 *instructionID =
1147 decode(insn->opcodeType, insnCtx, insn->opcode, insn->modRM);
1148 } else {
1149 *instructionID = decode(insn->opcodeType, insnCtx, insn->opcode, 0);
1150 }
1151
1152 return 0;
1153}
1154
1156 if (insn->opcodeType != MAP4)
1157 return false;
1158 if (insn->opcode == 0x83 && regFromModRM(insn->modRM) == 7)
1159 return true;
1160 switch (insn->opcode & 0xfe) {
1161 default:
1162 return false;
1163 case 0x38:
1164 case 0x3a:
1165 case 0x84:
1166 return true;
1167 case 0x80:
1168 return regFromModRM(insn->modRM) == 7;
1169 case 0xf6:
1170 return regFromModRM(insn->modRM) == 0;
1171 }
1172}
1173
1174static bool isNF(InternalInstruction *insn) {
1176 return false;
1177 if (insn->opcodeType == MAP4)
1178 return true;
1179 // Below NF instructions are not in map4.
1180 if (insn->opcodeType == THREEBYTE_38 &&
1182 switch (insn->opcode) {
1183 case 0xf2: // ANDN
1184 case 0xf3: // BLSI, BLSR, BLSMSK
1185 case 0xf5: // BZHI
1186 case 0xf7: // BEXTR
1187 return true;
1188 default:
1189 break;
1190 }
1191 }
1192 return false;
1193}
1194
1195// Determine the ID of an instruction, consuming the ModR/M byte as appropriate
1196// for extended and escape opcodes. Determines the attributes and context for
1197// the instruction before doing so.
1199 const MCInstrInfo *mii) {
1200 uint16_t attrMask;
1201 uint16_t instructionID;
1202
1203 LLVM_DEBUG(dbgs() << "getID()");
1204
1205 attrMask = ATTR_NONE;
1206
1207 if (insn->mode == MODE_64BIT)
1208 attrMask |= ATTR_64BIT;
1209
1210 if (insn->vectorExtensionType != TYPE_NO_VEX_XOP) {
1211 attrMask |= (insn->vectorExtensionType == TYPE_EVEX) ? ATTR_EVEX : ATTR_VEX;
1212
1213 if (insn->vectorExtensionType == TYPE_EVEX) {
1214 switch (ppFromEVEX3of4(insn->vectorExtensionPrefix[2])) {
1215 case VEX_PREFIX_66:
1216 attrMask |= ATTR_OPSIZE;
1217 break;
1218 case VEX_PREFIX_F3:
1219 attrMask |= ATTR_XS;
1220 break;
1221 case VEX_PREFIX_F2:
1222 attrMask |= ATTR_XD;
1223 break;
1224 }
1225
1227 attrMask |= ATTR_EVEXKZ;
1228 if (isNF(insn) && !readModRM(insn) &&
1229 !isCCMPOrCTEST(insn)) // NF bit is the MSB of aaa.
1230 attrMask |= ATTR_EVEXNF;
1231 // aaa is not used a opmask in MAP4
1232 else if (aaaFromEVEX4of4(insn->vectorExtensionPrefix[3]) &&
1233 (insn->opcodeType != MAP4))
1234 attrMask |= ATTR_EVEXK;
1235 if (bFromEVEX4of4(insn->vectorExtensionPrefix[3])) {
1236 attrMask |= ATTR_EVEXB;
1237 if (uFromEVEX3of4(insn->vectorExtensionPrefix[2]) && !readModRM(insn) &&
1238 modFromModRM(insn->modRM) == 3)
1239 attrMask |= ATTR_EVEXU;
1240 }
1242 attrMask |= ATTR_VEXL;
1244 attrMask |= ATTR_EVEXL2;
1245 } else if (insn->vectorExtensionType == TYPE_VEX_3B) {
1246 switch (ppFromVEX3of3(insn->vectorExtensionPrefix[2])) {
1247 case VEX_PREFIX_66:
1248 attrMask |= ATTR_OPSIZE;
1249 break;
1250 case VEX_PREFIX_F3:
1251 attrMask |= ATTR_XS;
1252 break;
1253 case VEX_PREFIX_F2:
1254 attrMask |= ATTR_XD;
1255 break;
1256 }
1257
1258 if (lFromVEX3of3(insn->vectorExtensionPrefix[2]))
1259 attrMask |= ATTR_VEXL;
1260 } else if (insn->vectorExtensionType == TYPE_VEX_2B) {
1261 switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {
1262 case VEX_PREFIX_66:
1263 attrMask |= ATTR_OPSIZE;
1264 if (insn->hasAdSize)
1265 attrMask |= ATTR_ADSIZE;
1266 break;
1267 case VEX_PREFIX_F3:
1268 attrMask |= ATTR_XS;
1269 break;
1270 case VEX_PREFIX_F2:
1271 attrMask |= ATTR_XD;
1272 break;
1273 }
1274
1275 if (lFromVEX2of2(insn->vectorExtensionPrefix[1]))
1276 attrMask |= ATTR_VEXL;
1277 } else if (insn->vectorExtensionType == TYPE_XOP) {
1278 switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {
1279 case VEX_PREFIX_66:
1280 attrMask |= ATTR_OPSIZE;
1281 break;
1282 case VEX_PREFIX_F3:
1283 attrMask |= ATTR_XS;
1284 break;
1285 case VEX_PREFIX_F2:
1286 attrMask |= ATTR_XD;
1287 break;
1288 }
1289
1290 if (lFromXOP3of3(insn->vectorExtensionPrefix[2]))
1291 attrMask |= ATTR_VEXL;
1292 } else {
1293 return -1;
1294 }
1295 } else if (!insn->mandatoryPrefix) {
1296 // If we don't have mandatory prefix we should use legacy prefixes here
1297 if (insn->hasOpSize && (insn->mode != MODE_16BIT))
1298 attrMask |= ATTR_OPSIZE;
1299 if (insn->hasAdSize)
1300 attrMask |= ATTR_ADSIZE;
1301 if (insn->opcodeType == ONEBYTE) {
1302 if (insn->repeatPrefix == 0xf3 && (insn->opcode == 0x90))
1303 // Special support for PAUSE
1304 attrMask |= ATTR_XS;
1305 } else {
1306 if (insn->repeatPrefix == 0xf2)
1307 attrMask |= ATTR_XD;
1308 else if (insn->repeatPrefix == 0xf3)
1309 attrMask |= ATTR_XS;
1310 }
1311 } else {
1312 switch (insn->mandatoryPrefix) {
1313 case 0xf2:
1314 attrMask |= ATTR_XD;
1315 break;
1316 case 0xf3:
1317 attrMask |= ATTR_XS;
1318 break;
1319 case 0x66:
1320 if (insn->mode != MODE_16BIT)
1321 attrMask |= ATTR_OPSIZE;
1322 if (insn->hasAdSize)
1323 attrMask |= ATTR_ADSIZE;
1324 break;
1325 case 0x67:
1326 attrMask |= ATTR_ADSIZE;
1327 break;
1328 }
1329 }
1330
1331 if (insn->rexPrefix & 0x08) {
1332 attrMask |= ATTR_REXW;
1333 attrMask &= ~ATTR_ADSIZE;
1334 }
1335
1336 // Absolute jump and pushp/popp need special handling
1337 if (insn->rex2ExtensionPrefix[0] == 0xd5 && insn->opcodeType == ONEBYTE &&
1338 (insn->opcode == 0xA1 || (insn->opcode & 0xf0) == 0x50))
1339 attrMask |= ATTR_REX2;
1340
1341 if (insn->mode == MODE_16BIT) {
1342 // JCXZ/JECXZ need special handling for 16-bit mode because the meaning
1343 // of the AdSize prefix is inverted w.r.t. 32-bit mode.
1344 if (insn->opcodeType == ONEBYTE && insn->opcode == 0xE3)
1345 attrMask ^= ATTR_ADSIZE;
1346 // If we're in 16-bit mode and this is one of the relative jumps and opsize
1347 // prefix isn't present, we need to force the opsize attribute since the
1348 // prefix is inverted relative to 32-bit mode.
1349 if (!insn->hasOpSize && insn->opcodeType == ONEBYTE &&
1350 (insn->opcode == 0xE8 || insn->opcode == 0xE9))
1351 attrMask |= ATTR_OPSIZE;
1352
1353 if (!insn->hasOpSize && insn->opcodeType == TWOBYTE &&
1354 insn->opcode >= 0x80 && insn->opcode <= 0x8F)
1355 attrMask |= ATTR_OPSIZE;
1356 }
1357
1358
1359 if (getInstructionIDWithAttrMask(&instructionID, insn, attrMask))
1360 return -1;
1361
1362 // The following clauses compensate for limitations of the tables.
1363
1364 if (insn->mode != MODE_64BIT &&
1366 // The tables can't distinquish between cases where the W-bit is used to
1367 // select register size and cases where its a required part of the opcode.
1368 if ((insn->vectorExtensionType == TYPE_EVEX &&
1370 (insn->vectorExtensionType == TYPE_VEX_3B &&
1372 (insn->vectorExtensionType == TYPE_XOP &&
1374
1375 uint16_t instructionIDWithREXW;
1376 if (getInstructionIDWithAttrMask(&instructionIDWithREXW, insn,
1377 attrMask | ATTR_REXW)) {
1378 insn->instructionID = instructionID;
1379 insn->spec = &INSTRUCTIONS_SYM[instructionID];
1380 return 0;
1381 }
1382
1383 auto SpecName = mii->getName(instructionIDWithREXW);
1384 // If not a 64-bit instruction. Switch the opcode.
1385 if (!is64Bit(SpecName.data())) {
1386 insn->instructionID = instructionIDWithREXW;
1387 insn->spec = &INSTRUCTIONS_SYM[instructionIDWithREXW];
1388 return 0;
1389 }
1390 }
1391 }
1392
1393 // Absolute moves, umonitor, and movdir64b need special handling.
1394 // -For 16-bit mode because the meaning of the AdSize and OpSize prefixes are
1395 // inverted w.r.t.
1396 // -For 32-bit mode we need to ensure the ADSIZE prefix is observed in
1397 // any position.
1398 if ((insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0)) ||
1399 (insn->opcodeType == TWOBYTE && (insn->opcode == 0xAE)) ||
1400 (insn->opcodeType == THREEBYTE_38 && insn->opcode == 0xF8) ||
1401 (insn->opcodeType == MAP4 && insn->opcode == 0xF8)) {
1402 // Make sure we observed the prefixes in any position.
1403 if (insn->hasAdSize)
1404 attrMask |= ATTR_ADSIZE;
1405 if (insn->hasOpSize)
1406 attrMask |= ATTR_OPSIZE;
1407
1408 // In 16-bit, invert the attributes.
1409 if (insn->mode == MODE_16BIT) {
1410 attrMask ^= ATTR_ADSIZE;
1411
1412 // The OpSize attribute is only valid with the absolute moves.
1413 if (insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0))
1414 attrMask ^= ATTR_OPSIZE;
1415 }
1416
1417 if (getInstructionIDWithAttrMask(&instructionID, insn, attrMask))
1418 return -1;
1419
1420 insn->instructionID = instructionID;
1421 insn->spec = &INSTRUCTIONS_SYM[instructionID];
1422 return 0;
1423 }
1424
1425 if ((insn->mode == MODE_16BIT || insn->hasOpSize) &&
1426 !(attrMask & ATTR_OPSIZE)) {
1427 // The instruction tables make no distinction between instructions that
1428 // allow OpSize anywhere (i.e., 16-bit operations) and that need it in a
1429 // particular spot (i.e., many MMX operations). In general we're
1430 // conservative, but in the specific case where OpSize is present but not in
1431 // the right place we check if there's a 16-bit operation.
1432 const struct InstructionSpecifier *spec;
1433 uint16_t instructionIDWithOpsize;
1434 llvm::StringRef specName, specWithOpSizeName;
1435
1436 spec = &INSTRUCTIONS_SYM[instructionID];
1437
1438 if (getInstructionIDWithAttrMask(&instructionIDWithOpsize, insn,
1439 attrMask | ATTR_OPSIZE)) {
1440 // ModRM required with OpSize but not present. Give up and return the
1441 // version without OpSize set.
1442 insn->instructionID = instructionID;
1443 insn->spec = spec;
1444 return 0;
1445 }
1446
1447 specName = mii->getName(instructionID);
1448 specWithOpSizeName = mii->getName(instructionIDWithOpsize);
1449
1450 if (is16BitEquivalent(specName.data(), specWithOpSizeName.data()) &&
1451 (insn->mode == MODE_16BIT) ^ insn->hasOpSize) {
1452 insn->instructionID = instructionIDWithOpsize;
1453 insn->spec = &INSTRUCTIONS_SYM[instructionIDWithOpsize];
1454 } else {
1455 insn->instructionID = instructionID;
1456 insn->spec = spec;
1457 }
1458 return 0;
1459 }
1460
1461 if (insn->opcodeType == ONEBYTE && insn->opcode == 0x90 &&
1462 insn->rexPrefix & 0x01) {
1463 // NOOP shouldn't decode as NOOP if REX.b is set. Instead it should decode
1464 // as XCHG %r8, %eax.
1465 const struct InstructionSpecifier *spec;
1466 uint16_t instructionIDWithNewOpcode;
1467 const struct InstructionSpecifier *specWithNewOpcode;
1468
1469 spec = &INSTRUCTIONS_SYM[instructionID];
1470
1471 // Borrow opcode from one of the other XCHGar opcodes
1472 insn->opcode = 0x91;
1473
1474 if (getInstructionIDWithAttrMask(&instructionIDWithNewOpcode, insn,
1475 attrMask)) {
1476 insn->opcode = 0x90;
1477
1478 insn->instructionID = instructionID;
1479 insn->spec = spec;
1480 return 0;
1481 }
1482
1483 specWithNewOpcode = &INSTRUCTIONS_SYM[instructionIDWithNewOpcode];
1484
1485 // Change back
1486 insn->opcode = 0x90;
1487
1488 insn->instructionID = instructionIDWithNewOpcode;
1489 insn->spec = specWithNewOpcode;
1490
1491 return 0;
1492 }
1493
1494 insn->instructionID = instructionID;
1495 insn->spec = &INSTRUCTIONS_SYM[insn->instructionID];
1496
1497 return 0;
1498}
1499
1500// Read an operand from the opcode field of an instruction and interprets it
1501// appropriately given the operand width. Handles AddRegFrm instructions.
1502//
1503// @param insn - the instruction whose opcode field is to be read.
1504// @param size - The width (in bytes) of the register being specified.
1505// 1 means AL and friends, 2 means AX, 4 means EAX, and 8 means
1506// RAX.
1507// @return - 0 on success; nonzero otherwise.
1509 LLVM_DEBUG(dbgs() << "readOpcodeRegister()");
1510
1511 if (size == 0)
1512 size = insn->registerSize;
1513
1514 auto setOpcodeRegister = [&](unsigned base) {
1515 insn->opcodeRegister =
1516 (Reg)(base + ((bFromREX(insn->rexPrefix) << 3) |
1517 (b2FromREX2(insn->rex2ExtensionPrefix[1]) << 4) |
1518 (insn->opcode & 7)));
1519 };
1520
1521 switch (size) {
1522 case 1:
1523 setOpcodeRegister(MODRM_REG_AL);
1524 if (insn->rexPrefix && insn->opcodeRegister >= MODRM_REG_AL + 0x4 &&
1525 insn->opcodeRegister < MODRM_REG_AL + 0x8) {
1526 insn->opcodeRegister =
1527 (Reg)(MODRM_REG_SPL + (insn->opcodeRegister - MODRM_REG_AL - 4));
1528 }
1529
1530 break;
1531 case 2:
1532 setOpcodeRegister(MODRM_REG_AX);
1533 break;
1534 case 4:
1535 setOpcodeRegister(MODRM_REG_EAX);
1536 break;
1537 case 8:
1538 setOpcodeRegister(MODRM_REG_RAX);
1539 break;
1540 }
1541
1542 return 0;
1543}
1544
1545// Consume an immediate operand from an instruction, given the desired operand
1546// size.
1547//
1548// @param insn - The instruction whose operand is to be read.
1549// @param size - The width (in bytes) of the operand.
1550// @return - 0 if the immediate was successfully consumed; nonzero
1551// otherwise.
1553 uint8_t imm8;
1554 uint16_t imm16;
1555 uint32_t imm32;
1556 uint64_t imm64;
1557
1558 LLVM_DEBUG(dbgs() << "readImmediate()");
1559
1560 assert(insn->numImmediatesConsumed < 2 && "Already consumed two immediates");
1561
1562 insn->immediateSize = size;
1563 insn->immediateOffset = insn->readerCursor - insn->startLocation;
1564
1565 switch (size) {
1566 case 1:
1567 if (consume(insn, imm8))
1568 return -1;
1569 insn->immediates[insn->numImmediatesConsumed] = imm8;
1570 break;
1571 case 2:
1572 if (consume(insn, imm16))
1573 return -1;
1574 insn->immediates[insn->numImmediatesConsumed] = imm16;
1575 break;
1576 case 4:
1577 if (consume(insn, imm32))
1578 return -1;
1579 insn->immediates[insn->numImmediatesConsumed] = imm32;
1580 break;
1581 case 8:
1582 if (consume(insn, imm64))
1583 return -1;
1584 insn->immediates[insn->numImmediatesConsumed] = imm64;
1585 break;
1586 default:
1587 llvm_unreachable("invalid size");
1588 }
1589
1590 insn->numImmediatesConsumed++;
1591
1592 return 0;
1593}
1594
1595// Consume vvvv from an instruction if it has a VEX prefix.
1596static int readVVVV(struct InternalInstruction *insn) {
1597 LLVM_DEBUG(dbgs() << "readVVVV()");
1598
1599 int vvvv;
1600 if (insn->vectorExtensionType == TYPE_EVEX)
1601 vvvv = (v2FromEVEX4of4(insn->vectorExtensionPrefix[3]) << 4 |
1603 else if (insn->vectorExtensionType == TYPE_VEX_3B)
1604 vvvv = vvvvFromVEX3of3(insn->vectorExtensionPrefix[2]);
1605 else if (insn->vectorExtensionType == TYPE_VEX_2B)
1606 vvvv = vvvvFromVEX2of2(insn->vectorExtensionPrefix[1]);
1607 else if (insn->vectorExtensionType == TYPE_XOP)
1608 vvvv = vvvvFromXOP3of3(insn->vectorExtensionPrefix[2]);
1609 else
1610 return -1;
1611
1612 if (insn->mode != MODE_64BIT)
1613 vvvv &= 0xf; // Can only clear bit 4. Bit 3 must be cleared later.
1614
1615 insn->vvvv = static_cast<Reg>(vvvv);
1616 return 0;
1617}
1618
1619// Read an mask register from the opcode field of an instruction.
1620//
1621// @param insn - The instruction whose opcode field is to be read.
1622// @return - 0 on success; nonzero otherwise.
1623static int readMaskRegister(struct InternalInstruction *insn) {
1624 LLVM_DEBUG(dbgs() << "readMaskRegister()");
1625
1626 if (insn->vectorExtensionType != TYPE_EVEX)
1627 return -1;
1628
1629 insn->writemask =
1630 static_cast<Reg>(aaaFromEVEX4of4(insn->vectorExtensionPrefix[3]));
1631 return 0;
1632}
1633
1634// Consults the specifier for an instruction and consumes all
1635// operands for that instruction, interpreting them as it goes.
1636static int readOperands(struct InternalInstruction *insn) {
1637 int hasVVVV, needVVVV;
1638 int sawRegImm = 0;
1639
1640 LLVM_DEBUG(dbgs() << "readOperands()");
1641
1642 // If non-zero vvvv specified, make sure one of the operands uses it.
1643 hasVVVV = !readVVVV(insn);
1644 needVVVV = hasVVVV && (insn->vvvv != 0);
1645
1646 for (const auto &Op : x86OperandSets[insn->spec->operands]) {
1647 switch (Op.encoding) {
1648 case ENCODING_NONE:
1649 case ENCODING_SI:
1650 case ENCODING_DI:
1651 break;
1653 // VSIB can use the V2 bit so check only the other bits.
1654 if (needVVVV)
1655 needVVVV = hasVVVV & ((insn->vvvv & 0xf) != 0);
1656 if (readModRM(insn))
1657 return -1;
1658
1659 // Reject if SIB wasn't used.
1660 if (insn->eaBase != EA_BASE_sib && insn->eaBase != EA_BASE_sib64)
1661 return -1;
1662
1663 // If sibIndex was set to SIB_INDEX_NONE, index offset is 4.
1664 if (insn->sibIndex == SIB_INDEX_NONE)
1665 insn->sibIndex = (SIBIndex)(insn->sibIndexBase + 4);
1666
1667 // If EVEX.v2 is set this is one of the 16-31 registers.
1668 if (insn->vectorExtensionType == TYPE_EVEX && insn->mode == MODE_64BIT &&
1670 insn->sibIndex = (SIBIndex)(insn->sibIndex + 16);
1671
1672 // Adjust the index register to the correct size.
1673 switch ((OperandType)Op.type) {
1674 default:
1675 debug("Unhandled VSIB index type");
1676 return -1;
1677 case TYPE_MVSIBX:
1678 insn->sibIndex =
1679 (SIBIndex)(SIB_INDEX_XMM0 + (insn->sibIndex - insn->sibIndexBase));
1680 break;
1681 case TYPE_MVSIBY:
1682 insn->sibIndex =
1683 (SIBIndex)(SIB_INDEX_YMM0 + (insn->sibIndex - insn->sibIndexBase));
1684 break;
1685 case TYPE_MVSIBZ:
1686 insn->sibIndex =
1687 (SIBIndex)(SIB_INDEX_ZMM0 + (insn->sibIndex - insn->sibIndexBase));
1688 break;
1689 }
1690
1691 // Apply the AVX512 compressed displacement scaling factor.
1692 if (Op.encoding != ENCODING_REG && insn->eaDisplacement == EA_DISP_8)
1693 insn->displacement *= 1 << (Op.encoding - ENCODING_VSIB);
1694 break;
1695 case ENCODING_SIB:
1696 // Reject if SIB wasn't used.
1697 if (insn->eaBase != EA_BASE_sib && insn->eaBase != EA_BASE_sib64)
1698 return -1;
1699 if (readModRM(insn))
1700 return -1;
1701 if (fixupReg(insn, &Op))
1702 return -1;
1703 break;
1704 case ENCODING_REG:
1706 if (readModRM(insn))
1707 return -1;
1708 if (fixupReg(insn, &Op))
1709 return -1;
1710 // Apply the AVX512 compressed displacement scaling factor.
1711 if (Op.encoding != ENCODING_REG && insn->eaDisplacement == EA_DISP_8)
1712 insn->displacement *= 1 << (Op.encoding - ENCODING_RM);
1713 break;
1714 case ENCODING_IB:
1715 if (sawRegImm) {
1716 // Saw a register immediate so don't read again and instead split the
1717 // previous immediate. FIXME: This is a hack.
1718 insn->immediates[insn->numImmediatesConsumed] =
1719 insn->immediates[insn->numImmediatesConsumed - 1] & 0xf;
1720 ++insn->numImmediatesConsumed;
1721 break;
1722 }
1723 if (readImmediate(insn, 1))
1724 return -1;
1725 if (Op.type == TYPE_XMM || Op.type == TYPE_YMM)
1726 sawRegImm = 1;
1727 break;
1728 case ENCODING_IW:
1729 if (readImmediate(insn, 2))
1730 return -1;
1731 break;
1732 case ENCODING_ID:
1733 if (readImmediate(insn, 4))
1734 return -1;
1735 break;
1736 case ENCODING_IO:
1737 if (readImmediate(insn, 8))
1738 return -1;
1739 break;
1740 case ENCODING_Iv:
1741 if (readImmediate(insn, insn->immediateSize))
1742 return -1;
1743 break;
1744 case ENCODING_Ia:
1745 if (readImmediate(insn, insn->addressSize))
1746 return -1;
1747 break;
1748 case ENCODING_IRC:
1749 insn->RC = (l2FromEVEX4of4(insn->vectorExtensionPrefix[3]) << 1) |
1751 break;
1752 case ENCODING_RB:
1753 if (readOpcodeRegister(insn, 1))
1754 return -1;
1755 break;
1756 case ENCODING_RW:
1757 if (readOpcodeRegister(insn, 2))
1758 return -1;
1759 break;
1760 case ENCODING_RD:
1761 if (readOpcodeRegister(insn, 4))
1762 return -1;
1763 break;
1764 case ENCODING_RO:
1765 if (readOpcodeRegister(insn, 8))
1766 return -1;
1767 break;
1768 case ENCODING_Rv:
1769 if (readOpcodeRegister(insn, 0))
1770 return -1;
1771 break;
1772 case ENCODING_CF:
1774 needVVVV = false; // oszc shares the same bits with VVVV
1775 break;
1776 case ENCODING_CC:
1777 if (isCCMPOrCTEST(insn))
1778 insn->immediates[2] = scFromEVEX4of4(insn->vectorExtensionPrefix[3]);
1779 else
1780 insn->immediates[1] = insn->opcode & 0xf;
1781 break;
1782 case ENCODING_FP:
1783 break;
1784 case ENCODING_VVVV:
1785 needVVVV = 0; // Mark that we have found a VVVV operand.
1786 if (!hasVVVV)
1787 return -1;
1788 if (insn->mode != MODE_64BIT)
1789 insn->vvvv = static_cast<Reg>(insn->vvvv & 0x7);
1790 if (fixupReg(insn, &Op))
1791 return -1;
1792 break;
1793 case ENCODING_WRITEMASK:
1794 if (readMaskRegister(insn))
1795 return -1;
1796 break;
1797 case ENCODING_DUP:
1798 break;
1799 default:
1800 LLVM_DEBUG(dbgs() << "Encountered an operand with an unknown encoding.");
1801 return -1;
1802 }
1803 }
1804
1805 // If we didn't find ENCODING_VVVV operand, but non-zero vvvv present, fail
1806 if (needVVVV)
1807 return -1;
1808
1809 return 0;
1810}
1811
1812namespace llvm {
1813
1814// Fill-ins to make the compiler happy. These constants are never actually
1815// assigned; they are just filler to make an automatically-generated switch
1816// statement work.
1817namespace X86 {
1818 enum {
1819 BX_SI = 500,
1820 BX_DI = 501,
1821 BP_SI = 502,
1822 BP_DI = 503,
1823 sib = 504,
1824 sib64 = 505
1825 };
1826} // namespace X86
1827
1828} // namespace llvm
1829
1830static bool translateInstruction(MCInst &target,
1831 InternalInstruction &source,
1832 const MCDisassembler *Dis);
1833
1834namespace {
1835
1836/// Generic disassembler for all X86 platforms. All each platform class should
1837/// have to do is subclass the constructor, and provide a different
1838/// disassemblerMode value.
1839class X86GenericDisassembler : public MCDisassembler {
1840 std::unique_ptr<const MCInstrInfo> MII;
1841public:
1842 X86GenericDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx,
1843 std::unique_ptr<const MCInstrInfo> MII);
1844public:
1845 DecodeStatus getInstruction(MCInst &instr, uint64_t &size,
1846 ArrayRef<uint8_t> Bytes, uint64_t Address,
1847 raw_ostream &cStream) const override;
1848
1849private:
1850 DisassemblerMode fMode;
1851};
1852
1853} // namespace
1854
1855X86GenericDisassembler::X86GenericDisassembler(
1856 const MCSubtargetInfo &STI,
1857 MCContext &Ctx,
1858 std::unique_ptr<const MCInstrInfo> MII)
1859 : MCDisassembler(STI, Ctx), MII(std::move(MII)) {
1860 const FeatureBitset &FB = STI.getFeatureBits();
1861 if (FB[X86::Is16Bit]) {
1862 fMode = MODE_16BIT;
1863 return;
1864 } else if (FB[X86::Is32Bit]) {
1865 fMode = MODE_32BIT;
1866 return;
1867 } else if (FB[X86::Is64Bit]) {
1868 fMode = MODE_64BIT;
1869 return;
1870 }
1871
1872 llvm_unreachable("Invalid CPU mode");
1873}
1874
1875MCDisassembler::DecodeStatus X86GenericDisassembler::getInstruction(
1876 MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes, uint64_t Address,
1877 raw_ostream &CStream) const {
1878 CommentStream = &CStream;
1879
1880 InternalInstruction Insn;
1881 memset(&Insn, 0, sizeof(InternalInstruction));
1882 Insn.bytes = Bytes;
1883 Insn.startLocation = Address;
1884 Insn.readerCursor = Address;
1885 Insn.mode = fMode;
1886
1887 if (Bytes.empty() || readPrefixes(&Insn) || readOpcode(&Insn) ||
1888 getInstructionID(&Insn, MII.get()) || Insn.instructionID == 0 ||
1889 readOperands(&Insn)) {
1890 Size = Insn.readerCursor - Address;
1891 return Fail;
1892 }
1893
1894 Insn.operands = x86OperandSets[Insn.spec->operands];
1895 Insn.length = Insn.readerCursor - Insn.startLocation;
1896 Size = Insn.length;
1897 if (Size > 15)
1898 LLVM_DEBUG(dbgs() << "Instruction exceeds 15-byte limit");
1899
1900 bool Ret = translateInstruction(Instr, Insn, this);
1901 if (!Ret) {
1902 unsigned Flags = X86::IP_NO_PREFIX;
1903 if (Insn.hasAdSize)
1905 if (!Insn.mandatoryPrefix) {
1906 if (Insn.hasOpSize)
1908 if (Insn.repeatPrefix == 0xf2)
1910 else if (Insn.repeatPrefix == 0xf3 &&
1911 // It should not be 'pause' f3 90
1912 Insn.opcode != 0x90)
1914 if (Insn.hasLockPrefix)
1916 }
1917 Instr.setFlags(Flags);
1918 }
1919 return (!Ret) ? Success : Fail;
1920}
1921
1922//
1923// Private code that translates from struct InternalInstructions to MCInsts.
1924//
1925
1926/// translateRegister - Translates an internal register to the appropriate LLVM
1927/// register, and appends it as an operand to an MCInst.
1928///
1929/// @param mcInst - The MCInst to append to.
1930/// @param reg - The Reg to append.
1931static void translateRegister(MCInst &mcInst, Reg reg) {
1932#define ENTRY(x) X86::x,
1933 static constexpr MCPhysReg llvmRegnums[] = {ALL_REGS};
1934#undef ENTRY
1935
1936 MCPhysReg llvmRegnum = llvmRegnums[reg];
1937 mcInst.addOperand(MCOperand::createReg(llvmRegnum));
1938}
1939
1941 0, // SEG_OVERRIDE_NONE
1942 X86::CS,
1943 X86::SS,
1944 X86::DS,
1945 X86::ES,
1946 X86::FS,
1947 X86::GS
1948};
1949
1950/// translateSrcIndex - Appends a source index operand to an MCInst.
1951///
1952/// @param mcInst - The MCInst to append to.
1953/// @param insn - The internal instruction.
1954static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn) {
1955 unsigned baseRegNo;
1956
1957 if (insn.mode == MODE_64BIT)
1958 baseRegNo = insn.hasAdSize ? X86::ESI : X86::RSI;
1959 else if (insn.mode == MODE_32BIT)
1960 baseRegNo = insn.hasAdSize ? X86::SI : X86::ESI;
1961 else {
1962 assert(insn.mode == MODE_16BIT);
1963 baseRegNo = insn.hasAdSize ? X86::ESI : X86::SI;
1964 }
1965 MCOperand baseReg = MCOperand::createReg(baseRegNo);
1966 mcInst.addOperand(baseReg);
1967
1968 MCOperand segmentReg;
1970 mcInst.addOperand(segmentReg);
1971 return false;
1972}
1973
1974/// translateDstIndex - Appends a destination index operand to an MCInst.
1975///
1976/// @param mcInst - The MCInst to append to.
1977/// @param insn - The internal instruction.
1978
1979static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn) {
1980 unsigned baseRegNo;
1981
1982 if (insn.mode == MODE_64BIT)
1983 baseRegNo = insn.hasAdSize ? X86::EDI : X86::RDI;
1984 else if (insn.mode == MODE_32BIT)
1985 baseRegNo = insn.hasAdSize ? X86::DI : X86::EDI;
1986 else {
1987 assert(insn.mode == MODE_16BIT);
1988 baseRegNo = insn.hasAdSize ? X86::EDI : X86::DI;
1989 }
1990 MCOperand baseReg = MCOperand::createReg(baseRegNo);
1991 mcInst.addOperand(baseReg);
1992 return false;
1993}
1994
1995/// translateImmediate - Appends an immediate operand to an MCInst.
1996///
1997/// @param mcInst - The MCInst to append to.
1998/// @param immediate - The immediate value to append.
1999/// @param operand - The operand, as stored in the descriptor table.
2000/// @param insn - The internal instruction.
2001static void translateImmediate(MCInst &mcInst, uint64_t immediate,
2002 const OperandSpecifier &operand,
2003 InternalInstruction &insn,
2004 const MCDisassembler *Dis) {
2005 // Sign-extend the immediate if necessary.
2006
2007 OperandType type = (OperandType)operand.type;
2008
2009 bool isBranch = false;
2010 uint64_t pcrel = 0;
2011 if (type == TYPE_REL) {
2012 isBranch = true;
2013 pcrel = insn.startLocation + insn.length;
2014 switch (operand.encoding) {
2015 default:
2016 break;
2017 case ENCODING_Iv:
2018 switch (insn.displacementSize) {
2019 default:
2020 break;
2021 case 1:
2022 if(immediate & 0x80)
2023 immediate |= ~(0xffull);
2024 break;
2025 case 2:
2026 if(immediate & 0x8000)
2027 immediate |= ~(0xffffull);
2028 break;
2029 case 4:
2030 if(immediate & 0x80000000)
2031 immediate |= ~(0xffffffffull);
2032 break;
2033 case 8:
2034 break;
2035 }
2036 break;
2037 case ENCODING_IB:
2038 if(immediate & 0x80)
2039 immediate |= ~(0xffull);
2040 break;
2041 case ENCODING_IW:
2042 if(immediate & 0x8000)
2043 immediate |= ~(0xffffull);
2044 break;
2045 case ENCODING_ID:
2046 if(immediate & 0x80000000)
2047 immediate |= ~(0xffffffffull);
2048 break;
2049 }
2050 }
2051 // By default sign-extend all X86 immediates based on their encoding.
2052 else if (type == TYPE_IMM) {
2053 switch (operand.encoding) {
2054 default:
2055 break;
2056 case ENCODING_IB:
2057 if(immediate & 0x80)
2058 immediate |= ~(0xffull);
2059 break;
2060 case ENCODING_IW:
2061 if(immediate & 0x8000)
2062 immediate |= ~(0xffffull);
2063 break;
2064 case ENCODING_ID:
2065 if(immediate & 0x80000000)
2066 immediate |= ~(0xffffffffull);
2067 break;
2068 case ENCODING_IO:
2069 break;
2070 }
2071 }
2072
2073 switch (type) {
2074 case TYPE_XMM:
2075 mcInst.addOperand(MCOperand::createReg(X86::XMM0 + (immediate >> 4)));
2076 return;
2077 case TYPE_YMM:
2078 mcInst.addOperand(MCOperand::createReg(X86::YMM0 + (immediate >> 4)));
2079 return;
2080 case TYPE_ZMM:
2081 mcInst.addOperand(MCOperand::createReg(X86::ZMM0 + (immediate >> 4)));
2082 return;
2083 default:
2084 // operand is 64 bits wide. Do nothing.
2085 break;
2086 }
2087
2088 if (!Dis->tryAddingSymbolicOperand(
2089 mcInst, immediate + pcrel, insn.startLocation, isBranch,
2090 insn.immediateOffset, insn.immediateSize, insn.length))
2091 mcInst.addOperand(MCOperand::createImm(immediate));
2092
2093 if (type == TYPE_MOFFS) {
2094 MCOperand segmentReg;
2096 mcInst.addOperand(segmentReg);
2097 }
2098}
2099
2100/// translateRMRegister - Translates a register stored in the R/M field of the
2101/// ModR/M byte to its LLVM equivalent and appends it to an MCInst.
2102/// @param mcInst - The MCInst to append to.
2103/// @param insn - The internal instruction to extract the R/M field
2104/// from.
2105/// @return - 0 on success; -1 otherwise
2106static bool translateRMRegister(MCInst &mcInst,
2107 InternalInstruction &insn) {
2108 if (insn.eaBase == EA_BASE_sib || insn.eaBase == EA_BASE_sib64) {
2109 debug("A R/M register operand may not have a SIB byte");
2110 return true;
2111 }
2112
2113 switch (insn.eaBase) {
2114 default:
2115 debug("Unexpected EA base register");
2116 return true;
2117 case EA_BASE_NONE:
2118 debug("EA_BASE_NONE for ModR/M base");
2119 return true;
2120#define ENTRY(x) case EA_BASE_##x:
2122#undef ENTRY
2123 debug("A R/M register operand may not have a base; "
2124 "the operand must be a register.");
2125 return true;
2126#define ENTRY(x) \
2127 case EA_REG_##x: \
2128 mcInst.addOperand(MCOperand::createReg(X86::x)); break;
2129 ALL_REGS
2130#undef ENTRY
2131 }
2132
2133 return false;
2134}
2135
2136/// translateRMMemory - Translates a memory operand stored in the Mod and R/M
2137/// fields of an internal instruction (and possibly its SIB byte) to a memory
2138/// operand in LLVM's format, and appends it to an MCInst.
2139///
2140/// @param mcInst - The MCInst to append to.
2141/// @param insn - The instruction to extract Mod, R/M, and SIB fields
2142/// from.
2143/// @param ForceSIB - The instruction must use SIB.
2144/// @return - 0 on success; nonzero otherwise
2146 const MCDisassembler *Dis,
2147 bool ForceSIB = false) {
2148 // Addresses in an MCInst are represented as five operands:
2149 // 1. basereg (register) The R/M base, or (if there is a SIB) the
2150 // SIB base
2151 // 2. scaleamount (immediate) 1, or (if there is a SIB) the specified
2152 // scale amount
2153 // 3. indexreg (register) x86_registerNONE, or (if there is a SIB)
2154 // the index (which is multiplied by the
2155 // scale amount)
2156 // 4. displacement (immediate) 0, or the displacement if there is one
2157 // 5. segmentreg (register) x86_registerNONE for now, but could be set
2158 // if we have segment overrides
2159
2160 MCOperand baseReg;
2161 MCOperand scaleAmount;
2162 MCOperand indexReg;
2163 MCOperand displacement;
2164 MCOperand segmentReg;
2165 uint64_t pcrel = 0;
2166
2167 if (insn.eaBase == EA_BASE_sib || insn.eaBase == EA_BASE_sib64) {
2168 if (insn.sibBase != SIB_BASE_NONE) {
2169 switch (insn.sibBase) {
2170 default:
2171 debug("Unexpected sibBase");
2172 return true;
2173#define ENTRY(x) \
2174 case SIB_BASE_##x: \
2175 baseReg = MCOperand::createReg(X86::x); break;
2177#undef ENTRY
2178 }
2179 } else {
2180 baseReg = MCOperand::createReg(X86::NoRegister);
2181 }
2182
2183 if (insn.sibIndex != SIB_INDEX_NONE) {
2184 switch (insn.sibIndex) {
2185 default:
2186 debug("Unexpected sibIndex");
2187 return true;
2188#define ENTRY(x) \
2189 case SIB_INDEX_##x: \
2190 indexReg = MCOperand::createReg(X86::x); break;
2193 REGS_XMM
2194 REGS_YMM
2195 REGS_ZMM
2196#undef ENTRY
2197 }
2198 } else {
2199 // Use EIZ/RIZ for a few ambiguous cases where the SIB byte is present,
2200 // but no index is used and modrm alone should have been enough.
2201 // -No base register in 32-bit mode. In 64-bit mode this is used to
2202 // avoid rip-relative addressing.
2203 // -Any base register used other than ESP/RSP/R12D/R12. Using these as a
2204 // base always requires a SIB byte.
2205 // -A scale other than 1 is used.
2206 if (!ForceSIB &&
2207 (insn.sibScale != 1 ||
2208 (insn.sibBase == SIB_BASE_NONE && insn.mode != MODE_64BIT) ||
2209 (insn.sibBase != SIB_BASE_NONE &&
2210 insn.sibBase != SIB_BASE_ESP && insn.sibBase != SIB_BASE_RSP &&
2211 insn.sibBase != SIB_BASE_R12D && insn.sibBase != SIB_BASE_R12))) {
2212 indexReg = MCOperand::createReg(insn.addressSize == 4 ? X86::EIZ :
2213 X86::RIZ);
2214 } else
2215 indexReg = MCOperand::createReg(X86::NoRegister);
2216 }
2217
2218 scaleAmount = MCOperand::createImm(insn.sibScale);
2219 } else {
2220 switch (insn.eaBase) {
2221 case EA_BASE_NONE:
2222 if (insn.eaDisplacement == EA_DISP_NONE) {
2223 debug("EA_BASE_NONE and EA_DISP_NONE for ModR/M base");
2224 return true;
2225 }
2226 if (insn.mode == MODE_64BIT){
2227 pcrel = insn.startLocation + insn.length;
2229 insn.startLocation +
2230 insn.displacementOffset);
2231 // Section 2.2.1.6
2232 baseReg = MCOperand::createReg(insn.addressSize == 4 ? X86::EIP :
2233 X86::RIP);
2234 }
2235 else
2236 baseReg = MCOperand::createReg(X86::NoRegister);
2237
2238 indexReg = MCOperand::createReg(X86::NoRegister);
2239 break;
2240 case EA_BASE_BX_SI:
2241 baseReg = MCOperand::createReg(X86::BX);
2242 indexReg = MCOperand::createReg(X86::SI);
2243 break;
2244 case EA_BASE_BX_DI:
2245 baseReg = MCOperand::createReg(X86::BX);
2246 indexReg = MCOperand::createReg(X86::DI);
2247 break;
2248 case EA_BASE_BP_SI:
2249 baseReg = MCOperand::createReg(X86::BP);
2250 indexReg = MCOperand::createReg(X86::SI);
2251 break;
2252 case EA_BASE_BP_DI:
2253 baseReg = MCOperand::createReg(X86::BP);
2254 indexReg = MCOperand::createReg(X86::DI);
2255 break;
2256 default:
2257 indexReg = MCOperand::createReg(X86::NoRegister);
2258 switch (insn.eaBase) {
2259 default:
2260 debug("Unexpected eaBase");
2261 return true;
2262 // Here, we will use the fill-ins defined above. However,
2263 // BX_SI, BX_DI, BP_SI, and BP_DI are all handled above and
2264 // sib and sib64 were handled in the top-level if, so they're only
2265 // placeholders to keep the compiler happy.
2266#define ENTRY(x) \
2267 case EA_BASE_##x: \
2268 baseReg = MCOperand::createReg(X86::x); break;
2270#undef ENTRY
2271#define ENTRY(x) case EA_REG_##x:
2272 ALL_REGS
2273#undef ENTRY
2274 debug("A R/M memory operand may not be a register; "
2275 "the base field must be a base.");
2276 return true;
2277 }
2278 }
2279
2280 scaleAmount = MCOperand::createImm(1);
2281 }
2282
2283 displacement = MCOperand::createImm(insn.displacement);
2284
2286
2287 mcInst.addOperand(baseReg);
2288 mcInst.addOperand(scaleAmount);
2289 mcInst.addOperand(indexReg);
2290
2291 const uint8_t dispSize =
2292 (insn.eaDisplacement == EA_DISP_NONE) ? 0 : insn.displacementSize;
2293
2294 if (!Dis->tryAddingSymbolicOperand(
2295 mcInst, insn.displacement + pcrel, insn.startLocation, false,
2296 insn.displacementOffset, dispSize, insn.length))
2297 mcInst.addOperand(displacement);
2298 mcInst.addOperand(segmentReg);
2299 return false;
2300}
2301
2302/// translateRM - Translates an operand stored in the R/M (and possibly SIB)
2303/// byte of an instruction to LLVM form, and appends it to an MCInst.
2304///
2305/// @param mcInst - The MCInst to append to.
2306/// @param operand - The operand, as stored in the descriptor table.
2307/// @param insn - The instruction to extract Mod, R/M, and SIB fields
2308/// from.
2309/// @return - 0 on success; nonzero otherwise
2310static bool translateRM(MCInst &mcInst, const OperandSpecifier &operand,
2311 InternalInstruction &insn, const MCDisassembler *Dis) {
2312 switch (operand.type) {
2313 default:
2314 debug("Unexpected type for a R/M operand");
2315 return true;
2316 case TYPE_R8:
2317 case TYPE_R16:
2318 case TYPE_R32:
2319 case TYPE_R64:
2320 case TYPE_Rv:
2321 case TYPE_MM64:
2322 case TYPE_XMM:
2323 case TYPE_YMM:
2324 case TYPE_ZMM:
2325 case TYPE_TMM:
2326 case TYPE_TMM_PAIR:
2327 case TYPE_VK_PAIR:
2328 case TYPE_VK:
2329 case TYPE_DEBUGREG:
2330 case TYPE_CONTROLREG:
2331 case TYPE_BNDR:
2332 return translateRMRegister(mcInst, insn);
2333 case TYPE_M:
2334 case TYPE_MVSIBX:
2335 case TYPE_MVSIBY:
2336 case TYPE_MVSIBZ:
2337 return translateRMMemory(mcInst, insn, Dis);
2338 case TYPE_MSIB:
2339 return translateRMMemory(mcInst, insn, Dis, true);
2340 }
2341}
2342
2343/// translateFPRegister - Translates a stack position on the FPU stack to its
2344/// LLVM form, and appends it to an MCInst.
2345///
2346/// @param mcInst - The MCInst to append to.
2347/// @param stackPos - The stack position to translate.
2348static void translateFPRegister(MCInst &mcInst,
2349 uint8_t stackPos) {
2350 mcInst.addOperand(MCOperand::createReg(X86::ST0 + stackPos));
2351}
2352
2353/// translateMaskRegister - Translates a 3-bit mask register number to
2354/// LLVM form, and appends it to an MCInst.
2355///
2356/// @param mcInst - The MCInst to append to.
2357/// @param maskRegNum - Number of mask register from 0 to 7.
2358/// @return - false on success; true otherwise.
2359static bool translateMaskRegister(MCInst &mcInst,
2360 uint8_t maskRegNum) {
2361 if (maskRegNum >= 8) {
2362 debug("Invalid mask register number");
2363 return true;
2364 }
2365
2366 mcInst.addOperand(MCOperand::createReg(X86::K0 + maskRegNum));
2367 return false;
2368}
2369
2370/// translateOperand - Translates an operand stored in an internal instruction
2371/// to LLVM's format and appends it to an MCInst.
2372///
2373/// @param mcInst - The MCInst to append to.
2374/// @param operand - The operand, as stored in the descriptor table.
2375/// @param insn - The internal instruction.
2376/// @return - false on success; true otherwise.
2377static bool translateOperand(MCInst &mcInst, const OperandSpecifier &operand,
2378 InternalInstruction &insn,
2379 const MCDisassembler *Dis) {
2380 switch (operand.encoding) {
2381 default:
2382 debug("Unhandled operand encoding during translation");
2383 return true;
2384 case ENCODING_REG:
2385 translateRegister(mcInst, insn.reg);
2386 return false;
2387 case ENCODING_WRITEMASK:
2388 return translateMaskRegister(mcInst, insn.writemask);
2389 case ENCODING_SIB:
2392 return translateRM(mcInst, operand, insn, Dis);
2393 case ENCODING_IB:
2394 case ENCODING_IW:
2395 case ENCODING_ID:
2396 case ENCODING_IO:
2397 case ENCODING_Iv:
2398 case ENCODING_Ia:
2399 translateImmediate(mcInst,
2401 operand,
2402 insn,
2403 Dis);
2404 return false;
2405 case ENCODING_IRC:
2406 mcInst.addOperand(MCOperand::createImm(insn.RC));
2407 return false;
2408 case ENCODING_SI:
2409 return translateSrcIndex(mcInst, insn);
2410 case ENCODING_DI:
2411 return translateDstIndex(mcInst, insn);
2412 case ENCODING_RB:
2413 case ENCODING_RW:
2414 case ENCODING_RD:
2415 case ENCODING_RO:
2416 case ENCODING_Rv:
2417 translateRegister(mcInst, insn.opcodeRegister);
2418 return false;
2419 case ENCODING_CF:
2421 return false;
2422 case ENCODING_CC:
2423 if (isCCMPOrCTEST(&insn))
2425 else
2427 return false;
2428 case ENCODING_FP:
2429 translateFPRegister(mcInst, insn.modRM & 7);
2430 return false;
2431 case ENCODING_VVVV:
2432 translateRegister(mcInst, insn.vvvv);
2433 return false;
2434 case ENCODING_DUP:
2435 return translateOperand(mcInst, insn.operands[operand.type - TYPE_DUP0],
2436 insn, Dis);
2437 }
2438}
2439
2440/// translateInstruction - Translates an internal instruction and all its
2441/// operands to an MCInst.
2442///
2443/// @param mcInst - The MCInst to populate with the instruction's data.
2444/// @param insn - The internal instruction.
2445/// @return - false on success; true otherwise.
2446static bool translateInstruction(MCInst &mcInst,
2447 InternalInstruction &insn,
2448 const MCDisassembler *Dis) {
2449 if (!insn.spec) {
2450 debug("Instruction has no specification");
2451 return true;
2452 }
2453
2454 mcInst.clear();
2455 mcInst.setOpcode(insn.instructionID);
2456 // If when reading the prefix bytes we determined the overlapping 0xf2 or 0xf3
2457 // prefix bytes should be disassembled as xrelease and xacquire then set the
2458 // opcode to those instead of the rep and repne opcodes.
2459 if (insn.xAcquireRelease) {
2460 if(mcInst.getOpcode() == X86::REP_PREFIX)
2461 mcInst.setOpcode(X86::XRELEASE_PREFIX);
2462 else if(mcInst.getOpcode() == X86::REPNE_PREFIX)
2463 mcInst.setOpcode(X86::XACQUIRE_PREFIX);
2464 }
2465
2466 insn.numImmediatesTranslated = 0;
2467
2468 for (const auto &Op : insn.operands) {
2469 if (Op.encoding != ENCODING_NONE) {
2470 if (translateOperand(mcInst, Op, insn, Dis)) {
2471 return true;
2472 }
2473 }
2474 }
2475
2476 return false;
2477}
2478
2480 const MCSubtargetInfo &STI,
2481 MCContext &Ctx) {
2482 std::unique_ptr<const MCInstrInfo> MII(T.createMCInstrInfo());
2483 return new X86GenericDisassembler(STI, Ctx, std::move(MII));
2484}
2485
#define Fail
MCDisassembler::DecodeStatus DecodeStatus
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
#define op(i)
#define T
static bool isBranch(unsigned Opcode)
static const char * name
#define LLVM_DEBUG(...)
Definition Debug.h:119
#define LLVM_C_ABI
LLVM_C_ABI is the export/visibility macro used to mark symbols declared in llvm-c as exported when bu...
Definition Visibility.h:40
static uint8_t readOpcode(WasmObjectFile::ReadContext &Ctx)
static int nextByte(ArrayRef< uint8_t > Bytes, uint64_t &Size)
static bool isPrefix(unsigned Opcode, const MCInstrInfo &MCII)
Check if the instruction is a prefix.
#define TWOBYTE_SYM
#define CASE_ENCODING_VSIB
#define XOP9_MAP_SYM
#define CASE_ENCODING_RM
#define THREEDNOW_MAP_SYM
#define INSTRUCTIONS_SYM
#define THREEBYTE3A_SYM
#define XOP8_MAP_SYM
#define THREEBYTE38_SYM
#define XOPA_MAP_SYM
#define ONEBYTE_SYM
#define rFromEVEX2of4(evex)
#define lFromEVEX4of4(evex)
#define l2FromEVEX4of4(evex)
#define rFromVEX2of3(vex)
#define zFromEVEX4of4(evex)
#define wFromREX2(rex2)
#define rFromREX(rex)
#define bFromXOP2of3(xop)
#define xFromVEX2of3(vex)
#define mmmmmFromVEX2of3(vex)
#define rmFromModRM(modRM)
#define bFromREX2(rex2)
#define baseFromSIB(sib)
#define bFromEVEX4of4(evex)
#define rFromVEX2of2(vex)
#define ppFromEVEX3of4(evex)
#define v2FromEVEX4of4(evex)
#define modFromModRM(modRM)
#define rFromXOP2of3(xop)
#define wFromREX(rex)
#define lFromXOP3of3(xop)
#define EA_BASES_64BIT
#define lFromVEX2of2(vex)
#define REGS_YMM
#define x2FromREX2(rex2)
#define scFromEVEX4of4(evex)
#define scaleFromSIB(sib)
#define REGS_XMM
#define rFromREX2(rex2)
#define regFromModRM(modRM)
#define b2FromEVEX2of4(evex)
#define b2FromREX2(rex2)
#define vvvvFromVEX2of2(vex)
#define nfFromEVEX4of4(evex)
#define ALL_REGS
#define ppFromXOP3of3(xop)
#define ALL_SIB_BASES
#define vvvvFromVEX3of3(vex)
#define r2FromEVEX2of4(evex)
#define uFromEVEX3of4(evex)
#define xFromREX2(rex2)
#define EA_BASES_32BIT
#define xFromXOP2of3(xop)
#define wFromEVEX3of4(evex)
#define bFromVEX2of3(vex)
#define wFromVEX3of3(vex)
#define mmmmmFromXOP2of3(xop)
#define aaaFromEVEX4of4(evex)
#define lFromVEX3of3(vex)
#define mmmFromEVEX2of4(evex)
#define ppFromVEX3of3(vex)
#define bFromEVEX2of4(evex)
#define xFromEVEX2of4(evex)
#define REGS_ZMM
#define ppFromVEX2of2(vex)
#define indexFromSIB(sib)
#define ALL_EA_BASES
#define mFromREX2(rex2)
#define vvvvFromXOP3of3(xop)
#define wFromXOP3of3(xop)
#define r2FromREX2(rex2)
#define oszcFromEVEX3of4(evex)
#define vvvvFromEVEX3of4(evex)
#define xFromREX(rex)
#define bFromREX(rex)
static void translateRegister(MCInst &mcInst, Reg reg)
translateRegister - Translates an internal register to the appropriate LLVM register,...
static bool isREX2(struct InternalInstruction *insn, uint8_t prefix)
static int getInstructionID(struct InternalInstruction *insn, const MCInstrInfo *mii)
static bool readOpcode(struct InternalInstruction *insn)
static MCDisassembler * createX86Disassembler(const Target &T, const MCSubtargetInfo &STI, MCContext &Ctx)
static bool translateMaskRegister(MCInst &mcInst, uint8_t maskRegNum)
translateMaskRegister - Translates a 3-bit mask register number to LLVM form, and appends it to an MC...
static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn)
translateDstIndex - Appends a destination index operand to an MCInst.
static void translateImmediate(MCInst &mcInst, uint64_t immediate, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateImmediate - Appends an immediate operand to an MCInst.
static int readOperands(struct InternalInstruction *insn)
static void translateFPRegister(MCInst &mcInst, uint8_t stackPos)
translateFPRegister - Translates a stack position on the FPU stack to its LLVM form,...
static bool is64Bit(const char *name)
static const uint8_t segmentRegnums[SEG_OVERRIDE_max]
static int readImmediate(struct InternalInstruction *insn, uint8_t size)
static int getInstructionIDWithAttrMask(uint16_t *instructionID, struct InternalInstruction *insn, uint16_t attrMask)
static int readSIB(struct InternalInstruction *insn)
static bool isREX(struct InternalInstruction *insn, uint8_t prefix)
static int readVVVV(struct InternalInstruction *insn)
static bool isNF(InternalInstruction *insn)
static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn)
translateSrcIndex - Appends a source index operand to an MCInst.
#define GENERIC_FIXUP_FUNC(name, base, prefix)
static int readMaskRegister(struct InternalInstruction *insn)
static bool translateRM(MCInst &mcInst, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateRM - Translates an operand stored in the R/M (and possibly SIB) byte of an instruction to LL...
static InstrUID decode(OpcodeType type, InstructionContext insnContext, uint8_t opcode, uint8_t modRM)
static int readOpcodeRegister(struct InternalInstruction *insn, uint8_t size)
static int readDisplacement(struct InternalInstruction *insn)
static bool isCCMPOrCTEST(InternalInstruction *insn)
LLVM_C_ABI void LLVMInitializeX86Disassembler()
static int fixupReg(struct InternalInstruction *insn, const struct OperandSpecifier *op)
#define debug(s)
static int readModRM(struct InternalInstruction *insn)
static bool is16BitEquivalent(const char *orig, const char *equiv)
static bool translateRMMemory(MCInst &mcInst, InternalInstruction &insn, const MCDisassembler *Dis, bool ForceSIB=false)
translateRMMemory - Translates a memory operand stored in the Mod and R/M fields of an internal instr...
static bool translateInstruction(MCInst &target, InternalInstruction &source, const MCDisassembler *Dis)
translateInstruction - Translates an internal instruction and all its operands to an MCInst.
static bool translateRMRegister(MCInst &mcInst, InternalInstruction &insn)
translateRMRegister - Translates a register stored in the R/M field of the ModR/M byte to its LLVM eq...
static bool translateOperand(MCInst &mcInst, const OperandSpecifier &operand, InternalInstruction &insn, const MCDisassembler *Dis)
translateOperand - Translates an operand stored in an internal instruction to LLVM's format and appen...
static int readPrefixes(struct InternalInstruction *insn)
static bool peek(struct InternalInstruction *insn, uint8_t &byte)
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:147
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:142
Context object for machine code objects.
Definition MCContext.h:83
Superclass for all disassemblers.
bool tryAddingSymbolicOperand(MCInst &Inst, int64_t Value, uint64_t Address, bool IsBranch, uint64_t Offset, uint64_t OpSize, uint64_t InstSize) const
void tryAddingPcLoadReferenceComment(int64_t Value, uint64_t Address) const
DecodeStatus
Ternary decode status.
Instances of this class represent a single low-level machine instruction.
Definition MCInst.h:188
unsigned getOpcode() const
Definition MCInst.h:202
void addOperand(const MCOperand Op)
Definition MCInst.h:215
void setOpcode(unsigned Op)
Definition MCInst.h:201
void clear()
Definition MCInst.h:223
Interface to description of machine instruction set.
Definition MCInstrInfo.h:27
StringRef getName(unsigned Opcode) const
Returns the name for the instructions with the given opcode.
Definition MCInstrInfo.h:71
Instances of this class represent operands of the MCInst class.
Definition MCInst.h:40
static MCOperand createReg(MCRegister Reg)
Definition MCInst.h:138
static MCOperand createImm(int64_t Val)
Definition MCInst.h:145
Generic base class for all target subtargets.
const FeatureBitset & getFeatureBits() const
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
constexpr const char * data() const
data - Get a pointer to the start of the string (which may not be null terminated).
Definition StringRef.h:148
Target - Wrapper for Target specific information.
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
EABase
All possible values of the base field for effective-address computations, a.k.a.
Reg
All possible values of the reg field in the ModR/M byte.
DisassemblerMode
Decoding mode for the Intel disassembler.
SIBBase
All possible values of the SIB base field.
SIBIndex
All possible values of the SIB index field.
Define some predicates that are used for node matching.
@ IP_HAS_REPEAT_NE
Definition X86BaseInfo.h:55
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
value_type read(const void *memory, endianness endian)
Read a value of a particular endianness from memory.
Definition Endian.h:58
This is an optimization pass for GlobalISel generic memory operations.
LLVM_ATTRIBUTE_ALWAYS_INLINE DynamicAPInt mod(const DynamicAPInt &LHS, const DynamicAPInt &RHS)
is always non-negative.
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1685
Target & getTheX86_32Target()
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition Format.h:126
@ Success
The lock was released successfully.
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
DWARFExpression::Operation Op
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1869
Target & getTheX86_64Target()
Implement std::hash so that hash_code can be used in STL containers.
Definition BitVector.h:851
OpcodeDecision opcodeDecisions[IC_max]
ModRMDecision modRMDecisions[256]
static void RegisterMCDisassembler(Target &T, Target::MCDisassemblerCtorTy Fn)
RegisterMCDisassembler - Register a MCDisassembler implementation for the given target.
The specification for how to extract and interpret a full instruction and its operands.
The x86 internal instruction, which is produced by the decoder.
The specification for how to extract and interpret one operand.