Assembly language. Assembler Commands and Fundamentals
The article will discuss the basics of assembly language in relation to the win32 architecture. It is a symbolic record of machine codes. In any electronic computer, the lowest level is hardware. Here the management of processes takes place with instructions or instructions in machine language. It is in this area that assembler is meant to work.
Writing a program in assembler is an extremely difficult and costly process. To create an effective algorithm, you need a deep understanding of the operation of computers, knowledge of the details of commands, as well as increased attention and accuracy. Efficiency is a critical parameter for programming in assembler.
The main advantage of assembler is that it allows you to create short and fast programs. Therefore, it is used, as a rule, for solving specialized problems. You need code that works efficiently with hardware components, or you need a program that requires memory or runtime.
Registers in the assembler language are called memory cells located directly on a chip with an ALU (processor). A feature of this type of memory is the speed of access to it, which is much faster than the RAM of a computer. It is also called ultra-fast RAM (PopS or SRAM).
There are the following types of registers:
- General purpose registers (RON).
- Command pointer
- Segment registers.
There are 8 general-purpose registers, each 32 bits in size.
The EAX, ECX, EDX, EBX registers can be accessed in 32-bit mode, 16-bit - AX, BX, CX, DX, and also 8-bit - AH and AL, BH and BL, etc.
The letter "E" in the names of the registers means Extended (extended). The names themselves are related to their names in English:
- Accumulator register (AX) - for arithmetic operations.
- Counter register (CX) - for shifts and cycles.
- Data register (DX) - for arithmetic and input / output operations.
- Base register (BX) - for a pointer to the data.
- Stack Pointer register (SP) - for a pointer to the top of the stack.
- Stack Base Pointer register (BP) - for the indicator of the base of the stack.
- Source Index register (SI) - for the sender (source) pointer.
- Destination Index register (DI) - for the recipient.
Specialization RON assembly language is conditional. They can be used in any operations. However, some commands are able to use only certain registers.For example, loop commands use ESX to store the counter value.
Register of flags. This implies a byte, which can take the values 0 and 1. The combination of all flags (there are about 30) show the status of the processor. Examples of flags: Carry Flag (CF) - Carry flag, Overflow Flag (OF) - overflow, Nested Flag (NT) - task nesting flag and many others. Flags are divided into 3 groups: status, management and system.
Command Index (EIP - Instruction Pointer). This register contains the address of the instruction, which must be executed next, unless otherwise specified.
Segment registers (CS, DS, SS, ES, FS, GS). Their presence in the assembler is dictated by a special control of the RAM in order to increase its use in programs. Thanks to them, you could manage memory up to 4 GB. In the Win32 architecture, there is no need for segments, but the names of the registers are preserved and used differently.
This is a memory area allocated to work procedures. The peculiarity of the stack is that the latest data written to it is readable first. Or in other words: the first stack entries are retrieved last. You can imagine this process as a tower of drafts.To get the sword (the bottom sword at the base of the tower or any in the middle), you must first remove all that lie on top. And, accordingly, the last piece put on the tower, is removed first when parsing the tower. This principle of organizing memory and working with it is dictated by its economy. The stack is constantly cleared and at each time one procedure uses it.
Identifiers, integers, symbols, comments, equivalence
The identifier in the programming language assembler has the same meaning as in any other. Latin letters, numbers and symbols "_", ".", "?", "@", "$" Are allowed. In this case, uppercase and lowercase letters are equivalent, and a dot can only be the first character of the identifier.
Integer numbers in the assembler can be specified in reference systems with bases 2, 8, 10, and 16. Any other entry of numbers will be considered by the assembler as an identifier.
It is allowed to use both apostrophes and quotes in writing character data. If you need to specify one of them in the character string, then the rules are as follows:
- in the line enclosed in apostrophes, the quotes are indicated once, the apostrophe - twice: 'can''t', 'he said "to be or not to be";
- for a string enclosed in quotes, the rule is the opposite: double quotes are duplicated, apostrophes are indicated as is: "couldn't", "My favorite bar is" "Black Cat" "".
To indicate comments in assembler language, the semicolon symbol is used - ";". It is permissible to use comments both at the beginning of lines and after the command. The comment ends with a newline.
The equivalence directive is used in a similar way to how constant expressions are used in other languages. Equivalence is indicated in the following way:<name> EQU <operand>
Thus, in the program, all occurrences of <name> will be replaced with <operand>, in the place of which it is possible to specify an integer, address, string or other name. The EQU directive is similar in its work to #define in C ++.
High-level languages (C ++, Pascal) are typed. That is, they use data that has a certain type, there are functions for processing them, etc. In the programming language assembler, there is no such thing. There are only 5 directives for data definition:
- DB - Byte: allocate 1 byte for a variable.
- DW - Word: select 2 bytes.
- DD - Double word: select 4 bytes.
- DQ - Quad word: select 8 bytes.
- DT - Ten bytes: allocate 10 bytes for a variable.
The letter D means Define.
Any directive can be used to declare any data and arrays. However, for strings, DB is recommended.
Syntax:<name> DQ <operand> [, <operand>]
As an operand, it is permissible to use numbers, symbols and a question mark - “?”, Denoting a variable without initialization. Consider examples:real1 DD 12.34 char db 'c' ar2 db '123456', 0; array of 7 bytes num1 db 11001001b; binary number num2 dw 7777; octal number num3 dd -890d; decimal number num4 dd 0beah; hexadecimal number var1 dd? ; variable without initial value ar3 dd 50 dup (0); array of 50 initialized email ar4 dq 5 dup (0, 1, 1.25); An array of 15 elements, initialized by repeats 0, 1 and 1.25
The syntax of assembler commands or assembler instructions is as follows:<label>: <instruction operands> [; Comment]
A label (label :) must end with a colon and can be placed on a separate line. Labels are used to refer to commands within a program.
Instructions indicate the operation to be performed. In the assembler operations are presented in the form of letter abbreviations to facilitate understanding. Instructions may also be called mnemonics.
In the role of operands commands can be:
- registers accessed by their names;
More about addresses
The address can be transmitted in several ways:
- In the form of a variable name that is synonymous with an address in an assembler.
- If the variable is an array, then the access to the array element occurs through the name of its variable and offset. There are 2 forms for this: [<name> + <offset>] and <name> [<offset>]. Note that the offset is not an index in the array, but a size in bytes. The programmer himself needs to understand how much an offset needs to be made in bytes to get the desired element of the array.
- You can use registers. To access the memory in which the register is stored, you need to use square brackets: [ebx], [edi].
-  - brackets allow the use of complex expressions within themselves to calculate the address: [esi + 2 * eax].
In the assembler, the address is transmitted through square brackets. Since the variable is also an address, it can be used with or without square brackets.
In addition, there are abbreviations in assembler: r for registers, m for memory, and i for operand. These abbreviations are used with the numbers 8, 16, and 32 to indicate the size of the operand: r8, m16, i32, etc.add i8 / i16 / i32, m8 / m16 / m32; summation of the operand with a memory cell
Mov command or forward
This instruction is the main one among the assembler commands. It allows you to write to the register the value of another register, a memory cell or a constant. It also writes to the memory cell values of the register or constant. Command syntax:MOV <op1>, <op2>
There are other commands in the processor to implement the transfer. For example, XCHG is a command for exchanging operands by values. But from the point of view of the programmer, they are all implemented through the basic MOV command. Consider examples:MOV i, 0; Write in i the value 0 MOV ECX, EBX; Forwarding EBX to ECX
In the form of an operand can act as a register, and a memory cell. However, if the contents of the two registers can be rearranged, then there are no two memory cells. Care should be taken to ensure that the operands are the same size. Also note that the MOV command does not change the value of the flags.
Further theoretical study of the assembler can be difficult, so you should think about the tools used to develop programs with it. Here you will only see a short list of popular tools:
- Borland Turbo Assembler (TASM) is one of the most popular tools.Good for development under DOS and bad for Windows.
- Microsoft Macro Assembler (MASM) is a package for development on the assembler in the environment of Windows. It exists both separately and as a built-in function in the Visual Studio environment. Assembler and high-level languages are often compatible. In the sense that the latter can use the assembler directly. For example, C ++.
- Netwide Assembler (NASM) is a popular free assembler for Intel architecture.
There are many tools. In this case, you should make a special note that there is no uniform standard for assembler syntax. There are 2 most applicable: AT & T syntax, focused on non-Intel processors, and, accordingly, Intel syntax.
Despite the apparent complexity, assembler is a simple programming language that is easy to understand. Therefore, you can safely use the educational literature on the similarity of "assembler for dummies" and learn this wonderful language.