Introduction to assembly macros (FASM 64-bit).
Some important notes that you should read before starting the tutorial:
- The tutorial samples are intended for people, who on one hand, have at least a basic knowledge about writing assembly programs, on the other hand, are new to the usage of assembly macros. The tutorial samples may be seen as starting points, or templates for your own assembly projects involving macros.
- All code in the tutorial is 64-bit assembly; this code has no chance to work on a 32-bit target without important modifications.
- The samples have been developed and tested on Windows 11. They should run without problems on other 64-bit Windows releases. No effort has been made to guarantee whatever portability to Linux or macOS.
- The tutorial samples are intended to be used with FASM. There is probably a certain compatibility with NASM; they will probably not work with MASM or TASM, and certainly not with GAS (that uses the A&T syntax).
- I developed these samples using the SASM IDE (cf. my tutorial 32bit and 64-bit assembly programming made easy with SASM), that automatically creates the C-bindings needed for input-output operations. If you use another IDE, you must see yourself to make that work correctly.
The FASM preprocessor.
FASM, as many other assemblers, includes a so-called preprocessor, i.e. a program (or part of a program such as an assembler or compiler), that modifies the source code before it is assembled (resp. compiled). The preprocessor scans the source and replaces some things with others. For example, you can use a name to designate some lines of code that you use several times, and the preprocessor replaces this name with the corresponding code. Another example is the creation of your own, new instructions, that the FASM preprocessor will replace by some code using regular x64 assembly instructions. This means that you, the programmer, has to tell the preprocessor what it should preprocess and how. This is done using preprocessor directives. An important point to remember is that the preprocessor has no knowledge of assembly, all it understands are these directives; any parts of code not meant to the preprocessor will be ignored.
A first thing that the FASM preprocessor does is removing all comments from the code. So, any whatever after a semicolon will never be passed to the assembler.
FASM allows to break instructions into several lines using the line-break character "\" (backslash). The preprocessor removes this character and concatenates the two lines of code into one.
The include directive is used to include assembly code from a file. The preprocessor reads this file and replaces the directive with the code contained in that file.
The preprocessor translates string literals to binary.
The pseudo-instruction equ is in fact a simple preprocessor directive. If your code contains something like
array_length equ 20
the preprocessor will replace each occurrence of "array_length" by "20".
And finally, the preprocessor replaces all macros by the corresponding macro definition.
Macros without arguments.
Macros, or macroinstructions, are custom instructions, defined by the programmer, and replaced by regular assembly instructions by the preprocessor. Macros are
"declared" to the preprocessor by a macro definition, that starts with the preprocessor directive
macro. In the simplest case, a macro definition is of the following form:
macro <macro-name> {
<macro-body>
}
and in our assembly program, we'll use <macro-name> to call the macro (what happens in reality is that the preprocessor replaces
<macro-name> by the assembly instructions that make up the macro definition, and this code will be executed).
As example, let's write two macros that push resp. pop the 4 data registers RAX, RBX, RCX, and RDX. Here is the code (the macro names are up to you, of course):
macro pushregs {
push rax rbx rcx rdx
}
macro popregs {
pop rax rbx rcx rdx
}
I said before that it is important to note that the preprocessor don't know anything about assembly instructions. This means that if we use the name of a regular x64
assembly instruction as name for a macro, the preprocessor will replace this instruction with the macro definition. This is some kind of "overloading" the assembly
instruction with the macro. As an example, consider the instruction pusha, that pushes the registers AX CX DX, SP, BP, SI, DI onto the
stack. When using this instruction in 64-bit assembly, we get an "illegal instruction" error (it is not allowed to use 16-bit or 32-bit operands with
push and pop instructions in 64-bit assembly). If, now, we define the following macro
macro pusha {
push rax rcx rdx rsp rbp rsi rdi
}
and use pusha in our assembly source, the processor will replace it by the macro definition, and the code works without errors pushing
the 8 (64-bit) registers onto the stack. So, using assembly instructions as macro names is possible, but I guess that it's not the best practice to do it!
Consider the following program sample, that displays a "Hello World" message after having cleared the screen by printing the "clear-screen" ANSI escape sequence (cf.
my tutorial Text positioning and coloring using ANSI escape sequences on Windows).
format ELF64
section '.data' writeable
ecls db 1Bh, '[2J', 00h
hello db 'Hello World!', 0Dh, 0Ah, 00h
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
mov rcx, ecls
call printf
mov rcx, hello
call printf
mov rsp, rbp
xor rax, rax
ret
How can we implement the clear screen function as a macro?
Our macro will have two particularities: 1. it has to call the C function printf; 2. it has to include the declaration of the ANSI escape sequence (a string literal).
The call of printf must not bother us. The preprocessor is only concerned about the directives that are intended to itself, and the assembler will have no problems to deal with the call when assembling the code by which the preprocessor replaces the call to the macro.
In the program above, the escape sequence has been defined in the .data section. That is as it should be and normally is. However, FASM doesn't have any problem with data being declared in the .code section, and I guess that this not unusual in macro definition code. The only thing, to pay attention to, is to make sure that the data can never be interpreted as instructions and is never "executed".
Here is a first version of the cls macro definition (we will see below that this code can result in an error message during assembly):
format ELF64
macro cls {
jmp continue
ecls db 1Bh, '[2J', 00h
continue:
mov rcx, ecls
call printf
}
The code is easy to understand. We declare the ANSI escape sequence within the macro just as we did before in the .data section. This data part (within the code) is skipped by using a jmp instruction.
Here is another way to implement the declaration of some data within an assembly code part (the error during assembly being possible just as before):
format ELF64
macro cls {
call continue
db 1Bh, '[2J', 00h
continue:
pop rcx
call printf
}
To avoid the "execution" of the data, we use this time a call instruction (versus the jmp instruction used before). This instruction, before actually calling the subroutine, pushes the return address onto the stack. This return address is the address of the first byte following the call instruction. And that is ... the address of our ANSI escape sequence data! Thus, popping the return address into RCX, we load RCX with the address of the string to display, what is all that is needed to call printf.
That these two implementations of the macro can result in an error during assembly, I said. Do you see why? What do you think does happen if the macro would be called twice? The preprocessor replaces each macro call by the corresponding macro definition, i.e. the assembly instructions coded as macro body. Do you see now? If cls is called twice, the macro body is copied twice. This is in particular true for the two labels "ecls" and "continue". The assembler would find itself with a program where two labels are declared twice; the result would be a symbol already defined error!
To avoid this problem, FASM includes the directive local. If a label is defined as being local, its name will be dynamically changed by the preprocessor. In fact, each time a local label is encountered, a suffix of the form ?x, where x is a hexadecimal number, incremented during each replacement, is added to the label name. In the code, presented to the assembler, the label "continue" would thus be named "continue?1", when cls is called for the first time, "continue?2", when it is called for the second time, etc. No more duplicate label declarations, and the program can be assembled without errors, independently of how many times the macro is called.
Here is the code of the sample program macro1.asm that displays a "Hello World" message after having cleared the screen using the
corrected version of the first implementation of the cls macro. The usage of labels, starting with 2 dots (..) is not mandatory, but it
seems kind of good practice to use this prefix when naming labels local to a macro.
format ELF64
macro cls {
local ..ecls, ..code
jmp ..code
..ecls db 1Bh, '[2J', 00h
..code:
mov rcx, ..ecls
call printf
}
section '.data' writeable
hello db 'Hello, World!', 0Dh, 0Ah, 00h
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
cls
mov rcx, hello
call printf
mov rsp, rbp
xor rax, rax
ret
You can download the source code of all program samples of the tutorial from my website. Note that the program macro2.asm, contained in the download archive, is identical to macro1.asm, but using call instead of jmp in the macro definition.
Macros with (simple) arguments.
Like functions and procedures, macros may have one or more arguments. If there are are several arguments, they are separated with a comma (,). General form of a macro
with arguments definition:
macro <macro-name> <argument1>, <argument2>, ... {
<macro-body>
}
Example with one argument: Macro to print an unsigned integer:
macro print_int number {
local ..fmt, ..code
jmp ..code
..fmt db '%u', 0Dh, 0Ah, 00h
..code:
mov rcx, ..fmt
mov rdx, number
call printf
}
This macro can be called with either a 64-bit register, or a 64-bit value located in memory; examples:
print_int rax
print_int [integer] (with, for example: integer dq 20)
Example with 2 arguments: Swapping two 64-bit values (located either in a register or memory):
macro swap val1, val2 {
push val1
push val2
pop val1
pop val2
}
Calling examples:
swap rax, rbx
swap rax, [integer]
swap [integer1], [integer2]
Note: If a macro is called with less arguments than specified in its definition, the rest of arguments will have empty values. By placing the asterisk (*) symbol after an argument name, you can mark this argument as required; the preprocessor will not allow it to have an empty value. Optional arguments may be assigned a default value, using the equal (=) sign.
Is it possible to pass a string literal to a macro? No problem, just do it the same way as with a register or an address. The following macro displays a string,
passed as literal; carriage-return-linefeed and null terminator are added by the macro.
macro print_literal string {
local ..str, ..code
jmp ..code
..str db string
db 0Dh, 0Ah, 00h
..code:
mov rcx, ..str
call printf
}
Calling example:
print_literal "Hello, World!"
We can improve our macro, letting the user decide to add the end-of-line characters, or not. One way would be to add a Boolean argument and depending on this argument printing a carriage-return-linefeed, or not. Another way is to add, or not the end-of-line characters 0Dh and 0Ah to the string (the macro argument). Here, we have a problem however. The string that we use is of the form <literal>, 0Dh, 0Ah, and the two commas would be interpreted as argument separators. The FASM preprocessor uses a special format to specify arguments that include commas: you'll have to enclose the argument using <>.
Here is the new version of the macro:
macro print_literal string {
local ..str, ..code
jmp ..code
..str db string
db 00h
..code:
mov rcx, ..str
call printf
}
And we can call it with, or without the carriage-return-linefeed.
print_literal "Hello, "
print_literal <"World!", 0Dh, 0Ah>
The program sample fact.asm calculates the factorial of a number entered by the user. The display is done using the macros
print_literal and print_int.
format ELF64
macro print_int integer {
local ..fmt, ..code
jmp ..code
..fmt db '%u', 0Dh, 0Ah, 00h
..code:
mov rcx, ..fmt
mov rdx, integer
call printf
}
macro print_literal string {
local ..str, ..code
jmp ..code
..str db string
db 00h
..code:
mov rcx, ..str
call printf
}
section '.data' writeable
frmat db '%u', 00h
number dq ?
save dq ?
section '.text' executable
public main
extrn printf
extrn scanf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
print_literal "Please, enter an integer number from 1 to 20? "
mov rcx, frmat
mov rdx, number
call scanf
mov rax, [number]
cmp rax, 1
jl invalid
cmp rax, 20
jg invalid
mov rbx, rax
next:
dec rbx
cmp rbx, 0
je done
mul rbx
jmp next
done:
mov [save], rax
print_literal "The factorial of this number is "
mov rax, [save]
print_int rax
jmp exit
invalid:
print_literal <"Error: Number out of range!", 0Dh, 0Ah>
exit:
mov rsp, rbp
xor rax, rax
ret
I guess that the simplest way to return a string from a macro is to pass a pointer to the memory location, where to store it as argument. The program sample convert.asm reads a string from the keyboard (using the macro read_string, that also transforms underscores to spaces, a work-around to allow to enter strings containing spaces), and transforms it to uppercase (using the macro uppercase and lowercase (using the macro lowercase. the Original string, and the two converted strings are printed out using the macro print_string. The program also shows that one macro can call another one.
format ELF64
; Read a string of maximum length (defined by input format)
macro str_read text, string, fmt {
local ..txt, ..fmt, ..code, ..next, ..continue, ..done
jmp ..code
..txt db text, 00h
..fmt db fmt, 00h
..code:
mov rcx, ..txt
call printf
mov rcx, ..fmt
mov rdx, string
call scanf
; Convert '_' to space
mov rdi, string
..next:
mov al, [rdi]
cmp al, 00h
je ..done
cmp al, '_'
jne ..continue
mov byte [rdi], ' '
..continue:
inc rdi
jmp ..next
..done:
}
; Print a string
macro str_write string {
mov rcx, string
call printf
}
; Print a string with CR-LF
macro str_writeln string {
str_write string
newline
}
; Print a new line
macro newline {
local ..str, ..code
jmp ..code
..str db 0Dh, 0Ah, 00h
..code:
mov rcx, ..str
call printf
}
; Convert a string to uppercase
macro uppercase string {
local ..next, ..continue, ..done
mov rsi, string
mov rdi, rsi
..next:
lodsb
cmp al, 00h
je ..done
cmp al, 'a'
jl ..continue
cmp al, 'z'
jg ..continue
sub al, 32
..continue:
stosb
jmp ..next
..done:
}
; Convert a string to lowercase
macro lowercase string {
local ..next, ..continue, ..done
mov rsi, string
mov rdi, rsi
..next:
lodsb
cmp al, 00h
je ..done
cmp al, 'A'
jl ..continue
cmp al, 'Z'
jg ..continue
add al, 32
..continue:
stosb
jmp ..next
..done:
}
; BSS section
section '.bss' writeable
string db 101 dup (?)
; Code section (main program)
section ".text" executable
public main
extrn printf
extrn scanf
main:
mov rbp, rsp;
sub rsp, 32
and rsp, -16
str_read "Please enter a string? ", string, "%100s"
str_writeln string
uppercase string
str_writeln string
lowercase string
str_writeln string
mov rsp, rbp
xor rax, rax
ret
Macros with group arguments.
Group arguments allow to specify several values for an argument when calling a macro (and make the macro a macro with a variable number of arguments). In the macro
definition, they are specified enclosed by square brackets. If the macro definition also includes simple arguments, these must precede the group argument(s). When calling
a macro with one group argument, all values specified after the simple arguments will be passed to the group argument. General format of the definition of a macro with
one group argument:
macro <macro-name> <argument1>, <argument2>, ..., [<group-argument>] {
<macro-body>
}
Macros with a group argument are special. In fact, the macro body is executed for each value passed to the group argument.
Example:
macro name_list count, [names] {
db count
db names, 00h
}
The macro call name_list 3, "Aly", "Juno", "Yoko" will be transformed by the preprocessor to the following code:
macro name_list count,[names] {
db 3
db "Aly", 00h
db 3
db "Juno", 00h
db 3
db "Yoko", 00h
}
Macros with a group argument may include the special preprocessor directives forward, reverse, and common. These directives allow to subdivide the macro body into blocks, executed one after the other. In a forward block, all block instructions are executed for each value of the group-argument. It's the same for reverse blocks, except that the values are taken in reverse order (starting with the last one). Instructions of a common block, on the other hand, are executed only once.
Let's use blocks to improve our "name_list" macro:
macro name_list count, [names] {
common
db count
forward
db names, 00h
}
The macro call name_list 3, "Aly", "Juno", "Yoko" will now be transformed by the preprocessor as follows:
macro name_list count,[names] {
db 3
db "Aly", 00h
db "Juno", 00h
db "Yoko", 00h
}
The program sample macro3.asm shows how to use a macro with a group argument to calculate and print the sum of 2 or more integers.
format ELF64
macro print_sum [numbers] {
common
local ..fmt, ..code
jmp ..code
..fmt db 'Sum is: %Ld', 0Dh, 0Ah, 00h
..code:
mov rcx, ..fmt
xor rdx, rdx
forward
add rdx, numbers
common
call printf
}
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
print_sum 10, 10
print_sum 1, -2, 3, -4
print_sum 1, 2, 3, 4, 5, 6, 7, 8, 9
mov rsp, rbp
xor rax, rax
ret
If you pass several values to a group-argument used in a common bloc, then all values are passed to this argument
in one time. Example:
macro define_string [strings] {
common
db strings, 00h
}
When calling the macro as define_string "Hello, World!", 0Dh, 0Ah, the preprocessor will generate the code
db "Hello, World"', 0Dh, 0Ah, 00h
A macro definition may include two (or more) group-arguments. The names of the different group-arguments have to be placed between square brackets and separated by a comma. When the macro is called, the values are passed alternately to the different group-arguments. In the case of 2 group-arguments: value 1 to first argument, value 2 to second argument, value 3 to first argument, etc.
The program sample macro4.asm shows how to use a macro with two group arguments to display a list of persons with their profession.
format ELF64
macro persons_list [names, professions] {
common
local ..fmt, ..code
jmp ..code
..fmt db '%s is a %s', 0Dh, 0Ah, 00h
..code:
forward
local ..name, ..prof, ..continue
jmp ..continue
..name db names, 00h
..prof db professions, 00h
..continue:
mov rcx, ..fmt
mov rdx, ..name
mov r8, ..prof
call printf
}
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
persons_list "Aly Baba", "programmer"
persons_list "Aly", "Free Pascal programmer", "Juno", "movie actrice", "Yoko", "chemical engineer"
mov rsp, rbp
xor rax, rax
ret
Preprocessor operators.
The concatenation operator (#) is used to concatenate two symbols into one. Example: Generation of a conditional jump depending on macro
argument:
macro jump_if operand1, condition, operand2, label {
cmp operand1, operand2
j#condition label
}
When calling the macro as jumpif ecx, le, eax, next, the preprocessor will generate the code
cmp ecx, eax
jle next
You may also use this operator to concatenate quoted strings into one. We'll see an example further down in the text.
The "string" operator (`) is used to transform a symbol into a quoted string. This is, in particular, useful when passing a quoted string argument from one macro to another.
As code example of the preprocessor operators, let's review the sample program convert.asm. This program uses the macro str_read to read a string from the keyboard, one of its arguments being the input format for the C function scanf(), that actually does the reading. Wouldn't it be better (or, at least nicer) if, instead, we could simply use the maximum string length? That's what is implemented in sample program convert2.asm (that I started with as a copy of convert.asm).
As I wanted to keep the name of the string reading macro in the main program, I renamed the original "str_read" macro to "readstring". It's this macro that will call scanf(), just as before, thus, no code changes here. However, the main program will not call this macro directly, but will call the new "str_read" macro, that transforms the maximum string length argument (passed by the main program) into a scanf() input string, as it is expected by "readstring", that it calls to actually do the reading.
Here is the code of the new "str_read" macro:
macro str_read text, string, maxlen {
readstring `text, string, "%"#maxlen#"s"
}
The macro "readstring" (the original "str_read" macro from program sample convert.asm) is defined with 3 arguments: a quoted string (text to display), a symbol (label for the input buffer), and another quoted string (the input format). I said above that the concatenation operator may also be used with quoted strings. Here is the example: We use it to create the input format using 3 quoted strings (the maximum string length being passed as argument to "str_read"). The macro also shows the usage of the "string" operator. As the text to display has to be passed as a quoted string to the "readstring" macro, we'll have to transform the second argument (interpreted as a symbol) using this operator.
With these changes done, we can read our string, calling the macro as str_read "Please enter a string? ", string, "100". I will not show the code of convert2.asm here. Except for the changes described, it's the same as for convert.asm. The download archive with the tutorial samples includes the entire sample convert2.asm, of course.
Conditional blocks.
There is no preprocessor conditional syntax in FASM. But FASM includes the assembly directive if, that can be used in conjunction with the preprocessor to achieve the same results as with preprocessor conditionals. Besides that this way uses more time and memory, it is important to be aware that the evaluation of the if statement is done by the assembler, i.e. after the code has been parsed and changed by the preprocessor.
The FASM if directive works the same way as it does in higher programming languages. In its simplest form the condition is formed by a single
block as follows
if <logical-expression>
...
end if
To also specify code that is executed if the logical expression is false, two blocks are used as follows
if <logical-expression>
...
else
...
end if
Further conditions (with supplementary blocs) may be specified using else if <logical-expression>.
Logical expressions are made of one or several comparison expressions (the result of which being a logical value, "true" or "false"). Logical expressions may be combined using the operators & (logical "and"), or | (logical "or"). There is also a logical "not" operator; it is written as ~.
Comparison expressions with numerical values allow the usual operators = (equal), < (less), > (greater), <= (less or equal), >= (greater or equal), and <> (not equal). As in most higher programming languages, a numerical expression (as for example just a variable name) may be used instead of the comparison expression. The evaluation of this expression yields "false" if the result is zero, "true" in all other cases.
Example:
if count & ~ count mod 4
...
end if
This logical expression is true if "count" (defined before, of course) is not zero and if it is divisible by 4 (if count is divisible by 4, count mod 4 = 0, that
corresponds to logical "false", thus we use the ~ (not) operator to get a logical value of "true", if count is divisible by 4).
There are also operators that allow the comparison of values being any chains of symbols. The eq operator compares whether two values are exactly the same (from the point of view of the assembler). The in operator checks whether a given value is a member of the list of values following this operator. The list should be enclosed between < and > characters; its members should be separated with commas.
The eqtype operator checks whether the two compared values have the same structure, and whether the structural elements are of the same type. The distinguished types include numerical expressions, individual quoted strings, floating point numbers, address expressions (expressions enclosed in square brackets or preceded by ptr operator), instruction mnemonics, registers, size operators, jump type and code type operators. And each of the special characters that act as a separator (like comma or colon), is the separate type itself. For example, two values, each one consisting of a register name followed by a comma and a numerical expression, will be regarded as of the same type, no matter what kind of register and how complicated the numerical expression is, except for quoted strings and floating point values, which are special kinds of numerical expressions and are treated as different types. Thus the eax,16 eqtype fs,3+7 condition is true, but eax,16 eqtype eax,1.6 is false.
Finally, the used, defined, and definite operators allow to check if a symbol is used or has been defined; cf. the FASM documentation for details.
The "special" operators described above are normally used in if directives within a macro, as we will see in the following examples.
Consider a macro that declares a symbol assigned to either a word, a doubleword, or a string value (depending on the situation). We can code it using
if and eq as follows:
macro declare_item item, value {
if item eq WORD
dw value
else if item eq DWORD
dd value
else if item eq STRING
db value, 0
end if
}
The macro call declare_item STRING "Hello, World!", for example, would result in the following preprocessor output:
if STRING eq WORD
dw "Hello, World!"
else if STRING eq DWORD
dd "Hello, World!"
else if STRING eq STRING
db "Hello, World!", 0
end if
And the assembler would create the object code corresponding to the instruction
db "Hello, World!", 0
FASM allows to redefine standard assembly instructions. Consider, for example, to extend the instruction mov, giving the possibility to use
3 arguments, the second being copied to the first, and the third to the second. Here is how this could be done:
macro mov operand1*, operand2*, operand3 {
if operand3 eq
mov operand1, operand2
else
mov operand1, operand2
mov operand2, operand3
end if
}
Note the logical expression with no operand following eq. that means here if "operand3" is empty (i.e. is not specified when calling the
macro). This allows to use mov with 2 arguments (as you normally do), but also its extended form with 3 arguments.
The operator in is a shorter form for several eq operations combined by a logical "or". Consider another
extension of mov, that also allows both operands to be segment registers. Here is how this could be done:
macro mov operand1, operand2 {
if operand1 in <cs, ds, es, fs, gs, ss> & operand2 in <cs, ds, es, fs, gs, ss>
push operand2
pop operand1
else
mov operand1, operand2
end if
}
In the "normal" case, the new mov will work just as before; if both arguments are segment registers, however,
push and pop are used.
And here the example of a further extension of the mov instruction; it allows to copy a value from one memory location to another memory
location:
macro mov operand1, operand2 {
if operand1 operand2 eqtype [0] [0]
push operand2
pop operand1
else
mov operand1, operand2
end if
}
If the macro is, for example called as mov [var], 5, the instruction mov will be used; if it is
called as mov [var1], [var2], push and pop will be used.
Note: A more readable way to write the logical expression would be to use the & operator: if operand1 equ [0] & operand2 equ [0]
The sample program macros5.asm asks the user for the name and uses this name to display a personal greeting message. The program uses
a simplified version of the str_read macro seen before and an extended version of str_write, that allows to display not only of a string identified by the label of
the address where it is located, but also a quoted literal. Here is the code:
format ELF64
; Read a string of maximum length (defined by input format)
macro str_read text, string, fmt {
local ..txt, ..fmt, ..code
jmp ..code
..txt db text, 00h
..fmt db fmt, 00h
..code:
mov rcx, ..txt
call printf
mov rcx, ..fmt
mov rdx, string
call scanf
}
; Print a string
macro str_write string {
local ..continue
if string eqtype 'string'
call ..continue
db string, 00h
..continue:
pop rcx
call printf
else if string eqtype ..continue
mov rcx, string
call printf
end if
}
; Print a string with new line
macro str_writeln string {
str_write string
newline
}
; Print a new line
macro newline {
local ..str, ..code
jmp ..code
..str db 0Dh, 0Ah, 00h
..code:
mov rcx, ..str
call printf
}
; BSS section
section '.bss' writeable
uname db 26 dup (?)
; Code section (main program)
section ".text" executable
public main
extrn printf
extrn scanf
main:
mov rbp, rsp;
sub rsp, 32
and rsp, -16
str_read "Please enter your name? ", uname, "%25s"
str_write "Hello, "
str_write uname
str_writeln "! How are you?"
mov rsp, rbp
xor rax, rax
ret
Repeating blocks.
First, as with conditionals, we can use the FASM assembler repeating blocks directives. Second there are also several repeating macroinstructions.
The assembler directive times repeats one instruction a specified number of times. It should be followed by a
numerical expression specifying the number of repeats and the instruction to be repeated. When the special symbol % is used inside the instruction, it is equal to the
number of the current repeat. Example:
times 5 db %
will declare five bytes with values 1, 2, 3, 4, 5.
The assembler directive repeat repeats a block of instructions. It should be followed by a numerical expression
specifying the number of repeats. the instructions to be repeated are expected in the next lines, ended with the end repeat directive.
Example:
repeat 8
mov byte [rdi], %
inc rdi
end repeat
This stores the numbers 1 to 8 into successive memory locations, starting with the address contained in RDI.
The assembler directive while repeats a block of instructions as long as the condition specified by the logical
expression following it is true. The block of instructions to be repeated should end with the end while directive. Before each repetition,
the logical expression is evaluated and when its value is false, the assembly is continued starting from the first line after the end while.
Example:
i = 1
while i <= 8
mov byte [rdi+i-1], i
i = i + 1
end while
This does exactly the same as in the previous example.
The sample program macros6.asm fills an array with successive values of bytes using the macro fill_array (usage of the directive
repeat), and then prints the array out using the macro print_array (usage of the directive while).
format ELF64
; Fill array with byte numbers from M to N
macro fill_array array, max, first, last {
mov rdi, array
repeat last - first + 1
if % > max
break
end if
mov byte [rdi+%-1], first + % - 1
end repeat
}
; Print array of bytes
macro print_array array, len {
local ..fmt, ..eol, ..code
jmp ..code
..fmt db '%5u ', 00h
..eol db 0Dh, 0Ah, 00h
..code:
i = 1
while i <= len
mov rcx, ..fmt
mov rdi, array
xor rdx, rdx
mov dl, [rdi+i-1]
call printf
i = i + 1
end while
mov rcx, ..eol
call printf
}
; BSS section
section '.bss' writeable
array db 10 dup (?)
; Code section (main program)
section '.text' executable
public main
extrn printf
extrn scanf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
fill_array array, 10, 1, 5
print_array array, 5
fill_array array, 10, 20, 30
print_array array, 10
mov rsp, rbp
xor rax, rax
ret
The program output will be two lines, in the first one the numbers from 1 to 5, in the second one the numbers from 20 to 29. Note that the program does not abort if we try to fill the array with numbers from 20 to 30, what's actually 11 numbers (thus normally exceeding the area reserved for the array) thanks to the usage of the if and break directives.
FASM includes the following repeating macroinstructions: rept, irp, irps, and irpv.
The rept directive makes a given amount of duplicates of the block enclosed with the {
and } curly brackets. The number of duplicates is a number following the directive. The number itself may be followed by the name of
a counter symbol, eventually followed by the base for the counting (separated from the counter name by a colon (:). Examples:
Copy of given value to 10 successive memory locations:
rept 10 {
mov [rdi], rax
add rdi, 8
}
Repetitive symbol generation:
rept 3 counter {
byte#counter db counter
}
what will generate the following:
byte1 db 1
byte2 db 2
byte3 db 3
Reset to zero of the SSE registers XMM0 to XMM7:
rept 8 n:0 {
pxor xmm#n, xmm#n
}
Note that multiple counters, separated by commas (,) and with individual base may be specified.
The irp directive iterates the single argument through a given list of parameters (values). Example:
irp oddnumber, 1, 3, 5, 7, 9 {
db oddnumber
}
will generate the following:
db 1
db 3
db 5
db 7
db 9
The irps directive iterates the single argument through a given list of symbols. Example:
irps register, rax rbx rcx rdx {
xor register, register
}
will generate the following:
xor rax, rax
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
Structures.
Structures are similar to macros (you may even say that they are a special variant of macro); they are used to define data structures
(comparable with C structures, and Pascal records). General format of a structure definition:
struc <structure-name> <arguments> {
<structure-body>
}
Let's consider the following structure definition:
struc point x, y {
.x dq x
.y dq y
}
This definition is, however, not enough. In fact, to use a structure, you'll have to create an instance of the structure, using a label as
identifier. Examples:
p1 point 1, 2
p2 point ?, ?
The label (name of the instance) will be attached at the beginning of every item name within the struc macroinstruction that starts
with a dot. For the examples above, the preprocessor will generate the following:
p1.x dq 1
p1.y dq 2
p2.x dq ?
p2.y dq ?
and we can code instructions like mov [p2.x], -2, and mov [p2.y], -1.
If somewhere inside the definition of a structure, a name consisting of nothing but a single dot is found, it is replaced by the name of the label for the given
instance of the structure. This label will not be defined automatically in such case, allowing to completely customize the definition. Example:
struc db [data] {
common
. db data
.size = $ - .
}
The instruction msg db "Hello!", 0Dh, 0Ah, 00h will be transformed by the preprocessor to
msg db "Hello!", 0Dh, 0Ah, 00h
msg.size = $ - msg
This actually is a redefinition of the pseudo-instruction db, that beside the declaration of the data label, also includes the ability to
calculate the size of the defined data (note that in this example the data size includes the null-terminator character).
The sample program structure1.asm defines a rectangle as structure described by its top-left and bottom-right coordinates. Two instances
of this rectangle are created, one being initialized with the instance creation, the other by assigning values to the structure's items. The program then calculates
and displays the rectangles' surface.
format ELF64
struc rectangle x1, y1, x2, y2 {
.x1 dw x1
.y1 dw y1
.x2 dw x2
.y2 dw y2
}
macro print_int integer {
local ..fmt, ..code
jmp ..code
..fmt db '%u', 0Dh, 0Ah, 00h
..code:
mov rcx, ..fmt
xor rdx, rdx
mov dx, integer
call printf
}
macro print_literal string {
local ..str, ..code
jmp ..code
..str db string
db 00h
..code:
mov rcx, ..str
call printf
}
section '.data' writeable
rect1 rectangle 0, 0, 25, 4
rect2 rectangle ?, ?, ?, ?
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
print_literal "The surface of rectangle 1 is "
mov ax, [rect1.x2]
sub ax, [rect1.x1]
mov bx, [rect1.y2]
sub bx, [rect1.y1]
imul bx
print_int ax
mov [rect2.x1], 10
mov [rect2.y1], 20
mov [rect2.x2], 60
mov [rect2.y2], 25
print_literal "The surface of rectangle 2 is "
mov ax, [rect2.x2]
sub ax, [rect2.x1]
mov bx, [rect2.y2]
sub bx, [rect2.y1]
imul bx
print_int ax
mov rsp, rbp
xor rax, rax
ret
The sample program structure2.asm does the same as structure1.asm, but uses a macro to calculate the rectangles' surface.
format ELF64
struc rectangle x1, y1, x2, y2 {
.x1 dw x1
.y1 dw y1
.x2 dw x2
.y2 dw y2
}
macro print_int integer {
local ..fmt, ..code
jmp ..code
..fmt db '%u', 0Dh, 0Ah, 00h
..code:
mov rcx, ..fmt
xor rdx, rdx
mov dx, integer
call printf
}
macro print_literal string {
local ..str, ..code
jmp ..code
..str db string
db 00h
..code:
mov rcx, ..str
call printf
}
macro rect_surface x1, y1, x2, y2 {
mov ax, x2
sub ax, x1
mov bx, y2
sub bx, y1
imul bx
}
section '.data' writeable
rect1 rectangle 0, 0, 25, 4
rect2 rectangle 10, 20, 60, 25
section '.text' executable
public main
extrn printf
main:
mov rbp, rsp
sub rsp, 32
and rsp, -16
print_literal "The surface of rectangle 1 is "
rect_surface [rect1.x1], [rect1.y1], [rect1.x2], [rect1.y2]
print_int ax
print_literal "The surface of rectangle 2 is "
rect_surface [rect2.x1], [rect2.y1], [rect2.x2], [rect2.y2]
print_int ax
mov rsp, rbp
xor rax, rax
ret
That's it for my tutorial about assembly macros using FASM 64-bit. As the title of the tutorial indicates, it's an introduction to the subject, not more and not less. The FASM macros allow to do lots more as described here. Some features, that I tried, failed, in particular using rept (I always got the assembler error message Incomplete macro) (?), and passing a structure as argument to a macro and then using the structure's items within the macro (Undefined symbol error message in this case) (?). Other features have not been discussed, because I didn't try them out, as for example usage of the virtual directive, or passing a structure as argument to a procedure. You can find some details about macros in the Flat Assembler Programmer's Manual, available as PDF on several web sites. I guess that you can also find some macro-specific information, when searching the Internet. Finally, to note that there are several macro libraries available for download on development relates websites.
If you find this text helpful, please, support me and this website by signing my guestbook.