16-bit assembly file input/output (NASM on DOS).
In my tutorial 16-bit assembly programming using NASM, I describe how to build 16-bit real mode and protected mode programs on DOS. The program samples provided use DOS interrupt 21h to read from the keyboard, and to write to the screen, as well as to terminate the program. This interrupt, called with a function code in register AH (AX) gives us access to the so-called DOS Services, i.e. a collection of functions, that are part of the DOS operating system, and that we can call to perform system related tasks. In this tutorial we will call interrupt 21h as a simple way to work with files using the assembly programming language. The program samples have been developed and tested on FreeDOS, using NASM 2.16.0.1 (16-bit protected mode). The tutorial should also apply to MS-DOS or other DOS operating systems. Use the following link to download the source code of the sample programs.
Creating a file (AH = 3ch).
Function code 3ch allows to create a (new) file. Attention: If a file with the name specified already exists, it is truncated (i.e. its contents will be lost) without any warning message! The file will be open in read-write mode, this even if you set the read-only attribute (this setting will only be considered when opening the file later-on).
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 3ch | DOS function code |
| CX | (word) | file attribute(s) | |
| DS:DX | (pointer) | file name | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (word) | success: file handle | |
| failure: error code |
File attributes:
- 0: Normal (no attribute set)
- 1: Read-only
- 2: Hidden
- 4: System
- 16: Archive
Error codes:
- 3: Path not found
- 4: No handle available
- 5: Access denied
Opening a file (AH = 3dh).
Function code 3dh allows to open an (existing) file. A file with the name specified must already exist. The file pointer is set to the start of the file.
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 3dh | DOS function code |
| AL | (byte) | access mode | |
| DS:DX | (pointer) | file name | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (word) | success: file handle | |
| failure: error code |
Access modes:
- 0: Read
- 1: Write
- 2: Read/write
Error codes:
- 2: File not found
- 3: Path not found
- 4: No handle available
- 5: Access denied
Closing a file (AH = 3eh).
Function code 3eh allows to close a file (opened before with the "Create file" or "Open file" function).
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 3eh | DOS function code |
| BX | (word) | file handle | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (word) | success: (destroyed) | |
| failure: error code |
Program sample 1: Creating a new file.
The program sample files1.asm asks the user for filenames (to terminate the program, just hit ENTER). If the file doesn't already exists, a file with the name entered is created. Otherwise an error message is displayed. Here is the source:
; Main program
segment code
..start:
; Initialization
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
mov sp, stacktop
; Enter filename until no input
do_loop:
; Ask for filename
mov dx, askfn
mov ah, 09h
int 21h
; Get filename from keyboard
mov dx, buffer
mov ah, 0ah
int 21h
mov cl, [buffer + 1] ; length of input
cmp cl, 0
je exit ; exit if no filename
; Copy keyboard input
lea esi, [buffer + 2] ; start of input text
lea edi, [fname] ; copy destination
copy_char:
mov bl, [esi]
mov [edi], bl
inc esi
inc edi
dec cl
test cl, cl
jnz copy_char
mov byte [edi], 00h ; add null-terminator
; Try to open the file
mov ah, 3dh
mov al, 0
mov dx, fname
int 21h
jc continue
; File already exists
mov [handle], ax
mov dx, exists
mov ah, 09h
int 21h
call close_file
jmp do_loop
continue:
; Create the new file
mov ah, 3ch
mov cx, 0
mov dx, fname
int 21h
jc file_error
mov [handle], ax
mov dx, success
mov ah, 09h
int 21h
call close_file
jmp do_loop
; File error
file_error:
mov dx, ferror1
mov ah, 09h
int 21h
jmp do_loop
; Terminate the program
exit:
mov dx, newline
mov ah, 09h
int 21h
mov ax, 4c00h
int 21h
; Subroutine: Close the file
close_file:
mov bx, [handle]
mov ah, 3eh
int 21h
jnc return
mov dx, ferror2
mov ah, 09h
int 21h
return:
ret
; Data segment
segment data
maxlen equ 25
askfn db 'Filename? ', '$'
success db 13, 10, 'File successfully created!', 13, 10, '$'
exists db 13, 10, 'File already exists!', 13, 10, '$'
ferror1 db 13, 10, 'Error when creating file!', 13, 10, '$'
ferror2 db 13, 10, 'Error when closing file!', 13, 10, '$'
newline db 13, 10, '$'
buffer db maxlen + 1
resb maxlen + 2
fname db maxlen + 1
handle resb 2
; Stack segment
segment stack stack
resb 64
stacktop:
Notes concerning the code:
- Displaying a string onto the screen and reading a string from the keyboard are done, using interrupt 21h with AH = 09h (display string), resp. AH = 0ah (read string); cf. 16-bit assembly programming using NASM for details.
- Maybe that I missed something, but I think that there is no DOS function to check if a file already exists (?). That's why in the program above, before creating a new file (interrupt 21h, with AH = 3ch), I call the DOS function "Open file" (interrupt 21h, with AH = 3dh). If the carry is not set (indicating that there was no error when opening it), the file obviously already exists...
- Note that there is some inconsistency in the way how the DOS functions are implemented. Whereas the filename, used with the "Create file" and "Open file" functions, must be a null-terminated string, strings used with the "Display string" function have to be terminated with a dollar sign ($).
The screenshot on the left shows the build of the program (using my custom script nasm.bat, described in the tutorial mentioned above) and the files created by the build. The screenshot on the right shows an execution of files1.exe.
|
|
Little exercises.
Just some lines: Display the error code, when an error occurs during the file creation. A little bit more to do: If the file already exists, ask the user if they want to overwrite it.
Writing to a file (AH = 40h).
Function code 40h allows to write a given number of bytes to a file (opened before with the "Create file" or "Open file" function). Obvious, that the file must not have been created as read-only, and that the read-only property must not have been otherwise set.
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 40h | DOS function code |
| BX | (word) | file handle | |
| CX | (word) | number of bytes | |
| DS:DX | (pointer) | data to be written | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (word) | success: number of bytes actually written | |
| failure: error code |
Data is written beginning at the current file position, and the file position is updated after a successful write. If CX is zero, no data is written, and the file is truncated or extended to the current position. The usual cause for AX < CX on return is a full disk.
Error codes:
- 5: Access denied
- 6: Illegal handle, or file not open
Program sample 2: Writing to a file.
The program sample files2.asm writes the 3-letter coded protein sequence of the human insulin chain A to the file INSULINE.TXT (my mistake: "insulin" is written without a final "e" in English!). Here is the source:
; Main program
segment code
..start:
; Initialization
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
mov sp, stacktop
; Create a new file
mov ah, 3ch
mov cx, 0
mov dx, fname
int 21h
jc file_error1
mov [handle], ax
; Write insuline chain A to the file
mov byte [count], lchaina ; number of amino acids
lea esi, [chaina] ; pointer to the amino acids data
write_aa:
lea edi, [amacid] ; pointer to the copy destination
mov cl, 3 ; 3-letter amino acid codes
; Copy this amino acid
copy_char:
mov bl, [esi]
mov [edi], bl
inc esi
inc edi
dec cl
test cl, cl
jnz copy_char
; Write this amino acid to file
mov ah, 40h
mov bx, [handle]
mov cx, 3
mov dx, amacid
int 21h
jc file_error2
; Continue with next amino acid (unless all are done)
mov cl, [count]
dec cl
mov [count], cl
test cl, cl
jnz write_aa
; Close the file
mov bx, [handle]
mov ah, 3eh
int 21h
; Display success message
mov dx, success
mov ah, 09h
int 21h
jmp exit
; Create file error
file_error1:
mov dx, ferror1
mov ah, 09h
int 21h
jmp exit
; Write file error
file_error2:
mov dx, ferror2
mov ah, 09h
int 21h
jmp exit
; Terminate the program
exit:
mov ax, 4c00h
int 21h
; Data segment
segment data
lchaina equ 21
chaina db 'GLYILEVALGLUGLNCYSCYSTHRSERILE'
db 'CYSSERLEUTYRGLNLEUGLUASNTYRCYS'
db 'ASN'
fname db 'INSULINE.TXT', 00h
success db 'File INSULINE.TXT successfully created!', 13, 10, '$'
ferror1 db 'Error when creating file!', 13, 10, '$'
ferror2 db 'Error when writing to file!', 13, 10, '$'
handle resb 2
amacid resb 3
count resb 1
segment stack stack
resb 64
stacktop:
Little exercise.
Rewrite the program, changing the file output format as follows: 1. separate the amino acid codes by a space; 2. add a line break after each 10 amino acid codes.
Reading from a file (AH = 3fh).
Function code 3fh allows to read a given number of bytes from a file (opened before with the "Open file" function).
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 3fh | DOS function code |
| BX | (word) | file handle | |
| CX | (word) | number of bytes | |
| DS:DX | (pointer) | data to be written | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (word) | success: number of bytes actually read | |
| failure: error code |
Data is read beginning at the current file position, and the file position is updated after a successful read. If CF is clear and AX = 0, it means that the file pointer was already at the end of the file. The usual cause for AX < CX on return is that only part of a record was read, because the end-of-file was reached.
Error codes:
- 5: Access denied
- 6: Illegal handle, or file not open
Program sample 3: Reading from a file.
The program sample files3.asm reads the insulin sequence from the file created before and displays it, 10 amino acids per line, and using a space as separator between the amino acid codes. The file is read in a loop, 1 amino acid at a time, until end-of-file is reached. Here is the source:
; Main program
segment code
..start:
; Initialization
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
mov sp, stacktop
; Open the file
mov ah, 3dh
mov al, 0
mov dx, fname
int 21h
jc file_error1
mov [handle], ax
; Read insuline chain A from the file, and display it
read_sequence:
mov byte [count], 0
; Read one amino acid
read_aa:
mov ah, 3fh
mov bx, [handle]
mov cx, 3
mov dx, amacid
int 21h
jc file_error2
test ax, ax ; end-of-file
jz done
; Display this amino acid
mov ah, 09h
mov dx, amacid
int 21h
; Line break after each 10 amino acids
mov cl, [count]
inc cl
mov [count], cl
cmp cl, 10
jl read_aa
mov dx, newline
mov ah, 09h
int 21h
jmp read_sequence
; Open file error
file_error1:
mov dx, ferror1
mov ah, 09h
int 21h
jmp exit
; Read file error
file_error2:
mov dx, ferror2
mov ah, 09h
int 21h
jmp done
; Close the file
done:
mov bx, [handle]
mov ah, 3eh
int 21h
; Terminate the program
mov ax, 4c00h
int 21h
; Data segment
segment data
fname db 'INSULINE.TXT', 00h
ferror1 db 'Error when opening file!', 13, 10, '$'
ferror2 db 'Error when reading from file!', 13, 10, '$'
amacid db 'XXX'
db ' '
db '$'
newline db 13, 10, '$'
handle resb 2
count resb 1
; Stack segment
segment stack stack
resb 64
stacktop:
The screenshot shows the output of the program files3.exe.
|
Little exercise.
Nothing to do with file input/output and just some lines of code: Display the amino acid codes, using first letter uppercase format (other letters being lowercase).
Setting file position (Seek) (AH = 42h).
Function code 42h allows to set the current position in a file (opened before with the "Open file" function). Subsequent read/write operations will begin at this this position.
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 42h | DOS function code |
| AL | (byte) | origin of move (0=begin, 1=current, 2=end) | |
| BX | (word) | file handle | |
| CX:DX | (dword) | signed offset from the origin of move | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| DX:AX | (dword) | success: new file position in bytes from start of file | |
| AX | failure: error code |
Setting the file position allows random input/output: Setting the file position to N * record length allows to read and write the Nth record of the file. Seeking may be done relative to the beginning of the file, the end of the file, or the actual file position. As DX:AX returns the new position from the file beginning, setting the file position to the end of the file allows to determine the file size. If the new position is beyond the current end-of-file, the file will be extended by the next write. Setting the file position to the end of the file is a simple way to append records to a file. On my FreeDOS system, setting the file position beyond the current end-of-file, the next read will not result in an error, but the last record of the file is returned (?).
Error codes:
- 1: illegal offset code
- 6: illegal handle or file not open
Program sample 4: Appending to a file.
The program sample files4.asm uses the file with the insulin chain A sequence from example 2 and appends the chain B sequence. The program then reads the entire file and displays the amino acids of the 2 chains (as one sequence).
; Main program
segment code
..start:
; Initialization
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
mov sp, stacktop
; Open file
mov ah, 3dh
mov al, 2 ; open file as read/write
mov dx, fname
int 21h
jc file_error1
mov [handle], ax
; Set file position to end of file
mov ah, 42h
mov al, 2 ; end of file position
mov bx, [handle]
mov cx, 0 ; CX:DX = offset 0 from end of file
mov dx, 0
int 21h
jc file_error2
; Write insuline chain B to file
mov byte [count], lchainb ; number of amino acids
lea esi, [chainb] ; pointer to the amino acids data
write_aa:
lea edi, [amacid] ; pointer to the copy destination
mov cl, 3 ; 3-letter amino acid codes
; Copy this amino acid
copy_char:
mov bl, [esi]
mov [edi], bl
inc esi
inc edi
dec cl
test cl, cl
jnz copy_char
; Write this amino acid to file
mov ah, 40h
mov bx, [handle]
mov cx, 3
mov dx, amacid
int 21h
jc file_error3
; Continue with next amino acid (unless all are done)
mov cl, [count]
dec cl
mov [count], cl
test cl, cl
jnz write_aa
; Set file position to beginning of file
mov ah, 42h
mov al, 0 ; begin of file position
mov bx, [handle]
mov cx, 0 ; CX:DX = offset 0 from begin of file
mov dx, 0
int 21h
jc file_error2
; Read insuline sequence from file, and display it
mov ah, 3fh
mov bx, [handle]
mov cx, 3 * (lchaina + lchainb)
mov dx, chainab
int 21h
jc file_error4
mov dx, chainab
mov ah, 09h
int 21h
jmp done
; Open file error
file_error1:
mov dx, ferror1
mov ah, 09h
int 21h
jmp exit
; Seek file error
file_error2:
mov dx, ferror2
mov ah, 09h
int 21h
jmp done
; Write file error
file_error3:
mov dx, ferror3
mov ah, 09h
int 21h
jmp done
; Read file error
file_error4:
mov dx, ferror4
mov ah, 09h
int 21h
jmp done
; Close the file
done:
mov bx, [handle]
mov ah, 3eh
int 21h
; Terminate the program
exit:
mov ax, 4c00h
int 21h
; Data segment
segment data
lchaina equ 21
lchainb equ 30
chainb db 'PHEVALASNGLNHISLEUCYSGLYSERHIS'
db 'LEUVALGLUALALEUTYRLEUVALCYSGLY'
db 'GLUARGGLYPHEPHETYRTHRPROLYSALA'
fname db 'INSULINE.TXT', 00h
ferror1 db 'Error when opening file!', 13, 10, '$'
ferror2 db 'Error when setting file position!', 13, 10, '$'
ferror3 db 'Error when writing to file!', 13, 10, '$'
ferror4 db 'Error when reading from file!', 13, 10, '$'
handle resb 2
amacid resb 3
count resb 1
chainab resb 3 * (lchaina + lchainb)
db 13, 10, '$'
; Stack segment
segment stack stack
resb 64
stacktop:
The screenshot shows how I made a backup of the original file insuline.txt (containing the chain A), then run the program files4.exe, and finally made a backup of the modified file (containing both chains A and B).
|
Little exercise.
Modify the display output of the program as follows: Display the 2 chains separately, preceded by a header and using a formatted output with 10 amino acids per line, and a space separating the amino acid codes.
Program sample 5: Random read.
The program sample files5.asm displays the Nth amino acid of the insulin sequence (file with chains A and B, created with the preceding sample program), the number N being entered by the user.
; Main program
segment code
..start:
; Initialization
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
mov sp, stacktop
; Open file
mov ah, 3dh
mov al, 0 ; open file as read
mov dx, fname
int 21h
jc file_error1
mov [handle], ax
; Display amino acid at position entered by user
; until user input is empty
loop_aa:
; Ask for position
mov dx, askpos
mov ah, 09h
int 21h
; Get position from keyboard
mov dx, buffer
mov ah, 0ah
int 21h
mov dx, newline
mov ah, 09h
int 21h
mov cl, [poslen]
cmp cl, 0 ; terminate program if no input
je done
; Convert ASCII to positive integer
lea esi, [pospos] ; first digit
lea edi, [pospos + 1] ; second digit
cmp cl, 2
je convert
mov al, [esi] ; make 2-digit number
mov [edi], al
mov byte [esi], '0'
convert:
xor ax, ax
mov al, [esi]
sub al, 30h ; '0' = ASCII 30h
mov bl, 10
imul bl ; first digit * 10
xor bx, bx
mov bl, [edi]
sub bl, 30h ; '0' = ASCII 30h
add ax, bx ; second digit as such
cmp ax, 1 ; position must be >= 1
jl invalid_1
cmp ax, seqlen ; position must be <= sequence length
jg invalid_2
dec ax ; first record = 0
mov bx, 3 ; file position = aa position * aa code length (3)
imul bx
; Set file position
mov cx, 0 ; CX:DX is offset in file
mov dx, ax
mov ah, 42h
mov al, 0 ; begin of file position
mov bx, [handle]
int 21h
jc file_error2
; Read amino acid at this position
mov ah, 3fh
mov bx, [handle]
mov cx, 3 ; 3-letter amino acid code
mov dx, amacid
int 21h
jc file_error3
; Display this amino acid
mov ah, 09h
mov dx, amacid
int 21h
jmp loop_aa
; Invalid position (< 1)
invalid_1:
mov dx, poserr1
mov ah, 09h
int 21h
jmp loop_aa
; Invalid position (> seqlen)
invalid_2:
mov dx, poserr2
mov ah, 09h
int 21h
jmp loop_aa
; Open file error
file_error1:
mov dx, ferror1
mov ah, 09h
int 21h
jmp exit
; Seek file error
file_error2:
mov dx, ferror2
mov ah, 09h
int 21h
jmp done
; Read file error
file_error3:
mov dx, ferror3
mov ah, 09h
int 21h
jmp done
; Close the file
done:
mov bx, [handle]
mov ah, 3eh
int 21h
; Terminate the program
exit:
mov ax, 4c00h
int 21h
; Data segment
segment data
seqlen equ 51
maxlen equ 2
fname db 'INSULINE.TXT', 00h
askpos db 'Amino acid position in insuline sequence? ', '$'
poserr1 db 'Invalid position!', 13, 10, '$'
poserr2 db 'Sequence has only 51 amino acids!', 13, 10, '$'
ferror1 db 'Error when opening file!', 13, 10, '$'
ferror2 db 'Error when setting file position!', 13, 10, '$'
ferror3 db 'Error when reading from file!', 13, 10, '$'
amacid resb 3
newline db 13, 10, '$'
handle resb 2
buffer db maxlen + 1
poslen resb 1
pospos resb maxlen + 1
; Stack segment
segment stack stack
resb 64
stacktop:
The screenshot shows an execution of files5.exe.
|
Little exercise.
Modify the program as follows: Display of the Nth amino acid of either chain A or chain B (the user being asked for the chain and the position of the amino acid within this chain).
Delete a file (unlink) (AH = 41h).
Function code 41h allows to delete a file specified with its filename.
Function arguments:
| Register | Value | Description | |
|---|---|---|---|
| Input | AH | 41h | DOS function code |
| DS:DX | (pointer) | filename | |
| Output | Flags (CF) | clear | indicates successful call |
| set | indicates failed call | ||
| AX | (dword) | success: AL = drive of deleted file; AH = destroyed | |
| failure: error code |
DOS does not erase the file's data (what means that if the data is not overwritten, the file can be recovered with some "undelete" utility); it becomes inaccessible because the FAT chain for the file is cleared. Deleting a file which is currently open may lead to filesystem corruption!
Error codes:
- 2: file not found
- 3: path not found
- 5: access denied
Copying and moving a file.
The DOS API doesn't include any functions to copy or move a file. Thus, to do this from an assembly program, you'll have to read the original file (mostly byte by byte until the end-of-file) and write the new one; in the case of a "move", you'll have then to delete the original file.
Working with text files.
The DOS API doesn't include any particular functions for working with text files. This means, that our assembly program has to access them as it would for any other file. A text file created, for example with the DOS editor, is an unstructured sequence of bytes; each line of the text file is terminated by an end-of-line marker, made of the two bytes 0Ah and 0Dh. Reading a text file thus consists in reading it byte by byte (until end-of-file). If the two last bytes, that we have read, are 0Ah and 0Dh, the actual line has been read.
If we know the length of the file, and its size is not to big, we can read it all at once. To extract the information of the different lines, we then have to parse the memory locations, where we stored the file content, finding the end-of-line markers 0Ah and 0Dh.
When creating a text file from assembly, we have two possibilities:
- Using a fixed line length format (filling the unused positions with spaces). This creates a structured (record-based file), where the different lines are easily accessible. Using this format, we can read the file line by line. It is also possible to read (and write) a given line, identified by its line number (record number = line number - 1). The disadvantages of this method are obvious: First, the file is bigger (possibly lots bigger) than the size needed to store the information, that it actually contains. Second (unless we add 0A0Dh at the end of each record), such files do not display properly in a text editor, or other software.
- Using a variable line length format. This creates a regular text file, i.e. a sequence of bytes, the different lines being terminated with 0A0Dh (we could use a custom end-of-line marker, but in this case, the file would not display properly in other programs, such as an editor). Such files' size corresponds to the size of the information that they actually contain (plus the end-of-line markers). The disadvantage of this method is that reading it line by line, or reading it using random access (reading a given line) is not possible. Also, we'll have to deal with the end-of-line markers included in the file.
If you find this text helpful, please, support me and this website by signing my guestbook.