Computing: DOS, OS/2 & Windows Programming

16-bit assembly file input/output (NASM on DOS).

In my tutorial 16-bit assembly programming using NASM, I describe how to build 16-bit real mode and protected mode programs on DOS. The program samples provided use DOS interrupt 21h to read from the keyboard, and to write to the screen, as well as to terminate the program. This interrupt, called with a function code in register AH (AX) gives us access to the so-called DOS Services, i.e. a collection of functions, that are part of the DOS operating system, and that we can call to perform system related tasks. In this tutorial we will call interrupt 21h as a simple way to work with files using the assembly programming language. The program samples have been developed and tested on FreeDOS, using NASM 2.16.0.1 (16-bit protected mode). The tutorial should also apply to MS-DOS or other DOS operating systems. Use the following link to download the source code of the sample programs.

Creating a file (AH = 3ch).

Function code 3ch allows to create a (new) file. Attention: If a file with the name specified already exists, it is truncated (i.e. its contents will be lost) without any warning message! The file will be open in read-write mode, this even if you set the read-only attribute (this setting will only be considered when opening the file later-on).

Function arguments:

 RegisterValueDescription
InputAH3chDOS function code
 CX(word)file attribute(s)
 DS:DX(pointer)file name
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(word)success: file handle
   failure: error code

File attributes:

To set several attributes, add the individual values; example: read-only+hidden+system = 1 + 2 + 4 = 7.

Error codes:

Opening a file (AH = 3dh).

Function code 3dh allows to open an (existing) file. A file with the name specified must already exist. The file pointer is set to the start of the file.

Function arguments:

 RegisterValueDescription
InputAH3dhDOS function code
 AL(byte)access mode
 DS:DX(pointer)file name
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(word)success: file handle
   failure: error code

Access modes:

Error codes:

Closing a file (AH = 3eh).

Function code 3eh allows to close a file (opened before with the "Create file" or "Open file" function).

Function arguments:

 RegisterValueDescription
InputAH3ehDOS function code
 BX(word)file handle
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(word)success: (destroyed)
   failure: error code

Program sample 1: Creating a new file.

The program sample files1.asm asks the user for filenames (to terminate the program, just hit ENTER). If the file doesn't already exists, a file with the name entered is created. Otherwise an error message is displayed. Here is the source:

; Main program
segment code
..start:
        ; Initialization
        mov     ax, data
        mov     ds, ax
        mov     ax, stack
        mov     ss, ax
        mov     sp, stacktop
        ; Enter filename until no input
do_loop:
        ; Ask for filename
        mov     dx, askfn
        mov     ah, 09h
        int     21h
        ; Get filename from keyboard
        mov     dx, buffer
        mov     ah, 0ah
        int     21h
        mov     cl, [buffer + 1]       ; length of input
        cmp     cl, 0
        je      exit                   ; exit if no filename
        ; Copy keyboard input
        lea     esi, [buffer + 2]      ; start of input text
        lea     edi, [fname]           ; copy destination
copy_char:
        mov     bl, [esi]
        mov     [edi], bl
        inc     esi
        inc     edi
        dec     cl
        test    cl, cl
        jnz     copy_char
        mov     byte [edi], 00h        ; add null-terminator
        ; Try to open the file
        mov     ah, 3dh
        mov     al, 0
        mov     dx, fname
        int     21h
        jc      continue
        ; File already exists
        mov     [handle], ax
        mov     dx, exists
        mov     ah, 09h
        int     21h
        call    close_file
        jmp     do_loop
continue:
        ; Create the new file
        mov     ah, 3ch
        mov     cx, 0
        mov     dx, fname
        int     21h
        jc      file_error
        mov     [handle], ax
        mov     dx, success
        mov     ah, 09h
        int     21h
        call    close_file
        jmp     do_loop
        ; File error
file_error:
        mov     dx, ferror1
        mov     ah, 09h
        int     21h
        jmp     do_loop
        ; Terminate the program
exit:
        mov     dx, newline
        mov     ah, 09h
        int     21h
        mov     ax, 4c00h
        int     21h

; Subroutine: Close the file
close_file:
        mov     bx, [handle]
        mov     ah, 3eh
        int     21h
        jnc     return
        mov     dx, ferror2
        mov     ah, 09h
        int     21h
return:
        ret

; Data segment
segment data
maxlen  equ     25
askfn   db      'Filename? ', '$'
success db      13, 10, 'File successfully created!', 13, 10, '$'
exists  db      13, 10, 'File already exists!', 13, 10, '$'
ferror1 db      13, 10, 'Error when creating file!', 13, 10, '$'
ferror2 db      13, 10, 'Error when closing file!', 13, 10, '$'
newline db      13, 10, '$'
buffer  db      maxlen + 1
        resb    maxlen + 2
fname   db      maxlen + 1
handle  resb    2

; Stack segment
segment stack   stack
        resb    64
stacktop:

Notes concerning the code:

The screenshot on the left shows the build of the program (using my custom script nasm.bat, described in the tutorial mentioned above) and the files created by the build. The screenshot on the right shows an execution of files1.exe.

NASM on FreeDOS: Building a protected mode assembly program
NASM on FreeDOS: Assembly program example - Creating a new file

Little exercises.

Just some lines: Display the error code, when an error occurs during the file creation. A little bit more to do: If the file already exists, ask the user if they want to overwrite it.

Writing to a file (AH = 40h).

Function code 40h allows to write a given number of bytes to a file (opened before with the "Create file" or "Open file" function). Obvious, that the file must not have been created as read-only, and that the read-only property must not have been otherwise set.

Function arguments:

 RegisterValueDescription
InputAH40hDOS function code
 BX(word)file handle
 CX(word)number of bytes
 DS:DX(pointer)data to be written
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(word)success: number of bytes actually written
   failure: error code

Data is written beginning at the current file position, and the file position is updated after a successful write. If CX is zero, no data is written, and the file is truncated or extended to the current position. The usual cause for AX < CX on return is a full disk.

Error codes:

Program sample 2: Writing to a file.

The program sample files2.asm writes the 3-letter coded protein sequence of the human insulin chain A to the file INSULINE.TXT (my mistake: "insulin" is written without a final "e" in English!). Here is the source:

; Main program
segment code
..start:
        ; Initialization
        mov     ax, data
        mov     ds, ax
        mov     ax, stack
        mov     ss, ax
        mov     sp, stacktop
        ; Create a new file
        mov     ah, 3ch
        mov     cx, 0
        mov     dx, fname
        int     21h
        jc      file_error1
        mov     [handle], ax
        ; Write insuline chain A to the file
        mov     byte [count], lchaina            ; number of amino acids
        lea     esi, [chaina]                    ; pointer to the amino acids data
write_aa:
        lea     edi, [amacid]                    ; pointer to the copy destination
        mov     cl, 3                            ; 3-letter amino acid codes
        ; Copy this amino acid
copy_char:
        mov     bl, [esi]
        mov     [edi], bl
        inc     esi
        inc     edi
        dec     cl
        test    cl, cl
        jnz     copy_char
        ; Write this amino acid to file
        mov     ah, 40h
        mov     bx, [handle]
        mov     cx, 3
        mov     dx, amacid
        int     21h
        jc      file_error2
        ; Continue with next amino acid (unless all are done)
        mov     cl, [count]
        dec     cl
        mov     [count], cl
        test    cl, cl
        jnz     write_aa
        ; Close the file
        mov     bx, [handle]
        mov     ah, 3eh
        int     21h
        ; Display success message
        mov     dx, success
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Create file error
file_error1:
        mov     dx, ferror1
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Write file error
file_error2:
        mov     dx, ferror2
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Terminate the program
exit:
        mov     ax, 4c00h
        int     21h

; Data segment
segment data
lchaina equ     21
chaina  db      'GLYILEVALGLUGLNCYSCYSTHRSERILE'
        db      'CYSSERLEUTYRGLNLEUGLUASNTYRCYS'
        db      'ASN'
fname   db      'INSULINE.TXT', 00h
success db      'File INSULINE.TXT successfully created!', 13, 10, '$'
ferror1 db      'Error when creating file!', 13, 10, '$'
ferror2 db      'Error when writing to file!', 13, 10, '$'
handle  resb    2
amacid  resb    3
count   resb    1

segment stack   stack
        resb    64
stacktop:

Little exercise.

Rewrite the program, changing the file output format as follows: 1. separate the amino acid codes by a space; 2. add a line break after each 10 amino acid codes.

Reading from a file (AH = 3fh).

Function code 3fh allows to read a given number of bytes from a file (opened before with the "Open file" function).

Function arguments:

 RegisterValueDescription
InputAH3fhDOS function code
 BX(word)file handle
 CX(word)number of bytes
 DS:DX(pointer)data to be written
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(word)success: number of bytes actually read
   failure: error code

Data is read beginning at the current file position, and the file position is updated after a successful read. If CF is clear and AX = 0, it means that the file pointer was already at the end of the file. The usual cause for AX < CX on return is that only part of a record was read, because the end-of-file was reached.

Error codes:

Program sample 3: Reading from a file.

The program sample files3.asm reads the insulin sequence from the file created before and displays it, 10 amino acids per line, and using a space as separator between the amino acid codes. The file is read in a loop, 1 amino acid at a time, until end-of-file is reached. Here is the source:

; Main program
segment code
..start:
        ; Initialization
        mov     ax, data
        mov     ds, ax
        mov     ax, stack
        mov     ss, ax
        mov     sp, stacktop
        ; Open the file
        mov     ah, 3dh
        mov     al, 0
        mov     dx, fname
        int     21h
        jc      file_error1
        mov     [handle], ax
        ; Read insuline chain A from the file, and display it
read_sequence:
        mov     byte [count], 0
        ; Read one amino acid
read_aa:
        mov     ah, 3fh
        mov     bx, [handle]
        mov     cx, 3
        mov     dx, amacid
        int     21h
        jc      file_error2
        test    ax, ax                           ; end-of-file
        jz      done
        ; Display this amino acid
        mov     ah, 09h
        mov     dx, amacid
        int     21h
        ; Line break after each 10 amino acids
        mov     cl, [count]
        inc     cl
        mov     [count], cl
        cmp     cl, 10
        jl      read_aa
        mov     dx, newline
        mov     ah, 09h
        int     21h
        jmp     read_sequence
        ; Open file error
file_error1:
        mov     dx, ferror1
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Read file error
file_error2:
        mov     dx, ferror2
        mov     ah, 09h
        int     21h
        jmp     done
        ; Close the file
done:
        mov     bx, [handle]
        mov     ah, 3eh
        int     21h
        ; Terminate the program
        mov     ax, 4c00h
        int     21h

; Data segment
segment data
fname   db      'INSULINE.TXT', 00h
ferror1 db      'Error when opening file!', 13, 10, '$'
ferror2 db      'Error when reading from file!', 13, 10, '$'
amacid  db      'XXX'
        db      ' '
        db      '$'
newline db      13, 10, '$'
handle  resb    2
count   resb    1

; Stack segment
segment stack   stack
        resb    64
stacktop:

The screenshot shows the output of the program files3.exe.

NASM on FreeDOS: Assembly program example - Reading from a file

Little exercise.

Nothing to do with file input/output and just some lines of code: Display the amino acid codes, using first letter uppercase format (other letters being lowercase).

Setting file position (Seek) (AH = 42h).

Function code 42h allows to set the current position in a file (opened before with the "Open file" function). Subsequent read/write operations will begin at this this position.

Function arguments:

 RegisterValueDescription
InputAH42hDOS function code
 AL(byte)origin of move (0=begin, 1=current, 2=end)
 BX(word)file handle
 CX:DX(dword)signed offset from the origin of move
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 DX:AX(dword)success: new file position in bytes from start of file
 AX failure: error code

Setting the file position allows random input/output: Setting the file position to N * record length allows to read and write the Nth record of the file. Seeking may be done relative to the beginning of the file, the end of the file, or the actual file position. As DX:AX returns the new position from the file beginning, setting the file position to the end of the file allows to determine the file size. If the new position is beyond the current end-of-file, the file will be extended by the next write. Setting the file position to the end of the file is a simple way to append records to a file. On my FreeDOS system, setting the file position beyond the current end-of-file, the next read will not result in an error, but the last record of the file is returned (?).

Error codes:

Program sample 4: Appending to a file.

The program sample files4.asm uses the file with the insulin chain A sequence from example 2 and appends the chain B sequence. The program then reads the entire file and displays the amino acids of the 2 chains (as one sequence).

; Main program
segment code
..start:
        ; Initialization
        mov     ax, data
        mov     ds, ax
        mov     ax, stack
        mov     ss, ax
        mov     sp, stacktop
        ; Open file
        mov     ah, 3dh
        mov     al, 2                            ; open file as read/write
        mov     dx, fname
        int     21h
        jc      file_error1
        mov     [handle], ax
        ; Set file position to end of file
        mov     ah, 42h
        mov     al, 2                            ; end of file position
        mov     bx, [handle]
        mov     cx, 0                            ; CX:DX = offset 0 from end of file
        mov     dx, 0
        int     21h
        jc      file_error2
        ; Write insuline chain B to file
        mov     byte [count], lchainb            ; number of amino acids
        lea     esi, [chainb]                    ; pointer to the amino acids data
write_aa:
        lea     edi, [amacid]                    ; pointer to the copy destination
        mov     cl, 3                            ; 3-letter amino acid codes
        ; Copy this amino acid
copy_char:
        mov     bl, [esi]
        mov     [edi], bl
        inc     esi
        inc     edi
        dec     cl
        test    cl, cl
        jnz     copy_char
        ; Write this amino acid to file
        mov     ah, 40h
        mov     bx, [handle]
        mov     cx, 3
        mov     dx, amacid
        int     21h
        jc      file_error3
        ; Continue with next amino acid (unless all are done)
        mov     cl, [count]
        dec     cl
        mov     [count], cl
        test    cl, cl
        jnz     write_aa
        ; Set file position to beginning of file
        mov     ah, 42h
        mov     al, 0                            ; begin of file position
        mov     bx, [handle]
        mov     cx, 0                            ; CX:DX = offset 0 from begin of file
        mov     dx, 0
        int     21h
        jc      file_error2
        ; Read insuline sequence from file, and display it
        mov     ah, 3fh
        mov     bx, [handle]
        mov     cx, 3 * (lchaina + lchainb)
        mov     dx, chainab
        int     21h
        jc      file_error4
        mov     dx, chainab
        mov     ah, 09h
        int     21h
        jmp     done
        ; Open file error
file_error1:
        mov     dx, ferror1
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Seek file error
file_error2:
        mov     dx, ferror2
        mov     ah, 09h
        int     21h
        jmp     done
        ; Write file error
file_error3:
        mov     dx, ferror3
        mov     ah, 09h
        int     21h
        jmp     done
        ; Read file error
file_error4:
        mov     dx, ferror4
        mov     ah, 09h
        int     21h
        jmp     done
        ; Close the file
done:
        mov     bx, [handle]
        mov     ah, 3eh
        int     21h
        ; Terminate the program
exit:
        mov     ax, 4c00h
        int     21h

; Data segment
segment data
lchaina equ     21
lchainb equ     30
chainb  db      'PHEVALASNGLNHISLEUCYSGLYSERHIS'
        db      'LEUVALGLUALALEUTYRLEUVALCYSGLY'
        db      'GLUARGGLYPHEPHETYRTHRPROLYSALA'
fname   db      'INSULINE.TXT', 00h
ferror1 db      'Error when opening file!', 13, 10, '$'
ferror2 db      'Error when setting file position!', 13, 10, '$'
ferror3 db      'Error when writing to file!', 13, 10, '$'
ferror4 db      'Error when reading from file!', 13, 10, '$'
handle  resb    2
amacid  resb    3
count   resb    1
chainab resb    3 * (lchaina + lchainb)
        db      13, 10, '$'

; Stack segment
segment stack   stack
        resb    64
stacktop:

The screenshot shows how I made a backup of the original file insuline.txt (containing the chain A), then run the program files4.exe, and finally made a backup of the modified file (containing both chains A and B).

NASM on FreeDOS: Assembly program example - Appending to a file

Little exercise.

Modify the display output of the program as follows: Display the 2 chains separately, preceded by a header and using a formatted output with 10 amino acids per line, and a space separating the amino acid codes.

Program sample 5: Random read.

The program sample files5.asm displays the Nth amino acid of the insulin sequence (file with chains A and B, created with the preceding sample program), the number N being entered by the user.

; Main program
segment code
..start:
        ; Initialization
        mov     ax, data
        mov     ds, ax
        mov     ax, stack
        mov     ss, ax
        mov     sp, stacktop
        ; Open file
        mov     ah, 3dh
        mov     al, 0                            ; open file as read
        mov     dx, fname
        int     21h
        jc      file_error1
        mov     [handle], ax
; Display amino acid at position entered by user
; until user input is empty
loop_aa:
        ; Ask for position
        mov     dx, askpos
        mov     ah, 09h
        int     21h
        ; Get position from keyboard
        mov     dx, buffer
        mov     ah, 0ah
        int     21h
        mov     dx, newline
        mov     ah, 09h
        int     21h
        mov     cl, [poslen]
        cmp     cl, 0                            ; terminate program if no input
        je      done
        ; Convert ASCII to positive integer
        lea     esi, [pospos]                    ; first digit
        lea     edi, [pospos + 1]                ; second digit
        cmp     cl, 2
        je      convert
        mov     al, [esi]                        ; make 2-digit number
        mov     [edi], al
        mov     byte [esi], '0'
convert:
        xor     ax, ax
        mov     al, [esi]
        sub     al, 30h                          ; '0' = ASCII 30h
        mov     bl, 10
        imul    bl                               ; first digit * 10
        xor     bx, bx
        mov     bl, [edi]
        sub     bl, 30h                          ; '0' = ASCII 30h
        add     ax, bx                           ; second digit as such
        cmp     ax, 1                            ; position must be >= 1
        jl      invalid_1
        cmp     ax, seqlen                       ; position must be <= sequence length
        jg      invalid_2
        dec     ax                               ; first record = 0
        mov     bx, 3                            ; file position = aa position * aa code length (3)
        imul    bx
        ; Set file position
        mov     cx, 0                            ; CX:DX is offset in file
        mov     dx, ax
        mov     ah, 42h
        mov     al, 0                            ; begin of file position
        mov     bx, [handle]
        int     21h
        jc      file_error2
        ; Read amino acid at this position
        mov     ah, 3fh
        mov     bx, [handle]
        mov     cx, 3                            ; 3-letter amino acid code
        mov     dx, amacid
        int     21h
        jc      file_error3
        ; Display this amino acid
        mov     ah, 09h
        mov     dx, amacid
        int     21h
        jmp     loop_aa
        ; Invalid position (< 1)
invalid_1:
        mov     dx, poserr1
        mov     ah, 09h
        int     21h
        jmp     loop_aa
        ; Invalid position (> seqlen)
invalid_2:
        mov     dx, poserr2
        mov     ah, 09h
        int     21h
        jmp     loop_aa
        ; Open file error
file_error1:
        mov     dx, ferror1
        mov     ah, 09h
        int     21h
        jmp     exit
        ; Seek file error
file_error2:
        mov     dx, ferror2
        mov     ah, 09h
        int     21h
        jmp     done
        ; Read file error
file_error3:
        mov     dx, ferror3
        mov     ah, 09h
        int     21h
        jmp     done
        ; Close the file
done:
        mov     bx, [handle]
        mov     ah, 3eh
        int     21h
        ; Terminate the program
exit:
        mov     ax, 4c00h
        int     21h

; Data segment
segment data
seqlen  equ     51
maxlen  equ     2
fname   db      'INSULINE.TXT', 00h
askpos  db      'Amino acid position in insuline sequence? ', '$'
poserr1 db      'Invalid position!', 13, 10, '$'
poserr2 db      'Sequence has only 51 amino acids!', 13, 10, '$'
ferror1 db      'Error when opening file!', 13, 10, '$'
ferror2 db      'Error when setting file position!', 13, 10, '$'
ferror3 db      'Error when reading from file!', 13, 10, '$'
amacid  resb    3
newline db      13, 10, '$'
handle  resb    2
buffer  db      maxlen + 1
poslen  resb    1
pospos  resb    maxlen + 1

; Stack segment
segment stack   stack
        resb    64
stacktop:

The screenshot shows an execution of files5.exe.

NASM on FreeDOS: Assembly program example - Random read of a file

Little exercise.

Modify the program as follows: Display of the Nth amino acid of either chain A or chain B (the user being asked for the chain and the position of the amino acid within this chain).

Delete a file (unlink) (AH = 41h).

Function code 41h allows to delete a file specified with its filename.

Function arguments:

 RegisterValueDescription
InputAH41hDOS function code
 DS:DX(pointer)filename
OutputFlags (CF)clearindicates successful call
  setindicates failed call
 AX(dword)success: AL = drive of deleted file; AH = destroyed
   failure: error code

DOS does not erase the file's data (what means that if the data is not overwritten, the file can be recovered with some "undelete" utility); it becomes inaccessible because the FAT chain for the file is cleared. Deleting a file which is currently open may lead to filesystem corruption!

Error codes:

Copying and moving a file.

The DOS API doesn't include any functions to copy or move a file. Thus, to do this from an assembly program, you'll have to read the original file (mostly byte by byte until the end-of-file) and write the new one; in the case of a "move", you'll have then to delete the original file.

Working with text files.

The DOS API doesn't include any particular functions for working with text files. This means, that our assembly program has to access them as it would for any other file. A text file created, for example with the DOS editor, is an unstructured sequence of bytes; each line of the text file is terminated by an end-of-line marker, made of the two bytes 0Ah and 0Dh. Reading a text file thus consists in reading it byte by byte (until end-of-file). If the two last bytes, that we have read, are 0Ah and 0Dh, the actual line has been read.

If we know the length of the file, and its size is not to big, we can read it all at once. To extract the information of the different lines, we then have to parse the memory locations, where we stored the file content, finding the end-of-line markers 0Ah and 0Dh.

When creating a text file from assembly, we have two possibilities:

  1. Using a fixed line length format (filling the unused positions with spaces). This creates a structured (record-based file), where the different lines are easily accessible. Using this format, we can read the file line by line. It is also possible to read (and write) a given line, identified by its line number (record number = line number - 1). The disadvantages of this method are obvious: First, the file is bigger (possibly lots bigger) than the size needed to store the information, that it actually contains. Second (unless we add 0A0Dh at the end of each record), such files do not display properly in a text editor, or other software.
  2. Using a variable line length format. This creates a regular text file, i.e. a sequence of bytes, the different lines being terminated with 0A0Dh (we could use a custom end-of-line marker, but in this case, the file would not display properly in other programs, such as an editor). Such files' size corresponds to the size of the information that they actually contain (plus the end-of-line markers). The disadvantage of this method is that reading it line by line, or reading it using random access (reading a given line) is not possible. Also, we'll have to deal with the end-of-line markers included in the file.


If you find this text helpful, please, support me and this website by signing my guestbook.