DNABasics Help.

DNABasics is a biology application primarily intended for students and people interested in the basics of molecular genetics: Simple statistics and molecular weight of a DNA sequence, transcription of DNA to RNA, translation of RNA to proteins. Sequences (raw data or FASTA format) may be entered manually, loaded from a file or randomly generated. All sequences may be saved to a file, with the number of characters per line chosen by the user. Translation may be done for the complete sequence (using any of the 3 reading frames), or considering the start and/or stop codon. A translation report informs the user, what part of the sequence has actually been translated.

Menu: File.

Exit.

Exit the DNABasics application.

Menu: Settings.

Allow lowercase base codes.

Normally nucleobase codes are uppercase. However lots of bioinformatics applications accept lowercase codes. As does DNABasics, if this option is checked (this is the default setting). If you uncheck the option and you enter a sequence containing lowercase codes, you will get an "Invalid DNA/RNA sequence" error message.

Allow extended base codes.

With this option unchecked (this is the default setting), the sequence, you enter or load from file, may only contain the 4 standard base codes (A, C, G, T for DNA, A, C, G, U for RNA). If you check the option, all IUB/IUPAC base codes (R = purine, Y = pyrimidine, N = any of the 4 bases ...) will be accepted. You must also check this option, if you want random sequences to contain extended base codes.

Initiate translation at start codon.

With this option unchecked (this is the default setting), the translation of the sequence starts with the first codon of the reading frame chosen (bases 1-3, 2-4, 3-5 respectively). When the option is checked, translation starts with the first start codon found; if there isn't a start codon, no translation will be done.

Terminate translation at stop codon.

With this option unchecked (this is the default setting), the translation of the sequence continues until the end of the sequence (with possibly the last base or the last 2 bases being ignored). If the option is checked, translation ends with the first stop codon found; if there isn't a stop codon, translation continues until the end of the sequence.

Random sequence properties...

This menu item opens a window, where you can configure the randomly generated sequences: Minimum length and maximum length determines the random sequences' length interval. Uncertain bases allows to fix the approximate percentage of bases, that are not known with certitude (i.e. the approximate percentage of extended base codes). If Unknown bases only is checked, all uncertain bases will be "any base" (N). Note, that extended base codes must be enabled in order to get random sequences with uncertain bases.

Sequence display format...

This menu item lets you set the number of characters (base codes or 1-letter amino acid codes) per line. The format set here will be used for sequence display and sequence saving. If, for example, you set a format of 80 characters and you load a DNA sequence from a file formatted with 72 bases per line, the sequence will be displayed with 80 bases per line and if you save it, the output file will also be formatted with 80 characters per line. Note that for 3-letter coded proteins, the format set here means "characters", not "amino acids"; the number of characters may be diminished by 1 or 2 in order not to span a code over two lines. To create a one-line sequence (sequence without end-of-line characters), set the number of characters per line to 0.

Menu: Tools.

Display base codes table.

Displays a table with all IUB/IUPAC base codes (for both DNA and RNA) with those of the 4 standard bases they represent.

Display genetic code table.

Displays a table with all possible codons and the corresponding amino acids (in both 1-letter and 3-letter code). Note that the codons are those that are part of a RNA sequence (in the corresponding DNA, U being replaced by T).

6 reading frames translation...

Transcription of the actually loaded DNA sequence and its reverse complement and translation of the obtained RNA sequences to proteins considering all 3 reading frames. This is equivalent to a translation of all 6 reading frames of the DNA sequence, considered as double-stranded, so to say. The result is saved to a multiple sequence FASTA file.

Menu: Help.

Biology help.

Opens your default web browser pointed to the Nucleic acids and genetic code primer tutorial on my website www.streetinfo.lu.

Application help.

Opens your default web browser pointed to the DNABasics application help file (this document).

About.

Displays the DNABasics application version, author and date-written.

DNA analysis.

With a DNA sequence loaded in the corresponding input field, you can use the Analyze button, to calculate the sequence molecular weight and to determine some simple statistical values.

The sequence molecular weight is equal to the sum of the different base molecular weights + 17.01 (molecular weight of a hydroxide group). It is calculated for both single-stranded DNA (the actual sequence) and double-stranded DNA (the sequence plus its reverse complement).

If the sequence contains bases other than A, C, G, T, there are two molecular weights displayed: the minimum molecular weight corresponds to the case where all uncertain bases are considered being those with the lowest molecular weight; the maximum molecular weight corresponds to the case where all uncertain bases are considered being those with the highest molecular weight. Example: for base code H, corresponding to A (mm: 313.21), G (mm: 329.21), or T (mm: 304.20), the molecular weight of T will be used to calculate the minimum, the molecular weight of G to determine the maximum.

The statistical values are simple counts and the corresponding percentages: the base composition (A, C, G, T and others), the purines (A, G, R), pyrimidines (C, T, Y) and unknown bases (N), and finally the GC percentage. If the sequence contains uncertain base codes, the percentage is a minimum value (considering G, C and S, but not base codes like for example, R, Y or V, that could eventually be one resp. both of these bases).

Working with DNA.

You can load a DNA sequence from file (button Load), let the application generate a random one (button Random), or enter it manually into the DNA input field. All sequences may be either raw data or FASTA. The number of characters per line will be the one defined in the Settings menu (not the one used in the loaded file, for example). Also remember that in order to use the extended base codes, you must explicitly enable this in the Settings menu. You can remove the sequence from the input filed, using the Clear button.

The DNA sequence may be written to a file, using the Save button. Again, the number of characters per line will be the one defined in the Settings menu. This allows, for example, to save an existing file, using a more convenient formatting. If you save a FASTA sequence as raw data (e.g. as a text file), the FASTA header will simply be dropped. If you try to save raw data as FASTA, a warning message is displayed. You can then choose to save the sequence anyway (the application adding ">" as FASTA header in this case), or to cancel the operation (for example to manually add a more descriptive FASTA header). In all cases, existing files will only be overwritten after user confirmation.

Use the button Reverse complement to calculate the reverse complement of the sequence (that actually corresponds to the second strand of the DNA double helix). If you analyze the one and the other of such two sequences, you'll note some fundamental properties of DNA sequences:

The transcription of the DNA sequence to RNA is done using the button Transcribe. This simply transforms the entire DNA sequence to RNA (replacing base T by base U). The RNA sequence is displayed in the corresponding input field, using the number of characters per line defined in the Settings menu; the FASTA header is copied without modification.

Working with RNA.

Everything said for DNA concerning loading and saving a sequence and generating random sequences also applies to RNA.

Use the button Reverse transcribe to do a reverse transcription of the RNA sequence to DNA (what actually corresponds to create a cDNA sequence). This transforms the entire RNA sequence to DNA (replacing base U by base T). The DNA sequence is displayed in the corresponding input field, using the number of characters per line defined in the Settings menu, the FASTA header simply being copied.

The translation of a RNA sequence to a protein (or polypeptide, if you prefer) is done using the button Translate. This operation, demonstrating the functioning of the genetic code is the central feature of the DNABasics application.

There are essentially two ways to use the RNA translation feature:

  1. Translating the sequence without considering start and stop codons. In this case the sequence is translated from begin to end. "Begin" means here, starting with the codon at position 1, 2, or 3, depending on the reading-frame, that you want to consider. "End" means here as long as there is any complete codon to translate, thus being possible that the last base or the last two bases of the sequence will be ignored. The resulting protein sequence may this way contain "gaps" (stop codons do not code for any amino acid); they are represented by an underscore (_) in the protein sequence display.
  2. Translating the sequence considering start and/or stop codons. If "considering start codons" is enabled, the translation starts at the first encounter of the start codon AUG; if no AUG is found a message is displayed and nothing is translated. If "considering stop codons" is enabled, translation is ended when the first stop codon (UAA, UAG, UGA) is found. If there isn't any stop codon, translation continues until the end of the sequence, with (as above) the possibility that 1 base or 2 bases are ignored.

The sequence translation report shows details concerning the translation operation: position of the first base that has been translated (base 1, 2 or 3 at sequence begin or first occurence of a start codon), position of the last base that has been translated (end of sequence, with possibility that 1 base or 2 bases have been ignored, or base just before the first stop codon encountered), number of bases that have been translated, number of amino acids of the resulting protein (equal to the number of bases translated divided by 3, or this number diminished by the number of "gaps", resulting from the "translation of stop codons").

Working with proteins.

The application does not support protein input; they appear only as the result of RNA translation. Thus there are no Load and Random buttons near the protein display field and this field is read-only. Saving a protein sequence, on the other side, is, of course possible. What has been said for DNA concerning sequence saving also applies to proteins.

The only operation on proteins, you can do with DNABasics is the transformation of 1-letter code to 3-letter code, using the button 3-letter code. The number of characters per line of a 3-letter code protein sequence is the one defined in the Settings menu, rounded downwards to the next multiple of 3. This means that 1 line of DNA/RNA base codes or 1-letter amino acid codes will correspond to 3 lines of 3-letter amino acid codes (with amino acid codes never spanning over 2 lines).

Contact the author of DNABasics.

If you have any questions, correction/improvement suggestions, or feature extension ideas, please, don't hesitate to contact me, using the email given in the navigation pane. And, if you really like this application, or want to support me and my "free programs for everyone", please, visit streetinfo.lu and sign my guestbook.