Avenger - Assembly Language Syntax v0.1

Avenger is an online multipass assembler that currently supports the MOS 6502 used in Cerberus 2080 and many other 8 bit computers. You can try it out here.

Avenger is written in Javascript (ES6), and uses the Ace Editor, with UI supported by JQuery.

File Structure

An assembly file is a plain text file with lines that:

Lines may optionally be prefixed with a label, or end with a comment (starting with a semicolon).

Instruction Syntax

6502 instructions are all three letters long, and can be either all lowercase, or all uppercase (for example LDA or lda). They may optionally be followed by an address or operand. The 6502 has twelve addressing modes:

Numbers and Expressions

By default, numbers in Avenger are decimal. Binary numbers are prefixed with either % or 0b. Hexadecimal numbers are prefixed with either $ or 0x. The ASCII value of a single character can be referenced by enclosing it in single quotes, for instance 'a' has a value of 97 (decimal). Special characters are escaped with a backslash (\) character - so '\n' has a value of 10 (decimal) and '\'' or '\\' have the values of 39 and 92 respectively.

Expressions are evaluated with normal precedence rules. Brackets can be used to override these. Expressions are calculated as 32-bit integers. Note that division is therefore an integer operation with the result being truncated.

OperatorPrecedenceFunction
||0Logical OR
&&0Logical AND
|1Bitwise OR
^1Bitwise Exclusive-OR
&2Bitwise AND
== !=3Equals, not equals
< <= => >4Comparisons
<< >>5Bit shift left, right
+ -6Add, subtract
* /8Multiply, divide
~8Bitwise complement
!8Logical not
**10Raise to power

Logical operators return a value of 0 (false) or 1 (true). The expression parser interprets any non-zero value as true, and zero as false. In addition, the constants true and false can be used.

Zero Page and Ambiguous Values

Where an expression is used as an operand, there are a number of addressing modes that have ambiguous syntax. For example absolute addressing and zero page addressing both take a single numeric expression as an operand. By default, Avenger will attempt to use zero page addressing in ambiguous instructions, so long as the expression fits within a single byte value.

If the default behaviour is not wanted, it can be overridden. Prefixing an expression with < will force zero page addressing to be used, taking only the lower byte of the expression as the effective address. Alternately, prefixing an expression with ! will force absolute addressing, extending the expression value to two bytes as appropriate.

Byte Values for Immediate Mode

Immediate mode allows single byte values to be loaded in the A, X or Y registers. If the value of an expression does not fit in a byte (it is less than -128, or greater than 255), an error will be raised. For larger values (such as words or address labels), the > and < prefixes allow the high and low bytes to be selected.

For example:

Labels

A label is used to represent a value or address. Labels may start with a full-stop ., an upper or lowecase letter, or an underscore. They may then have any sequence of upper or lowerase letters, digits or underscores. As examples my_label, .local_label and _Label123 are all valid labels. Labels are case sensitive.

Address labels appear by themselves on a line, or at the start of an instruction line. Unless they are prefixed by a full-stop, they are global and cannot be redefined. Between global address labels, local labels can be used. These are prefixed with a full-stop and only need to be unique within their local scope (ie. between global labels).

Optionally, address labels may have a colon (:) after them where they are defined for compatibility with other assemblers. The colon is ignored in this case.

Commands

Avenger has a small command set that is growing as the software is developed. Commands are either all uppercase, or all lowercase letters (mixed case commands are not recognised). These are the commands currently handled by the assembler:

ORG - Set Origin

Before code or data instructions can be assembled, Avenger requires that an origin (start address) is specified. The ORG command takes a single (constant) expression that indicates the address of the first byte of the assembled code. Note that an assembled program need not have only one origin - it is possible to have multiple blocks of code within one program that start at different addresses in the computer's memory. In this case, each code block can be started with a different ORG command.

Example:

EQU, = - Equate

As well as being used to represent addresses in a program, labels can be used in place of numeric constants to make it easier to understand what the code does. The EQU or = command can be used to set a new label to a constant value.

Example:

.BYTE, DB - Byte data

Our programs often need data. The .BYTE or DB command allows one or more bytes of data to be specified as numeric expressions. Each byte of data is separated by a comma. Note that byte values are taken from the lower 8 bits of each expression - higher bits are ignored.

Example:

.WORD, DW - Word data

Just as the .BYTE command handles byte data, the .WORD or DW command allows one or more words (16 bit values) to be specified. Words are stored in memory in little endian format - that is the lower byte of the word is stored first, followed by the higher byte. The .WORD command will therefore store two bytes for each operand.

Example: