Avenger - Assembly Language Syntax v0.1
Avenger is an online multipass assembler that currently supports the MOS 6502 used in Cerberus 2080 and many other 8 bit computers. You can try it out here.
Avenger is written in Javascript (ES6), and uses the Ace Editor, with UI supported by JQuery.
File Structure
An assembly file is a plain text file with lines that:- Are blank, or
- Have a comment, starting with a semi-colon.
; This is a comment
- Contain a single machine instruction, for example
LDA #$23
, or - Contain a single assembler command, for example
ORG $202
Lines may optionally be prefixed with a label, or end with a comment (starting with a semicolon).
Instruction Syntax
6502 instructions are all three letters long, and can be either all lowercase, or all uppercase (for exampleLDA
or lda
). They may optionally be followed by an address or operand. The 6502
has twelve addressing modes:
INY
- Implied, no operand suppliedLDA #45
- Immediate, with a single byte operandLDA $1234
- Absolute, using a two byte addressLDA $1234,X
- Absolute, X indexedLDA $1234,Y
- Absolute, Y indexedLDA $12
- Zero page, using a single byte addressLDA $12,X
- Zero page, X indexedLDA $12,Y
- Zero page, Y indexedLDA ($1234)
- IndirectLDA ($12,X)
- X indexed, indirectLDA ($12),Y
- Indirect, Y indexedBRA $12
- Relative (branches)
Numbers and Expressions
By default, numbers in Avenger are decimal. Binary numbers are prefixed with either %
or 0b
.
Hexadecimal numbers are prefixed with either $
or 0x
. The ASCII value of a single character
can be referenced by enclosing it in single quotes, for instance 'a'
has a value of 97 (decimal). Special
characters are escaped with a backslash (\
) character - so '\n'
has a value of 10 (decimal)
and '\''
or '\\'
have the values of 39 and 92 respectively.
Expressions are evaluated with normal precedence rules. Brackets can be used to override these. Expressions are calculated as 32-bit integers. Note that division is therefore an integer operation with the result being truncated.
Operator | Precedence | Function |
---|---|---|
|| | 0 | Logical OR |
&& | 0 | Logical AND |
| | 1 | Bitwise OR |
^ | 1 | Bitwise Exclusive-OR |
& | 2 | Bitwise AND |
== != | 3 | Equals, not equals |
< <= => > | 4 | Comparisons |
<< >> | 5 | Bit shift left, right |
+ - | 6 | Add, subtract |
* / | 8 | Multiply, divide |
~ | 8 | Bitwise complement |
! | 8 | Logical not |
** | 10 | Raise to power |
Logical operators return a value of 0
(false) or 1
(true). The expression parser interprets
any non-zero value as true, and zero as false. In addition, the constants true
and false
can be used.
Zero Page and Ambiguous Values
Where an expression is used as an operand, there are a number of addressing modes that have ambiguous syntax. For example absolute addressing and zero page addressing both take a single numeric expression as an operand. By default, Avenger will attempt to use zero page addressing in ambiguous instructions, so long as the expression fits within a single byte value.
LDA $34
- Operand fits within a byte, zero page assumedLDA $2303
- Operand exceeds 8 bits, absolute addressing assumed
If the default behaviour is not wanted, it can be overridden. Prefixing an expression with
<
will force zero page addressing to be used, taking only the lower byte of the expression as the
effective address. Alternately, prefixing an expression with !
will force absolute addressing,
extending the expression value to two bytes as appropriate.
LDA !$34
- Force absolute address of $0034LDA <$2303
- Force zero page address of $03 (lowest 8 bits)
Byte Values for Immediate Mode
Immediate mode allows single byte values to be loaded in the A, X or Y registers. If the value of an expression
does not fit in a byte (it is less than -128, or greater than 255), an error will be raised. For larger values
(such as words or address labels), the >
and <
prefixes allow the high and low bytes
to be selected.
For example:
LDA #<$1234
LDX #>$1234
- Loads the accumulator with $34, the low byte of $1234 and X with $12, the high byte.
Labels
A label is used to represent a value or address. Labels may start with a full-stop .
, an upper or
lowecase letter, or an underscore. They may then have any sequence of upper or lowerase letters, digits or
underscores. As examples my_label
, .local_label
and _Label123
are all
valid labels. Labels are case sensitive.
Address labels appear by themselves on a line, or at the start of an instruction line. Unless they are prefixed by a full-stop, they are global and cannot be redefined. Between global address labels, local labels can be used. These are prefixed with a full-stop and only need to be unique within their local scope (ie. between global labels).
Optionally, address labels may have a colon (:
) after them where they are defined for compatibility with other assemblers. The colon is ignored in this case.
Commands
Avenger has a small command set that is growing as the software is developed. Commands are either all uppercase, or all lowercase letters (mixed case commands are not recognised). These are the commands currently handled by the assembler:
ORG - Set Origin
Before code or data instructions can be assembled, Avenger requires that an origin (start address) is specified. The ORG command takes a single (constant) expression that indicates the address of the first byte of the assembled code. Note that an assembled program need not have only one origin - it is possible to have multiple blocks of code within one program that start at different addresses in the computer's memory. In this case, each code block can be started with a different ORG command.
Example:
ORG $202
LDA #0
- start code at address 202 (hexadecimal), the start address of programs in Cerberus.
EQU, = - Equate
As well as being used to represent addresses in a program, labels can be used in place of numeric constants to make
it easier to understand what the code does. The EQU
or =
command can be used to set a new label
to a constant value.
Example:
hi_score EQU 100
LDA #hi_score
- define the label hi_score
to be equal to 100.
.BYTE, DB - Byte data
Our programs often need data. The .BYTE
or DB
command allows one or more bytes of data to be specified as
numeric expressions. Each byte of data is separated by a comma. Note that byte values are taken from the lower 8 bits
of each expression - higher bits are ignored.
Example:
.BYTE $1, 'c', $1234
- Creates the three byte sequence $01, $63, $34
.WORD, DW - Word data
Just as the .BYTE
command handles byte data, the .WORD
or DW
command allows one
or more words (16 bit values) to be specified. Words are stored in memory in little endian format - that is the lower byte
of the word is stored first, followed by the higher byte. The .WORD
command will therefore store two bytes
for each operand.
Example:
.WORD $1, $1234
- Stores the four byte sequence $01, $00, $34, $12