Avenger - Assembly Language Syntax v0.1
File StructureAn assembly file is a plain text file with lines that:
- Are blank, or
- Have a comment, starting with a semi-colon.
; This is a comment
- Contain a single machine instruction, for example
LDA #$23, or
- Contain a single assembler command, for example
Lines may optionally be prefixed with a label, or end with a comment (starting with a semicolon).
Instruction Syntax6502 instructions are all three letters long, and can be either all lowercase, or all uppercase (for example
lda). They may optionally be followed by an address or operand. The 6502 has twelve addressing modes:
INY- Implied, no operand supplied
LDA #45- Immediate, with a single byte operand
LDA $1234- Absolute, using a two byte address
LDA $1234,X- Absolute, X indexed
LDA $1234,Y- Absolute, Y indexed
LDA $12- Zero page, using a single byte address
LDA $12,X- Zero page, X indexed
LDA $12,Y- Zero page, Y indexed
LDA ($1234)- Indirect
LDA ($12,X)- X indexed, indirect
LDA ($12),Y- Indirect, Y indexed
BRA $12- Relative (branches)
Numbers and Expressions
By default, numbers in Avenger are decimal. Binary numbers are prefixed with either
Hexadecimal numbers are prefixed with either
0x. The ASCII value of a single character
can be referenced by enclosing it in single quotes, for instance
'a' has a value of 97 (decimal). Special
characters are escaped with a backslash (
\) character - so
'\n' has a value of 10 (decimal)
'\\' have the values of 39 and 92 respectively.
Expressions are evaluated with normal precedence rules. Brackets can be used to override these. Expressions are calculated as 32-bit integers. Note that division is therefore an integer operation with the result being truncated.
|3||Equals, not equals|
|5||Bit shift left, right|
|10||Raise to power|
Logical operators return a value of
0 (false) or
1 (true). The expression parser interprets
any non-zero value as true, and zero as false. In addition, the constants
false can be used.
Zero Page and Ambiguous Values
Where an expression is used as an operand, there are a number of addressing modes that have ambiguous syntax. For example absolute addressing and zero page addressing both take a single numeric expression as an operand. By default, Avenger will attempt to use zero page addressing in ambiguous instructions, so long as the expression fits within a single byte value.
LDA $34- Operand fits within a byte, zero page assumed
LDA $2303- Operand exceeds 8 bits, absolute addressing assumed
If the default behaviour is not wanted, it can be overridden. Prefixing an expression with
< will force zero page addressing to be used, taking only the lower byte of the expression as the
effective address. Alternately, prefixing an expression with
! will force absolute addressing,
extending the expression value to two bytes as appropriate.
LDA !$34- Force absolute address of $0034
LDA <$2303- Force zero page address of $03 (lowest 8 bits)
Byte Values for Immediate Mode
Immediate mode allows single byte values to be loaded in the A, X or Y registers. If the value of an expression
does not fit in a byte (it is less than -128, or greater than 255), an error will be raised. For larger values
(such as words or address labels), the
< prefixes allow the high and low bytes
to be selected.
- Loads the accumulator with $34, the low byte of $1234 and X with $12, the high byte.
A label is used to represent a value or address. Labels may start with a full-stop
., an upper or
lowecase letter, or an underscore. They may then have any sequence of upper or lowerase letters, digits or
underscores. As examples
_Label123 are all
valid labels. Labels are case sensitive.
Address labels appear by themselves on a line, or at the start of an instruction line. Unless they are prefixed by a full-stop, they are global and cannot be redefined. Between global address labels, local labels can be used. These are prefixed with a full-stop and only need to be unique within their local scope (ie. between global labels).
Optionally, address labels may have a colon (
:) after them where they are defined for compatibility with other assemblers. The colon is ignored in this case.
Avenger has a small command set that is growing as the software is developed. Commands are either all uppercase, or all lowercase letters (mixed case commands are not recognised). These are the commands currently handled by the assembler:
ORG - Set Origin
Before code or data instructions can be assembled, Avenger requires that an origin (start address) is specified. The ORG command takes a single (constant) expression that indicates the address of the first byte of the assembled code. Note that an assembled program need not have only one origin - it is possible to have multiple blocks of code within one program that start at different addresses in the computer's memory. In this case, each code block can be started with a different ORG command.
- start code at address 202 (hexadecimal), the start address of programs in Cerberus.
EQU, = - Equate
As well as being used to represent addresses in a program, labels can be used in place of numeric constants to make
it easier to understand what the code does. The
= command can be used to set a new label
to a constant value.
- define the label
hi_score EQU 100
hi_scoreto be equal to 100.
.BYTE, DB - Byte data
Our programs often need data. The
DB command allows one or more bytes of data to be specified as
numeric expressions. Each byte of data is separated by a comma. Note that byte values are taken from the lower 8 bits
of each expression - higher bits are ignored.
- Creates the three byte sequence $01, $63, $34
.BYTE $1, 'c', $1234
.WORD, DW - Word data
Just as the
.BYTE command handles byte data, the
DW command allows one
or more words (16 bit values) to be specified. Words are stored in memory in little endian format - that is the lower byte
of the word is stored first, followed by the higher byte. The
.WORD command will therefore store two bytes
for each operand.
- Stores the four byte sequence $01, $00, $34, $12
.WORD $1, $1234