The “direct” addressing quest!

For several reasons (mainly optimization) Peer* and I wanted to include the ability to use direct adressing in “C”.

Background

The 6809 microprocessor can address memory location in two ways:
(actually there are many more – but these are not if interest right now)

Extended

This is more or less the default adressing mode, since the address space of the microprocessor is 16bit, there can be addresses from 0 – 65535.
In assembler to load the value of a byte stored at address $d000 will be written as:

lda $d000 the machine code for that is: “F6 D0 00” the instruction takes 5 cycles to complete.

This is pretty straight forward.

Direct

The 6809 also has an address mode “direct”. This is similar to other processors “zero page” concept – only that the “zero page” can be at any 256 page boundary. The “page” that is to be used with direct addressing can be set to a special register, the “dp” (direct page) register.

The address for a direct page access is calculated like: (256*dp) + (address%256).

Using the above given example you can set the dp register to the “d0” page. The instruction

lda $d000 will than result in a machine code like: “D6 00“.

(If you program in assembler – the assembler can be told with some pseudo opcode that the direct page is set to a certain value and it will do the translation for you – in this case the assembler will know dp = d0, use the above formular and use the resulting values with the machine code representation of a “direct” lda).

The resulting instruction has some advantages:

1) it is one byte shorter

2) it is one cycle faster

Attention!

If you use direct addressing – ALWAYS make sure the correct value is set to the DP register, otherwise all kinds of awfull stuff can happen! (this is also true for any “C” – solution!)

Why is that interesting in “C”?

Because in “C” you also want to optimize your programs. And using assembler in the midst of your program is not always a “clean” solution.

 

First step

A tiny little first step. The realization, that it is not possible with the current “C” setup represented in Vide (or the original “C” setup that Peer did) to use direct addressing.

Second – Peeping

For several reasons I implemented in Vide a “search & replace” mechanism (kindly called Peephole), with that one can change the assembler code generated by gcc. With that it was (and still is – theoretically) possible to define patterns and rules that allow an examination of the source code and replace generated “extended” code with “direct” code.

This does work, but always felt “hacky” and the user only has the ability to enable or disable the complete mechanic and thus does not have much controll over it.

Third – Commenting

The third try is somewhat related to the second try. It basically resolves to inserting a special comment to a “C” line. The comment is “transported” to the assembler sources and the sources again can be examined and searched/replaced.

The only trustworthy mechanism to enter comments in “C” that find its way to the generated assembler sources is to use the “asm” statement. So after some macro magic, a resulting “C” line might look like:

VIA_t1_cnt_hi = 0; asm(“; use direct page”);

Thus the exact line could easily be identified and only that line could be changed to “direct” addressing.

BUT

The asm statement is on the one hand very powerfull on the other hand it can be awkward. GCC has no inert ability to know what kind of code an asm statement contains.
For that reason gcc always assumes the worst about it. Assuming the worst in this case means (at one stage), that it assumes each asm statement has a length of 4 bytes (since this is the largest amount of bytes a statement of a 6809 processor can have).
Also it seems the “worst” is multiplied by the number of “;” (or “\n”) that occur in the asm statement.

Why is that “interesting”?
Because we try to OPTIMIZE the generated code!

Our first assumption was, that this would not change the generated sources (other than inserting a comment). With above knowledge we now know we were wrong.

The thing is, that if there are a couple of these statements in the sources – gcc counts all the single statements with at least 4 bytes length. And the sum of the statements, can result in short branches being converted to long branches (which use more cycles and are one byte longer) – and thus can result in a pessimation instead of an optimization.

So – while this – as “the second” – was kind of working – it also had some disturbing quirks.

Final try – Success

The final try is all Peers doing – I often tend to take a solution that is “good enough” – Peer not so…

This solution is very clean for the enduser – the implementation is… somewhat “involved”. I’ll try to explain it as easily as I may.

We discovered early on – that gcc does not really use the direct page at all. The only “hint” for using it was found when looking at “soft registers”. After further examination of the subject Peer found out, that gcc has a default “direct” area located at page 0 (as with other similar old processors a “zero page”).

Using that knowledge and examining the abilities of the linker we use…

Peer implemented all known Vectrex RAM/ROM addresses that might at some stage be “direct page” relevant two times.

Example:

The vectrex VIA chip is accessed using memory locations, the relevant memory locations go from $d000 – $d00f. The “variable” VIA_t1_cnt_hi e.g. is located at address $d005.

Extended definition

The “extended” (normal) version of that lcoation is now (in “C”) defined as:

volatile int VIA_t1_cnt_hi __attribute__((section(".dpd0"), used)); // 0xD005, VIA timer 1 count register hi

(Note: Above notation was introduced in Vide 2.0 RC 15 – before all Vectrex variables were handled quite differently.
Above source (along with the other variables) is precompiled and linked to your “C” program. All vectrex variables (IO, RAM, ROM) by now have their own area and are made known to your own program during link time. Since these are just address definitions and no actual sources are included these areas can be “overlayed” (by the linker) without actually producing a larger binary.)

In your code you can use VIA_t1_cnt_hi just as you would use any other variable (read/write etc).

 

Direct definition

The “direct” version of that lcoation is now (in “C”) defined as:

volatile int dp_VIA_t1_cnt_hi __attribute__((section("direct"), used)); // 0xD005, VIA timer 1 count register hi

The mechanism to “notify” the compiler of using dp lies in the different variable name, all direct page access variables have the prefix “dp_”.
In your code you can use dp_VIA_t1_cnt_hi just as you would use any other variable (read/write etc).

The difference between the two version lies in the attribute and the there defined “section”. The dp – version lies in section “direct”, which starts at address 0 (zero), the “extended” – version lies in section “.dpd0” which starts at address $d000.

 

GCC “automatically” uses direct addressing for all “things” that lie in the “direct” area – so the generated code for “dp_” variables always uses “direct” addressing. If the user (programmer) DID make sure that the dp-register points to the right page, than the “zero page” magically moves to the right place and you can access address $d005 by accessing actually direct address $0005.

Peer did a lot of work and for all RAM/IO/ROM locations needed, implemented both direct and extended “variable” definitions in respective areas.
All dp_ variable definitions of all adresses use the direct page (zero). All different “zero page” definitions are again “overlayed” by the linker. So the user (programmer) actually never has to worry about things.

Excellent work Peer!

 

* Prof. Dr. Peer Johannsen of Pforzheim University, Germany.