Retro Programming: Optimising Assembly Like an 80's Hacker

Sunday, 21 September 2008

Optimising Assembly Like an 80's Hacker

Forget about fancy algorithms and data structures. If you want respect as an 80's hacker, follow these simple tips.

Never get caught setting a register to zero without using xor:

Z80 Code

ld a,0           ; bad, 2 bytes / 7 cycles

xor a            ; good, 1 byte / 4 cycles

8088 Code

mov ax,0         ; bad, 3 bytes / 4 cycles

xor ax,ax        ; good, 2 bytes / 3 cycles

Never set two 8 bit register independently. Code readability is not required:

Z80 Code

ld b,10          ; bad, 4 bytes / 14 cycles
ld c,32

ld bc,10*256+32  ; good, 3 bytes / 11 cycles

8088 Code

mov ch,10        ; bad, 4 bytes / 8 cycles
mov cl,32

mov cx,10*256+32 ; good, 3 bytes / 4 cycles

Never compare to zero:

Z80 Code

cp 0             ; bad, 2 bytes / 7 cycles

or a             ; good, 1 byte / 4 cycles

8088 Code

cmp ax,0         ; bad, 3 bytes / 4 cycles

test ax,ax       ; good, 2 bytes / 3 cycles

Remember, you don't need to worry about code alignment, order of instructions or processor penalties. Follow these simple tips and your super-optimised bubble sort will demand the utmost respect!

34 comments:

Anonymous26 October 2008 at 02:30
Great Post!

I was sorting my office out the other day and came across my first "useful" program which was written completely in assembler! I'll be putting it on my blog in the near future....! :)

-John.
ReplyDelete
Replies
Anonymous6 February 2009 at 09:51
I try and tell the "kids" of today about saving cycles and memory contrants, they just don't get it.
ReplyDelete
Replies
Anonymous13 February 2009 at 03:14
But don't forget: Premature optimalization is root of all evil.
ReplyDelete
Replies
Anonymous13 February 2009 at 03:51
So who said what to anger you into writing this post?
ReplyDelete
Replies
Harold Fowler13 February 2009 at 04:16
Wow, hacking and phreaking in the 80s was so much fun. Now everything is a federal offense so why bother!

RT
www.anon-tools.us.tc
ReplyDelete
Replies
Anonymous13 February 2009 at 04:19
Most time such low-level optimization is exaggerated. Rather spend more time on software design.
ReplyDelete
Replies
Anonymous13 February 2009 at 05:26
To the guys saying "Most time such low-level optimization is exaggerated" or stuff like that, remember that the compiler _is_ using xor instead of mov and stuff like that on platforms where it matters.
ReplyDelete
Replies
Unknown13 February 2009 at 05:28
Optimizing Z80 and 8088 assembler was how I spent a great deal of my time in the 1980'ies. Still very useful when you need to program single chip systems and micro controllers.
ReplyDelete
Replies
Rich Moore13 February 2009 at 05:31
Another one is never have a JSR followed by a RET. Use a JMP instead saving you lots of cycles on a 6502 (and I'd guess the same with the equivalent 8086 instruction).
ReplyDelete
Replies
Unknown13 February 2009 at 05:32
This kind of optimization doesn't require additional efforts, you just use these operations instead of the others. That's it!
Try to disassemble any C/Delphi etc. program, you won't EVER see "mov ax,0"!
You'll just get used to reading "xor ax,ax" as "ax=0".
ReplyDelete
Replies
Anonymous13 February 2009 at 05:34
There are some who belittle this type of optimization, but they are often the same ones who produce huge bloated code that wipes out 1MB L2 caches and slows 2GHz machines to a crawl.

More a project similar in essence to your ideal, google for "fbui".
ReplyDelete
Replies
Anonymous13 February 2009 at 06:04
@most commentators:

Didn't you read the last paragraph... or didn't you get the irony in it?
ReplyDelete
Replies
Unknown13 February 2009 at 06:11
Well I think most of us got it.
Just, I was replying to the comments, not to the article itself ;)
ReplyDelete
Replies
Anonymous13 February 2009 at 06:34
What WAS the point of the last paragraph? Making fun of programmers in the eighties for not optimizing for processor features they didn't have? Making fun of programmers now for writing assembly code like it was still the eighties? Neither makes one lick of sense.
ReplyDelete
Replies
Anonymous13 February 2009 at 06:42
Anonymous (is that your real name?), I think he was poking fun at the ridiculous of the article itself. I.e. the article contains some cool tricks but the gains tend to pale into insignificance beside the larger bottlenecks one tends to find.
ReplyDelete
Replies
Anonymous13 February 2009 at 06:56
Don't we have optimizing compilers for these kinds of things nowadays?
ReplyDelete
Replies
Anonymous13 February 2009 at 08:19
Optimizing Assembly is so passe...
ReplyDelete
Replies
Anonymous13 February 2009 at 08:23
I remember having to do that!

Now I can usually trust any decent 'C' compiler to do most of the job and the low cost of high performance silicon to do the rest.

I recently had to code an 8051 emulator, cross platform, but optimised for MIPS. I was surprised at how much performance I gained (~50%) after examining the object code and implementing a few simple tweaks (minimal inline asm, mainly type-forcing).

It's fast enough now, but I keep looking at the code, knowing that I could get another 200% by hand coding in asm.

Also - Some may accuse me of heresy, but MSC (7 or 8) produces *much* better 80x86 code than GCC.
ReplyDelete
Replies
Anonymous13 February 2009 at 08:33
The hell kind of 80s hacker doesn't code on a 6502?
ReplyDelete
Replies
Anonymous13 February 2009 at 08:41
6502 -

Apple II - too trendy, too expensive

Commodore 64 - too WalMart, too limited

Plus the Z80 was a much better processor.
ReplyDelete
Replies
Anonymous13 February 2009 at 09:09
8051 FTW!!!
ReplyDelete
Replies
Anonymous13 February 2009 at 10:52
8031/8051 and 8048, the worst instruction sets of all time!
ReplyDelete
Replies
Anonymous13 February 2009 at 10:54
Another 6502 tip - use "zero page" for frequently-access variables. One fewer clock cycle for the lookup. (50% savings! w00t!)

LDA $1001 ;2 cycle fetch
LDA $01 ;1 cycle fetch

--JT
ReplyDelete
Replies
Anonymous13 February 2009 at 10:59
well, then 8052 FTW!!!
ReplyDelete
Replies
Anonymous13 February 2009 at 11:16
Shall we play a game?
ReplyDelete
Replies
Anonymous13 February 2009 at 11:27
only an 8 bit game then
ReplyDelete
Replies
Anonymous13 February 2009 at 11:35
If you're a bit interested in compiler's source optimization: http://www.fefe.de/know-your-compiler.pdf
Gist: fast code = important; readable code = more important
ReplyDelete
Replies
Anonymous13 February 2009 at 11:38
6809 rules. Has a direct page instead of a zero page.
ReplyDelete
Replies
Henning13 February 2009 at 13:28
Another nice trick was to use shl/shr to multiply/divide by a power of 2.
ReplyDelete
Replies
Ian Duncan13 February 2009 at 16:39
Sad thing is that I know most of these tricks but I don't have much luck keeping the stack aligned. Anyone have some advice for me?
ReplyDelete
Replies
Anonymous19 February 2009 at 19:34
I guess a good question would be why program in assembler? When I 1st started programming, computers were 16x16 and $10,000 (Model 80), so it can't be for speed (64x since then with a good C compiler running only 2x behind). I think its for the thrill of direct control (down to the engine room).
ReplyDelete
Replies
Mikkel Alan Stokkebye Christiansen3 December 2009 at 08:54
LD bc,nn is 10 cycles if i'm not mistaken.
Your timing for the 8088 is only the cycles used in the kernel of the cpu, not the actual cycles it takes.
0 1 2 3 4 5 6 Prefetched
16:1 12:1 8:1 4:1 4:2 4:3 4:4 MOV AX,0
11:¾ 7:¾ 3:¾ 3:1¾ 3:2¾ 3:3¾ 3:4¾ XOR AX,AX
20:1 16:1 12:1 8:1 4:1 4:2 4:3 MOV CH,10!MOV CL,32
16:1 12:1 8:1 4:1 4:2 4:3 4:4 MOV CX,10*256+32
16:1 12:1 8:1 4:1 4:2 4:3 4:4 CMP AX,0
11:¾ 7:¾ 3:¾ 3:1¾ 3:2¾ 3:3¾ 3:4¾ TEST AX,AX
The number after the : is the number of bytes in the prefetch queue. ¾ means, that it will take 1 cycle more to get the next byte read.
ReplyDelete
Replies
Anonymous22 April 2011 at 07:07
6809 was the best designed 8-bit processor every made!

When writing ROM code for the Tandy CoCo, a great optimization was:
PUSH A
PUSH B
... rest of subroutine ...
PULS A,B,PC
register restore + RTS in 1 instruction!
ReplyDelete
Replies
Anonymous18 April 2012 at 12:44
Also, the 6809 is a great Forth processor as it has two stack pointers.
ReplyDelete
Replies

Add comment

Note: only a member of this blog may post a comment.