gnupic: Thread: Re: [gnupic] Linker Optimizations

Subject: Re: [gnupic] Linker Optimizations
From: "Scott Dattalo" ####@####.####
Date: 2 Mar 2007 21:14:40 +0000
Message-Id: <61451.71.139.37.215.1172870052.squirrel@ruckus.brouhaha.com>

On Fri, 2007-03-02 at 12:47 -0600, David Barnett wrote:
>  I've always been frustrated about some big optimizations that can only
>  be made manually; the linker doesn't seem to have enough information
>  to optimize them.  HI-TECH will avoid these issues by using
>  "omnicode", meaning all source files are compiled at the same time,
>  and SourceBoost seems to skirt the problem with a custom object file
>  format.  The two that stand out to me are:
>  a) code-flow-related optimizations on page and bank selections
>  b) removal of unused functions and global variables
>

<snip>

David,

I agree with you fully (for whatever that's worth). In my opinion, the
linker should be capable of Bank selection, Simulator assertions for
banks, tail optimization, dead code elimination, peephole optimization,
etc. and much more.

I suggested at one point that the pCode optimizer I started in SDCC should
be ported into gputils. In it's first implementation, it did much of the
stuff we'd want. The pCode optimizer was written before the linker was
used too much. So consequently, it dealt with single C-files at a time.
When linker support became more prevalent, the pCode optimizer had to be
changed quite a bit.

The only issue I have is time...

Scott

Subject: Re: [gnupic] Linker Optimizations
From: "David Barnett" ####@####.####
Date: 2 Mar 2007 21:51:05 +0000
Message-Id: <0b0001c75d14$8667ff00$2001a8c0@barnett2>

----- Original Message ----- 
From: "Scott Dattalo" ####@####.####
To: ####@####.####
Sent: Friday, March 02, 2007 3:14 PM
Subject: Re: [gnupic] Linker Optimizations


> On Fri, 2007-03-02 at 12:47 -0600, David Barnett wrote:
>>  I've always been frustrated about some big optimizations that can only
>>  be made manually; the linker doesn't seem to have enough information
>>  to optimize them.  HI-TECH will avoid these issues by using
>>  "omnicode", meaning all source files are compiled at the same time,
>>  and SourceBoost seems to skirt the problem with a custom object file
>>  format.  The two that stand out to me are:
>>  a) code-flow-related optimizations on page and bank selections
>>  b) removal of unused functions and global variables
>>
>
> <snip>
>
> David,
>
> I agree with you fully (for whatever that's worth). In my opinion, the
> linker should be capable of Bank selection, Simulator assertions for
> banks, tail optimization, dead code elimination, peephole optimization,
> etc. and much more.
Well, I'm glad to hear you agree.  I figured this wouldn't be a priority for 
anyone else.

> I suggested at one point that the pCode optimizer I started in SDCC should
> be ported into gputils. In it's first implementation, it did much of the
> stuff we'd want. The pCode optimizer was written before the linker was
> used too much. So consequently, it dealt with single C-files at a time.
> When linker support became more prevalent, the pCode optimizer had to be
> changed quite a bit.
This sounds very interesting and ambitious.  I'm not quite sure what 
constitutes pCode, though.  For SDCC, that's what gets passed from the front 
end to the back end, right?  I ask because I wasn't aware you could extract 
pCode from pure assembly code.  As I mentioned in the last email, there's 
some flow information that you could extract from C, but I don't think from 
asm, that would be necessary for a lot of the optimizations.

I'd love to see any information you have about that suggestion to port the 
pCode optimizer.  It looks like gputils may be the best place for me to get 
started helping out.  My biggest ambition is to make some radical 
improvements to SDCC, eventually.

> The only issue I have is time...
And nobody blames you there.  You've put a ton of time into gnupic projects 
(Thanks!).  What you need is more of other people's time =).

David

Subject: Re: [gnupic] Linker Optimizations
From: Scott Dattalo ####@####.####
Date: 5 Mar 2007 17:46:23 +0000
Message-Id: <45EC5728.2070100@dattalo.com>

David Barnett wrote:
>   I'm not quite sure what constitutes pCode, though. 

pCode is a term I came up with. C-compilers (and SDCC included) have an 
intermediate code generation stage where the code there is called 
'iCode'. Prior to the PIC backend, the other SDCC ports went straight 
from iCode to assembly mnemonics. Once the assembly mnemonics have been 
generated, then the only optimization available was a rather limited 
peep hole optimizer.

When I added the first PIC port to SDCC, I saw that much further 
PIC-specific optimizations were possible. One approach I could have 
taken was to write a subsequent intermediate code optimizer. There was 
some precedence for this already with port- specific memory and register 
allocation generation stages. However, I saw further potential with an 
architecture specific optimizer.

In a nutshell, the pCode is a one-to-one mapping of an object to a PIC 
instruction. On the PIC backend, SDCC converts the iCode directly into 
pCode. If no further optimizations are applied, then the pCode can be 
converted in text and would be similar to the other ports. However, with 
the pCode representation it is possible (or at least easier) to apply 
optimizations. For example, I rewrote the peephole optimizer for the PIC 
backend to work with wild cards. This made it possible to write 
categories of rules  as opposed to whole bunch of specific, special case 
rules. But in general - each pass through one of the optimization stages 
will convert pCode into potentially better optimized pCode. And since 
the pCode has a one-to-one mapping onto PIC instructions, the pCode 
optimizer is working much like a PIC firmware engineer.

A few other features of the pCode are tail optimizations, live range 
analysis,  and call tree analysis. These features don't make much sense 
living in the compiler. It's best to apply these optimizations in the 
linker. Take the call tree analysis. If the linker performs a call tree 
analysis and determine that certain functions are never called - then no 
code needs to be generated for them. Or if it determines that a certain 
function is only called once, then that function can be inlined. What's 
really cool is that after inlining you can run the code through the 
pCode optimizer again. This subsequent pass may end up removing things 
like parameter passing.

You may wish to compare SDCC's approach with gcc. gcc has an 
intermediate code generation that's represented in a language they call 
RTL - Register Transfer Language. RTL is a Lisp like language. gcc is a 
multi-pass compiler. Each pass applies some specific optimization to the 
RTL. There are ways of writing RTL for specific passes. But the general 
idea is to apply some kind of optimization strategy to the RTL and  run 
through the RTL objects. The RTL objects either do nothing or convert 
existing RTL objects into new ones. SDCC's iCode is somewhat like RTL. 
However, gcc keeps the representation in RTL as long as possible. There 
is not necessarily a one-to-one mapping between an RTL object and a 
single assembly instruction.

If we were to move pCode into gplink, then this is probably the high 
level approach that should be taken.

- define a set of assembler directives that can control the linker.
For example, we'd need to tell the linker things like 'the follow block 
is really data table and not instructions' or 'the following code has 
been hand crafted to run at a very specific rate and should not be 
optimized' or etc...

- Introduce an pCode as a do-nothing intermediate stage.
It'll take time to modify gplink's infrastructure to accommodate 
something like pCode. I suspect there are several built in assumptions 
that need to dealt with. The 'do nothing' pCode provides a type of 
invariance stage where you can resolve these assumptions before 
introducing any of the pCode features.

- Introduce peep hole optimization
Once the code inside the linker is in a pCode format, then the simplest 
thing to try is the peephole optimizer.

It's going to take several man-months to accomplish these high level tasks.

Scott

Subject: Re: [gnupic] Linker Optimizations
From: "David Barnett" ####@####.####
Date: 5 Mar 2007 18:26:34 +0000
Message-Id: <0bde01c75f53$64887fe0$2001a8c0@barnett2>

----- Original Message ----- 
From: "Scott Dattalo" ####@####.####
To: ####@####.####
Sent: Monday, March 05, 2007 11:45 AM
Subject: Re: [gnupic] Linker Optimizations


> David Barnett wrote:
>>   I'm not quite sure what constitutes pCode, though.

<snip>

> A few other features of the pCode are tail optimizations, live range 
> analysis,  and call tree analysis. These features don't make much sense 
> living in the compiler. It's best to apply these optimizations in the 
> linker. Take the call tree analysis. If the linker performs a call tree 
> analysis and determine that certain functions are never called - then no 
> code needs to be generated for them. Or if it determines that a certain 
> function is only called once, then that function can be inlined. What's 
> really cool is that after inlining you can run the code through the pCode 
> optimizer again. This subsequent pass may end up removing things like 
> parameter passing.
Just these optimizations would be worth all the effort.  I'm used to working 
with an 8-level stack and I've had to explicitly inline all kinds of 
single-call functions just so that I could modularize the code.  This 
approach led to strange header file dependencies since the body has to be 
defined at each inline declaration, which in turn led to some severe 
headaches.

<snip>

> If we were to move pCode into gplink, then this is probably the high level 
> approach that should be taken.
>
> - define a set of assembler directives that can control the linker.
> For example, we'd need to tell the linker things like 'the follow block is 
> really data table and not instructions' or 'the following code has been 
> hand crafted to run at a very specific rate and should not be optimized' 
> or etc...
Good point.  I think we'll also want to make sure that most of the 
optimization needs to be explicitly enabled for backwards-compatibility.

<snip>

> It's going to take several man-months to accomplish these high level 
> tasks.
Sounds fun =).  If those function optimizations will work, it'll solve a lot 
of problems for me, so I'll definitely keep it in mind to hack at when I get 
some free time.

David

Subject: Re: [gnupic] Linker Optimizations
From: "David Barnett" ####@####.####
Date: 5 Mar 2007 18:59:21 +0000
Message-Id: <0bee01c75f57$ee967620$2001a8c0@barnett2>

----- Original Message ----- 
From: "David Barnett" ####@####.####
To: ####@####.####
Sent: Friday, March 02, 2007 3:48 PM
Subject: Re: [gnupic] Linker Optimizations
...
> I ask because I wasn't aware you could extract pCode from pure assembly 
> code.  As I mentioned in the last email, there's some flow information 
> that you could extract from C, but I don't think from asm, that would be 
> necessary for a lot of the optimizations.
...
> David

I think these concerns of mine might be significant, so I'll elaborate a 
bit.

For one thing, it's not always a straight-forward task to determine where a 
function begins or ends.  Often the programmer manually optimizes a tail 
call to a goto, and there may be several exit points including computed 
gotos.  That means that the exit points may be determined by the flow of 
execution, but there are all sorts of runtime complications to consider. 
The selected bank will not be known at the entry point(s), nor will any 
other registers (except possibly PCLATH).  The bank selection bits or PC 
registers (or even FSR) might be modified through INDF, and all that could 
lead to computed gotos with targets either internal or external to a 
function.

On the other hand, I guess things could probably just as complicated with C 
source code because, as a trivial example, it could include inline assembly 
code with the same issues.  If we can solve those problems from the 
assembler/linker end, that will cut out all of the concerns I had with 
getting the information from the compiler to the linker.  Still, it seems 
like quite a challenge to me.  Do you think it's manageable?

David

Subject: Re: [gnupic] Linker Optimizations
From: Scott Dattalo ####@####.####
Date: 5 Mar 2007 21:45:49 +0000
Message-Id: <45EC8F3F.7040102@dattalo.com>

David Barnett wrote:
> I think these concerns of mine might be significant, so I'll elaborate 
> a bit.
>
> For one thing, it's not always a straight-forward task to determine 
> where a function begins or ends.  Often the programmer manually 
> optimizes a tail call to a goto, and there may be several exit points 
> including computed gotos.  That means that the exit points may be 
> determined by the flow of execution, but there are all sorts of 
> runtime complications to consider. The selected bank will not be known 
> at the entry point(s), nor will any other registers (except possibly 
> PCLATH).  The bank selection bits or PC registers (or even FSR) might 
> be modified through INDF, and all that could lead to computed gotos 
> with targets either internal or external to a function.

One of the risks you'll face is that subtle source code changes have 
drastic impact on code generation. For example, adding a second call to 
some function may prohibit it from being inlined which in turn causes 
the 8-level stack to overflow. I suppose if you're counting on this type 
of optimization, then perhaps you'll need to ensure your assumptions are 
validated! (For example, you might need to create simulation regression 
tests.)

> On the other hand, I guess things could probably just as complicated 
> with C source code because, as a trivial example, it could include 
> inline assembly code with the same issues.  If we can solve those 
> problems from the assembler/linker end, that will cut out all of the 
> concerns I had with getting the information from the compiler to the 
> linker.  Still, it seems like quite a challenge to me.  Do you think 
> it's manageable?

There's no doubt that adding an optimizer to the linker will be a 
challenging job. As far as being manageable, I guess that depends on who 
is managing! If you're interested, then I'd suggest diving into 
gputils/gputils/gplink/ and try getting a basic understanding on how 
things are structured. In particular, try to understand how program 
memory and register memory are dealt with. See if it's possible to 
annotate the preexisting structures with new information you can use for 
optimization. After that, check out sdcc/sdcc/src/pic/ and attempt to 
get some basic understanding on how the pcode stuff is used.

Scott

Subject: Re: [gnupic] Linker Optimizations
From: Scott Dattalo ####@####.####
Date: 5 Mar 2007 23:54:08 +0000
Message-Id: <45ECAD60.80208@dattalo.com>

Michael Ambrus just posted this on the SID mailing list:

http://www.iecc.com/linker/

This is an online copy of a book "Linkers and Loaders" by John R. 
Levine. The book's web site is

http://linker.iecc.com/

Chapter 11 talks about advanced linker techniques like global optimization.

Scott

Subject: Re: [gnupic] Linker Optimizations
From: Ralph Corderoy ####@####.####
Date: 6 Mar 2007 10:59:49 +0000
Message-Id: <20070306105917.B491614A8B8@blake.inputplus.co.uk>

Hi Scott,

> Michael Ambrus just posted this on the SID mailing list:
> 
> http://www.iecc.com/linker/
> 
> This is an online copy of a book "Linkers and Loaders" by John R. 
> Levine. The book's web site is

It's a good book, I gave a little feedback on a draft IIRC.

Regarding great optimisation, people interested in compilers should look
at LLVM, http://llvm.org/  It's outstanding the optimisations they've
achieved, e.g. a malloc'd linked-list of structs might be turned into
one vector per struct member.  This removes the per-element malloc
overhead, and gives better CPU cache performance.  This isn't a dumb
translation;  it only does it if there's benefits.

Cheers,


Ralph.

Subject: Re: [gnupic] Linker Optimizations
From: David ####@####.####
Date: 9 Mar 2007 23:20:25 +0000
Message-Id: <20070309171717.0a9b0a3d@DEEPTHOUGHT.BARNET.net>

On Mon, 05 Mar 2007 13:44:31 -0800
Scott Dattalo ####@####.#### wrote:

> One of the risks you'll face is that subtle source code changes have 
> drastic impact on code generation. For example, adding a second call
> to some function may prohibit it from being inlined which in turn
> causes the 8-level stack to overflow. I suppose if you're counting on
> this type of optimization, then perhaps you'll need to ensure your
> assumptions are validated! (For example, you might need to create
> simulation regression tests.)
Most of the functions I'm dealing with are fundamentally designed to be
called from only one location.  They're things like a "button machine"
function to be run from a "main loop".  I know there are lots of other
scenarios where small changes have big effects, though.  Hopefully
in time we can make the optimizer smart enough not to get tripped up
with some of the more trivial things.

> There's no doubt that adding an optimizer to the linker will be a 
> challenging job. As far as being manageable, I guess that depends on
> who is managing! If you're interested, then I'd suggest diving into 
> gputils/gputils/gplink/ and try getting a basic understanding on how 
> things are structured. In particular, try to understand how program 
> memory and register memory are dealt with. See if it's possible to 
> annotate the preexisting structures with new information you can use
> for optimization.
FYI, I've been making my way through the headers for gplink and digging
through some of the code.  It's pretty simple and concise for the most
part.  I haven't seen anything dealing with memory or registers yet,
though.

> After that, check out sdcc/sdcc/src/pic/ and
> attempt to get some basic understanding on how the pcode stuff is
> used.
I glanced over that and it made me dizzy =).  I think I'll spend some
more time going through gplink before getting back to that.

BTW, how's your other project coming along?

David

Subject: Re: [gnupic] Linker Optimizations
From: Scott Dattalo ####@####.####
Date: 10 Mar 2007 00:13:43 +0000
Message-Id: <45F1F7F8.8000004@dattalo.com>

David wrote:
> FYI, I've been making my way through the headers for gplink and digging
> through some of the code.  It's pretty simple and concise for the most
> part.  I haven't seen anything dealing with memory or registers yet,
> though.
>   

Craig Franklin writes very clean code. Take a look at libgputils/ and in 
particularly at gpmemory.c

>> After that, check out sdcc/sdcc/src/pic/ and
>> attempt to get some basic understanding on how the pcode stuff is
>> used.
>>     
> I glanced over that and it made me dizzy =).  I think I'll spend some
> more time going through gplink before getting back to that.
>   

Craig Franklin writes cleaner code than I do! :) The pCode structures 
are a lot like C++ classes.

> BTW, how's your other project coming along?
>   

If you're referring to the Graphic Display based on the SSD0323, then I 
can report that I've got all of the interfaces working. Initially there 
was only the '8080' mode, now there's the 6800 mode and the SPI mode. By 
the way, the SSP code Roy Rankin added a year ago works perfectly (as 
far as I can tell). I written a sample .asm file that tests all three 
modes. I haven't implemented all of the high level functionality of the 
SSD0323. But I do support the essential commands like set row and column 
ranges.

This work is for a side contract job. That job will probably eat into my 
gpsim development time (but the pay is better!).

The other project that I haven't made any significant progress on over 
the last few weeks is the multiple processor feature. I've made a few 
really minor tweaks to the symbol table code (I rewrote the symbol table 
code as part of the multiple processor effort).

Scott