mirror of
https://git.FreeBSD.org/src.git
synced 2024-12-28 11:57:28 +00:00
a4cd5630b0
non-i386, non-unix, and generatable files have been trimmed, but can easily be added in later if needed. gcc-2.7.2.1 will follow shortly, it's a very small delta to this and it's handy to have both available for reference for such little cost. The freebsd-specific changes will then be committed, and once the dust has settled, the bmakefiles will be committed to use this code.
449 lines
18 KiB
Plaintext
449 lines
18 KiB
Plaintext
0. Improved efficiency.
|
|
|
|
* Parse and output array initializers an element at a time, freeing
|
|
storage after each, instead of parsing the whole initializer first and
|
|
then outputting. This would reduce memory usage for large
|
|
initializers.
|
|
|
|
* See if the techniques describe in Oct 1991 SIGPLAN Notices
|
|
(Frazer and Hanson) are applicable to GCC.
|
|
|
|
1. Better optimization.
|
|
|
|
* Constants in unused inline functions
|
|
|
|
It would be nice to delay output of string constants so that string
|
|
constants mentioned in unused inline functions are never generated.
|
|
Perhaps this would also take care of string constants in dead code.
|
|
|
|
The difficulty is in finding a clean way for the RTL which refers
|
|
to the constant (currently, only by an assembler symbol name)
|
|
to point to the constant and cause it to be output.
|
|
|
|
* More cse
|
|
|
|
The techniques for doing full global cse are described in the red
|
|
dragon book, or (a different version) in Frederick Chow's thesis from
|
|
Stanford. It is likely to be slow and use a lot of memory, but it
|
|
might be worth offering as an additional option.
|
|
|
|
It is probably possible to extend cse to a few very frequent cases
|
|
without so much expense.
|
|
|
|
For example, it is not very hard to handle cse through if-then
|
|
statements with no else clauses. Here's how to do it. On reaching a
|
|
label, notice that the label's use-count is 1 and that the last
|
|
preceding jump jumps conditionally to this label. Now you know it
|
|
is a simple if-then statement. Remove from the hash table
|
|
all the expressions that were entered since that jump insn
|
|
and you can continue with cse.
|
|
|
|
It is probably not hard to handle cse from the end of a loop
|
|
around to the beginning, and a few loops would be greatly sped
|
|
up by this.
|
|
|
|
* Optimize a sequence of if statements whose conditions are exclusive.
|
|
|
|
It is possible to optimize
|
|
|
|
if (x == 1) ...;
|
|
if (x == 2) ...;
|
|
if (x == 3) ...;
|
|
|
|
into
|
|
|
|
if (x == 1) ...;
|
|
else if (x == 2) ...;
|
|
else if (x == 3) ...;
|
|
|
|
provided that x is not altered by the contents of the if statements.
|
|
|
|
It's not certain whether this is worth doing. Perhaps programmers
|
|
nearly always write the else's themselves, leaving few opportunities
|
|
to improve anything.
|
|
|
|
* Un-cse.
|
|
|
|
Perhaps we should have an un-cse step right after cse, which tries to
|
|
replace a reg with its value if the value can be substituted for the
|
|
reg everywhere, if that looks like an improvement. Which is if the
|
|
reg is used only a few times. Use rtx_cost to determine if the
|
|
change is really an improvement.
|
|
|
|
* Clean up how cse works.
|
|
|
|
The scheme is that each value has just one hash entry. The
|
|
first_same_value and next_same_value chains are no longer needed.
|
|
|
|
For arithmetic, each hash table elt has the following slots:
|
|
|
|
* Operation. This is an rtx code.
|
|
* Mode.
|
|
* Operands 0, 1 and 2. These point to other hash table elements.
|
|
|
|
So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
|
|
first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
|
|
points to. Then we put these elts into operands 0 and 1 of a new elt.
|
|
We put PLUS and SI into the new elt.
|
|
|
|
Registers and mem refs would never be entered into the table as such.
|
|
However, the values they contain would be entered. There would be a
|
|
table indexed by regno which points at the hash entry for the value in
|
|
that reg.
|
|
|
|
The hash entry index now plays the role of a qty number.
|
|
We still need qty_first_reg, reg_next_eqv, etc. to record which regs
|
|
share a particular qty.
|
|
|
|
When a reg is used whose contents are unknown, we need to create a
|
|
hash table entry whose contents say "unknown", as a place holder for
|
|
whatever the reg contains. If that reg is added to something, then
|
|
the hash entry for the sum will refer to the "unknown" entry. Use
|
|
UNKNOWN for the rtx code in this entry. This replaces make_new_qty.
|
|
|
|
For a constant, a unique hash entry would be made based on the
|
|
value of the constant.
|
|
|
|
What about MEM? Each time a memory address is referenced, we need a
|
|
qty (a hash table elt) to represent what is in it. (Just as for a
|
|
register.) If this isn't known, create one, just as for a reg whose
|
|
contents are unknown.
|
|
|
|
We need a way to find all mem refs that still contain a certain value.
|
|
Do this with a chain of hash elts (for memory addresses) that point to
|
|
locations that hold the value. The hash elt for the value itself should
|
|
point to the start of the chain. It would be good for the hash elt
|
|
for an address to point to the hash elt for the contents of that address
|
|
(but this ptr can be null if the contents have never been entered).
|
|
|
|
With this data structure, nothing need ever be invalidated except
|
|
the lists of which regs or mems hold a particular value. It is easy
|
|
to see if there is a reg or mem that is equiv to a particular value.
|
|
If the value is constant, it is always explicitly constant.
|
|
|
|
* Support more general tail-recursion among different functions.
|
|
|
|
This might be possible under certain circumstances, such as when
|
|
the argument lists of the functions have the same lengths.
|
|
Perhaps it could be done with a special declaration.
|
|
|
|
You would need to verify in the calling function that it does not
|
|
use the addresses of any local variables and does not use setjmp.
|
|
|
|
* Put short statics vars at low addresses and use short addressing mode?
|
|
|
|
Useful on the 68000/68020 and perhaps on the 32000 series,
|
|
provided one has a linker that works with the feature.
|
|
This is said to make a 15% speedup on the 68000.
|
|
|
|
* Keep global variables in registers.
|
|
|
|
Here is a scheme for doing this. A global variable, or a local variable
|
|
whose address is taken, can be kept in a register for an entire function
|
|
if it does not use non-constant memory addresses and (for globals only)
|
|
does not call other functions. If the entire function does not meet
|
|
this criterion, a loop may.
|
|
|
|
The VAR_DECL for such a variable would have to have two RTL expressions:
|
|
the true home in memory, and the pseudo-register used temporarily.
|
|
It is necessary to emit insns to copy the memory location into the
|
|
pseudo-register at the beginning of the function or loop, and perhaps
|
|
back out at the end. These insns should have REG_EQUIV notes so that,
|
|
if the pseudo-register does not get a hard register, it is spilled into
|
|
the memory location which exists in any case.
|
|
|
|
The easiest way to set up these insns is to modify the routine
|
|
put_var_into_stack so that it does not apply to the entire function
|
|
(sparing any loops which contain nothing dangerous) and to call it at
|
|
the end of the function regardless of where in the function the
|
|
address of a local variable is taken. It would be called
|
|
unconditionally at the end of the function for all relevant global
|
|
variables.
|
|
|
|
For debugger output, the thing to do is to invent a new binding level
|
|
around the appropriate loop and define the variable name as a register
|
|
variable with that scope.
|
|
|
|
* Live-range splitting.
|
|
|
|
Currently a variable is allocated a hard register either for the full
|
|
extent of its use or not at all. Sometimes it would be good to
|
|
allocate a variable a hard register for just part of a function; for
|
|
example, through a particular loop where the variable is mostly used,
|
|
or outside of a particular loop where the variable is not used. (The
|
|
latter is nice because it might let the variable be in a register most
|
|
of the time even though the loop needs all the registers.)
|
|
|
|
It might not be very hard to do this in global.c when a variable
|
|
fails to get a hard register for its entire life span.
|
|
|
|
The first step is to find a loop in which the variable is live, but
|
|
which is not the whole life span or nearly so. It's probably best to
|
|
use a loop in which the variable is heavily used.
|
|
|
|
Then create a new pseudo-register to represent the variable in that loop.
|
|
Substitute this for the old pseudo-register there, and insert move insns
|
|
to copy between the two at the loop entry and all exits. (When several
|
|
such moves are inserted at the same place, some new feature should be
|
|
added to say that none of those registers conflict merely because of
|
|
overlap between the new moves. And the reload pass should reorder them
|
|
so that a store precedes a load, for any given hard register.)
|
|
|
|
After doing this for all the reasonable candidates, run global-alloc
|
|
over again. With luck, one of the two pseudo-registers will be fit
|
|
somewhere. It may even have a much higher priority due to its reduced
|
|
life span.
|
|
|
|
There will be no room in general for the new pseudo-registers in
|
|
basic_block_live_at_start, so there will need to be a second such
|
|
matrix exclusively for the new ones. Various other vectors indexed by
|
|
register number will have to be made bigger, or there will have to be
|
|
secondary extender vectors just for global-alloc.
|
|
|
|
A simple new feature could arrange that both pseudo-registers get the
|
|
same stack slot if they both fail to get hard registers.
|
|
|
|
Other compilers split live ranges when they are not connected, or
|
|
try to split off pieces `at the edge'. I think splitting around loops
|
|
will provide more speedup.
|
|
|
|
Creating a fake binding block and a new like-named variable with
|
|
shorter life span and different address might succeed in describing
|
|
this technique for the debugger.
|
|
|
|
* Detect dead stores into memory?
|
|
|
|
A store into memory is dead if it is followed by another store into
|
|
the same location; and, in between, there is no reference to anything
|
|
that might be that location (including no reference to a variable
|
|
address).
|
|
|
|
* Loop optimization.
|
|
|
|
Strength reduction and iteration variable elimination could be
|
|
smarter. They should know how to decide which iteration variables are
|
|
not worth making explicit because they can be computed as part of an
|
|
address calculation. Based on this information, they should decide
|
|
when it is desirable to eliminate one iteration variable and create
|
|
another in its place.
|
|
|
|
It should be possible to compute what the value of an iteration
|
|
variable will be at the end of the loop, and eliminate the variable
|
|
within the loop by computing that value at the loop end.
|
|
|
|
When a loop has a simple increment that adds 1,
|
|
instead of jumping in after the increment,
|
|
decrement the loop count and jump to the increment.
|
|
This allows aob insns to be used.
|
|
|
|
* Using constraints on values.
|
|
|
|
Many operations could be simplified based on knowledge of the
|
|
minimum and maximum possible values of a register at any particular time.
|
|
These limits could come from the data types in the tree, via rtl generation,
|
|
or they can be deduced from operations that are performed. For example,
|
|
the result of an `and' operation one of whose operands is 7 must be in
|
|
the range 0 to 7. Compare instructions also tell something about the
|
|
possible values of the operand, in the code beyond the test.
|
|
|
|
Value constraints can be used to determine the results of a further
|
|
comparison. They can also indicate that certain `and' operations are
|
|
redundant. Constraints might permit a decrement and branch
|
|
instruction that checks zeroness to be used when the user has
|
|
specified to exit if negative.
|
|
|
|
* Smarter reload pass.
|
|
|
|
The reload pass as currently written can reload values only into registers
|
|
that are reserved for reloading. This means that in order to use a
|
|
register for reloading it must spill everything out of that register.
|
|
|
|
It would be straightforward, though complicated, for reload1.c to keep
|
|
track, during its scan, of which hard registers were available at each
|
|
point in the function, and use for reloading even registers that were
|
|
free only at the point they were needed. This would avoid much spilling
|
|
and make better code.
|
|
|
|
* Change the type of a variable.
|
|
|
|
Sometimes a variable is declared as `int', it is assigned only once
|
|
from a value of type `char', and then it is used only by comparison
|
|
against constants. On many machines, better code would result if
|
|
the variable had type `char'. If the compiler could detect this
|
|
case, it could change the declaration of the variable and change
|
|
all the places that use it.
|
|
|
|
* Better handling for very sparse switches.
|
|
|
|
There may be cases where it would be better to compile a switch
|
|
statement to use a fixed hash table rather than the current
|
|
combination of jump tables and binary search.
|
|
|
|
* Order of subexpressions.
|
|
|
|
It might be possible to make better code by paying attention
|
|
to the order in which to generate code for subexpressions of an expression.
|
|
|
|
* More code motion.
|
|
|
|
Consider hoisting common code up past conditional branches or
|
|
tablejumps.
|
|
|
|
* Trace scheduling.
|
|
|
|
This technique is said to be able to figure out which way a jump
|
|
will usually go, and rearrange the code to make that path the
|
|
faster one.
|
|
|
|
* Distributive law.
|
|
|
|
The C expression *(X + 4 * (Y + C)) compiles better on certain
|
|
machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
|
|
modes. It may be tricky to determine when, and for which machines, to
|
|
use each alternative.
|
|
|
|
Some work has been done on this, in combine.c.
|
|
|
|
* Can optimize by changing if (x) y; else z; into z; if (x) y;
|
|
if z and x do not interfere and z has no effects not undone by y.
|
|
This is desirable if z is faster than jumping.
|
|
|
|
* For a two-insn loop on the 68020, such as
|
|
foo: movb a2@+,a3@+
|
|
jne foo
|
|
it is better to insert dbeq d0,foo before the jne.
|
|
d0 can be a junk register. The challenge is to fit this into
|
|
a portable framework: when can you detect this situation and
|
|
still be able to allocate a junk register?
|
|
|
|
2. Simpler porting.
|
|
|
|
Right now, describing the target machine's instructions is done
|
|
cleanly, but describing its addressing mode is done with several
|
|
ad-hoc macro definitions. Porting would be much easier if there were
|
|
an RTL description for addressing modes like that for instructions.
|
|
Tools analogous to genflags and genrecog would generate macros from
|
|
this description.
|
|
|
|
There would be one pattern in the address-description file for each
|
|
kind of addressing, and this pattern would have:
|
|
|
|
* the RTL expression for the address
|
|
* C code to verify its validity (since that may depend on
|
|
the exact data).
|
|
* C code to print the address in assembler language.
|
|
* C code to convert the address into a valid one, if it is not valid.
|
|
(This would replace LEGITIMIZE_ADDRESS).
|
|
* Register constraints for all indeterminates that appear
|
|
in the RTL expression.
|
|
|
|
3. Other languages.
|
|
|
|
Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
|
|
desirable.
|
|
|
|
Pascal, Modula-2 and Ada require the implementation of functions
|
|
within functions. Some of the mechanisms for this already exist.
|
|
|
|
4. More extensions.
|
|
|
|
* Generated unique labels. Have some way of generating distinct labels
|
|
for use in extended asm statements. I don't know what a good syntax would
|
|
be.
|
|
|
|
* A way of defining a structure containing a union, in which the choice of
|
|
union alternative is controlled by a previous structure component.
|
|
|
|
Here is a possible syntax for this.
|
|
|
|
struct foo {
|
|
enum { INT, DOUBLE } code;
|
|
auto union { case INT: int i; case DOUBLE: double d;} value : code;
|
|
};
|
|
|
|
* Allow constructor expressions as lvalues, like this:
|
|
|
|
(struct foo) {a, b, c} = foo();
|
|
|
|
This would call foo, which returns a structure, and then store the
|
|
several components of the structure into the variables a, b, and c.
|
|
|
|
5. Generalize the machine model.
|
|
|
|
* Some new compiler features may be needed to do a good job on machines
|
|
where static data needs to be addressed using base registers.
|
|
|
|
* Some machines have two stacks in different areas of memory, one used
|
|
for scalars and another for large objects. The compiler does not
|
|
now have a way to understand this.
|
|
|
|
6. Useful warnings.
|
|
|
|
* Warn about statements that are undefined because the order of
|
|
evaluation of increment operators makes a big difference. Here is an
|
|
example:
|
|
|
|
*foo++ = hack (*foo);
|
|
|
|
7. Better documentation of how GCC works and how to port it.
|
|
|
|
Here is an outline proposed by Allan Adler.
|
|
|
|
I. Overview of this document
|
|
II. The machines on which GCC is implemented
|
|
A. Prose description of those characteristics of target machines and
|
|
their operating systems which are pertinent to the implementation
|
|
of GCC.
|
|
i. target machine characteristics
|
|
ii. comparison of this system of machine characteristics with
|
|
other systems of machine specification currently in use
|
|
B. Tables of the characteristics of the target machines on which
|
|
GCC is implemented.
|
|
C. A priori restrictions on the values of characteristics of target
|
|
machines, with special reference to those parts of the source code
|
|
which entail those restrictions
|
|
i. restrictions on individual characteristics
|
|
ii. restrictions involving relations between various characteristics
|
|
D. The use of GCC as a cross-compiler
|
|
i. cross-compilation to existing machines
|
|
ii. cross-compilation to non-existent machines
|
|
E. Assumptions which are made regarding the target machine
|
|
i. assumptions regarding the architecture of the target machine
|
|
ii. assumptions regarding the operating system of the target machine
|
|
iii. assumptions regarding software resident on the target machine
|
|
iv. where in the source code these assumptions are in effect made
|
|
III. A systematic approach to writing the files tm.h and xm.h
|
|
A. Macros which require special care or skill
|
|
B. Examples, with special reference to the underlying reasoning
|
|
IV. A systematic approach to writing the machine description file md
|
|
A. Minimal viable sets of insn descriptions
|
|
B. Examples, with special reference to the underlying reasoning
|
|
V. Uses of the file aux-output.c
|
|
VI. Specification of what constitutes correct performance of an
|
|
implementation of GCC
|
|
A. The components of GCC
|
|
B. The itinerary of a C program through GCC
|
|
C. A system of benchmark programs
|
|
D. What your RTL and assembler should look like with these benchmarks
|
|
E. Fine tuning for speed and size of compiled code
|
|
VII. A systematic procedure for debugging an implementation of GCC
|
|
A. Use of GDB
|
|
i. the macros in the file .gdbinit for GCC
|
|
ii. obstacles to the use of GDB
|
|
a. functions implemented as macros can't be called in GDB
|
|
B. Debugging without GDB
|
|
i. How to turn off the normal operation of GCC and access specific
|
|
parts of GCC
|
|
C. Debugging tools
|
|
D. Debugging the parser
|
|
i. how machine macros and insn definitions affect the parser
|
|
E. Debugging the recognizer
|
|
i. how machine macros and insn definitions affect the recognizer
|
|
|
|
ditto for other components
|
|
|
|
VIII. Data types used by GCC, with special reference to restrictions not
|
|
specified in the formal definition of the data type
|
|
IX. References to the literature for the algorithms used in GCC
|
|
|