26th January 2000
Al Riddoch <ajr@ecs.soton.ac.uk>

Up until now the only binary format supported by ELKS has been the basic
minix format which locates data bss and heap at address 0 in the data
segment, locates the stack at the top of the data segment, and effectively
hard sets the limit on the total memory used by the program at load time,
and wastes this memory most of the time.

Here is the diagram from memory.txt

+------------+--------------------------------------------------+
| data + bss |  heap ---> |                          <--- stack |
+------------+--------------------------------------------------+
^            ^            ^                                     ^
0x0  current->t_enddata  current->t_endbrk                  current->t_endstack

As of now I have implemented a new format which locates the stack below
the data and bss in reserved space at the bottom of the data segment,
and allows the heap to grow up indefinitely providing support for growing
holes is available in the mm code. This format uses already existing
features of the minix header to indicate the amount of reserved space,
and should be possible to add to the compiler tool chain with only minor
changes.

Here is a diagram representing the new format (env is the environment)

   current->t_begstack                       current->t_endseg
            v                                         v
+------------------+------------+---------------------+
|  |<--stack|  env | data + bss |     heap --> |      |
+------------------+------------+---------------------+
^                               ^              ^
0x0                    current->t_enddata  current->t_endbrk

and here is how the new labels apply to the old format

                                           current->t_begstack
                                                    v
+------------+--------------------------------------------+
| data + bss |  heap ----> |            |<--- stack | env |
+------------+--------------------------------------------+
^            ^             ^                              ^
0x0  current->t_enddata  current->t_endbrk       current->t_endseg

A check for what type a binary is can be done by checking if t_begstack
(stack pointer at the start of the process is) is greater than t_enddata.
If it is, this is old format, otherwise it is new format.

The basic minix header format is as follows:-

struct minix_exec_hdr
{
        unsigned long type;
#define MINIX_COMBID    0x04100301L
#define MINIX_SPLITID   0x04200301L     
#define MINIX_S_SPLITID 0x04600301L     
#define MINIX_DLLID     0x04A00301L     
        unsigned long hlen;
        unsigned long tseg;
        unsigned long dseg;
        unsigned long bseg;
        unsigned long unused;
        unsigned long chmem;
        unsigned long unused2; 
};

Header file a.out.h in the Dev86 package describe some optional extra fields
at the end of this structure, which can be described as a supplementary
header as follows:-

struct minix_supl_hdr
{
        long            msh_trsize;     /* text relocation size */
        long            msh_drsize;     /* data relocation size */
        long            msh_tbase;      /* text relocation base */
        long            msh_dbase;      /* data relocation base */
};

Binaries which are loaded into memory in the new format have this supplementary
header which is indicated by the hlen field of the main header which is 0x30
if the supplementary header is present. The msh_dbase field of the 
supplementary field is set to indicate the base address of the data segment
set at compile time. The exec code has been modified to check this field
if the supplementary header is present, and load the initialised
data at this address, initialising the stack to be below this address.

At the time of writing, the compiler did not explicitly support this format,
but binaries could be generated manually as follows.

1. Compile the program as usual specifying a data segment offset on the
   linker stage command line.

   e.g.

   bcc -X-D0x1000 -o prog prog.c

   -D0x1000 is a linker option telling it to use a data segment base address
   of 0x1000. -X is the bcc option to pass the rest to the linker.

2. Using a hex editor insert 4 long words (16 bytes) after the 8 long words
   (32 bytes) of the main header.

   e.g.

   01032004 20000000 b0000000 14000000
   04000000 00000000 18800000 00000000
+  00000000 00000000 00000000 00000000
   e95900c3 302e3134 2e350000 5589e557
   56e8....

3. Modify the second long word of the main header from 0x20 to 0x30 to indicate
   that the supplementary header is present.

4. Modify the fourth long word of the supplementary field so that it is equal
   to the data segment offest. In the example shown in 1 the offset was 0x1000.
   As a long word in little endian form this reads 00100000.

   01032004 20000000 b0000000 14000000
   04000000 00000000 18800000 00000000
-  00000000 00000000 00000000 00000000
+  00000000 00000000 00000000 00100000
   e95900c3 302e3134 2e350000 5589e557
   56e8....

The exec code currently contains debugging output that will report the value
it thinks is the data segment base address as found in the supplementary
header.
