在研究gcc的编译 链接时,遇到的比较有意思的东西。号称最小的程序。
这个东西来源于书 程序员的自我修养(链接 装载与库) P124.
传统的helloworld
#include<stdio.h> int main() { printf("hello world\n"); return 0; }
对于这个代码的运行需要glibc库等很多库,可以利用gcc -static --verbose -fno-builtin hello.c将编译链接的中间过程打印出来
所以下面的helloworld几个不同点
1、脱离c语言运行库
2、一般c代码的函数入口在库里面的_start,然后调用main函数,此次直接使用c中nomain作为整个程序的入口
3、一般c代码的函数会生成好多段text 段 data段,此次将所有的段合并到一个我们自己命名的tinytext段
tinyhelloworld.c代码如下
char * str = "hello world from tinyos\n"; void printf() { asm("movl $24,%%edx\n\t" "movl %0,%%ecx\n\t" "movl $0,%%ebx\n\t" "movl $4,%%eax\n\t" "int $0x80\n\t" ::"r"(str):"edx","ecx","ebx"); } void exit() { asm("movl $42,%ebx\n\t" "movl $1,%eax\n\t" "int $0x80\n\t"); } int nomain() { printf(); exit(); }
这段源代码为书中,分析一下
这里的printf 是直接使用linux的write系统调用exit也是如此。
gcc -c -fno-builtin tinyhelloworld.c只编译不链接
-fno-builtin参数为不使用gcc内部函数优化
ld -static -e nomain -o tinyhelloworld tinyhelloworld.o
使用ld链接,
-static是静态链接
-e nomain 是修改程序入口为 nomain
这里说明一下
在链接脚本中也可以设置程序入口,ENTRY(name),会有一个优先级。从高到底为:
1、ld -e参数命令
2、链接脚本ENTRY(name)
3、如果定义了_start则以此为入口
4、如果有text段,则text的第一个字节地址
<span style="color:#FF0000;">readelf -h tinyhelloworld</span> ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement,little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 <span style="color:#FF0000;">Entry point address: 0x80480c4</span> Start of program headers: 52 (bytes into file) Start of section headers: 472 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 9 Section header string table index: 6
从上面可知程序的入口地址为0x80480c4
<span style="color:#FF0000;">objdump -h tinyhelloworld</span> tinyhelloworld: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn <span style="color:#FF0000;">0 .text 0000003f 08048094 </span> 08048094 00000094 2**2 CONTENTS,ALLOC,LOAD,READONLY,CODE 1 .rodata 00000019 080480d3 080480d3 000000d3 2**0 CONTENTS,DATA 2 .eh_frame 0000007c 080480ec 080480ec 000000ec 2**2 CONTENTS,DATA 3 .data 00000004 08049168 08049168 00000168 2**2 CONTENTS,DATA 4 .comment 0000002a 00000000 00000000 0000016c 2**0 CONTENTS,READONLYtext段的开始地址为0x08048094
<span style="color:#FF0000;">objdump -S tinyhelloworld</span> tinyhelloworld: file format elf32-i386 Disassembly of section .text: <span style="color:#FF0000;">08048094</span> <printf>: 8048094: 55 push %ebp 8048095: 89 e5 mov %esp,%ebp 8048097: 53 push %ebx 8048098: a1 68 91 04 08 mov 0x8049168,%eax 804809d: ba 18 00 00 00 mov $0x18,%edx 80480a2: 89 c1 mov %eax,%ecx 80480a4: bb 00 00 00 00 mov $0x0,%ebx 80480a9: b8 04 00 00 00 mov $0x4,%eax 80480ae: cd 80 int $0x80 80480b0: 5b pop %ebx 80480b1: 5d pop %ebp 80480b2: c3 ret 080480b3 <exit>: 80480b3: 55 push %ebp 80480b4: 89 e5 mov %esp,%ebp 80480b6: bb 2a 00 00 00 mov $0x2a,%ebx 80480bb: b8 01 00 00 00 mov $0x1,%eax 80480c0: cd 80 int $0x80 80480c2: 5d pop %ebp 80480c3: c3 ret <span style="color:#FF0000;">080480c4 </span><nomain>: 80480c4: 55 push %ebp 80480c5: 89 e5 mov %esp,%ebp 80480c7: e8 c8 ff ff ff call 8048094 <printf> 80480cc: e8 e2 ff ff ff call 80480b3 <exit> 80480d1: 5d pop %ebp 80480d2: c3 ret
将text'中反汇编是从nomain开始执行的,确实如此。
这是ld默认的链接顺序和法则。
为了改变链接顺序
lds文件为
cat tinyhelloworld.lds ENTRY(nomain) SECTIONS { . = 0x08048000 ; tinytext : {*(.text) *(.data) *(.rodata)} /DISCARD/ : {*(.comment) *(.eh_frame)} }ld -static -T tinyhelloworld.lds tinyhelloworld.o -o tinyhelloworld
<span style="color:#FF0000;"> readelf -h tinyhelloworld</span> ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement,little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 <span style="color:#FF0000;">Entry point address: 0x8048030</span> Start of program headers: 52 (bytes into file) Start of section headers: 4228 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 5 Section header string table index: 2这里看出程序入口地址为0x8048030就是nomain的地址。
<span style="color:#FF0000;">objdump -S tinyhelloworld</span> tinyhelloworld: file format elf32-i386 <span style="color:#FF0000;">Disassembly of section tinytext: 08048000 <printf>:</span> 8048000: 55 push %ebp 8048001: 89 e5 mov %esp,%ebp 8048003: 53 push %ebx 8048004: a1 40 80 04 08 mov 0x8048040,%eax 8048009: ba 18 00 00 00 mov $0x18,%edx 804800e: 89 c1 mov %eax,%ecx 8048010: bb 00 00 00 00 mov $0x0,%ebx 8048015: b8 04 00 00 00 mov $0x4,%eax 804801a: cd 80 int $0x80 804801c: 5b pop %ebx 804801d: 5d pop %ebp 804801e: c3 ret 0804801f <exit>: 804801f: 55 push %ebp 8048020: 89 e5 mov %esp,%ebp 8048022: bb 2a 00 00 00 mov $0x2a,%ebx 8048027: b8 01 00 00 00 mov $0x1,%eax 804802c: cd 80 int $0x80 804802e: 5d pop %ebp 804802f: c3 ret <span style="color:#FF0000;">08048030 <nomain>:</span> 8048030: 55 push %ebp 8048031: 89 e5 mov %esp,%ebp 8048033: e8 c8 ff ff ff call 8048000 <printf> 8048038: e8 e2 ff ff ff call 804801f <exit> 804803d: 5d pop %ebp 804803e: c3 ret ... 08048040 <str>: 8048040: 44 80 04 08 68 65 6c 6c 6f 20 77 6f 72 6c 64 20 D...hello world 8048050: 66 72 6f 6d 20 74 69 6e 79 6f 73 0a 00 from tinyos..
反汇编发现tinytext的地址是脚本上约束的,入口地址也是
-----------------------------------------------------------------------------------关于elf文件的入口地址的研究------------------------------
这个例子也是来源于程序员的自我修养
a.c
extern int shared; int main() { int a =100; swap(&a,&shared); }
b.c
int shared =1; void swap(int* a,int* b) { *a^=*b^=*a^=*b; }
gcc -c a.c b.c
生成a.o b.o
<span style="color:#FF0000;">objdump -h a.o</span> a.o: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000027 00000000 00000000 00000034 2**2 CONTENTS,RELOC,CODE 1 .data 00000000 00000000 00000000 0000005c 2**2 CONTENTS,DATA 2 .bss 00000000 00000000 00000000 0000005c 2**2 ALLOC 3 .comment 0000002b 00000000 00000000 0000005c 2**0 CONTENTS,READONLY 4 .note.GNU-stack 00000000 00000000 00000000 00000087 2**0 CONTENTS,READONLY 5 .eh_frame 00000038 00000000 00000000 00000088 2**2 CONTENTS,DATA
<span style="color:#FF0000;">objdump -h b.o</span> b.o: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn 0 .text 0000003a 00000000 00000000 00000034 2**2 CONTENTS,CODE 1 .data 00000004 00000000 00000000 00000070 2**2 CONTENTS,DATA 2 .bss 00000000 00000000 00000000 00000074 2**2 ALLOC 3 .comment 0000002b 00000000 00000000 00000074 2**0 CONTENTS,READONLY 4 .note.GNU-stack 00000000 00000000 00000000 0000009f 2**0 CONTENTS,READONLY 5 .eh_frame 0000003c 00000000 00000000 000000a0 2**2 CONTENTS,DATA应为没有链接 VMA和LMA都是0,只是标注每个段的size
ld a.o b.o -e main -o ab
<span style="color:#FF0000;">readelf -h ab</span> ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement,little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x8048094 Start of program headers: 52 (bytes into file) Start of section headers: 444 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 8 Section header string table index: 5
这里我们只关注Entry point address: 0x8048094这是实际程序虚拟地址入口,这个性息是从elf头中读到的
<span style="color:#FF0000;">objdump -h ab</span> ab: file format elf32-i386 Sections: Idx Name Size VMA LMA File off Algn <span style="color:#FF0000;"> 0 .text 00000062 08048094</span> 08048094 00000094 2**2 CONTENTS,CODE 1 .eh_frame 0000005c 080480f8 080480f8 000000f8 2**2 CONTENTS,DATA 2 .data 00000004 08049154 08049154 00000154 2**2 CONTENTS,DATA 3 .comment 0000002a 00000000 00000000 00000158 2**0 CONTENTS,READONLY
objdump看到text段的VMA就是这个地址,程序是从这个地址开始执行的,但是是不是一定从这个地址开始执行,倒不是,前面也讲过。
在链接脚本中也可以设置程序入口,ENTRY(name),会有一个优先级。从高到底为:
1、ld -e参数命令
2、链接脚本ENTRY(name)
3、如果定义了_start则以此为入口
4、如果有text段,则text的第一个字节地址
readelf -h tinyhelloworld ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement,little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Intel 80386 Version: 0x1 Entry point address: 0x80480c4 Start of program headers: 52 (bytes into file) Start of section headers: 472 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 9 Section header string table index: 6