Here is the most simplest C code:
// main.c int main() { return 0; }
After we do:
$ gcc main.c
we get a.out (Assembler Output), and if we open it with an editor, what we see is something like this:
^?ELF^A^A^A^@^@^@^@^@^@^@^@^@^B^@^C^@^A^@^@^@~@~B^D^H4^@^@^@ ... ^@^@^@^D^@^@^@/lib/ld-linux.so.2^@^@^D^@^@^@^P^@^@^@^A^@^@^@ GNU^@^@^@^@^@^B^@^@^@^F^@^@^@ ...... ....
The a.out is one of the output file formats. But as we see from the editor it's now written using ELF (Executable and Linkable Format) format in most of the linux systems.
Actually, executable code is created by the compiler and the linker, but it's the linker that puts things in the binary format. On Linux, this format is typically ELF, on Windows it's PE (Portable Executable), a modified version of the Unix COFF file format, including object code and DLLs, and on Mac OS X it's Mach-O.
It contains metadata, and multiple sections of code and data. They will be loaded into memory. The a.out is written that's can not be easily read , however, if we use objdump, we can see how the a.out looks like after being translated machine code for the main() function.
$ objdump -D a.out | grep -A20 main.: 08048354: 8048354: 8d 4c 24 04 lea 0x4(%esp),%ecx 8048358: 83 e4 f0 and $0xfffffff0,%esp 804835b: ff 71 fc pushl 0xfffffffc(%ecx) 804835e: 55 push %ebp 804835f: 89 e5 mov %esp,%ebp 8048361: 51 push %ecx 8048362: b8 00 00 00 00 mov $0x0,%eax 8048367: 59 pop %ecx 8048368: 5d pop %ebp 8048369: 8d 61 fc lea 0xfffffffc(%ecx),%esp 804836c: c3 ret 804836d: 90 nop 804836e: 90 nop 804836f: 90 nop 08048370 <__libc_csu_fini>: 8048370: 55 push %ebp 8048371: 89 e5 mov %esp,%ebp 8048373: 5d pop %ebp 8048374: c3 ret
The binary a.out's instructions are written in machine language that CPU can understand. Compilers are designed to translate C code into machine language for a variety of processor architectures. In our case, the processor is in a family of using x86 architecture (xeon)
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz stepping : 6 cpu MHz : 1995.080 cache size : 6144 KB ....
Each architecture has a different machine language, so the compiler acts as a middle man, translating C code into machine language for the target architecture.
Let's look more detail about out compiled executable binary for the x86 architecture.
The objdump here is piped into grep so that the output has only show 20 lines after the main.:. In the output, each byte is represented in hexadecimal.
The hexadecimal numbers, starting with 0x08048354 on the far left, are memory addresses. We'll look each machine instructions later, but for now, let's look at the register info using GDB:
$ gcc -g main.c $ gdb -q ./a.out Using host libthread_db library "/lib/libthread_db.so.1". (gdb) break main Breakpoint 1 at 0x8048362: file main.c, line 3. (gdb) run Starting program: /home/a.out Breakpoint 1, main () at main.c:3 3 return 0; (gdb) info registers eax 0xbfec4204 -1075035644 ecx 0xbfec4180 -1075035776 edx 0x1 1 ebx 0x4a571ff4 1247223796 esp 0xbfec4164 0xbfec4164 ebp 0xbfec4168 0xbfec4168 esi 0x4a430ca0 1245908128 edi 0x0 0 eip 0x8048362 0x8048362eflags 0x282 [ SF IF ] cs 0x73 115 ss 0x7b 123 ds 0x7b 123 es 0x7b 123 fs 0x0 0 gs 0x33 51 (gdb)
The bit of the machine language instructions must be put into memory, and the layout of the memory is shown in the next section.
The picture above is the memory diagram, in which memory is divided into several regions.
Local variables declared as part of a procedure or function appear on the
top part of the diagram, which represents the stack storage or automatic storage.
Also, stack is the place where function parameters are stored. Stack is also used for storing the return address of the calling functions, and stack keeps the register contents and return address when an interrupt service routine is called.
Modern architectures typically assign lower memory addresses to addresses in the heap than they do to addresses the stack.
We have other regions in the diagram:
Data region stores global/static variables. When we start a C++ program, the compiler sets aside memory for our code (code storage or text storage), and the compiled assembly resides in the Code segment.
In C, statically-allocated variables without an explicit initializer are initialized to zero or a null pointer. Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits. Hence, the bss (block started by symbol/block storage segment) section typically includes all uninitialized variables declared at the file level (outside of any function) as well as uninitialized local variables declared with the static keyword.
The rest of the computer's memory is available for other uses. It is free. It is free store, and also called heap. Dynamically allocated memory created using the new/malloc operator appears on the bottom of the diagram, which represents the region of memory
called the heap.
Heap memory, also known as dynamic memory, is an alternative to local stack memory. Local memory is so automatic that it is allocated automatically on function call and it is deallocated automatically when a function exits. Heap memory is different in every way. The programmer explicitly requests the allocation of a memory block of a particular size, and the block continues to be allocated until the programmer explicitly requests that it be deallocated. Nothing happens automatically. So the programmer has much greater control of memory, but with greater responsibility since the memory must now be actively managed.
The advantages of heap memory:
The disadvantages of heap memory:
However, there are many problems that can only be solved with heap memory, so that's that way it has to be. In languages with garbage collectors such as LISP and Java, the above disadvantages are mostly eliminated. The garbage collector takes over most of the responsibility.
The heap is a large area of memory available for use by the program. The program can request blocks of memory for its use within the heap. In order to allocate a block of some size, the program makes an explicit request by calling the heap allocation function. The allocation function reserves a block of memory of the requested size in the heap and returns a pointer to it.
Suppose a program makes three allocation requests to 25 allocate memory to hold three separate images in the heap each of which takes 1024 bytes of memory.
Each allocation request reserves a contiguous area of the requested size in the heap and returns a pointer to that new block to the program. Since each block is always referred to by a pointer, the block always plays the role of an object that the pointer can points to and the program always manipulates its heap blocks through pointers. The heap block pointers are sometimes known as base address pointers since by convention they point to the base, lowest address byte, of the block.
In this example, the three blocks have been allocated contiguously starting at the bottom of the heap, and each block is 1024 bytes in size as requested. In reality, the heap manager can allocate the blocks wherever it wants in the heap so long as the blocks do not overlap and they are at least the requested size. At any particular moment, some areas in the heap have been allocated to the program, and so are in use. Other areas have yet to be committed and so are free and are available to satisfy allocation requests. The heap manager has its own, private data structures to record what areas of the heap are committed to what purpose at any moment The heap manager satisfies each allocation request from the pool of free memory and updates its private data structures to record which areas of the heap are in use.
When the program is finished using a block of memory, it makes an explicit deallocation request to indicate to the heap manager that the program is now finished with that block.
The heap manager updates its private data structures to show that the area of memory occupied by the block is free again and so may be reused to satisfy future allocation requests.
After the deallocation, the pointer continues to point to the now deallocated block. The program must not access the deallocated object. The pointer is there, but it must not be used. Sometimes the code will set the pointer to NULL immediately after the deallocation to make explicit the fact that it is no longer valid.
Let's look at the following example and see how the local variables are set in the stack area:
void f1() { int a; short b[4]; double c; f2(); f3(); }
When we call f1() function, space needs to be set aside. The following picture shows activation records/stack frame:
When the function f1() is called, the stack pointer will be decremented by 20 bytes which is the size of the variables of f1().
void f2() { int x; char *y; char *z[2]; f3(); } void f3() { double m[3]; int n; }
The following diagram shows the movement of stack pointer, sp: f1()->f1.f2()->f2.f3()->f2.f3() exits->f1.f2() exits.
After the f1.f2(), when we call f1.f3(), the variables of the function f3() simply overwrites them onto the area where the variables of f2() were.
This section introduces assembly, not the real one but the conceptual assembly just to learn how the stack is manipulated.
So, in this section, we're not going to use real names of registers such as Accumulator (EAX), Counter (ECX), Data (EDX), Base (EBX), Stack Pointer (ESP), Base Pointer (EBP), Source Index (ESI), and Destination Index (EDI) registers. But sometimes we may use Stack Pointer (ESP) or Instruction Pointer (EIP) registers, if necessary.
All the information in the code segment corresponds to the assembly code that was compiled to from our C/C++.
We're going to assume our processors have 32 of register set with 4-byte figure that has really fast access to. Registers themselves are electronically connected to the RAM. Every register draw/flush from/to RAM.
ALU does basic arithmetic (addition, subtraction, multiplication, division, shifting, and masking). In the picture, we're assuming that ALU is connected only to register not to RAM directly. That means that all the meaningful mathematical calculations have to actually be done in register. We have to take something from the memory and loaded into the register in order to add something to it. They can optimize just the load and store between register and RAM. So, any mathematical operations such as i++ or i+1 is to load the variables wherever they are either from heap or stack into registers and do the math and put the results into some other registers and flush them out to where belongs in memory.
Suppose we have:
i = 17; j = 20; j += i;
i = 17 is loaded into R1 and j = 10 is loaded into R2. j+i is compiles into assembly code. Assembly code is the recipe that knows how to load i and j into register set and do the addition and the flush the result back out to the same space that j occupies. What probably happen is that i will be loaded into R1 and j will be loaded into R2. We can add 17 to 20 and store result into R3 because R1 and R2 are connected to ALU which is capable of addition for us, and after we synthesize the result at R3. We can flush it back out to j. Note that the whole process is composed of 4 steps:
Let's write the code in mock assembly for the following code:
int i; int j; i = 10; j = i + 7; j++;
The compiles into the following mock assembly:
// i = 10; M[R1+4] = 10; // store operation // j = i + 1; R2 = M[R1+4]; // load operation R3 = R2 + 7; // ALU operation M[R1] = R3; // store operation // j++; R2 = M[R1] R2 = R2 + 1; M[R1] = R2;
R1 is the special dedicated register which stores the base address of the variables. M stands for the RAM and the value of 10 is stored 4 byte offset from the base address as M[R1+4] and then loaded into R2. After the addition of 7 at ALU, the result will be stored in the register R3 and flushed into the memory space with the base address.
Let's do an example that does not always deal with 4-byte quantities.
int i; short s1; short s2;
So, the activation record looks like this:
i = 200; => [R1+4] = 200; s1 = i; => R2 = M[R1+4]; M[R1+2] = .2 R2; // .2 two-byte s2 = s1 + 1; => R2 = .2 M[R1+1]; R3 = R2 + 1; M[R1] = .2 R3;