Before Main()

Overview

In this article I will go over what happens to a program, specifically what an already compiled binary program does prior to executing the main() function.  The main() function is typically referred to as the entry point into a program. You can follow along with the following discussion if you have Ubuntu 16 and gdb installed. To assist you in better understanding what is happening, I have installed the binary debugging symbols accessible via the Intro article at Intro

Getting Started

Here is the program we will be compiling.

#include <stdio.h>

int foo();

int main(int argc, char**argv) {
    int count = 5;
    int newCount = foo(count);
    foo(newCount);

    return 0;
}

int foo(int count) {
    printf("Count is %d \n", count);
    return count + 1;
}

Let's compile this program with the symbols:

gcc -gdwarf-5 reverseEngineer.c

Gdb Time

I will be using the binary compiled above with our symbols in my example. Prior to getting started I wanted to explain what happens prior the execution of the actual program from OS level.

When you run a program, the shell or gui calls execve() which executes the linux system call execve()...To summarize, it will set up a stack for you, and push onto it argc, argv, and envp. The file descriptions 0, 1, and 2, (stdin, stdout, stderr), are left to whatever the shell set them to. The loader does much work for you setting up your relocations, and as we'll see much later, calling your preinitializers. When everything is ready, control is handed to your program by calling _start(). Source: Link.

Let's go ahead and get started with the command line. First, we need to start gdb and allow gdb to look past main to do this:

[0][mike@virtual-box-home.] [18:18:37] [~/code/c]
>gdb a.out -q
Reading symbols from a.out...done.
(gdb) set backtrace past-entry
(gdb) set backtrace past-main
(gdb) show backtrace past-entry
Whether backtraces should continue past the entry point of a program is on.
(gdb) show backtrace past-main
Whether backtraces should continue past "main" is on.

Now that we have set the proper flags we need to figure out where to put our break point. To do this we could use objdump -D a.out and try to figure out what is the first thing executed but I prefer to just set a breakpoint and main and run and print the backtrace.

(gdb) break main
Breakpoint 1 at 0x400535: file reverseEngineer.c, line 6.
(gdb) run
Starting program: /home/mike/code/c/a.out 

Breakpoint 1, main (argc=1, argv=0x7fffffffe438) at reverseEngineer.c:6
6	    int count = 5;
(gdb) bt
#0  main (argc=1, argv=0x7fffffffe438) at reverseEngineer.c:6
#1  0x00007ffff7a2d830 in __libc_start_main (main=0x400526 <main>, argc=1, argv=0x7fffffffe438, init=<optimized out="">, fini=<optimized out="">, rtld_fini=<optimized out="">, 
    stack_end=0x7fffffffe428) at ../csu/libc-start.c:291
#2  0x0000000000400459 in _start ()

Looking at the above we see the first thing to happen is _start() followed by the __libc_start_main() so let's put our breakpoints on these two functions and run it again.

(gdb) run
Starting program: /home/mike/code/c/a.out 

Breakpoint 1, main (argc=1, argv=0x7fffffffe438) at reverseEngineer.c:6
6	    int count = 5;
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/mike/code/c/a.out 

Breakpoint 2, 0x0000000000400430 in _start ()

Getting Into It

Now that we are stopped at our break point _start() let's check the current stack frame, disassemble _start() and print out registers to see what is going on. The stack frame is at stack level zero, which makes sense seeing as how stacks are last in first out (LIFO) queue. The rip (register instruction pointer) is currently set to _start() it has none saved as this is the first function to be called. The arglist is the argv passed in into the function at runtime.


(gdb) info frame
Stack level 0, frame at 0x0:
 rip = 0x400430 in _start; saved rip = <unavailable>
 Outermost frame: outermost
 Arglist at 0x7fffffffe428, args: 
 Locals at 0x7fffffffe428, Previous frame's sp is 0x7fffffffe438

We can see the args are empty courtesy of gdb we can for fun try to figure out what is going on with the Locals using the x tool.

(gdb) x/8xw 0x7fffffffe438
0x7fffffffe438:	0xffffe7e8	0x00007fff	0x00000000	0x00000000
0x7fffffffe448:	0xffffe800	0x00007fff	0xffffe80b	0x00007fff

As you can see......Looking at the registers we can see rcx is currently the same value as the

(gdb) info registers
rax            0x1c	28
rbx            0x0	0
rcx            0x7fffffffe448	140737488348232
rdx            0x7ffff7de7ab0	140737351940784
rsi            0x1	1
rdi            0x7ffff7ffe168	140737354129768
rbp            0x0	0x0
rsp            0x7fffffffe430	0x7fffffffe430
r8             0x7ffff7ffe6f8	140737354131192
r9             0x0	0
r10            0x4f	79
r11            0x7ffff7b95300	140737349505792
r12            0x400430	4195376
r13            0x7fffffffe430	140737488348208
r14            0x0	0
r15            0x0	0
rip            0x400430	0x400430 <_start>
eflags         0x202	[ IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

Now looking at our disassembly we can tell that it seems to moving some information into some registers and getting ready to call the __libc_start_main.

(gdb) disas /rm _start
Dump of assembler code for function _start:
=> 0x0000000000400430 <+0>:	31 ed	xor    %ebp,%ebp
   0x0000000000400432 <+2>:	49 89 d1	mov    %rdx,%r9
   0x0000000000400435 <+5>:	5e	pop    %rsi
   0x0000000000400436 <+6>:	48 89 e2	mov    %rsp,%rdx
   0x0000000000400439 <+9>:	48 83 e4 f0	and    $0xfffffffffffffff0,%rsp
   0x000000000040043d <+13>:	50	push   %rax
   0x000000000040043e <+14>:	54	push   %rsp
   0x000000000040043f <+15>:	49 c7 c0 00 06 40 00	mov    $0x400600,%r8
   0x0000000000400446 <+22>:	48 c7 c1 90 05 40 00	mov    $0x400590,%rcx
   0x000000000040044d <+29>:	48 c7 c7 26 05 40 00	mov    $0x400526,%rdi
   0x0000000000400454 <+36>:	e8 b7 ff ff ff	callq  0x400410 <__libc_start_main@plt>
   0x0000000000400459 <+41>:	f4	hlt    
End of assembler dump.

Reverse engineering that seems pretty tedious to me. The best thing about linux is that I don't need to step through this line by line picking apart assembly. I can actually go pull the source code and figure it out that way. So let's do that:

wget http://archive.ubuntu.com/ubuntu/pool/main/g/glibc/glibc_2.26.orig.tar.xz .
tar xpvf glibc_2.26.orig.tar.xz

After a few grep searches look what I come across

[0][mike@virtual-box-home.] [19:10:28] [~/code/c]
>cat glibc-2.26/sysdeps/x86_64/start.S
/* snipped comments */
#include <sysdep.h>

ENTRY (_start)
	/* Clearing frame pointer is insufficient, use CFI.  */
	cfi_undefined (rip)
	/* Clear the frame pointer.  The ABI suggests this be done, to mark
	   the outermost frame obviously.  */
	xorl %ebp, %ebp

	/* Extract the arguments as encoded on the stack and set up
	   the arguments for __libc_start_main (int (*main) (int, char **, char **),
		   int argc, char *argv,
		   void (*init) (void), void (*fini) (void),
		   void (*rtld_fini) (void), void *stack_end).
	   The arguments are passed via registers and on the stack:
	main:		%rdi
	argc:		%rsi
	argv:		%rdx
	init:		%rcx
	fini:		%r8
	rtld_fini:	%r9
	stack_end:	stack.	*/

	mov %RDX_LP, %R9_LP	/* Address of the shared library termination
				   function.  */
#ifdef __ILP32__
	mov (%rsp), %esi	/* Simulate popping 4-byte argument count.  */
	add $4, %esp
#else
	popq %rsi		/* Pop the argument count.  */
#endif
	/* argv starts just at the current stack top.  */
	mov %RSP_LP, %RDX_LP
	/* Align the stack to a 16 byte boundary to follow the ABI.  */
	and  $~15, %RSP_LP

	/* Push garbage because we push 8 more bytes.  */
	pushq %rax

	/* Provide the highest stack address to the user code (for stacks
	   which grow downwards).  */
	pushq %rsp

#ifdef SHARED
	/* Pass address of our own entry points to .fini and .init.  */
	mov __libc_csu_fini@GOTPCREL(%rip), %R8_LP
	mov __libc_csu_init@GOTPCREL(%rip), %RCX_LP

	mov main@GOTPCREL(%rip), %RDI_LP
#else
	/* Pass address of our own entry points to .fini and .init.  */
	mov $__libc_csu_fini, %R8_LP
	mov $__libc_csu_init, %RCX_LP

	mov $main, %RDI_LP
#endif

	/* Call the user's main function, and exit with its value.
	   But let the libc call main.  Since __libc_start_main in
	   libc.so is called very early, lazy binding isn't relevant
	   here.  Use indirect branch via GOT to avoid extra branch
	   to PLT slot.  In case of static executable, ld in binutils
	   2.26 or above can convert indirect branch into direct
	   branch.  */
	call *__libc_start_main@GOTPCREL(%rip)

	hlt			/* Crash if somehow `exit' does return.	 */
END (_start)

/* Define a symbol for the first piece of initialized data.  */
	.data
	.globl __data_start
__data_start:
	.long 0
	.weak data_start
	data_start = __data_start

Looking at this the ENTRY(_start) looks pretty interesting.

Gone spelunking

Let's continue using step to get to our next break point:

(gdb) step
Single stepping until exit from function _start,
which has no line number information.

Breakpoint 3, __libc_start_main (main=0x400526 <main>, argc=1, argv=0x7fffffffe438, init=0x400590 <__libc_csu_init>, fini=0x400600 <__libc_csu_fini>, rtld_fini=0x7ffff7de7ab0 <_dl_fini>, 
    stack_end=0x7fffffffe428) at ../csu/libc-start.c:134
134	../csu/libc-start.c: No such file or directory.

Disassemble __libc_start_main()

Dump of assembler code for function __libc_start_main:
=> 0x00007ffff7a2d740 <+0>:	push   %r14
   0x00007ffff7a2d742 <+2>:	push   %r13
   0x00007ffff7a2d744 <+4>:	push   %r12
   *** snipped ***
   0x00007ffff7a2d900 <+448>:	cmp    %r12d,%r14d
   0x00007ffff7a2d903 <+451>:	jne    0x7ffff7a2d8df <__libc_start_main+415>
   0x00007ffff7a2d905 <+453>:	jmpq   0x7ffff7a2d7d6 <__libc_start_main+150>
End of assembler dump.

This function returned about 150 lines of assembly, again it might be easier just to go get the __libc_start_main source code. The only thing I'm really interested in looking at is the syscall.

Reverse Engineering is hard.

So again I didn't bother trying to reverse engineering the entire ___libc_start_main instead I just found the source code. The source code is below:

[0][mike@virtual-box-home.] [19:47:49] [~/code/c]
>cat glibc-2.26/sysdeps/unix/sysv/linux/powerpc/libc-start.c
/* snipped */
/* The main work is done in the generic function.  */
#define LIBC_START_MAIN generic_start_main
#define LIBC_START_DISABLE_INLINE
#define LIBC_START_MAIN_AUXVEC_ARG
#define MAIN_AUXVEC_ARG
#define INIT_MAIN_ARGS
#include <csu libc-start.c="">

struct startup_info
  {
    void *sda_base;
    int (*main) (int, char **, char **, void *);
    int (*init) (int, char **, char **, void *);
    void (*fini) (void);
  };

int
__libc_start_main (int argc, char **argv,
		   char **ev,
		   ElfW (auxv_t) * auxvec,
		   void (*rtld_fini) (void),
		   struct startup_info *stinfo,
		   char **stack_on_entry)
{
  /* the PPC SVR4 ABI says that the top thing on the stack will
     be a NULL pointer, so if not we assume that we're being called
     as a statically-linked program by Linux...  */
  if (*stack_on_entry != NULL)
    {
      char **temp;
      /* ...in which case, we have argc as the top thing on the
         stack, followed by argv (NULL-terminated), envp (likewise),
         and the auxiliary vector.  */
      /* 32/64-bit agnostic load from stack */
      argc = *(long int *) stack_on_entry;
      argv = stack_on_entry + 1;
      ev = argv + argc + 1;
#ifdef HAVE_AUX_VECTOR
      temp = ev;
      while (*temp != NULL)
	++temp;
      auxvec = (ElfW (auxv_t) *)++ temp;
#endif
      rtld_fini = NULL;
    }

  /* Initialize the __cache_line_size variable from the aux vector.  For the
     static case, we also need _dl_hwcap, _dl_hwcap2 and _dl_platform, so we
     can call __tcb_parse_hwcap_and_convert_at_platform ().  */
  for (ElfW (auxv_t) * av = auxvec; av->a_type != AT_NULL; ++av)
    switch (av->a_type)
      {
      case AT_DCACHEBSIZE:
	__cache_line_size = av->a_un.a_val;
	break;
#ifndef SHARED
      case AT_HWCAP:
	_dl_hwcap = (unsigned long int) av->a_un.a_val;
	break;
      case AT_HWCAP2:
	_dl_hwcap2 = (unsigned long int) av->a_un.a_val;
	break;
      case AT_PLATFORM:
	_dl_platform = (void *) av->a_un.a_val;
	break;
#endif
      }

  /* Initialize hwcap/hwcap2 and platform data so it can be copied to
     the TCB later in __libc_setup_tls (). (static case only).  */
#ifndef SHARED
  __tcb_parse_hwcap_and_convert_at_platform ();
#endif

  return generic_start_main (stinfo->main, argc, argv, auxvec,
			     stinfo->init, stinfo->fini, rtld_fini,
			     stack_on_entry);
}