Applied Pokology - Using maps in GNU poke

 
Applied Pokology                                           Back to blog...

     _____
 ---'   __\_______
            ______)         Using maps in GNU poke
            __)             
           __)
 ---._______)

                                                          Jose E. Marchesi
                                                          February 24, 2021

Table of Contents
_________________

1. Editing data using variables
2. Maps and map-files
3. Loading maps
4. Multiple perspectives of the same data
5. Auto-map
6. Creating and managing maps on the fly
7. Predefined maps




1 Editing data using variables
==============================

  Editing data with GNU poke mainly involves creating mapped values and
  storing them in Poke variables.  However, this may not be that
  convenient when poking several files simultaneously, and when the
  complexity of the data increases.

  For example, if we were interested in altering the fields of the
  header in an ELF file, we would map an `Elf64_Ehdr' struct at the
  beginning of the underlying IO space (the file), like in:

  ,----
  | (poke) .file foo.o
  | (poke) load elf
  | (poke) var ehdr = Elf64_Ehdr @ 0#B
  `----


  At this point the variable `ehdr' holds an `Elf64_Ehdr' structure,
  which is mapped.  As such, altering any of the fields of the struct
  will update the corresponding bytes in `foo.o'.  For example:

  ,----
  | (poke) ehdr.e_entry = 0#B
  `----


  A Poke value has three mapping related attributes: whether it is
  mapped, the offset at which it is mapped in an IO space, and in which
  IO space.  This information is accessible for both the user and Poke
  programs using the following attributes:

  ,----
  | (poke) ehdr'mapped
  | 1
  | (poke) ehdr'offset
  | 0UL#b
  | (poke) ehdr'ios
  | 0
  `----


  Thats it, `ehdr' is mapped at offset zero byte in the IO space `#0',
  which corresponds to `foo.o':

  ,----
  | (poke) .info ios
  |   Id   Type   Mode   Size           Name
  | * #0   FILE   rw     0x000004c8#B   ./foo.o
  `----


  Now that we have the ELF header, we may use it to get access to the
  ELF section header table in the file, that we will reference using
  another variable `shdr':

  ,----
  | (poke) var shdr = Elf64_Shdr[ehdr.e_shnum] @ ehdr.e_shoff
  | (poke) shdr[1]
  | Elf64_Shdr {
  |   sh_name=0x1bU#B,
  |   sh_type=0x1U,
  |   sh_flags=#<ALLOC,EXECINSTR>,
  |   sh_addr=0x0UL#B,
  |   sh_offset=0x40UL#B,
  |   sh_size=0xbUL#B,
  |   sh_link=0x0U,
  |   sh_info=0x0U,
  |   sh_addralign=0x1UL,
  |   sh_entsize=0x0UL#b
  | }
  `----


  Variables are convenient entities to manipulate in Poke.  Let's
  suppose that the file has a lot of sections and we want to do some
  transformation in every section.  It is a time consuming operation,
  and we may forget which sections we have already processed and which
  not. We could create an empty array to hold the sections already
  processed:

  ,----
  | (poke) var processed = Elf64_Shdr[] ()
  `----


  And then, once we have processed some given section, add it to the
  array:

  ,----
  | ... edit shdr[23] ...
  | (poke) processed += [shdr[23]]
  `----


  Note how the array `processed' is not mapped, but the sections
  contained in it are mapped: Poke uses copy by shared value.  So, after
  we spend the day carefully poking our ELF file, we can ask poke, are
  we done with all the sections in the file?

  ,----
  | (poke) shdr'length == processed'length
  | 1
  `----


  Yes, we are.  This can be made as sophisticated as desired.  We could
  easily write a function that saves the contents of `processed' in
  files, so we can continue hacking tomorrow, for example.

  We can then concluding that using mapped variables to edit data
  structures stored in IO spaces works well in common and simple cases
  like the above: we make our ways mapping here and there, defining
  variables to hold data that interests us, and it is easy to remember
  that the variables `ehdr' and `shdr' are mapped, where are they
  mapped, and that they are mapped in the file `foo.o'.

  However, GNU poke allows to edit more than one IO space
  simultaneously.  Let's say we now want to poke the sections of another
  ELF file: `bar.o'.  We would start by opening the file:

  ,----
  | (poke) .file bar.o
  | (poke) .info ios
  |   Id   Type   Mode   Size           Name
  | * #1   FILE   rw     0x000004c8#B   ./bar.o
  |   #0   FILE   rw     0x000004c8#B   ./foo.o
  `----


  Now that `bar.o' is the current IO space, we can map its header.  But
  now, what variable to use?  We would rather not redefine `ehdr',
  because that is already holding the header of `foo.o'.  We could adapt
  our naming schema on the fly:

  ,----
  | (poke) var foo_ehdr = ehdr
  | (poke) var bar_ehdr = Elf64_Ehdr @ 0#B
  `----


  But then we would need to do the same for the other variables too:

  ,----
  | (poke) var foo_shdr = shdr
  | (poke) var bar_shdr = Elf64_Shdr[bar_ehdr.e_shnum] @ bar_ehdr.e_shoff
  `----


  However, we can easily see how this can degenerate quickly: what about
  `processed', for example?  In general, as the number of IO spaces
  being edited increases it becomes more and more difficult to manage
  our mapped variables, which are associated to each IO space.


2 Maps and map-files
====================

  As we have seen mapping variables is a very powerful, general and
  flexible mean to edit stored binary data in one or more IO spaces.
  However it is easy to lose track of where the variables are mapped
  and, ideally speaking, we would want to have a mean to refer to, say,
  the "ELF header", and get the header as a mapped value regardless of
  what specific file we are editing.  Sort of a "meta variable".  GNU
  poke provides a way to do this: "maps".

  A "map" can be conceived as a sort of "view" that can be applied to a
  given IO space.  Maps have entries, which are values mapped at some
  given offset, under certain conditions.  For example, we have seen an
  ELF file contains, among other things, a header at the beginning of
  the file and a table of section headers of certain size and located at
  certain location determined by the header.  These would be two entries
  of a so-called ELF map.

  poke maps are defined in "map files".  These files use the `.map'
  extension.  A map file `self.map' (for sectioned/simple elf) defining
  the view of an ELF file as a header and a table of section header
  would look like this:

  ,----
  | /* self.map - map file for a simplified view of an ELF file.  */
  | 
  | load elf;
  | 
  | %%
  | 
  | %entry
  | %name ehdr
  | %type Elf64_Ehdr
  | %offset 0#B
  | 
  | %entry
  | %name shdr
  | %type Elf64_Shdr[(Elf64_Ehdr @ 0#B).e_shnum]
  | %condition (Elf64_Ehdr @ 0#B).e_shnum > 0
  | %offset (Elf64_Ehdr @ 0#B).e_shoff
  `----


  This map file defines a view of an ELF file as a header entry `ehdr'
  and an entry with a table of section headers `shdr'.

  The first section of the file, which spans until the separator line
  containing `%%', is arbitrary Poke code which as we shall see, gets
  evaluated before the map entries are processed.  This is called the
  map "prologue".  In this case, the prologue contains a comment
  explaining the purpose of the file, and a single statement `load' that
  loads the `elf.pk' pickle, since the entries below use definitions
  like `Elf64_Ehdr' that are defined by that pickle.  The prologue is
  useful to define Poke functions and other entities that are then used
  in the definitions of the entries.

  A separator line containing only `%%' separates the prologue from the
  next section, which is a list of entries definitions.  Each entry
  definition starts with a line `%entry', and has the following
  attributes:

  - A `%name', like `ehdr' and `shdr'.  These names should follow the
    same rules than Poke variables, but as we shall see later, map
    entries are not Poke variables.  This attribute is mandatory.

  - A `%type'.  This can be any Poke expression denoting a type, like
    `int', `Elf64_Ehdr' or `Elf64_Shdr[(Elf64_Ehdr @ 0#B).e_shnum]'.
    This attribute is mandatory.

  - A `%condition', if specified, will determine whether to include the
    entry in the map.  In the example above, the map will have an entry
    `shdr' only if the ELF file has one or more sections.  Any Poke
    expression evaluating to a boolean can be used as conditions.  This
    attribute is optional: entries not having a condition will always be
    included in the map.

  - An `%offset' in the IO space, where the entry will be mapped.  Any
    Poke expression evaluating to an offset can be used as entry offset.
    This attribute is mandatory.


3 Loading maps
==============

  So we have written our `self.map', which denotes a view or structure
  of ELF files we are interested on, and that resides in the current
  working directory.  How to use it?

  The first step is to fire up poke and open some object file.  Let's
  start with `foo.o':

  ,----
  | (poke) .file foo.o
  `----


  Now, we can load the map using the `.map load' dot-command:

  ,----
  | (poke) .map load self
  | [self](poke)
  `----


  The `.map load self' command makes poke to look in certain directories
  for a file called `self.map', and to load it.  The list of directories
  where poke looks for map files is encoded in the variable
  `map_load_path' as a string containing a maybe empty list of
  directories separated by `:' characters.  Each directory is tried in
  turn.  This variable is initialized with suitable defaults:

  ,----
  | (poke) map_load_path
  | "/home/jemarch/.poke.d:.:/home/jemarch/.local/share/poke:/home/jemarch/gnu/hacks/poke/maps"
  `----


  Once a map is loaded, observe how the prompt changed to contain a
  prefix `[self]'.  This means that the map `self' is loaded for the
  current IO space.  You can choose to not see this information in the
  prompt by setting the `prompt-maps' option either at the prompt or in
  your `.pokerc':

  ,----
  | (poke) .set prompt-maps no
  `----


  By default `prompt-maps' is `yes'.  This prompt aid is intended to
  provide a cursory look of the "views" or maps loaded for the current
  IO space.  If we load another IO space and switch to it, the prompt
  changes accordingly:

  ,----
  | [self](poke) .mem foo
  | The current IOS is now `*foo*'.
  | (poke) .ios #0
  | The current IOS is now `./foo.o'.
  | [self](poke) 
  `----


  At any time the `.info maps' dot-command can be used to obtain a full
  list of loaded maps, with more information about them:

  ,----
  | (poke) .info maps
  | IOS   Name   Source
  | #0    self   ./self.map
  `----


  In this case, there is a map `self' loaded in the IO space `#0', which
  corresponds to `foo.o'.

  Once we make `foo.o' our current IO space, we can ask poke to show us
  the entries corresponding to this map using another dot-command:

  ,----
  | (poke) .map show self
  | Offset     Entry
  | 0x0UL#B    $self::ehdr
  | 0x208UL#B  $self::shdr
  `----


  This tells us there are two entries for `self' in `foo.o':
  `$self::ehdr' and `$self::shdr'.  Note how map entries use names that
  start with the `$' character, then contain the name of the map an the
  name of the entry we defined in the map file, separated by `::'.

  We can now use these entries at the prompt like if they were regular
  mapped variables:

  ,----
  | [self](poke) $self::ehdr
  | Elf64_Ehdr {
  |   e_ident=struct {
  |     ei_mag=[0x7fUB,0x45UB,0x4cUB,0x46UB],
  |     [...]
  |   },
  |  e_type=0x1UH,
  |  e_machine=0x3eUH,
  |  [...]
  | }
  | (poke) $self::shdr'length
  | 11UL
  `----


  It is important to note, however, that map entries like $foo::bar are
  *not* part of the Poke language, and are only available when using
  poke interactively.  Poke programs and scripts can't use them.

  Let's now open another ELF file, and the `self' map in it:

  ,----
  | (poke) .file /usr/local/lib/libpoke.so.0.0.0 
  | (poke) .map load self
  | [self](poke)
  `----


  So now we have two ELF files loaded in poke: `foo.o' and
  `libpoke.so.0.0.0', and in both IO spaces we have the `self' map
  loaded.  We can easily see that the map entries are different
  depending on the current IO space:

  ,----
  | [self](poke) .map show self
  | Offset       Entry
  | 0UL#B        $self::ehdr
  | 3158952UL#B  $self::shdr
  | [self](poke) .ios #0
  | The current IOS is now `./foo.o'.
  | [self](poke) .map show self
  | Offset   Entry
  | 0UL#B    $self::ehdr
  | 520UL#B  $self::shdr
  `----


  `foo.o' is an object file, whereas `libpoke.so.0.0.0' is a DSO:

  ,----
  | (poke) .ios #0
  | The current IOS is now `./foo.o'.
  | [self](poke) $self::ehdr.e_type
  | 1UH
  | [self](poke) .ios #2
  | The current IOS is now `/usr/local/lib/libpoke.so.0.0.0'.
  | [self](poke) $self::ehdr.e_type
  | 3UH
  `----


  The interpretation of the map entry `$self::ehdr' is different
  depending on the current IO space.  This makes it possible to refer to
  the "ELF header" of the current file.

  Underneath, poke implements this by defining mapped variables and
  "redirecting" the entry names `$foo::bar' to the right variable
  depending on the IO space that is currently selected.  It hides all
  that complexity from us.


4 Multiple perspectives of the same data
========================================

  It is perfectly possible (and useful!) to load more than one map in
  the same IO space.  It is very natural for a single file, for example,
  to contain data that can be interpreted in several ways, or of
  different nature.

  Let's for example open again an ELF file, this time compiled with
  `-g':

  ,----
  | (poke) .file foo.o
  `----


  We now load our `self' map, to get a view of the file as a collection
  of sections:

  ,----
  | (poke) .map load self
  | [self](poke)
  `----


  And now we load the `dwarf' map that comes with poke, to get a view of
  the file as having debugging information encoded in DWARF:

  ,----
  | [self(poke) .map load dwarf
  | [dwarf,self](poke) 
  `----


  See how the prompt now reflects the fact that the current IO space
  contains DWARF info!  Let's take a look:

  ,----
  | [dwarf,self](poke) .info maps
  | IOS   Name    Source
  | #0    dwarf   /home/jemarch/gnu/hacks/poke/maps/dwarf.map
  | #0    self    ./self.map
  | [dwarf,self](poke) .map show dwarf
  | Offset    Entry
  | 0x5bUL#B  $dwarf::info
  `----


  Now we can access entries from any of the loaded maps, i.e. access the
  file in terms of different perspectives.  As an ELF file:

  ,----
  | [dwarf,self](poke) $self::shdr[1]
  | Elf64_Shdr {
  |   sh_name=0xb5U#B,
  |   sh_type=0x11U,
  |   sh_flags=#<>,
  |   sh_addr=0x0UL#B,
  |   sh_offset=0x40UL#B,
  |   sh_size=0x8UL#B,
  |   sh_link=0x18U,
  |   sh_info=0xfU,
  |   sh_addralign=0x4UL,
  |   sh_entsize=0x4UL#b
  | }
  `----


  And as a file containing DWARF info:

  ,----
  | [dwarf,self](poke) $dwarf::info
  | Dwarf_CU_Header {
  |   unit_length=#<0x0000004eU#B>,
  |   version=0x4UH,
  |   debug_abbrev_offset=#<0x00000000U#B>,
  |   address_size=0x8UB#B
  | }
  `----


  If you are curious about how the DWARF entries are defined, look at
  `maps/dwarf.map' in the poke source distribution, or in your installed
  poke (`.info maps' will tell you the file the map got loaded from.)

  It is possible to unload or remove a map from a given IO space using
  the `.map remove' dot-command.  Say we are done looking at the DWARF
  in `foo.o', and we are no longer interested in it as a file containing
  debugging info.  We can do:

  ,----
  | [dwarf,self](poke) .map remove dwarf
  | [self](poke) 
  `----


  Note how the prompt was updated accordingly: only `self' remains as a
  loaded map on this file.


5 Auto-map
==========

  Certain maps make sense when editing certain types of data.  For
  example, `dwarf.map' is intended to be used in ELF files.  In order to
  ease using maps, poke provides a feature called "auto mapping", which
  is disabled by default.

  You can set auto mapping like this:

  ,----
  | (poke) .set auto-map yes
  `----


  When auto mapping is enabled, poke will look to the value of the
  pre-defined variable `auto_map', which must contain an array of pairs
  of strings, associating a regular expression with a map name.

  For example, you may want to initialize `auto_map' like this in your
  `.pokerc' file:

  ,----
  | auto_map = [[".*\\.mp3$", "mp3"],
  |             [".*\\.o$", "elf"],
  |             ["a\\.out$", "elf"]];
  `----


  This will make poke to load `mp3.map' for every file whose name ends
  with ".mp3", and `elf.map' for files having names like `foo.o' and
  `a.out'.

  Following the usual pokeish philosophy of being as less as intrusive
  by default as possible, the default value of `auto_map' is the empty
  string.


6 Creating and managing maps on the fly
=======================================

  As we have seen, we can define our own maps by writing map files
  like `self.map', which contain a prologue and a set of map entries.
  However, sometimes it is useful to create maps "on the fly" while we
  explore some data with poke.

  To make this possible, poke provides a suitable set of dot-commands.
  Let's say we are poking some data, and we want to create a map for it.
  We can do that like this:

  ,----
  | (poke) .map create mymap
  `----


  This creates an empty map named `mymap', with no entries:

  ,----
  | [mymap](poke) .map show mymap
  | Offset   Entry
  `----


  Adding entries is easy.  First, we have to map some variable, and then
  use it as the base for the new entry:

  ,----
  | [mymap](poke) var foo = int[3] @ 0#B
  | [mymap](poke) .map entry add mymap, foo
  | [mymap](poke) .map show mymap
  | Offset   Entry
  | 0x0UL#B  $mymap::foo
  `----


  Note how the entry `$mymap::foo' gets created, associated to the
  current IO space and mapped at the same offset than the variable
  `foo'.

  We can remove entries from existing maps using the `.map entry remove'
     dot-command:

  ,----
  | [mymap](poke) .map entry remove mymap, foo
  | [mymap](poke) .map show mymap
  | Offset   Entry
  | [mymap](poke)
  `----


  We plan to add an additional command to save maps to map files.  The
  idea is that you can create your maps on the fly, save them, and then
  load them back some other day when you are ready to continue poking.
  This is not implemented yet though.


7 Predefined maps
=================

  GNU poke comes with a set of useful pre-written maps, which get
  installed in a system location.  We want to expand this collection, so
  please send us your map files!

Happy poking! :)