Applied Pokology

Back to blog... _____ ---' __\_______ ______)

Using maps in GNU poke

__) __) ---._______) Jose E. Marchesi February 24, 2021 Table of Contents _________________ 1. Editing data using variables 2. Maps and map-files 3. Loading maps 4. Multiple perspectives of the same data 5. Auto-map 6. Creating and managing maps on the fly 7. Predefined maps 1 Editing data using variables ============================== Editing data with GNU poke mainly involves creating mapped values and storing them in Poke variables. However, this may not be that convenient when poking several files simultaneously, and when the complexity of the data increases. For example, if we were interested in altering the fields of the header in an ELF file, we would map an `Elf64_Ehdr' struct at the beginning of the underlying IO space (the file), like in: ,---- | (poke) .file foo.o | (poke) load elf | (poke) var ehdr = Elf64_Ehdr @ 0#B `---- At this point the variable `ehdr' holds an `Elf64_Ehdr' structure, which is mapped. As such, altering any of the fields of the struct will update the corresponding bytes in `foo.o'. For example: ,---- | (poke) ehdr.e_entry = 0#B `---- A Poke value has three mapping related attributes: whether it is mapped, the offset at which it is mapped in an IO space, and in which IO space. This information is accessible for both the user and Poke programs using the following attributes: ,---- | (poke) ehdr'mapped | 1 | (poke) ehdr'offset | 0UL#b | (poke) ehdr'ios | 0 `---- Thats it, `ehdr' is mapped at offset zero byte in the IO space `#0', which corresponds to `foo.o': ,---- | (poke) .info ios | Id Type Mode Size Name | * #0 FILE rw 0x000004c8#B ./foo.o `---- Now that we have the ELF header, we may use it to get access to the ELF section header table in the file, that we will reference using another variable `shdr': ,---- | (poke) var shdr = Elf64_Shdr[ehdr.e_shnum] @ ehdr.e_shoff | (poke) shdr[1] | Elf64_Shdr { | sh_name=0x1bU#B, | sh_type=0x1U, | sh_flags=#<ALLOC,EXECINSTR>, | sh_addr=0x0UL#B, | sh_offset=0x40UL#B, | sh_size=0xbUL#B, | sh_link=0x0U, | sh_info=0x0U, | sh_addralign=0x1UL, | sh_entsize=0x0UL#b | } `---- Variables are convenient entities to manipulate in Poke. Let's suppose that the file has a lot of sections and we want to do some transformation in every section. It is a time consuming operation, and we may forget which sections we have already processed and which not. We could create an empty array to hold the sections already processed: ,---- | (poke) var processed = Elf64_Shdr[] () `---- And then, once we have processed some given section, add it to the array: ,---- | ... edit shdr[23] ... | (poke) processed += [shdr[23]] `---- Note how the array `processed' is not mapped, but the sections contained in it are mapped: Poke uses copy by shared value. So, after we spend the day carefully poking our ELF file, we can ask poke, are we done with all the sections in the file? ,---- | (poke) shdr'length == processed'length | 1 `---- Yes, we are. This can be made as sophisticated as desired. We could easily write a function that saves the contents of `processed' in files, so we can continue hacking tomorrow, for example. We can then concluding that using mapped variables to edit data structures stored in IO spaces works well in common and simple cases like the above: we make our ways mapping here and there, defining variables to hold data that interests us, and it is easy to remember that the variables `ehdr' and `shdr' are mapped, where are they mapped, and that they are mapped in the file `foo.o'. However, GNU poke allows to edit more than one IO space simultaneously. Let's say we now want to poke the sections of another ELF file: `bar.o'. We would start by opening the file: ,---- | (poke) .file bar.o | (poke) .info ios | Id Type Mode Size Name | * #1 FILE rw 0x000004c8#B ./bar.o | #0 FILE rw 0x000004c8#B ./foo.o `---- Now that `bar.o' is the current IO space, we can map its header. But now, what variable to use? We would rather not redefine `ehdr', because that is already holding the header of `foo.o'. We could adapt our naming schema on the fly: ,---- | (poke) var foo_ehdr = ehdr | (poke) var bar_ehdr = Elf64_Ehdr @ 0#B `---- But then we would need to do the same for the other variables too: ,---- | (poke) var foo_shdr = shdr | (poke) var bar_shdr = Elf64_Shdr[bar_ehdr.e_shnum] @ bar_ehdr.e_shoff `---- However, we can easily see how this can degenerate quickly: what about `processed', for example? In general, as the number of IO spaces being edited increases it becomes more and more difficult to manage our mapped variables, which are associated to each IO space. 2 Maps and map-files ==================== As we have seen mapping variables is a very powerful, general and flexible mean to edit stored binary data in one or more IO spaces. However it is easy to lose track of where the variables are mapped and, ideally speaking, we would want to have a mean to refer to, say, the "ELF header", and get the header as a mapped value regardless of what specific file we are editing. Sort of a "meta variable". GNU poke provides a way to do this: "maps". A "map" can be conceived as a sort of "view" that can be applied to a given IO space. Maps have entries, which are values mapped at some given offset, under certain conditions. For example, we have seen an ELF file contains, among other things, a header at the beginning of the file and a table of section headers of certain size and located at certain location determined by the header. These would be two entries of a so-called ELF map. poke maps are defined in "map files". These files use the `.map' extension. A map file `self.map' (for sectioned/simple elf) defining the view of an ELF file as a header and a table of section header would look like this: ,---- | /* self.map - map file for a simplified view of an ELF file. */ | | load elf; | | %% | | %entry | %name ehdr | %type Elf64_Ehdr | %offset 0#B | | %entry | %name shdr | %type Elf64_Shdr[(Elf64_Ehdr @ 0#B).e_shnum] | %condition (Elf64_Ehdr @ 0#B).e_shnum > 0 | %offset (Elf64_Ehdr @ 0#B).e_shoff `---- This map file defines a view of an ELF file as a header entry `ehdr' and an entry with a table of section headers `shdr'. The first section of the file, which spans until the separator line containing `%%', is arbitrary Poke code which as we shall see, gets evaluated before the map entries are processed. This is called the map "prologue". In this case, the prologue contains a comment explaining the purpose of the file, and a single statement `load' that loads the `elf.pk' pickle, since the entries below use definitions like `Elf64_Ehdr' that are defined by that pickle. The prologue is useful to define Poke functions and other entities that are then used in the definitions of the entries. A separator line containing only `%%' separates the prologue from the next section, which is a list of entries definitions. Each entry definition starts with a line `%entry', and has the following attributes: - A `%name', like `ehdr' and `shdr'. These names should follow the same rules than Poke variables, but as we shall see later, map entries are not Poke variables. This attribute is mandatory. - A `%type'. This can be any Poke expression denoting a type, like `int', `Elf64_Ehdr' or `Elf64_Shdr[(Elf64_Ehdr @ 0#B).e_shnum]'. This attribute is mandatory. - A `%condition', if specified, will determine whether to include the entry in the map. In the example above, the map will have an entry `shdr' only if the ELF file has one or more sections. Any Poke expression evaluating to a boolean can be used as conditions. This attribute is optional: entries not having a condition will always be included in the map. - An `%offset' in the IO space, where the entry will be mapped. Any Poke expression evaluating to an offset can be used as entry offset. This attribute is mandatory. 3 Loading maps ============== So we have written our `self.map', which denotes a view or structure of ELF files we are interested on, and that resides in the current working directory. How to use it? The first step is to fire up poke and open some object file. Let's start with `foo.o': ,---- | (poke) .file foo.o `---- Now, we can load the map using the `.map load' dot-command: ,---- | (poke) .map load self | [self](poke) `---- The `.map load self' command makes poke to look in certain directories for a file called `self.map', and to load it. The list of directories where poke looks for map files is encoded in the variable `map_load_path' as a string containing a maybe empty list of directories separated by `:' characters. Each directory is tried in turn. This variable is initialized with suitable defaults: ,---- | (poke) map_load_path | "/home/jemarch/.poke.d:.:/home/jemarch/.local/share/poke:/home/jemarch/gnu/hacks/poke/maps" `---- Once a map is loaded, observe how the prompt changed to contain a prefix `[self]'. This means that the map `self' is loaded for the current IO space. You can choose to not see this information in the prompt by setting the `prompt-maps' option either at the prompt or in your `.pokerc': ,---- | (poke) .set prompt-maps no `---- By default `prompt-maps' is `yes'. This prompt aid is intended to provide a cursory look of the "views" or maps loaded for the current IO space. If we load another IO space and switch to it, the prompt changes accordingly: ,---- | [self](poke) .mem foo | The current IOS is now `*foo*'. | (poke) .ios #0 | The current IOS is now `./foo.o'. | [self](poke) `---- At any time the `.info maps' dot-command can be used to obtain a full list of loaded maps, with more information about them: ,---- | (poke) .info maps | IOS Name Source | #0 self ./self.map `---- In this case, there is a map `self' loaded in the IO space `#0', which corresponds to `foo.o'. Once we make `foo.o' our current IO space, we can ask poke to show us the entries corresponding to this map using another dot-command: ,---- | (poke) .map show self | Offset Entry | 0x0UL#B $self::ehdr | 0x208UL#B $self::shdr `---- This tells us there are two entries for `self' in `foo.o': `$self::ehdr' and `$self::shdr'. Note how map entries use names that start with the `$' character, then contain the name of the map an the name of the entry we defined in the map file, separated by `::'. We can now use these entries at the prompt like if they were regular mapped variables: ,---- | [self](poke) $self::ehdr | Elf64_Ehdr { | e_ident=struct { | ei_mag=[0x7fUB,0x45UB,0x4cUB,0x46UB], | [...] | }, | e_type=0x1UH, | e_machine=0x3eUH, | [...] | } | (poke) $self::shdr'length | 11UL `---- It is important to note, however, that map entries like $foo::bar are *not* part of the Poke language, and are only available when using poke interactively. Poke programs and scripts can't use them. Let's now open another ELF file, and the `self' map in it: ,---- | (poke) .file /usr/local/lib/libpoke.so.0.0.0 | (poke) .map load self | [self](poke) `---- So now we have two ELF files loaded in poke: `foo.o' and `libpoke.so.0.0.0', and in both IO spaces we have the `self' map loaded. We can easily see that the map entries are different depending on the current IO space: ,---- | [self](poke) .map show self | Offset Entry | 0UL#B $self::ehdr | 3158952UL#B $self::shdr | [self](poke) .ios #0 | The current IOS is now `./foo.o'. | [self](poke) .map show self | Offset Entry | 0UL#B $self::ehdr | 520UL#B $self::shdr `---- `foo.o' is an object file, whereas `libpoke.so.0.0.0' is a DSO: ,---- | (poke) .ios #0 | The current IOS is now `./foo.o'. | [self](poke) $self::ehdr.e_type | 1UH | [self](poke) .ios #2 | The current IOS is now `/usr/local/lib/libpoke.so.0.0.0'. | [self](poke) $self::ehdr.e_type | 3UH `---- The interpretation of the map entry `$self::ehdr' is different depending on the current IO space. This makes it possible to refer to the "ELF header" of the current file. Underneath, poke implements this by defining mapped variables and "redirecting" the entry names `$foo::bar' to the right variable depending on the IO space that is currently selected. It hides all that complexity from us. 4 Multiple perspectives of the same data ======================================== It is perfectly possible (and useful!) to load more than one map in the same IO space. It is very natural for a single file, for example, to contain data that can be interpreted in several ways, or of different nature. Let's for example open again an ELF file, this time compiled with `-g': ,---- | (poke) .file foo.o `---- We now load our `self' map, to get a view of the file as a collection of sections: ,---- | (poke) .map load self | [self](poke) `---- And now we load the `dwarf' map that comes with poke, to get a view of the file as having debugging information encoded in DWARF: ,---- | [self(poke) .map load dwarf | [dwarf,self](poke) `---- See how the prompt now reflects the fact that the current IO space contains DWARF info! Let's take a look: ,---- | [dwarf,self](poke) .info maps | IOS Name Source | #0 dwarf /home/jemarch/gnu/hacks/poke/maps/dwarf.map | #0 self ./self.map | [dwarf,self](poke) .map show dwarf | Offset Entry | 0x5bUL#B $dwarf::info `---- Now we can access entries from any of the loaded maps, i.e. access the file in terms of different perspectives. As an ELF file: ,---- | [dwarf,self](poke) $self::shdr[1] | Elf64_Shdr { | sh_name=0xb5U#B, | sh_type=0x11U, | sh_flags=#<>, | sh_addr=0x0UL#B, | sh_offset=0x40UL#B, | sh_size=0x8UL#B, | sh_link=0x18U, | sh_info=0xfU, | sh_addralign=0x4UL, | sh_entsize=0x4UL#b | } `---- And as a file containing DWARF info: ,---- | [dwarf,self](poke) $dwarf::info | Dwarf_CU_Header { | unit_length=#<0x0000004eU#B>, | version=0x4UH, | debug_abbrev_offset=#<0x00000000U#B>, | address_size=0x8UB#B | } `---- If you are curious about how the DWARF entries are defined, look at `maps/dwarf.map' in the poke source distribution, or in your installed poke (`.info maps' will tell you the file the map got loaded from.) It is possible to unload or remove a map from a given IO space using the `.map remove' dot-command. Say we are done looking at the DWARF in `foo.o', and we are no longer interested in it as a file containing debugging info. We can do: ,---- | [dwarf,self](poke) .map remove dwarf | [self](poke) `---- Note how the prompt was updated accordingly: only `self' remains as a loaded map on this file. 5 Auto-map ========== Certain maps make sense when editing certain types of data. For example, `dwarf.map' is intended to be used in ELF files. In order to ease using maps, poke provides a feature called "auto mapping", which is disabled by default. You can set auto mapping like this: ,---- | (poke) .set auto-map yes `---- When auto mapping is enabled, poke will look to the value of the pre-defined variable `auto_map', which must contain an array of pairs of strings, associating a regular expression with a map name. For example, you may want to initialize `auto_map' like this in your `.pokerc' file: ,---- | auto_map = [[".*\\.mp3$", "mp3"], | [".*\\.o$", "elf"], | ["a\\.out$", "elf"]]; `---- This will make poke to load `mp3.map' for every file whose name ends with ".mp3", and `elf.map' for files having names like `foo.o' and `a.out'. Following the usual pokeish philosophy of being as less as intrusive by default as possible, the default value of `auto_map' is the empty string. 6 Creating and managing maps on the fly ======================================= As we have seen, we can define our own maps by writing map files like `self.map', which contain a prologue and a set of map entries. However, sometimes it is useful to create maps "on the fly" while we explore some data with poke. To make this possible, poke provides a suitable set of dot-commands. Let's say we are poking some data, and we want to create a map for it. We can do that like this: ,---- | (poke) .map create mymap `---- This creates an empty map named `mymap', with no entries: ,---- | [mymap](poke) .map show mymap | Offset Entry `---- Adding entries is easy. First, we have to map some variable, and then use it as the base for the new entry: ,---- | [mymap](poke) var foo = int[3] @ 0#B | [mymap](poke) .map entry add mymap, foo | [mymap](poke) .map show mymap | Offset Entry | 0x0UL#B $mymap::foo `---- Note how the entry `$mymap::foo' gets created, associated to the current IO space and mapped at the same offset than the variable `foo'. We can remove entries from existing maps using the `.map entry remove' dot-command: ,---- | [mymap](poke) .map entry remove mymap, foo | [mymap](poke) .map show mymap | Offset Entry | [mymap](poke) `---- We plan to add an additional command to save maps to map files. The idea is that you can create your maps on the fly, save them, and then load them back some other day when you are ready to continue poking. This is not implemented yet though. 7 Predefined maps ================= GNU poke comes with a set of useful pre-written maps, which get installed in a system location. We want to expand this collection, so please send us your map files! Happy poking! :)