top of page
Search

Developing an instruction decoder for ARM instructions using ASL - Part 1

  • Writer: Sarang Joshi
    Sarang Joshi
  • Oct 9, 2019
  • 3 min read

Updated: Mar 12, 2021

Motivation

In this post, we will demonstrate how to use ASL and Python to create a basic instruction decoder for Arm Specification Language. As a graduate researcher, I was trying to solve a larger problem of binary-level verification. Within that, there was a smaller sub-problem. I had to find a way to decode arm instructions and translate each individual instruction to an alternate representation. The alternate representation was a model that was developed in the PVS (Prototype Verification System), a theorem prover developed by NASA.



Problem

ARM instructions are fixed length 32 bits long. Each instruction is therefore a bitvector (only 0 or 1) of 32 bits. For example, the vector below represents the ADD (immediate) instruction.

ree
Add (Immediate) Instruction diagram

The goal is to translate these bitvectors back into the instruction with its opcodes. This task is error-prone and tedious especially when dealing with 300 instructions! So, how can we translate these bitvectors in a reliable manner? This is where ASL (Arm Specification Language) comes to the rescue. ASL is a machine-readable (XML-based) instruction that defines the semantics of an instruction. It basically defines how instruction will behave that is encoded in an XML format. For example for the Add (immediate) instruction above, it specifies which and how register values will be updated when an instruction is executed.



Downloading ASL


ASL can be downloaded from the link below.



ree
Click on Download XML (highlighted)

For our example, we will be downloading the xml files for the ARM v-8 A instruction set architecture.


Click on the XML to download and extract the tar.gz folder on your local machine to view the XML files.



Viewing the files in your browser


After extracting the files, you can navigate to the directory. On my local machine this is located at

ISA_A64_xml_v87A-2020-09 > xhtml. 

Click on instruction and you should be able to see the instruction diagram and the semantics in your browser window. I like to use this view when I want to understand the specifics of an instruction.


ree
ASL instruction in the browser

Parsing ASL with xmlTree


In order to fully exploit the potential of ASL we need to use an XML parser. We will be using the XMLTree Python module for our example. Let us write a small script to read the ASL file into our Python script. This script assumes that you have the python script in the same folder as the extracted ASL directory


 import xml.etree.ElementTree as ET
 import os
 
 # Loading the encodingindex.xml ASL file from the dir
 ASL_dirname = 'ISA_A64_xml_v87A-2020-09/'
 dirpath = os.getcwd().strip() + '/' + ASL_dirname
 filename = 'encodingindex.xml'
 filepath = os.path.join(path, self.file_name)
 fileptr = open(filepath,"r")
  

As we have successfully opened the file, it is time to use the magic of ElementTree for parsing

tree = ET.parse(filepath)
root = tree.getroot

 Add the following method for debugging and finding the attributes of the root element

 def displayRoot(root):
    print (root.attrib)

 In order to decode the instruction, we first need to find the top-level encoding for an instruction. ARM instructions are hierarchically organized and every instruction belongs to one of the following top-level encodings. As seen in the diagram, every instruction belongs to the encodings defined in the op0 column of the table below. What it means is that bits 25-28 specify which encoding this is. For example, if the bits 25-28 are '100x' (x can be 0 or 1), the instruction belongs to the Data Processing family.


Bits 25-28 of every instruction are specify which top-level encoding it belongs to.
ASL top level encodings

Our goal is to decode the instruction in a step-by-step fashion by starting from the top-level encoding. We find all nodes in the tree and loop through the nodes. We then match the bits (25-28) of the instruction against the reg diagram element in the XML. If the bits match we have decoded the top-level encoding. Code for this is as follows


matched_node = None
# bv is the 32 bit instruction bitvector
bits_to_match = bv[25:28] 
for node in self.root.findall(".//node"):
    if bit_to_match == node.regdiagram.attrib('op0')
        matched_node = node

The following example loops through all the nodes. Whenever a match is successful on the op0 field, it assigns the node to matched_node. This node can be used for further decoding in a recursive fashion. We will explore these details in a second post that decodes an entire instruction. Happy coding!


 
 
 

Comments


  • Octocat
  • Google Scholar Sarang Joshi
  • LinkedIn
  • Instagram

© 2023 by Web-Designer. Proudly created with Wix.com

bottom of page