Developing an instruction decoder for ARM instructions using ASL - Part 1
- Sarang Joshi
- Oct 9, 2019
- 3 min read
Updated: Mar 12, 2021
Motivation
In this post, we will demonstrate how to use ASL and Python to create a basic instruction decoder for Arm Specification Language. As a graduate researcher, I was trying to solve a larger problem of binary-level verification. Within that, there was a smaller sub-problem. I had to find a way to decode arm instructions and translate each individual instruction to an alternate representation. The alternate representation was a model that was developed in the PVS (Prototype Verification System), a theorem prover developed by NASA.
Problem
ARM instructions are fixed length 32 bits long. Each instruction is therefore a bitvector (only 0 or 1) of 32 bits. For example, the vector below represents the ADD (immediate) instruction.

The goal is to translate these bitvectors back into the instruction with its opcodes. This task is error-prone and tedious especially when dealing with 300 instructions! So, how can we translate these bitvectors in a reliable manner? This is where ASL (Arm Specification Language) comes to the rescue. ASL is a machine-readable (XML-based) instruction that defines the semantics of an instruction. It basically defines how instruction will behave that is encoded in an XML format. For example for the Add (immediate) instruction above, it specifies which and how register values will be updated when an instruction is executed.
More about ASL (https://alastairreid.github.io/specification_languages)
Downloading ASL
ASL can be downloaded from the link below.
Download link for ASL (https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools)

For our example, we will be downloading the xml files for the ARM v-8 A instruction set architecture.
Click on the XML to download and extract the tar.gz folder on your local machine to view the XML files.
Viewing the files in your browser
After extracting the files, you can navigate to the directory. On my local machine this is located at
ISA_A64_xml_v87A-2020-09 > xhtml. Click on instruction and you should be able to see the instruction diagram and the semantics in your browser window. I like to use this view when I want to understand the specifics of an instruction.

Parsing ASL with xmlTree
In order to fully exploit the potential of ASL we need to use an XML parser. We will be using the XMLTree Python module for our example. Let us write a small script to read the ASL file into our Python script. This script assumes that you have the python script in the same folder as the extracted ASL directory
import xml.etree.ElementTree as ET
import os
# Loading the encodingindex.xml ASL file from the dir
ASL_dirname = 'ISA_A64_xml_v87A-2020-09/'
dirpath = os.getcwd().strip() + '/' + ASL_dirname
filename = 'encodingindex.xml'
filepath = os.path.join(path, self.file_name)
fileptr = open(filepath,"r")
As we have successfully opened the file, it is time to use the magic of ElementTree for parsing
tree = ET.parse(filepath)
root = tree.getrootAdd the following method for debugging and finding the attributes of the root element
def displayRoot(root):
print (root.attrib)In order to decode the instruction, we first need to find the top-level encoding for an instruction. ARM instructions are hierarchically organized and every instruction belongs to one of the following top-level encodings. As seen in the diagram, every instruction belongs to the encodings defined in the op0 column of the table below. What it means is that bits 25-28 specify which encoding this is. For example, if the bits 25-28 are '100x' (x can be 0 or 1), the instruction belongs to the Data Processing family.

Our goal is to decode the instruction in a step-by-step fashion by starting from the top-level encoding. We find all nodes in the tree and loop through the nodes. We then match the bits (25-28) of the instruction against the reg diagram element in the XML. If the bits match we have decoded the top-level encoding. Code for this is as follows
matched_node = None
# bv is the 32 bit instruction bitvector
bits_to_match = bv[25:28]
for node in self.root.findall(".//node"):
if bit_to_match == node.regdiagram.attrib('op0')
matched_node = nodeThe following example loops through all the nodes. Whenever a match is successful on the op0 field, it assigns the node to matched_node. This node can be used for further decoding in a recursive fashion. We will explore these details in a second post that decodes an entire instruction. Happy coding!



Comments