RTEMS workflow on a STM32F4 Nucleo board

This post describes the workflow I use to build RTEMS applications for a STM32F429ZI Nucleo board. I am using RTEMS 5 with the included stm32f4 BSP.

Obtaining the sources and building the ARM architecture toolchain

We are using the RTEMS source builder (RSB) and RTEMS kernel version 5, i.e. branch 5 needs to be checked out on both the rsb and rtems repos:

mkdir -p $HOME/dev_rtems/bsps
mkdir -p $HOME/dev_rtems/tools
mkdir -p $HOME/dev_rtems/src
cd $HOME/dev_rtems/src
git clone git://git.rtems.org/rtems-source-builder.git rsb
git clone git://git.rtems.org/rtems.git rtems

# Verify the RSB
cd $HOME/dev_rtems/src/rsb
git checkout 5
./source-builder/sb-check

# Build the toolchain for the ARM architecture
cd $HOME/dev_rtems/src/rsb/rtems
git checkout 5
../source_builder/sb-set-builder --prefix=$HOME/dev_rtems/tools/5 5/rtems-arm

Bootstrapping and building the stm32f4 bsp

The bases for the F4 bsp is the STM32F407VGT6. Our STM32F429ZI has more RAM and its flash is located at 0x08000000. These changes need to be added to the linker configuration file, $HOME/dev_rtems/src/rtems/bsps/arm/stm32f4/start/linkcmds.stm32f4:

RAM_INT : ORIGIN = 0x20000000, LENGTH = 192k
ROM_INT : ORIGIN = 0x08000000, LENGTH = 2048k

Furthemore, we want to use UART2 for console output instead of the default UART3. This needs to be set within $HOME/dev_rtems/src/rtems/bsps/arm/stm32f4/start/start-config-io.c and $HOME/dev_rtems/src/rtems/c/src/lib/libbsp/arm/stm32f4/configure.ac. Now we are ready to build our bsp:

cd $HOME/dev_rtems/src/rtems
export PATH=$HOME/dev_rtems/tools/5/bin:"$PATH"
cd $HOME/dev_rtems/src/rtems
./bootstrap -c && ./rtems-bootstrap

# Build the stm32f4 bsp
cd $HOME/dev_rtems/bsps/5
$HOME/dev_rtems/src/rtems/configure --prefix=$HOME/dev_rtems/tools/5 --target=arm-rtems5 --enable-rtemsbsp=stm32f4 --enable-posix --disable-networking --enable-tests=samples --enable-maintainer-mode
make all
make install

Compiling an application

We are using the example programs from the rtems-examples repository. It is easy to start with one of the examples as a base for building our own RTEMS application.

cd $HOME/dev_rtems/src
git clone https://git.rtems.org/rtems-examples/ rtems-examples
cd rtems-examples
export RTEMS_MAKEFILE_PATH=$HOME/dev_rtems/bsps/5/arm-rtems5/c/stm32f4/make
make

We are flashing our Nucleo board using gdb through a Black Magic probe. The examples work producing printf output through UART2. Time to start experimenting with RTEMS!

Using Docker to cross-compile embedded software

Are you tired of managing the installation of different cross-development toolchains on the same machine, fixing issues when your compiler does not work after a host OS upgrade or having to deal with the same toolchain being installed in heterogeneous environments?

Docker fixes some of these issues by providing a light-weight virtualization layer that isolates the cross-development toolchains from the host OS, allows the easier coexistence of different tools in the same machine, and facilitates their management and deployability.

We have been facing these problems while developing software for an Infineon XMC4800 microcontroller on a Linux host, and have improved our process by using a Docker cross-compilation container with the following features:

  • Docker container based on Ubuntu 18.04 LTS
  • GNU ARM toolchain
  • Infineon XMC libraries for XMC4800
  • Segger JLink tool for target flashing and debugging
  • Container compiles code from the host invocation directory
  • Use of ccache to speed up subsequent compilations

This is our resulting Dockerfile:

# Root image built from LTS ubuntu in Docker Hub.
FROM ubuntu:18.04

MAINTAINER Juan Solano "jsm@jsolano.com"

# Update this variable to force a refresh of all base images and make
# sure subsequent commands do not use old cache versions.
ENV REFRESHED_AT 2018-11-26

ARG USERNAME="docker"
ARG USERGROUP="dckrgroup"
ARG DEBIAN_FRONTEND=noninteractive
# These can be overriden with a command line option when the image is
# built, e.g. --build-arg UID=$(id -u) --build-arg GID=$(id -g).
ARG UID=1000
ARG GID=1000
ARG GCC_ARM_TOOLCHAIN_VER="gcc-arm-none-eabi-7-2018-q2-update"
ARG GCC_ARM_TOOLCHAIN_URL="https://developer.arm.com/-/media/Files/downloads/gnu-rm/7-2018q2/"$GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2
ARG XMC_LIB_VER="XMC_Peripheral_Library_v2.1.18"
ARG XMC_LIB_URL="http://dave.infineon.com/Libraries/XMCLib/"$XMC_LIB_VER.zip
ARG JLINK_VER="JLink_Linux_V634g_x86_64"

# Set up the compiler path and other container environment variables.
ENV PATH $PATH:/home/$USERNAME/opt/$GCC_ARM_TOOLCHAIN_VER/bin
ENV GCC_ARM_TOOLCHAIN_VER $GCC_ARM_TOOLCHAIN_VER
ENV GCC_COLORS="error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01"
ENV USB_SCRIPT="usbdev_allow.sh"
ENV TZ=Europe/Berlin

RUN apt-get update -q \
    && apt-get install --no-install-recommends -y apt-utils \
    && apt-get install --no-install-recommends -y vim make sudo \
       tzdata libncurses5 ca-certificates unzip bzip2 libtool ccache \
       usbutils libusb-1.0-0-dev libusb-dev \
    && rm -rf /var/lib/apt/lists/*

# Set timezone and standard user.
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
    && echo $TZ > /etc/timezone \
    && groupadd --gid $GID $USERGROUP \
    && useradd -m -u $UID -g $GID -o -s /bin/bash $USERNAME \
    && echo "root:root" | chpasswd \
    && echo "$USERNAME:$USERNAME" | chpasswd \
    && usermod -a -G 20 $USERNAME \
    && adduser $USERNAME sudo \
    && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# Set up a build tools directory.
RUN mkdir -p /home/$USERNAME/opt
WORKDIR /home/$USERNAME/opt
RUN chown $USERNAME /home/$USERNAME/opt \
    && cd /home/$USERNAME/opt

# Install JLink as root, before changing to standard user.
COPY $JLINK_VER.deb /home/$USERNAME/opt
RUN dpkg -i $JLINK_VER.deb \
    && rm $JLINK_VER.deb
COPY $USB_SCRIPT /home/$USERNAME/opt
RUN chmod +x /home/$USERNAME/opt/$USER_SCRIPT

# Further operations as standard user.
USER $USERNAME

# Install the XMC library.
COPY $XMC_LIB_VER.zip /home/$USERNAME/opt
RUN unzip $XMC_LIB_VER.zip \
    && rm $XMC_LIB_VER.zip

# Install the ARM cross-compilation toolchain.
COPY $GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2 /home/$USERNAME/opt
RUN bunzip2 $GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2 \
    && tar xvf $GCC_ARM_TOOLCHAIN_VER-linux.tar \
    && rm $GCC_ARM_TOOLCHAIN_VER-linux.tar

# Required so that ccache files are kept in shared work directory.
RUN cd /usr/lib/ccache \
    && sudo ln -s ../../bin/ccache arm-none-eabi-gcc
ENV PATH /usr/lib/ccache:$PATH

# Create a directory for our project and setup a shared work directory.
RUN mkdir -p /home/$USERNAME/project
WORKDIR /home/$USERNAME/project
VOLUME /home/$USERNAME/project
RUN cd /home/$USERNAME/project \
    && mkdir -p $HOME/.ccache \
    && echo "cache_dir = $HOME/project/.ccache" >> \
       $HOME/.ccache/ccache.conf

Initially we added wget commands to the Dockerfile, so that the tools were directly downloaded before usage, but we have later decided to keep a local copy of our tools to speed up the Docker image creation. After creating the Docker image, compiling is just a matter of going to the directory where our source code lives and executing our make alias, which can be defined like e.g.:

alias xmcmake='docker run --rm -it --device=/dev/bus/usb --volume=$(pwd):/home/docker/project docker-arm-xmc make'

This starts a container based on the previously created docker-arm-xmc image, allowing access to the JLink usb port from inside the container, and executes the make command. After the make command is executed, the container exits and we can see our compiled binaries as well as a directory with the .ccache artifacts which will be used the next time the make command is invoked.

In subsequent posts, I will delve into additional development steps that can be realized with the help of this container. I hope you find this useful.

ARM unaligned data access and floating point in Linux

I was recently getting Data Aborts on an ARM11 program that makes intensive use on unaligned data accesses. The issue was caused by unaligned floating point accesses, which were not handled by the Linux kernel. Some background on the problem follows.

ARM unaligned data access hardware support

ARM 32-bit instructions must always be word boundary aligned. Data accesses do not have this restriction. Prior to ARMv6 architecture, unaligned load and store memory accesses were treated as aligned by truncating the data address. Starting with ARMv6, unaligned word and halfword load and store data access is supported by issuing one or more memory accesses to read the required bytes transparently, albeit incurring in a potentially greater access time.

Unaligned data access is controlled through the following bits of the CR1 register of the CP15 coprocessor:

  • U bit. Unaligned data access support enabled. This bit must be set in order to enable unaligned data access support. Disabling this bit means we must either provide an unaligned data access handler (like the one provided by the Linux kernel) or our software must be compiled with unaligned data access disabled by using the corresponding compiler option.
  • A bit. Alignment fault enabled. When this bit is set, all unaligned data accesses cause a Data Abort exception, irrespective of the value of the U bit. When A and U bits are not set, legacy ARMv5 mode is enabled, where an unaligned data access is treated as aligned and the data address is truncated.

The default configuration on ARM11 and ARM Cortex-A processors is U=1 and A=0, allowing unaligned half/word data access, otherwise having a strict word alignment check. Note that an unaligned multiple word access (e.g. long long) or coprocessor data access always signal Data Abort with Alignment Fault Status Code, even when the A bit is not set. Doubleword accesses must always be four-byte aligned.

Our current compiler, gcc 4.6.3, produces code with unaligned loads by default, being not possible to disable unaligned access. Other compilers are able to produce code with unaligned data access disabled (e.g. CodeSourcery, with option –mno-unaligned-access).

ARM unaligned data access and the Linux kernel

CONFIG_ALIGNMENT_TRAP is a kernel configuration option that makes non-aligned load/store instructions be emulated in software. Recent Linux kernels enable this setting by default. In fact, it is not even possible to disable this option with menuconfig (in order to make this setting visible with menuconfig, its description needs to be updated in arch/arm/Kconfig). On ARMv6 and later, this configuration option does not affect the initialization value of the CR1 register. This setting affects the software emulation for double word unaligned access while single word accesses are taken care of by the hardware directly (given our default A/U bit settings). If we disable CONFIG_ALIGNMENT_TRAP, double word unaligned accesses result on a bus error and program crash.

In the default case, with CONFIG_ALIGNMENT_TRAP enabled, a double word unaligned access results on an unaligned access fix by the kernel. This behavior is configurable through the /proc/cpu/alignment virtual file (the kernel needs to be compiled with CONFIG_DEBUG_KERNEL in order to make it visible). The default case handling of different types is:

  •  int (32-bit). Unaligned data access is handled directly by the hardware with no kernel involvement (/proc/cpu/alignment is not affected).
  • long long (64-bit). ARM processor cores do not support 64-bit unaligned accesses, so this is handled by the Linux kernel (/proc/cpu/alignment shows a DWord increment). The kernel traps an exception and the access is simulated.
  • float (IEEE single precission, 32-bit). ARM processor cores do not support unaligned accesses to VFP hardware instructions. See below.

Unaligned floating point accesses

gcc produces hardware-enabled floating point software when setting –mfloat-abi to softfp or hard, the difference being that the former generates function calls where FP arguments are passed in integer registers (same as soft ABI). An unaligned hardware floating point access results on an exception that the Linux kernel does not trap, therefore our program segfauls. An example of this kind of access can be shown with the following code:

include <stdio.h>

int main(int argc, char* argv[])
{
    char __attribute__ ((aligned(32))) buffer[8] = { 0 };
    float* fval_p;

    fval_p = (float*)&buffer[1];
    *fval_p = 0.1234;

    printf("\nfloat at &buf[1] is %f\n", *fval_p);
    return 0;
}

This produces a Bus error, with /proc/cpu/alignment showing:

User:            1
..
Skipped:       1
..

This means that the kernel was unable to fix the Data Abort exception that took place. This problem can be fixed by compiling our software with floating point emulation (-mfloat-abi=soft), which can be performed by the Linux kernel but is normally more efficiently done by the standard C library. This has the drawback of slower code, which can have a performance impact on software that relies heavily on floating point calculations, like scientific applications or graphics processing software. The definitive solution to this kind of abort and the one we should always aim at involves fixing our software to always access floating point data on 4-byte aligned memory.

Embedded Linux and ARM

Linux usage is growing enormously in embedded systems, thanks to its stability, being open source, the availability of drivers for a huge amount of hardware peripherals and its support for many networking protocols and filesystems. However, Linux exhibits some drawbacks in safety systems, where the code needs to be certified, or hard real-time systems where deadlines are critical.

Nowadays, some Linux installs in embedded systems have been deployed following a top-down approach, where no much care has been taken to remove unused software. This may have security implications, resulting also on code bloating and maintenance problems down the line of a software product lifecycle. I recommend following a bottom-up approach, where we control precisely the software installed in our systems. This helps in the long run with easier maintainability, and better security.

Why is ARM the dominant architecture on embedded systems? ARM follows a fabless model, with licensees competing with each other on SoCs that include an ARM core and a number of extensions. This model, together with the efficiency and elegancy of their design has made them number one, especially in power-conscious designs like mobile phones.

It is becoming very easy to port Linux to new hardware devices on X86, MIPS and ARM platforms. This is a list of popular ARM development platforms with ARM cores containing an MMU and therefore can be leveraged with standard Linux:

Open source tools in the Spanish embedded systems industry

Although I agree that proprietary software should have their own space within the embedded systems industry, there are some open source tools I consider essential when developing embedded software.

Cenatic is the Spanish national reference center on open source technologies. They have published an interesting report on the usage of open source software on the embedded systems industry in Spain. Especially interesting is the comprehensive list of open source tools covering all phases of the development life-cycle, some of which I intend to incorporate into my toolset.

Mentoring SANS SEC401 in Madrid

I will be mentoring a SANS Security 401 class in Madrid, starting the 6th of October. This class gives an overview of many security topics, and provides a great foundation on which to build your security career. I passed the GSEC certification test last year and preparing for it was quite challenging and interesting, with loads of hands-on exercises and very relevant and up-to-date course material.

Most likely the course will be held at HUB Madrid, hope to see you there!

DNIe development workshop in Valladolid

The technical workshop on the Spanish electronic ID card, DNIe, that took place in Valladolid the 1st of June, provided a good introduction into a subject I have just started to look into. The DNIe support website offers loads of resources useful for development of applications built around the usage of a DNIe smartcard.

I am now looking into OpenSC and support for DNIe on Ubuntu Lucid, which seems to be a problem right now as the OpenSC driver supported for usage with the DNIe is provided in compiled form, using a version of OpenSC that lags behind what is available with most modern distributions. I will keep you updated.