Click here to Skip to main content
15,887,175 members
Articles / Internet of Things

Getting Crafty With Graphics With Just Kilobytes of Flash and Almost No RAM

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
24 Feb 2024MIT6 min read 1.8K   8   2  
Got an embedded or IoT widget with a screen but no real memory or flash space to speak of? Read this.
On the tiny sort of processors like AVR prefers to produce you're often left with miserly amounts of SRAM and flash. This can often work, but once you add a screen, things quickly get out of control. Graphics is somewhat RAM and flash hungry. Here's a way to make it less so.

Image 1

Introduction

I'm working on a "Tamagotchi"-like game with some friends, which we plan on implanting into a PC keyboard that has little monochrome OLED screens on it. Specifically, the keyboard is a Boardsource Lulu. We want it to run alongside the existing firmware so that the keyboard remains fully functional. There are two varieties of the Lulu, and we happen to be working with the AVR model, which has 2.5KB of RAM, and 32KB of flash memory, of which 18KB or so of the flash is used by existing code. I'm not as sure about the RAM usage of the existing firmware, but where we're going we won't really need any.

I don't expect you to have one of these keyboards. That would just be mean, as they are pretty expensive, niche interest devices. Instead, I've crafted a PlatformIO** project on an ESP32-S3 or other Arduino compliant device. You'll need something like that, and an SSD1306 screen wired up to it over I2C. On the ESP32-S3, I used SDA of 16, and SCL of 17.

** Sorry folks that are still using the Arduino IDE. I gave up on it for being too limited, and I strongly recommend installing Platform IO even if you don't use it all the time as it's much more realistic for projects with multiple source files, or that need to support multiple devices.

The code is by default configured to drive a 128x32 "bandaid" form factor screen, but if you have the 128x64 model, change the SSD1306_HEIGHT define in the code to reflect that. You'll also need to generate larger images using the cigen tool, since the current ones are 128x32.

Background

The first thing to keep in mind is the framebuffer format on these little displays is weird, and that's putting it charitably. It's a monochrome display so there are 8 pixels per byte, but the pixels are packed vertically rather than horizontally into those bytes. The bytes however, are arranged traditionally left to right, top to bottom, so (0,0)-(0-7) is byte 1 and (1,0)-(1-7) is byte two.

Also because it's monochrome, and because it's write only, you either need to keep a 512-1024 byte framebuffer (depending on the resolution of your display hardware) or you can just stream a framebuffer off of flash direct over I2C to the display, requiring no RAM, but limiting you to static images.

For our project, we only have 2.5KB of RAM in total, and it's shared with other firmware components, so we're taking the latter approach. We won't need any sort of dynamic rendering, and we don't even have the flash space to store that kind of logic anyway.

To save more flash space, we can compress the images using run length encoding. This code actually allows either no compression or 3 styles of RLE so it can choose which yields the smallest images when the images are being generated. Run length encoding is simple, lightweight, and for this type of data, it's typically highly effective.

What we need is an application that will take images and generate RLE compressed uint8_t[] arrays, and then some code to spit those to a display.

The Cigen Application

Enter cigen. This is a little C# command line application that takes a series of images and generates RLE compressed C array content containing a framebuffer for each passed in image.

Terminal
cigen v1.0 Copyright c 2024 by honey the codewitch

Usage: cigen {<infile1> [<infileN>]} [/output <outfile>] [/threshold <threshold>]

  <infile>    The input files
  <outfile>   The output file - defaults to <stdout>
  <threshold> The luminosity threshold (0-255, defaults to 127)
- or -
  /help      Displays this screen and exits

It's pretty simple to use. You pass it a series of images, each of the same size as your LCD panel (ours is 128x32 in this demo), an optional <output> file, and an optional <threshold> value. The threshold is just the luminosity of the pixel at which point the application will consider the pixel "white" instead of "black".

When you run it with the Debug arguments provided with the project, you'll get the following output:

C++
#ifndef OUTPUT_H
#define OUTPUT_H
#include <stdint.h>
#include "progmem.h"
const uint8_t output_frame_1[] PROGMEM = {
    0xff, 131, 0x3f, 0x9f, 0xcf, 0xef,
    0xe7, 0xe7, 0xf3, 0xf3, 0xfb, 0xfb,
    0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9,
    0xf9, 0xfb, 0xfb, 0xf3, 0xf3, 0xe7,
    0xe7, 0xef, 0xcf, 0x9f, 0x3f, 0x7f,
    0xff, 98, 0x00, 2, 0xff, 6, 0x83,
    0x01, 0x83, 0xc7, 0xff, 7, 0x83,
    0x01, 0x83, 0xc7, 0xff, 7, 0x00, 2,
    0xff, 97, 0xfc, 0xf9, 0xf3, 0xe7,
    0xef, 0xcf, 0xcf, 0x9f, 0x9f, 0xbf,
    0xbf, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f,
    0x3f, 0x3f, 0xbf, 0xbf, 0x9f, 0x9f,
    0xcf, 0xcf, 0xef, 0xe7, 0xf3, 0xf9,
    0xfc, 0xfe, 0xff, 96
};
#define OUTPUT_FRAME_1_COMPRESSION 3

// [Compressed to 16.40625% of original. Len = 84 vs 512]

const uint8_t output_frame_2[] PROGMEM = {
    0xff, 166, 0x3f, 0x3f, 0xbf, 0x9f,
    0x9f, 0xdf, 0xdf, 0xdf, 0xcf, 0xcf,
    0xcf, 0xcf, 0xcf, 0xdf, 0xdf, 0xdf,
    0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x7f,
    0xff, 101, 0x03, 0xf1, 0xfc, 0xfe,
    0xfe, 0xff, 3, 0x0f, 0x07, 0x0f,
    0x9f, 0xff, 7, 0x0f, 0x07, 0x0f,
    0x9f, 0xff, 4, 0xfe, 0xfc, 0xf1,
    0x03, 0x0f, 0xff, 97, 0xf8, 0xf3,
    0xf7, 0xe7, 0xcf, 0xcf, 0xdf, 0x9f,
    0x9e, 0xbf, 0xbf, 0xbf, 0x3f, 0x3f,
    0x3f, 0x3f, 0x3f, 0xbf, 0xbf, 0xbe,
    0x9f, 0x9f, 0xdf, 0xcf, 0xcf, 0xe7,
    0xf7, 0xf3, 0xf8, 0xfc, 0xff, 64
};
#define OUTPUT_FRAME_2_COMPRESSION 3

// [Compressed to 16.40625% of original. Len = 84 vs 512]

const uint8_t output_frame_3[] PROGMEM = {
    0xff, 73, 0x7f, 0x3f, 0xbf, 0xbf,
    0x9f, 0x9f, 0x9f, 0x9f, 0x9f, 0x9f,
    0x9f, 0xbf, 0xbf, 0x3f, 0x7f, 0x7f,
    0xff, 105, 0x8f, 0xe7, 0xf3, 0xf9,
    0xfc, 0xfe, 0xfe, 0xff, 2, 0xff, 11,
    0xff, 3, 0xfe, 0xfc, 0xf9, 0xf3,
    0xe7, 0x8f, 0x1f, 0xff, 97, 0x00, 2,
    0xff, 6, 0xe0, 0xc0, 0xe0, 0xf1,
    0xff, 7, 0xe0, 0xc0, 0xe0, 0xf1,
    0xff, 7, 0x00, 2, 0xff, 98, 0xfc,
    0xf9, 0xf3, 0xe7, 0xef, 0xcf, 0xdf,
    0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x3f,
    0x3f, 0x3f, 0x3f, 0x3f, 0xbf, 0x9f,
    0x9f, 0xdf, 0xcf, 0xef, 0xe7, 0xf3,
    0xf9, 0xfc, 0xfe, 0xff, 33
};
#define OUTPUT_FRAME_3_COMPRESSION 3

// [Compressed to 17.96875% of original. Len = 92 vs 512]

const uint8_t output_frame_4[] PROGMEM = {
    0xff, 239, 0x7f, 0x7f, 0x7f, 0x7f,
    0xff, 110, 0x0f, 0xe7, 0xe7, 0xf3,
    0xf9, 0xf9, 0xfd, 0x3c, 0x1c, 0x1e,
    0x1e, 0x3e, 0xfe, 0xfe, 0xfe, 0xfe,
    0xfe, 0xfe, 0x3e, 0x1e, 0x1e, 0x1e,
    0x3c, 0xfc, 0xfd, 0xf9, 0xf9, 0xf3,
    0xe7, 0xe7, 0x0f, 0xff, 97, 0xf8,
    0xf3, 0xf3, 0xe7, 0xcf, 0xcf, 0xdf,
    0x9e, 0x9c, 0xbc, 0xbc, 0xbe, 0xbf,
    0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0xbe,
    0xbc, 0xbc, 0xbc, 0x9e, 0x9f, 0xdf,
    0xcf, 0xcf, 0xe7, 0xf3, 0xf3, 0xf8
};
#define OUTPUT_FRAME_4_COMPRESSION 3

// [Compressed to 14.0625% of original. Len = 72 vs 512]

const uint8_t* output_images[] = {
    output_frame_1,
    output_frame_2,
    output_frame_3,
    output_frame_4
};
const int output_images_compression[] = {
    OUTPUT_FRAME_1_COMPRESSION,
    OUTPUT_FRAME_2_COMPRESSION,
    OUTPUT_FRAME_3_COMPRESSION,
    OUTPUT_FRAME_4_COMPRESSION
};
#endif // OUTPUT_H

This header is geared for QMK, but you can just copy what you need of the code, such as the arrays into your own program.

The meat of this application's functionality is in Program.cs in the Run() method. In broad strokes, it loads all of the inputs into System.Drawing.Bitmap instances (which is why this is a .NET Framework app, since it relies on GDI+ which is "Windows only" although I think? Mono will run it on Linux too).

Once it has those bitmaps, it creates a byte array corresponding to each bitmap's dimensions, and packs the bitmap data as monochrome pixels. It does this in the weird format that the SSD1306 uses so we don't have to do any post translation. To convert to monochrome, each pixel has its luminosity computed, and then compared against a Threshold value (typically 127).

Now, with monoized bitmap data in hand, the app tries to compress the data using one of 3 different RLE variants, picking the one that yields the smallest size, or leaving it uncompressed if all of the compression methods yielded larger than original sizes. In one variant, both black and white runs will be encoded. In another, only white runs. Finally, only black runs.

Once this data is crunched, producing the actual header text is trivial.

The Arduino Prototype Firmware

As I said, I won't force QMK and a Lulu keyboard on you. Instead, we're using an Arduino compliant dev kit to protype this, with the same screen attached, but to an ESP32-S3 instead of the Lulu's AVR Atmega32U4. The main thing to bear in mind if working this way is that QMK is C and Arduino is C++ so code accordingly so that your code can be ported to your final environment. You can use some other Arduino board if you don't have an ESP32-S3. I have so many ESP32-S3s laying around that it made sense to use this one. You'll just have to change the board setting in platformio.ini to match your hardware.

This code is barebones. The interest was small size, not happy abstractions. I avoided all but the most utilitarian abstractions because I didn't want to waste flash space on them.

C++
#include <Arduino.h>
#include <Wire.h>
#ifdef ESP32
#define I2C_SDA 16
#define I2C_SCL 17
#endif

#define SSD1306_HEIGHT 32
const uint8_t output_frame_1[] PROGMEM = {
    0xff, 131, 0x3f, 0x9f, 0xcf, 0xef,
    0xe7, 0xe7, 0xf3, 0xf3, 0xfb, 0xfb,
    0xf9, 0xf9, 0xf9, 0xf9, 0xf9, 0xf9,
    0xf9, 0xfb, 0xfb, 0xf3, 0xf3, 0xe7,
    0xe7, 0xef, 0xcf, 0x9f, 0x3f, 0x7f,
    0xff, 98, 0x00, 2, 0xff, 6, 0x83,
    0x01, 0x83, 0xc7, 0xff, 7, 0x83,
    0x01, 0x83, 0xc7, 0xff, 7, 0x00, 2,
    0xff, 97, 0xfc, 0xf9, 0xf3, 0xe7,
    0xef, 0xcf, 0xcf, 0x9f, 0x9f, 0xbf,
    0xbf, 0x3f, 0x3f, 0x3f, 0x3f, 0x3f,
    0x3f, 0x3f, 0xbf, 0xbf, 0x9f, 0x9f,
    0xcf, 0xcf, 0xef, 0xe7, 0xf3, 0xf9,
    0xfc, 0xfe, 0xff, 96
};
#define OUTPUT_FRAME_1_COMPRESSION 3

// [Compressed to 16.40625% of original. Len = 84 vs 512]

const uint8_t output_frame_2[] PROGMEM = {
    0xff, 166, 0x3f, 0x3f, 0xbf, 0x9f,
    0x9f, 0xdf, 0xdf, 0xdf, 0xcf, 0xcf,
    0xcf, 0xcf, 0xcf, 0xdf, 0xdf, 0xdf,
    0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x7f,
    0xff, 101, 0x03, 0xf1, 0xfc, 0xfe,
    0xfe, 0xff, 3, 0x0f, 0x07, 0x0f,
    0x9f, 0xff, 7, 0x0f, 0x07, 0x0f,
    0x9f, 0xff, 4, 0xfe, 0xfc, 0xf1,
    0x03, 0x0f, 0xff, 97, 0xf8, 0xf3,
    0xf7, 0xe7, 0xcf, 0xcf, 0xdf, 0x9f,
    0x9e, 0xbf, 0xbf, 0xbf, 0x3f, 0x3f,
    0x3f, 0x3f, 0x3f, 0xbf, 0xbf, 0xbe,
    0x9f, 0x9f, 0xdf, 0xcf, 0xcf, 0xe7,
    0xf7, 0xf3, 0xf8, 0xfc, 0xff, 64
};
#define OUTPUT_FRAME_2_COMPRESSION 3

// [Compressed to 16.40625% of original. Len = 84 vs 512]

const uint8_t output_frame_3[] PROGMEM = {
    0xff, 73, 0x7f, 0x3f, 0xbf, 0xbf,
    0x9f, 0x9f, 0x9f, 0x9f, 0x9f, 0x9f,
    0x9f, 0xbf, 0xbf, 0x3f, 0x7f, 0x7f,
    0xff, 105, 0x8f, 0xe7, 0xf3, 0xf9,
    0xfc, 0xfe, 0xfe, 0xff, 2, 0xff, 11,
    0xff, 3, 0xfe, 0xfc, 0xf9, 0xf3,
    0xe7, 0x8f, 0x1f, 0xff, 97, 0x00, 2,
    0xff, 6, 0xe0, 0xc0, 0xe0, 0xf1,
    0xff, 7, 0xe0, 0xc0, 0xe0, 0xf1,
    0xff, 7, 0x00, 2, 0xff, 98, 0xfc,
    0xf9, 0xf3, 0xe7, 0xef, 0xcf, 0xdf,
    0x9f, 0x9f, 0xbf, 0x3f, 0x3f, 0x3f,
    0x3f, 0x3f, 0x3f, 0x3f, 0xbf, 0x9f,
    0x9f, 0xdf, 0xcf, 0xef, 0xe7, 0xf3,
    0xf9, 0xfc, 0xfe, 0xff, 33
};
#define OUTPUT_FRAME_3_COMPRESSION 3

// [Compressed to 17.96875% of original. Len = 92 vs 512]

const uint8_t output_frame_4[] PROGMEM = {
    0xff, 239, 0x7f, 0x7f, 0x7f, 0x7f,
    0xff, 110, 0x0f, 0xe7, 0xe7, 0xf3,
    0xf9, 0xf9, 0xfd, 0x3c, 0x1c, 0x1e,
    0x1e, 0x3e, 0xfe, 0xfe, 0xfe, 0xfe,
    0xfe, 0xfe, 0x3e, 0x1e, 0x1e, 0x1e,
    0x3c, 0xfc, 0xfd, 0xf9, 0xf9, 0xf3,
    0xe7, 0xe7, 0x0f, 0xff, 97, 0xf8,
    0xf3, 0xf3, 0xe7, 0xcf, 0xcf, 0xdf,
    0x9e, 0x9c, 0xbc, 0xbc, 0xbe, 0xbf,
    0x3f, 0x3f, 0x3f, 0x3f, 0x3f, 0xbe,
    0xbc, 0xbc, 0xbc, 0x9e, 0x9f, 0xdf,
    0xcf, 0xcf, 0xe7, 0xf3, 0xf3, 0xf8
};
#define OUTPUT_FRAME_4_COMPRESSION 3

// [Compressed to 14.0625% of original. Len = 72 vs 512]

const uint8_t* output_images[] = {
    output_frame_1,
    output_frame_2,
    output_frame_3,
    output_frame_4
};
const int output_images_compression[] = {
    OUTPUT_FRAME_1_COMPRESSION,
    OUTPUT_FRAME_2_COMPRESSION,
    OUTPUT_FRAME_3_COMPRESSION,
    OUTPUT_FRAME_4_COMPRESSION
};

#if SSD1306_HEIGHT == 32
const uint8_t ssd1306_init[] PROGMEM = {
    17,
    0xAE, 0,
    0xA8, 1, 0x1F,
    0x20, 1, 0x00,
    0x40, 0,
    0xD3, 1, 0x00,
    0xA1, 0,
    0xC8, 0,
    0xDA, 1, 0x02,
    0x81, 1, 0x7F,
    0xA4, 0,
    0xA6, 0,
    0xD5, 1, 0x80,
    0xD9, 1, 0xc2,
    0xDB, 1, 0x20,
    0x8D, 1, 0x14,
    0x2E, 0,
    0xAF, 0};
#endif
#if SSD1306_HEIGHT == 64
const uint8_t ssd1306_init[] PROGMEM = {
    17,
    0xAE, 0,
    0xA8, 1, 0x3F,
    0x20, 1, 0x00,
    0x40, 0,
    0xD3, 1, 0x00,
    0xA1, 0,
    0xC8, 0,
    0xDA, 1, 0x12,
    0x81, 1, 0x7F,
    0xA4, 0,
    0xA6, 0,
    0xD5, 1, 0x80,
    0xD9, 1, 0xc2,
    0xDB, 1, 0x20,
    0x8D, 1, 0x14,
    0x2E, 0,
    0xAF, 0};
#endif

void ssd1306_send_screen(int index)
{
    const uint8_t *data = output_images[index];
    int comp = output_images_compression[index];
    Wire.beginTransmission(0x3C);

    Wire.write(0x00);
    Wire.write(0x22);
    Wire.write(0x00);
    Wire.write(0xFF);

    Wire.write(0x00);
    Wire.write(0x21);
    Wire.write(0x00);
    Wire.write(0x7F);

    Wire.endTransmission();

    size_t rem = I2C_BUFFER_LENGTH - 1;
    int len = 0;
    Wire.beginTransmission(0x3C);
    Wire.write(0x40);
    while (len < (SSD1306_HEIGHT * 16))
    {
        uint8_t b = pgm_read_byte(data++);
        uint8_t count = 1;
        if (((comp == 1 || comp == 3) && b == 0) ||
            ((comp == 2 || comp == 3) && b == 255))
        {
            count = pgm_read_byte(data++);
        }
        while (count--)
        {
            Wire.write(b);
            ++len;
            --rem;
            if (rem == 0)
            {
                rem = I2C_BUFFER_LENGTH - 1;
                Wire.endTransmission();
                Wire.beginTransmission(0x3C);
                Wire.write(0x40);
            }
        }
    }
    Wire.endTransmission();
}
void setup()
{
#ifdef ESP32
    Wire.begin(I2C_SDA, I2C_SCL, 800 * 1000);
#else
    Wire.begin();
#endif
    Serial.begin(115200);
    Wire.beginTransmission(0x3C);
    const uint8_t *init = ssd1306_init;
    uint8_t len = pgm_read_byte(init);
    const uint8_t *p = init + 1;
    while (len--)
    {
        Wire.write(0x00);
        Wire.write(pgm_read_byte(p++));
        uint8_t arglen = pgm_read_byte(p++);
        while (arglen--)
            Wire.write(pgm_read_byte(p++));
    }
    Wire.endTransmission();
}

void loop()
{
    static int index = 0;
    ssd1306_send_screen(index++);
    delay(100);
    if (index == 4)
    {
        index = 0;
    }
}

What's of primary interest here is ssd1306_send_screen(). This routine takes the contents of our images, decompressing them as necessary, and sends them straight to the screen. It doesn't really take any SRAM to operate other than that use for the stack frame since we decompress everything straight to the display. As you can see, the decompression is stupid simple, allowing us to support all methods with a single if() test. The more complicated bit is actually making sure we don't overrun the I2C transmission buffer in Arduino. If we're about to, we simply start a new transmission.

On the ESP32-S3 this code, including the 4 embedded images takes 3.5KB of flash according to the build statistics. I arrived at this figure by comparing the build sizes for an empty project, versus an empty project with this code.

(empty project)
RAM:   [=         ]   5.8% (used 18880 bytes from 327680 bytes)
Flash: [=         ]   8.2% (used 274181 bytes from 3342336 bytes)

(project with code)
RAM:   [=         ]   5.8% (used 18904 bytes from 327680 bytes)
Flash: [=         ]   8.3% (used 277765 bytes from 3342336 bytes)

History

  • 24th February, 2024 - Initial submission

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
United States United States
Just a shiny lil monster. Casts spells in C++. Mostly harmless.

Comments and Discussions

 
-- There are no messages in this forum --