Click here to Skip to main content
15,879,239 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
This is complicated so bear with me.

I have a very performance sensitive codepath I'm working on, for low level IO comms with LCD displays, so there's a lot of data I have to move quickly.

I am working with an offering called TFT_eSPI GitHub - Bodmer/TFT_eSPI: Arduino and PlatformIO IDE compatible TFT library optimised for the Raspberry Pi Pico (RP2040), STM32, ESP8266 and ESP32 that supports different driver chips[^] as a reference implementation.

A big problem with it that templates would solve is that you cannot drive multiple displays at the same time with his code, since it's singleton.

It's complicated and I want to port it since we both license MIT. The primary issue is that it is a class backed by all static members and augmented with a ton of preprocessor macros like this:

Forgive the long code, it's important to understand the full scope of the issue

C++
#if defined (TFT_PARALLEL_8_BIT)
  // Create a bit set lookup table for data bus - wastes 1kbyte of RAM but speeds things up dramatically
  // can then use e.g. GPIO.out_w1ts = set_mask(0xFF); to set data bus to 0xFF
  #define CONSTRUCTOR_INIT_TFT_DATA_BUS            \
  for (int32_t c = 0; c<256; c++)                  \
  {                                                \
    xset_mask[c] = 0;                              \
    if ( c & 0x01 ) xset_mask[c] |= (1 << TFT_D0); \
    if ( c & 0x02 ) xset_mask[c] |= (1 << TFT_D1); \
    if ( c & 0x04 ) xset_mask[c] |= (1 << TFT_D2); \
    if ( c & 0x08 ) xset_mask[c] |= (1 << TFT_D3); \
    if ( c & 0x10 ) xset_mask[c] |= (1 << TFT_D4); \
    if ( c & 0x20 ) xset_mask[c] |= (1 << TFT_D5); \
    if ( c & 0x40 ) xset_mask[c] |= (1 << TFT_D6); \
    if ( c & 0x80 ) xset_mask[c] |= (1 << TFT_D7); \
  }                                                \

  // Mask for the 8 data bits to set pin directions
  #define dir_mask ((1 << TFT_D0) | (1 << TFT_D1) | (1 << TFT_D2) | (1 << TFT_D3) | (1 << TFT_D4) | (1 << TFT_D5) | (1 << TFT_D6) | (1 << TFT_D7))

  #if (TFT_WR >= 32)
    // Data bits and the write line are cleared sequentially
    #define clr_mask (dir_mask); WR_L
  #elif (TFT_WR >= 0)
    // Data bits and the write line are cleared to 0 in one step (1.25x faster)
    #define clr_mask (dir_mask | (1 << TFT_WR))
  #else
    #define clr_mask
  #endif

  // A lookup table is used to set the different bit patterns, this uses 1kByte of RAM
  #define set_mask(C) xset_mask[C] // 63fps Sprite rendering test 33% faster, graphicstest only 1.8% faster than shifting in real time

  // Real-time shifting alternative to above to save 1KByte RAM, 47 fps Sprite rendering test
  /*#define set_mask(C) (((C)&0x80)>>7)<<TFT_D7 | (((C)&0x40)>>6)<<TFT_D6 | (((C)&0x20)>>5)<<TFT_D5 | (((C)&0x10)>>4)<<TFT_D4 | \
                        (((C)&0x08)>>3)<<TFT_D3 | (((C)&0x04)>>2)<<TFT_D2 | (((C)&0x02)>>1)<<TFT_D1 | (((C)&0x01)>>0)<<TFT_D0
  //*/

  // Write 8 bits to TFT
  #define tft_Write_8(C)  GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t)(C)); WR_H

  #if defined (SSD1963_DRIVER)

    // Write 18 bit color to TFT
    #define tft_Write_16(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) (((C) & 0xF800)>> 8)); WR_H; \
                            GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) (((C) & 0x07E0)>> 3)); WR_H; \
                            GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) (((C) & 0x001F)<< 3)); WR_H

    // 18 bit color write with swapped bytes
    #define tft_Write_16S(C) uint16_t Cswap = ((C) >>8 | (C) << 8); tft_Write_16(Cswap)

  #else

    #ifdef PSEUDO_16_BIT
      // One write strobe for both bytes
      #define tft_Write_16(C)  GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H
      #define tft_Write_16S(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H
    #else
      // Write 16 bits to TFT
      #define tft_Write_16(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H; \
                              GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H

      // 16 bit write with swapped bytes
      #define tft_Write_16S(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H; \
                               GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H
    #endif

  #endif

  // Write 32 bits to TFT
  #define tft_Write_32(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 24)); WR_H; \
                          GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 16)); WR_H; \
                          GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >>  8)); WR_H; \
                          GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >>  0)); WR_H

  // Write two concatenated 16 bit values to TFT
  #define tft_Write_32C(C,D) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H; \
                             GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H; \
                             GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((D) >> 8)); WR_H; \
                             GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((D) >> 0)); WR_H

  // Write 16 bit value twice to TFT - used by drawPixel()
  #define tft_Write_32D(C) GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H; \
                           GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H; \
                           GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 8)); WR_H; \
                           GPIO.out_w1tc = clr_mask; GPIO.out_w1ts = set_mask((uint8_t) ((C) >> 0)); WR_H

   // Read pin
  #ifdef TFT_RD
    #if (TFT_RD >= 32)
      #define RD_L GPIO.out1_w1tc.val = (1 << (TFT_RD - 32))
      #define RD_H GPIO.out1_w1ts.val = (1 << (TFT_RD - 32))
    #elif (TFT_RD >= 0)
      #define RD_L GPIO.out_w1tc = (1 << TFT_RD)
      //#define RD_L digitalWrite(TFT_WR, LOW)
      #define RD_H GPIO.out_w1ts = (1 << TFT_RD)
      //#define RD_H digitalWrite(TFT_WR, HIGH)
    #else
      #define RD_L
      #define RD_H
    #endif
  #endif

////////////////////////////////////////////////////////////////////////////////////////
// Macros to write commands/pixel colour data to a SPI ILI948x TFT
////////////////////////////////////////////////////////////////////////////////////////
#elif  defined (SPI_18BIT_DRIVER) // SPI 18 bit colour

  // Write 8 bits to TFT
  #define tft_Write_8(C)   spi.transfer(C)

  // Convert 16 bit colour to 18 bit and write in 3 bytes
  #define tft_Write_16(C)  spi.transfer(((C) & 0xF800)>>8); \
                           spi.transfer(((C) & 0x07E0)>>3); \
                           spi.transfer(((C) & 0x001F)<<3)

  // Future option for transfer without wait
  #define tft_Write_16N(C) tft_Write_16(C)

  // Convert swapped byte 16 bit colour to 18 bit and write in 3 bytes
  #define tft_Write_16S(C) spi.transfer((C) & 0xF8); \
                           spi.transfer(((C) & 0xE000)>>11 | ((C) & 0x07)<<5); \
                           spi.transfer(((C) & 0x1F00)>>5)

  // Write 32 bits to TFT
  #define tft_Write_32(C)  spi.write32(C)

  // Write two concatenated 16 bit values to TFT
  #define tft_Write_32C(C,D) spi.write32((C)<<16 | (D))

  // Write 16 bit value twice to TFT
  #define tft_Write_32D(C)  spi.write32((C)<<16 | (C))

////////////////////////////////////////////////////////////////////////////////////////
// Macros to write commands/pixel colour data to an Raspberry Pi TFT
////////////////////////////////////////////////////////////////////////////////////////
#elif  defined (RPI_DISPLAY_TYPE)

  // ESP32 low level SPI writes for 8, 16 and 32 bit values
  // to avoid the function call overhead
  #define TFT_WRITE_BITS(D, B) \
  WRITE_PERI_REG(SPI_MOSI_DLEN_REG(SPI_PORT), B-1); \
  WRITE_PERI_REG(SPI_W0_REG(SPI_PORT), D); \
  SET_PERI_REG_MASK(SPI_CMD_REG(SPI_PORT), SPI_USR); \
  while (READ_PERI_REG(SPI_CMD_REG(SPI_PORT))&SPI_USR);

  // Write 8 bits
  #define tft_Write_8(C) TFT_WRITE_BITS((C)<<8, 16)

  // Write 16 bits with corrected endianess for 16 bit colours
  #define tft_Write_16(C) TFT_WRITE_BITS((C)<<8 | (C)>>8, 16)

  // Future option for transfer without wait
  #define tft_Write_16N(C) tft_Write_16(C)

  // Write 16 bits
  #define tft_Write_16S(C) TFT_WRITE_BITS(C, 16)

  // Write 32 bits
  #define tft_Write_32(C) TFT_WRITE_BITS(C, 32)

  // Write two address coordinates
  #define tft_Write_32C(C,D)  TFT_WRITE_BITS((C)<<24 | (C), 32); \
                              TFT_WRITE_BITS((D)<<24 | (D), 32)

  // Write same value twice
  #define tft_Write_32D(C) tft_Write_32C(C,C)

////////////////////////////////////////////////////////////////////////////////////////
// Macros for all other SPI displays
////////////////////////////////////////////////////////////////////////////////////////
#else
// Replacement slimmer macros
  #define TFT_WRITE_BITS(D, B) *_spi_mosi_dlen = B-1;    \
                               *_spi_w = D;             \
                               *_spi_cmd = SPI_USR;      \
                        while (*_spi_cmd & SPI_USR);

  // Write 8 bits
  #define tft_Write_8(C) TFT_WRITE_BITS(C, 8)

  // Write 16 bits with corrected endianess for 16 bit colours
  #define tft_Write_16(C) TFT_WRITE_BITS((C)<<8 | (C)>>8, 16)

  // Future option for transfer without wait
  #define tft_Write_16N(C) *_spi_mosi_dlen = 16-1;       \
                           *_spi_w = ((C)<<8 | (C)>>8); \
                           *_spi_cmd = SPI_USR;

  // Write 16 bits
  #define tft_Write_16S(C) TFT_WRITE_BITS(C, 16)

  // Write 32 bits
  #define tft_Write_32(C) TFT_WRITE_BITS(C, 32)

  // Write two address coordinates
  #define tft_Write_32C(C,D)  TFT_WRITE_BITS((uint16_t)((D)<<8 | (D)>>8)<<16 | (uint16_t)((C)<<8 | (C)>>8), 32)

  // Write same value twice
  #define tft_Write_32D(C) TFT_WRITE_BITS((uint16_t)((C)<<8 | (C)>>8)<<16 | (uint16_t)((C)<<8 | (C)>>8), 32)

...
#endif


Essentially I want to templatize this but the problem is pin assignments are currently like:

C++
// the data control pin is GPIO #5
#define TFT_CS 5


And these are used in all the macros.

I want to replace them as template parameters, and resolve the #ifdefs above using C++ ifs/constexpr to generate the equivalent string of machine code on the target platform

Here's the issue though: I need a lot of the above preprocessor augmentations for the actual implementation, but I'd like them in the C++ source file, rather than the header, for what I hope are obvious reasons involving namespace pollution and also to limit the code smell to a CPP implementation file.

However, when I go to make my pin assignments as template arguments I do like:

C++
template<int8_t SpiHost,int8_t PinCS, int8_t PinDC, int8_t PinMiso, int8_t PinMosi> spi_bus {
   constexpr static const int8_t spi_host = SpiHost;
   constexpr static const int8_t pin_cs = PinCS;
   constexpr static const int8_t pin_dc = PinDC;
   constexpr static const int8_t pin_miso = PinMiso;
   constexpr static const int8_t pin_mosi = PinMosi;
};

that's in the header.

I need to put the code for this template in a CPP file such that I can limit the preprocessor macros to that file.

Is there *any* way to access these pin assignments in a cpp file by hook or by crook?

I don't care if it fits the question or not. ANYTHING that will get me those pin assignments in a CPP -- essentially any way I can tie a generic template (not an instantiation of it) to a CPP file at all. Or if not that, any way to hide those preprocessor macros or localize them to a CPP file you can think of, while accessing those pin constexpr values (even if they're transfered somehow - BUT THEY MUST BE compile time - not simply static for performance reasons)

I've thought about using #undef but there are several problems with that kludge, including the fact, that I need different preprocessor macros to support different MCUs so I'd have to put undefs in each processor specific include.

Now that you understand the full scope of the problem, any solution that would get those pins by template in a CPP without instantiating the template would solve my problem, even if it "fakes" it somehow. I'm just at a loss as to how to solve this.

I need ideas. I have to target C++14 or earlier. Specifically C++14 for reasons. But the solution *can* be GCC specific.

What I have tried:

I've tried a bunch of stuff, but nothing compiles.
Posted
Comments
k5054 8-Dec-21 11:33am    
Have you read (and understood) this: Template Instantiation (Using the GNU Compiler Collection (GCC))[^] I'm wondering if extern templates do what you need. If you are using G++-8 or earlier, there's also a -frepo option that might help. You could drill down from here GCC online documentation- GNU Project[^] to get to the closest G++ version to what you're using. All this supposes that the g++ provided by the device manufacturer has not omitted this from their toolchain.
honey the codewitch 8-Dec-21 13:59pm    
I'll look into that, thanks.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900