|
|
Is the processor just that slow, or do you think that profiling your code might uncover things that could be sped up to reach an acceptable level?
|
|
|
|
|
It's difficult to profile because it's cross compiled to run on an IoT device, where I don't have access to profiling.
I *can* do it but it involves setting up a lot of test code in order to use GFX from my PC.
Besides, I know where it's taking the time, and there's not much I can do about it.
The source machine is running at between 160MHz and 240MHz and the operation cannot be readily parallelized without using RAM i don't have.
Algorithmically, I've optimized it about as much as I can. The mixing plan for finding two dither colors is like O(log N) which isn't that bad. The problem is I have to do it for 200x200 pixels. And for my color display I have to do it twice per frame because it's a 3 color display organized in two monochrome planes - one white, and one red, and I don't have the RAM to store the frame between rendering the two planes.
On a superscalar PC running at GHz speeds this is no problem, and I *thought* it shouldn't be a problem even for this machine, but I guess I dramatically underestimated the time it takes this algorithm to run.
I actually have two algos. The first one is similar to the one used by photoshop (but different enough that it avoids patent infringement)
The second one is a much faster, simpler algorithm.
Color dithers are apparently not as easy as I thought.
Real programmers use butterflies
|
|
|
|
|
I knew nothing of dithering, so I just read about it. Interesting. Eventually you'll be a numerical analysis weenie.
It also means that what I was going to ask, namely whether you could cache frequently used results, seems to make no sense.
|
|
|
|
|
Yeah. My current plan is to stick with nearest matching color only for color displays. I'm going to try dithering once again on black and white since those algorithms run much faster.
Edit: Black and white dithering is fast fast fast, so I've added support for it to b&w e-paper displays. I don't do it for monochrome displays yet, and I'm not sure I will since they can update in real time and dithering interferes with that, but since you can turn it off maybe I'll add support for it. I only do nearest color matching for color e-ink displays. The color dithering as I said, was just too expensive.
Real programmers use butterflies
modified 18-Jun-21 1:35am.
|
|
|
|
|
Best to begin without morals.
|
|
|
|
|
Well.. look at it this way, the code you took 1 day to write is almost as good as this PhD code the guy spent a few years perfecting!
|
|
|
|
|
 I guess you do a bit of rounding then? If you do, it may be that it is possible to improve that by a lot of bit-fiddling …
On x64 I've seen significant performance improvement, between 20% and 300%, for the following:
isnan, isinf, signbit, frexp, min, max, trunc, round, clamp and lerp
I have no idea about how well this will work out for an ARM cpu, but here is the core of my implementation (Sorry about the formatting, paste and encode as HTML doesn't work well for C++ code anymore ):
template <typename T>
struct FractionWidth;
template <>
struct FractionWidth<float>
{
static constexpr UInt32 value = 23;
};
template <>
struct FractionWidth<double>
{
static constexpr UInt32 value = 52;
};
template <typename T>
struct ExponentWidth;
template <>
struct ExponentWidth<float>
{
static constexpr UInt32 value = 8;
};
template <>
struct ExponentWidth<double>
{
static constexpr UInt32 value = 11;
};
template <typename T>
struct ExponenBias;
template <>
struct ExponenBias<float>
{
static constexpr UInt32 value = _FBIAS;
};
template <>
struct ExponenBias<double>
{
static constexpr UInt32 value = _DBIAS;
};
template <typename T>
struct InfinityUnsignedValue;
template <>
struct InfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0X7F800000UL;
};
template <>
struct InfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000000ULL;
};
template <typename T>
struct NegativeInfinityUnsignedValue;
template <>
struct NegativeInfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0xFF800000UL;
};
template <>
struct NegativeInfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0xFFF0000000000000ULL;
};
template <typename T>
struct QuietNaNUnsignedValue;
template <>
struct QuietNaNUnsignedValue<float>
{
static constexpr UInt32 value = 0XFFC00001UL;
};
template <>
struct QuietNaNUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000001ULL;
};
#pragma pack(push,1)
template<typename T>
struct FloatingPoint
{
using ValueType = std::remove_cvref_t<T>;
using UIntType = MakeUnsigned<ValueType>;
static constexpr Int32 FractionWidth = static_cast<Int32>( Internal::FractionWidth<ValueType>::value );
static constexpr Int32 ExponentWidth = static_cast<Int32>( Internal::ExponentWidth<ValueType>::value );
static constexpr Int32 ExponentBias = ( 1 << ( ExponentWidth - 1 ) ) - 1;
static constexpr Int32 MaxExponentValue = ( 1 << ExponentWidth ) - 1;
static constexpr UIntType MaxExponent = static_cast<UIntType>( MaxExponentValue ) << FractionWidth;
static constexpr UIntType MinSubnormal = UIntType( 1 );
static constexpr UIntType MaxSubnormal = ( UIntType( 1 ) << FractionWidth ) - 1;
static constexpr UIntType MinNormal = ( UIntType( 1 ) << FractionWidth );
static constexpr UIntType MaxNormal = ( ( UIntType( MaxExponentValue ) - 1 ) << FractionWidth ) | MaxSubnormal;
static constexpr UIntType FractionMask = FractionMask<ValueType, UIntType>;
static constexpr UIntType ExponentMask = ExponentMask<ValueType, UIntType>;
static constexpr UIntType SignMask = ~( FractionMask | ExponentMask );
static constexpr UIntType InfinityValue = InfinityUnsignedValue<ValueType>::value;
static constexpr UIntType NegativeInfinityValue = NegativeInfinityUnsignedValue<ValueType>::value;
static constexpr UIntType QuietNaNValue = QuietNaNUnsignedValue<ValueType>::value;
static constexpr UIntType ZeroValue = static_cast<UIntType>( 0 );
static constexpr UIntType NegativeZeroValue = SignMask;
UIntType value_;
constexpr FloatingPoint( ) noexcept
: value_( std::bit_cast<UIntType>( static_cast<ValueType>( 0.0 ) ) )
{
}
constexpr explicit FloatingPoint( ValueType value ) noexcept
: value_( std::bit_cast<UIntType>( value ) )
{
}
constexpr explicit FloatingPoint( UIntType value, bool ) noexcept
: value_( value )
{
}
constexpr explicit FloatingPoint( UIntType fraction, Int32 exponent, bool sign) noexcept
: value_( (fraction & FractionMask ) |
(( static_cast<UIntType>( exponent ) << FractionWidth ) & ExponentMask) |
( sign? SignMask : 0 ) )
{
}
constexpr FloatingPoint& operator = ( ValueType value ) noexcept
{
value_ = std::bit_cast<UIntType>( value );
return *this;
}
constexpr bool Sign( ) const noexcept
{
return ( value_ & SignMask ) != 0;
}
constexpr void SetSign( bool value = true ) noexcept
{
if ( value )
{
value_ |= SignMask;
}
else
{
value_ &= ~SignMask;
}
}
constexpr Int32 Exponent( ) const noexcept
{
return static_cast<Int32>( ( value_ & ExponentMask ) >> FractionWidth ) - ExponentBias;
}
private:
constexpr void SetExponent( UIntType value ) noexcept
{
value_ = ( value << FractionWidth ) & ExponentMask;
}
public:
constexpr UIntType Fraction( ) const noexcept
{
return value_ & FractionMask;
}
private:
constexpr void SetFraction( UIntType value ) noexcept
{
value_ = value & FractionMask;
}
public:
constexpr bool IsZero( ) const noexcept
{
return (value_ & ( ExponentMask | FractionMask )) == 0;
}
constexpr bool IsInf( ) const noexcept
{
return ( value_ & FractionMask ) == 0 && ( ( value_ & ExponentMask ) == MaxExponent );
}
constexpr bool IsNaN( ) const noexcept
{
return ( ( value_ & ExponentMask ) == MaxExponent ) && ( ( value_ & FractionMask ) != 0 );
}
constexpr bool IsInfOrNaN( ) const noexcept
{
return ( value_ & ExponentMask ) == MaxExponent;
}
static constexpr ValueType MakeNaN( UIntType value ) noexcept
{
UIntType result;
result = MaxExponent | (value & FractionMask);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType AsFloatingPoint( ) const noexcept
{
return std::bit_cast<ValueType>( value_ );
}
constexpr UIntType AsUnsigned( ) const noexcept
{
return value_;
}
static constexpr FloatingPoint Zero( ) noexcept
{
return FloatingPoint( );
}
static constexpr FloatingPoint NegZero( ) noexcept
{
FloatingPoint result;
result.value_ = SignMask;
return result;
}
static constexpr FloatingPoint Inf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent;
return result;
}
static constexpr FloatingPoint NegInf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent | SignMask;
return result;
}
constexpr ValueType Trunc( ) const noexcept
{
if ( IsInfOrNaN( ) )
{
return std::bit_cast<ValueType>(value_);
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= -1 )
{
return Sign() ? static_cast<ValueType>( -0.0 ) : static_cast<ValueType>( 0.0 );
}
Int32 trimSize = FractionWidth - exponent;
UIntType result = (value_ & (SignMask | ExponentMask)) | (( (value_ & FractionMask) >> trimSize ) << trimSize);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType Ceil( ) const noexcept
{
if ( IsInfOrNaN( ) || IsZero( ) )
{
return std::bit_cast<ValueType>( value_ );
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= -1 )
{
return Sign() ? ValueType( -0.0 ) : ValueType( 1.0 );
}
Int32 trimSize = FractionWidth - exponent;
UIntType result = ( value_ & ( SignMask | ExponentMask ) ) | ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
return Sign( ) ? std::bit_cast<ValueType>( result ) : std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
constexpr ValueType Floor( ) const noexcept
{
if ( Sign() )
{
FloatingPoint tmp( value_ & ( ExponentMask | FractionMask ), true );
return -tmp.Ceil( );
}
else
{
return Trunc( );
}
}
constexpr ValueType Round( ) const noexcept
{
if ( IsInfOrNaN( ) || IsZero( ) )
{
return std::bit_cast<ValueType>(value_);
}
int exponent = Exponent( );
if ( exponent >= static_cast<int>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent == -1 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( -1.0 );
}
else
{
return static_cast<ValueType>( 1.0 );
}
}
if ( exponent <= -2 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( -0.0 );
}
else
{
return static_cast<ValueType>( 0.0 );
}
}
UInt32 trimSize = FractionWidth - exponent;
bool middleBitSet = (value_ & FractionMask) & ( UIntType( 1 ) << ( trimSize - 1 ) );
UIntType result = ( value_ & ( SignMask | ExponentMask ) ) | ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
if ( !middleBitSet )
{
return std::bit_cast<ValueType>( result );
}
else
{
bool isNegative = Sign( );
return isNegative ?
std::bit_cast<ValueType>( result ) - static_cast<ValueType>( 1.0 ) :
std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
}
};
#pragma pack(pop)
Espen Harlinn
Senior Architect - Ulriken Consulting AS
The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.Edsger W.Dijkstra
modified 21-Jun-21 19:41pm.
|
|
|
|
|
If it was closer to performing like I need I'd twiddle with optimizations like this, but this sort of improvement isn't going to change the code from taking minutes to taking seconds, and that's what I need.
I've basically abandoned color dithering for this project.
Real programmers use butterflies
|
|
|
|
|
 You made me curious. It looks like this worked. There were a couple quirk texts in your paste I had to eliminate, but the main thing was using Notepad++ to convert it to ANSI before pasting here. Weird - it took a bit of work, like 5 mins puttering...
For my own code, I do a two-step process. First paste as HTML and encode, then copy everything, delete it, and repaste as C++.
template <typename T>
struct FractionWidth;
template <>
struct FractionWidth<float>
{
static constexpr UInt32 value = 23;
};
template <>
struct FractionWidth<double>
{
static constexpr UInt32 value = 52;
};
template <typename T>
struct ExponentWidth;
template <>
struct ExponentWidth<float>
{
static constexpr UInt32 value = 8;
};
template <>
struct ExponentWidth<double>
{
static constexpr UInt32 value = 11;
};
template <typename T>
struct ExponenBias;
template <>
struct ExponenBias<float>
{
static constexpr UInt32 value = _FBIAS;
};
template <>
struct ExponenBias<double>
{
static constexpr UInt32 value = _DBIAS;
};
template <typename T>
struct InfinityUnsignedValue;
template <>
struct InfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0X7F800000UL;
};
template <>
struct InfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000000ULL;
};
template <typename T>
struct NegativeInfinityUnsignedValue;
template <>
struct NegativeInfinityUnsignedValue<float>
{
static constexpr UInt32 value = 0xFF800000UL;
};
template <>
struct NegativeInfinityUnsignedValue<double>
{
static constexpr UInt64 value = 0xFFF0000000000000ULL;
};
template <typename T>
struct QuietNaNUnsignedValue;
template <>
struct QuietNaNUnsignedValue<float>
{
static constexpr UInt32 value = 0XFFC00001UL;
};
template <>
struct QuietNaNUnsignedValue<double>
{
static constexpr UInt64 value = 0x7FF0000000000001ULL;
};
pragma pack(push,1);
template<typename T>
struct FloatingPoint
{
using ValueType = std::remove_cvref_t<T>;
using UIntType = MakeUnsigned<ValueType>;
<pre>
static constexpr Int32 FractionWidth = static_cast<Int32>( Internal::FractionWidth<ValueType>::value );
static constexpr Int32 ExponentWidth = static_cast<Int32>( Internal::ExponentWidth<ValueType>::value );
static constexpr Int32 ExponentBias = ( 1 << ( ExponentWidth - 1 ) ) - 1;
static constexpr Int32 MaxExponentValue = ( 1 << ExponentWidth ) - 1;
static constexpr UIntType MaxExponent = static_cast<UIntType>( MaxExponentValue ) << FractionWidth;
static constexpr UIntType MinSubnormal = UIntType( 1 );
static constexpr UIntType MaxSubnormal = ( UIntType( 1 ) << FractionWidth ) - 1;
static constexpr UIntType MinNormal = ( UIntType( 1 ) << FractionWidth );
static constexpr UIntType MaxNormal = ( ( UIntType( MaxExponentValue ) - 1 ) << FractionWidth ) | MaxSubnormal;
static constexpr UIntType FractionMask = FractionMask<ValueType, UIntType>;
static constexpr UIntType ExponentMask = ExponentMask<ValueType, UIntType>;
static constexpr UIntType SignMask = ~( FractionMask | ExponentMask );
static constexpr UIntType InfinityValue = InfinityUnsignedValue<ValueType>::value;
static constexpr UIntType NegativeInfinityValue = NegativeInfinityUnsignedValue<ValueType>::value;
static constexpr UIntType QuietNaNValue = QuietNaNUnsignedValue<ValueType>::value;
static constexpr UIntType ZeroValue = static_cast<UIntType>( 0 );
static constexpr UIntType NegativeZeroValue = SignMask;
UIntType value_;
constexpr FloatingPoint( ) noexcept
: value_( std::bit_cast<UIntType>( static_cast<ValueType>( 0.0 ) ) )
{
}
constexpr explicit FloatingPoint( ValueType value ) noexcept
: value_( std::bit_cast<UIntType>( value ) )
{
}
constexpr explicit FloatingPoint( UIntType value, bool ) noexcept
: value_( value )
{
}
constexpr explicit FloatingPoint( UIntType fraction, Int32 exponent, bool sign) noexcept
: value_( (fraction & FractionMask ) |
(( static_cast<UIntType>( exponent ) << FractionWidth ) & ExponentMask) |
( sign? SignMask : 0 ) )
{
}
constexpr FloatingPoint& operator = ( ValueType value ) noexcept
{
value_ = std::bit_cast<UIntType>( value );
return *this;
}
constexpr bool Sign( ) const noexcept
{
return ( value_ & SignMask ) != 0;
}
constexpr void SetSign( bool value = true ) noexcept
{
if ( value )
{
value_ |= SignMask;
}
else
{
value_ &= ~SignMask;
}
}
constexpr Int32 Exponent( ) const noexcept
{
return static_cast<Int32>( ( value_ & ExponentMask ) >> FractionWidth ) - ExponentBias;
}
private:
constexpr void SetExponent( UIntType value ) noexcept
{
value_ = ( value << FractionWidth ) & ExponentMask;
}
public:
constexpr UIntType Fraction( ) const noexcept
{
return value_ & FractionMask;
}
private:
constexpr void SetFraction( UIntType value ) noexcept
{
value_ = value & FractionMask;
}
public:
constexpr bool IsZero( ) const noexcept
{
return (value_ & ( ExponentMask | FractionMask )) == 0;
}
constexpr bool IsInf( ) const noexcept
{
return ( value_ & FractionMask ) == 0 && ( ( value_ & ExponentMask ) == MaxExponent );
}
constexpr bool IsNaN( ) const noexcept
{
return ( ( value_ & ExponentMask ) == MaxExponent ) && ( ( value_ & FractionMask ) != 0 );
}
constexpr bool IsInfOrNaN( ) const noexcept
{
return ( value_ & ExponentMask ) == MaxExponent;
}
static constexpr ValueType MakeNaN( UIntType value ) noexcept
{
UIntType result;
result = MaxExponent | (value & FractionMask);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType AsFloatingPoint( ) const noexcept
{
return std::bit_cast<ValueType>( value_ );
}
constexpr UIntType AsUnsigned( ) const noexcept
{
return value_;
}
static constexpr FloatingPoint Zero( ) noexcept
{
return FloatingPoint( );
}
static constexpr FloatingPoint NegZero( ) noexcept
{
FloatingPoint result;
result.value_ = SignMask;
return result;
}
static constexpr FloatingPoint Inf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent;
return result;
}
static constexpr FloatingPoint NegInf( ) noexcept
{
FloatingPoint result;
result.value_ = MaxExponent | SignMask;
return result;
}
constexpr ValueType Trunc( ) const noexcept
{
if ( IsInfOrNaN( ) )
{
return std::bit_cast<ValueType>(value_);
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= -1 )
{
return Sign() ? static_cast<ValueType>( -0.0 ) : static_cast<ValueType>( 0.0 );
}
Int32 trimSize = FractionWidth - exponent;
UIntType result = (value_ & (SignMask | ExponentMask)) | (( (value_ & FractionMask) >> trimSize ) << trimSize);
return std::bit_cast<ValueType>( result );
}
constexpr ValueType Ceil( ) const noexcept
{
if ( IsInfOrNaN( ) || IsZero( ) )
{
return std::bit_cast<ValueType>( value_ );
}
Int32 exponent = Exponent( );
if ( exponent >= static_cast<Int32>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent <= -1 )
{
return Sign() ? ValueType( -0.0 ) : ValueType( 1.0 );
}
Int32 trimSize = FractionWidth - exponent;
UIntType result = ( value_ & ( SignMask | ExponentMask ) ) | ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
return Sign( ) ? std::bit_cast<ValueType>( result ) : std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
constexpr ValueType Floor( ) const noexcept
{
if ( Sign() )
{
FloatingPoint tmp( value_ & ( ExponentMask | FractionMask ), true );
return -tmp.Ceil( );
}
else
{
return Trunc( );
}
}
constexpr ValueType Round( ) const noexcept
{
if ( IsInfOrNaN( ) || IsZero( ) )
{
return std::bit_cast<ValueType>(value_);
}
int exponent = Exponent( );
if ( exponent >= static_cast<int>( FractionWidth ) )
{
return std::bit_cast<ValueType>( value_ );
}
if ( exponent == -1 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( -1.0 );
}
else
{
return static_cast<ValueType>( 1.0 );
}
}
if ( exponent <= -2 )
{
bool isNegative = Sign( );
if ( isNegative )
{
return static_cast<ValueType>( -0.0 );
}
else
{
return static_cast<ValueType>( 0.0 );
}
}
UInt32 trimSize = FractionWidth - exponent;
bool middleBitSet = (value_ & FractionMask) & ( UIntType( 1 ) << ( trimSize - 1 ) );
UIntType result = ( value_ & ( SignMask | ExponentMask ) ) | ( ( ( value_ & FractionMask ) >> trimSize ) << trimSize );
if ( result == value_ )
{
return std::bit_cast<ValueType>( value_ );
}
if ( !middleBitSet )
{
return std::bit_cast<ValueType>( result );
}
else
{
bool isNegative = Sign( );
return isNegative ?
std::bit_cast<ValueType>( result ) - static_cast<ValueType>( 1.0 ) :
std::bit_cast<ValueType>( result ) + static_cast<ValueType>( 1.0 );
}
}
};
pragma pack(pop)
|
|
|
|
|
I am surprised it is that bad. I thought something like the ESP32 at 240Mhz would do a Floyd-Steinberg reasonable well, at least keeping up with the 680x0's and 86x86's I used to run Floyd-Steinberg on 25 odd years ago. I know it is a different architecture, but it is also a 200Mhz speed advantage. But maybe it was just a lot slower than I recall it - back then we where amazed it displayed something at all. I am quite sure it never took a minute though - but if it was ½ or 20 seconds who knows.
I wonder if it is memory access or something slowing it down.
Takes ages to develop these kind of things though. As soon as it displays something, you loose the next couple of hours looking at it before moving on. 
|
|
|
|
|
I can't do floyd steinberg because of the memory requirements.
I do a similar style as Thomas Knoll's adobe photoshop grid dithering method for my "slow" dithering, and an optimized Yliluoma algorithm for my "fast" dithering. Both are far too slow.
Real programmers use butterflies
|
|
|
|
|
 Adding were you doing color dithering? My black and white bayer dithering is quite fast.
Also if you were doing color dithering there are much faster algos you can use when simply simulating a higher bit depth, but I actually have to do color matching to a palette.
Here's how I have to choose two colors to blend:
template<typename PaletteType>
gfx_result dither_mixing_plan_fast(const PaletteType* palette, typename PaletteType::mapped_pixel_type color, dither_mixing_plan_data_fast* plan) {
gfx_result rr ;
if(nullptr==plan || nullptr==palette) {
return gfx_result::invalid_argument;
}
rgb_pixel<24> rgb888;
rr = convert(color,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
const unsigned r= rgb888.template channel<channel_name::R>(),
g=rgb888.template channel<channel_name::G>(),
b=rgb888.template channel<channel_name::B>();
*plan = { {0,0}, 0.5 };
double least_penalty = 1e99;
for(unsigned index1 = 0; index1 < 16; ++index1)
for(unsigned index2 = index1; index2 < 16; ++index2)
{
typename PaletteType::mapped_pixel_type mpx1;
rr=palette->map(typename PaletteType::pixel_type(index1),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
typename PaletteType::mapped_pixel_type mpx2;
rr=palette->map(typename PaletteType::pixel_type(index2),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
rr = convert(mpx1,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r1= rgb888.template channel<channel_name::R>(),
g1=rgb888.template channel<channel_name::G>(),
b1=rgb888.template channel<channel_name::B>();
rr = convert(mpx2,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r2= rgb888.template channel<channel_name::R>(),
g2=rgb888.template channel<channel_name::G>(),
b2=rgb888.template channel<channel_name::B>();
int ratio = 32;
if(mpx1.native_value != mpx2.native_value)
{
ratio = ((r2 != r1 ? 299*64 * int(r - r1) / int(r2-r1) : 0)
+ (g2 != g1 ? 587*64 * int(g - g1) / int(g2-g1) : 0)
+ (b1 != b2 ? 114*64 * int(b - b1) / int(b2-b1) : 0))
/ ((r2 != r1 ? 299 : 0)
+ (g2 != g1 ? 587 : 0)
+ (b2 != b1 ? 114 : 0));
if(ratio < 0) ratio = 0; else if(ratio > 63) ratio = 63;
}
unsigned r0 = r1 + ratio * int(r2-r1) / 64;
unsigned g0 = g1 + ratio * int(g2-g1) / 64;
unsigned b0 = b1 + ratio * int(b2-b1) / 64;
double penalty = dither_mixing_error(
r,g,b, r0,g0,b0, r1,g1,b1, r2,g2,b2,
ratio / double(64));
if(penalty < least_penalty)
{
least_penalty = penalty;
plan->colors[0] = index1;
plan->colors[1] = index2;
plan->ratio = ratio / double(64);
}
}
return gfx_result::success;
}
It's not easy. I know it could be faster, but I don't think I can make many algorithmic improvements and that's the sort of improvement I need to achieve orders of magnitude reduction in time requirements - that's what I need right now.
Real programmers use butterflies
|
|
|
|
|
It was color. As far as I recall, we had a 16 color fixed palette, and anything available above that would be "allocated" as images where displayed (I think the Amiga supported 32, 64, 128, 256 - while our Windows support would go directly to 256). We also supported 16/24 bit, but I believe we just did nearest color on 16 bit.
Palette entries where allocated based on one or another algorithm based on distance to nearest "existing" color, the number of pixels using the color, and the number of free palette entries.
But most likely our images where just small enough that we did not encounter memory issues. I recall we did support Amiga 500, but we might have had restrictions on features there. Most systems would have had at least 1MB, and then it is no problem keeping an extra scanline or two in memory for dithering.
|
|
|
|
|
lmoelleb wrote: Palette entries where allocated based on one or another algorithm based on distance to nearest "existing" color, the number of pixels using the color, and the number of free palette entries.
I'm not sure what you mean by nearest existing color, as my algo has to find *two* colors in order to determine what to blend with what. I have a KD tree implementation waiting in the wings for larger palettes since it sorts in such a way as to speed up distance based matching, but it's not helpful for say, 16 colors. I may "pre-expand" the palette, mixing colors beforehand, so a 16 color palette becomes (16*15)/2 colors, and then trying throwing that into a kd_tree and see what happens.
But that's my biggest issue, is finding the two colors to blend. The rest is fast.
Real programmers use butterflies
|
|
|
|
|
We where mainly (maybe only) loading images with a palette. So we where dithering one palette to another, meaning there was a max of 256 colors in the source image (and often less than that). I This allowed us to quickly calculate the total number of a given color, and for each color calculate how close it as to existing colors in our palette.
I guess it is pretty useless these days as no one works with palettes (I think we had a jpg decoder for fun, but probably was sticking to gif and various other formats for images we really needed).
Once the palette was locked in for an image, we could just run the Floyd-Steinberg. It only requires finding the nearest color per pixel as "blending" is done by pushing the error ahead of the calculations (at the cost of one scanline extra memory consumed - though I guess you could stamp it into the bitmap data directly). No problem on a 500KB system with relative small images.
We had the advantage advantage that low CPU spec systems where typically also running lower colors (so we only had to search for nearest color in 16 or 32 target colors), while systems running 256 colors typically also had more CPU power.
|
|
|
|
|
Actually palettes are very useful these days for e-paper displays, which are either monochrome, or have a *fixed palette* of a handful of colors.
That's primarily why my GFX library supports it.
Real programmers use butterflies
|
|
|
|
|
A N^2-level grey scale dithering for a 2-colour screen (B&W) may be performed with an NxN integer matrix. If N is a power of 2, the most expensive operation would be masking the X, Y ordinates to the range 0 .. N-1, and reading the dither threshold value.
Unless you are dealing with giant pictures, how would the dithering take that long?
(You could even speed the rendering by dithering a partial block of the picture in memory, and then bitblt-ing the block to the screen. The size of the block would depend on the memory available.)
As for colour dithering, assuming that you have an 8-colour screen (on/off for each of R,G,B), perhaps you could use a B&W dithering algorithm on each level - R, G, B, and then combine them. I do not know about the quality of the colours, though...
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
 Grayscale dithering is not color dithering. Also it's much faster to dither when you have a palette where the colors are evenly distributed/gradiated.
When you're doing color matching to dither between, say 7 available colors while loading a 24-bit color JPEG, that's a different story.
You can't color dither like you suggest. You have to mix two colors, not just one color and a fixed color like black
This is what finding the two colors to mix looks like - or at least one of the ways I use:
template<typename PaletteType>
gfx_result dither_mixing_plan_fast(const PaletteType* palette, typename PaletteType::mapped_pixel_type color, dither_mixing_plan_data_fast* plan) {
gfx_result rr ;
if(nullptr==plan || nullptr==palette) {
return gfx_result::invalid_argument;
}
rgb_pixel<24> rgb888;
rr = convert(color,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
const unsigned r= rgb888.template channel<channel_name::R>(),
g=rgb888.template channel<channel_name::G>(),
b=rgb888.template channel<channel_name::B>();
*plan = { {0,0}, 0.5 };
double least_penalty = 1e99;
for(unsigned index1 = 0; index1 < PaletteType::size; ++index1)
for(unsigned index2 = index1; index2 < PaletteType::size; ++index2)
{
typename PaletteType::mapped_pixel_type mpx1;
rr=palette->map(typename PaletteType::pixel_type(index1),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
typename PaletteType::mapped_pixel_type mpx2;
rr=palette->map(typename PaletteType::pixel_type(index2),&mpx1);
if(gfx_result::success!=rr) {
return rr;
}
rr = convert(mpx1,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r1= rgb888.template channel<channel_name::R>(),
g1=rgb888.template channel<channel_name::G>(),
b1=rgb888.template channel<channel_name::B>();
rr = convert(mpx2,&rgb888);
if(gfx_result::success!=rr) {
return rr;
}
unsigned r2= rgb888.template channel<channel_name::R>(),
g2=rgb888.template channel<channel_name::G>(),
b2=rgb888.template channel<channel_name::B>();
int ratio = 32;
if(mpx1.native_value != mpx2.native_value)
{
ratio = ((r2 != r1 ? 299*64 * int(r - r1) / int(r2-r1) : 0)
+ (g2 != g1 ? 587*64 * int(g - g1) / int(g2-g1) : 0)
+ (b1 != b2 ? 114*64 * int(b - b1) / int(b2-b1) : 0))
/ ((r2 != r1 ? 299 : 0)
+ (g2 != g1 ? 587 : 0)
+ (b2 != b1 ? 114 : 0));
if(ratio < 0) ratio = 0; else if(ratio > 63) ratio = 63;
}
unsigned r0 = r1 + ratio * int(r2-r1) / 64;
unsigned g0 = g1 + ratio * int(g2-g1) / 64;
unsigned b0 = b1 + ratio * int(b2-b1) / 64;
double penalty = dither_mixing_error(
r,g,b, r0,g0,b0, r1,g1,b1, r2,g2,b2,
ratio / double(64));
if(penalty < least_penalty)
{
least_penalty = penalty;
plan->colors[0] = index1;
plan->colors[1] = index2;
plan->ratio = ratio / double(64);
}
}
return gfx_result::success;
}
Edit: Thanks! I found a bug in the above code when I pasted it to you, I fixed the bug and now it's fast-ish and usable.
Real programmers use butterflies
modified 18-Jun-21 9:49am.
|
|
|
|
|
Was the word "algorithm" created to honour Al Gore's contribution to computer science?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
OriginalGriff wrote: Was the word "algorithm" created to honour Al Gore's contributions to computer science dancing and speling?
FTFY
EDIT: added speling to his accomplishments.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: added speling to his accomplishments. A bit too late. They already misspelled his name when naming those decorative melons "Gourds".
(I can hear you vine after you read that all the way over here)
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
W∴ Balboos, GHB wrote: I can hear you vine
oy vey-ing, not vining.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
When a bullfighter is gored, does that mean he sat through one of Al's speeches?
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
Or is it more effective than the Vatican's method?
|
|
|
|
|