Click here to Skip to main content
15,882,063 members
Articles / Programming Languages / C++

C++: Simplistic Binary Streams

Rate me:
Please Sign up or sign in to vote.
4.85/5 (23 votes)
12 Aug 2018CPOL4 min read 76.6K   1.3K   58   20
Simplistic Binary Streams with endian swap support

Table of Contents

Introduction

More than a few C++ developers having accustomed to << and >> operators on text stream, missed them on binary streams. Simplistic Binary Stream is nothing but a barebone wrapper over STL fstream's read and write functions. Readers may compare it to other serialization libraries like Boost Serialization Library and MFC Serialization due to seemingly similar <<, >> overloads. Simplistic Binary Stream is not a serialization library: it does not handle versioning, backward/forward compatible, endianess correctness, leaving everything to the developer. Every developer who had used Boost Serialization in the past, is fresh in their memory having bitten when the version 1.42-1.44 files rendered unreadable by newer version. Using a serialization library is like putting your file format under a third party not within your control. While Simplistic Binary Stream offers none of the serialization convenience, it puts the developer in the driver seat over their file format.

Anyone using the library to read/write file format, is advised to implement another layer above it. In this article, we will look at the usage before looking at the source code. Simplistic Binary Stream comes in two flavors: file and memory streams. File stream encapsulates the STL fstream while memory stream uses STL vector<char> to hold the data in memory. Developer can use the memory stream to parse in memory for files downloaded from network.

Simple Examples

The examples of writing and then reading is similar to both memory and file streams, except that we flush and close the output file stream before reading it with the input stream.

C++
#include <iostream>
#include "MiniBinStream.h"

void TestMem()
{
    simple::mem_ostream out;
    out << 23 << 24 << "Hello world!";

    simple::mem_istream in(out.get_internal_vec());
    int num1 = 0, num2 = 0;
    std::string str;
    in >> num1 >> num2 >> str;

    cout << num1 << "," << num2 << "," << str << endl;
}

void TestFile()
{
    simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
    out << 23 << 24 << "Hello world!";
    out.flush();
    out.close();

    simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
    int num1 = 0, num2 = 0;
    std::string str;
    in >> num1 >> num2 >> str;

    cout << num1 << "," << num2 << "," << str << endl;
}

The output is the same for both:

23,24,Hello world!

Overloading the Operators

Say we have a Product structure. We can overload them like below:

C++
#include <vector>
#include <string>
#include "MiniBinStream.h"

struct Product
{
    Product() : product_name(""), price(0.0f), qty(0) {}
    Product(const std::string& name, 
            float _price, int _qty) : product_name(name), price(_price), qty(_qty) {}
    std::string product_name;
    float price;
    int qty;
};

simple::mem_istream& operator >> (simple::mem_istream& istm, Product& val)
{
    return istm >> val.product_name >> val.price >> val.qty;
}

simple::file_istream& operator >> (simple::file_istream& istm, Product& val)
{
    return istm >> val.product_name >> val.price >> val.qty;
}

simple::mem_ostream& operator << (simple::mem_ostream& ostm, const Product& val)
{
    return ostm << val.product_name << val.price << val.qty;
}

simple::file_ostream& operator << (simple::file_ostream& ostm, const Product& val)
{
    return ostm << val.product_name << val.price << val.qty;
}

If the struct only contains fundamental types and the developer can pack the struct members with no padding or alignment as shown below, then he/she can write/read the whole struct at one go, instead of processing the members one by one. Reader should notice that we overload the memory and file streams with the same code. That is unfortunate because both types of streams are not derived from the same base class. Even if they are, it wouldn't work because the write and read functions are template functions and template functions cannot be virtual for reasons that template function is determined at compile time while virtual polymorphism is determined at runtime: they cannot be used together.

#if defined(__linux__)
#pragma pack(push)
#pragma pack(1)
// Your struct declaration here.
#pragma pack(pop)
#endif

#if defined(WIN32)
#pragma warning(disable:4103)
#pragma pack(push,1)
// Your struct declaration here.
#pragma pack(pop)
#endif

Next, we overload the operators for writing/reading vector of Product and also outputting it on console. Rule of thumb: never use a size_t because its size is dependent on platform(32/64bits).

C++
simple::mem_istream& operator >> (simple::mem_istream& istm, std::vector<Product>& vec)
{
    int size=0;
    istm >> size;

    if(size<=0)
        return istm;

    for(int i=0; i<size; ++i)
    {
        Product product;
        istm >> product;
        vec.push_back(product);
    }

    return istm;
}

simple::file_istream& operator >> (simple::file_istream& istm, std::vector<Product>& vec)
{
    int size=0;
    istm >> size;

    if(size<=0)
        return istm;

    for(int i=0; i<size; ++i)
    {
        Product product;
        istm >> product;
        vec.push_back(product);
    }

    return istm;
}

simple::mem_ostream& operator << (simple::mem_ostream& ostm, const std::vector<Product>& vec)
{
    int size = vec.size();
    ostm << size;
    for(size_t i=0; i<vec.size(); ++i)
    {
        ostm << vec[i];
    }

    return ostm;
}

simple::file_ostream& operator << (simple::file_ostream& ostm, const std::vector<Product>& vec)
{
    int size = vec.size();
    ostm << size;
    for(size_t i=0; i<vec.size(); ++i)
    {
        ostm << vec[i];
    }

    return ostm;
}

void print_product(const Product& product)
{
    using namespace std;
    cout << "Product:" << product.product_name << ", 
        Price:" << product.price << ", Qty:" << product.qty << endl;
}

void print_products(const std::vector<Product>& vec)
{
    for(size_t i=0; i<vec.size() ; ++i)
        print_product(vec[i]);
}

We test the overloaded operators for Product using the code below:

C++
void TestMemCustomOperatorsOnVec()
{
    std::vector<Product> vec_src;
    vec_src.push_back(Product("Book", 10.0f, 50));
    vec_src.push_back(Product("Phone", 25.0f, 20));
    vec_src.push_back(Product("Pillow", 8.0f, 10));
    simple::mem_ostream out;
    out << vec_src;

    simple::mem_istream in(out.get_internal_vec());
    std::vector<Product> vec_dest;
    in >> vec_dest;

    print_products(vec_dest);
}

void TestFileCustomOperatorsOnVec()
{
    std::vector<Product> vec_src;
    vec_src.push_back(Product("Book", 10.0f, 50));
    vec_src.push_back(Product("Phone", 25.0f, 20));
    vec_src.push_back(Product("Pillow", 8.0f, 10));
    simple::file_ostream out("file.bin", std::ios_base::out | std::ios_base::binary);
    out << vec_src;
    out.flush();
    out.close();

    simple::file_istream in("file.bin", std::ios_base::in | std::ios_base::binary);
    std::vector<Product> vec_dest;
    in >> vec_dest;

    print_products(vec_dest);
}

The output is as follows:

Product:Book, Price:10, Qty:50
Product:Phone, Price:25, Qty:20
Product:Pillow, Price:8, Qty:10

Source Code

All the source code is in a header file, just include the MiniBinStream.h to use the stream class. The class is not using any C++11/14 features. It has been tested on VS2008, GCC4.4 and Clang 3.2. The class is just a thin wrapper over the fstream: there isn't any need for me to explain anything.

C++
// The MIT License (MIT)
// Simplistic Binary Streams 0.9
// Copyright (C) 2014, by Wong Shao Voon (shaovoon@yahoo.com)
//
// http://opensource.org/licenses/MIT
//

#ifndef MiniBinStream_H
#define MiniBinStream_H

#include <fstream>
#include <vector>
#include <string>
#include <cstring>
#include <stdexcept>
#include <iostream>

namespace simple
{

class file_istream
{
public:
    file_istream() {}
    file_istream(const char * file, std::ios_base::openmode mode) 
    {
        open(file, mode);
    }
    void open(const char * file, std::ios_base::openmode mode)
    {
        m_istm.open(file, mode);
    }
    void close()
    {
        m_istm.close();
    }
    bool is_open()
    {
        return m_istm.is_open();
    }
    bool eof() const
    {
        return m_istm.eof();
    }
    std::ifstream::pos_type tellg()
    {
        return m_istm.tellg();
    }
    void seekg (std::streampos pos)
    {
        m_istm.seekg(pos);
    }
    void seekg (std::streamoff offset, std::ios_base::seekdir way)
    {
        m_istm.seekg(offset, way);
    }

    template<typename T>
    void read(T& t)
    {
        if(m_istm.read(reinterpret_cast<char*>(&t), sizeof(T)).bad())
        {
            throw std::runtime_error("Read Error!");
        }
    }
    void read(char* p, size_t size)
    {
        if(m_istm.read(p, size).bad())
        {
            throw std::runtime_error("Read Error!");
        }
    }
private:
    std::ifstream m_istm;
};

template<>
void file_istream::read(std::vector<char>& vec)
{
    if(m_istm.read(reinterpret_cast<char*>(&vec[0]), vec.size()).bad())
    {
        throw std::runtime_error("Read Error!");
    }
}

template<typename T>
file_istream& operator >> (file_istream& istm, T& val)
{
    istm.read(val);

    return istm;
}

template<>
file_istream& operator >> (file_istream& istm, std::string& val)
{
    int size = 0;
    istm.read(size);

    if(size<=0)
        return istm;

    std::vector<char> vec((size_t)size);
    istm.read(vec);
    val.assign(&vec[0], (size_t)size);

    return istm;
}

class mem_istream
{
public:
    mem_istream() : m_index(0) {}
    mem_istream(const char * mem, size_t size) 
    {
        open(mem, size);
    }
    mem_istream(const std::vector<char>& vec) 
    {
        m_index = 0;
        m_vec.clear();
        m_vec.reserve(vec.size());
        m_vec.assign(vec.begin(), vec.end());
    }
    void open(const char * mem, size_t size)
    {
        m_index = 0;
        m_vec.clear();
        m_vec.reserve(size);
        m_vec.assign(mem, mem + size);
    }
    void close()
    {
        m_vec.clear();
    }
    bool eof() const
    {
        return m_index >= m_vec.size();
    }
    std::ifstream::pos_type tellg()
    {
        return m_index;
    }
    bool seekg (size_t pos)
    {
        if(pos<m_vec.size())
            m_index = pos;
        else 
            return false;

        return true;
    }
    bool seekg (std::streamoff offset, std::ios_base::seekdir way)
    {
        if(way==std::ios_base::beg && offset < m_vec.size())
            m_index = offset;
        else if(way==std::ios_base::cur && (m_index + offset) < m_vec.size())
            m_index += offset;
        else if(way==std::ios_base::end && (m_vec.size() + offset) < m_vec.size())
            m_index = m_vec.size() + offset;
        else
            return false;

        return true;
    }

    const std::vector<char>& get_internal_vec()
    {
        return m_vec;
    }

    template<typename T>
    void read(T& t)
    {
        if(eof())
            throw std::runtime_error("Premature end of array!");

        if((m_index + sizeof(T)) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        std::memcpy(reinterpret_cast<void*>(&t), &m_vec[m_index], sizeof(T));

        m_index += sizeof(T);
    }

    void read(char* p, size_t size)
    {
        if(eof())
            throw std::runtime_error("Premature end of array!");

        if((m_index + size) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        std::memcpy(reinterpret_cast<void*>(p), &m_vec[m_index], size);

        m_index += size;
    }

    void read(std::string& str, const unsigned int size)
    {
        if (eof())
            throw std::runtime_error("Premature end of array!");

        if ((m_index + str.size()) > m_vec.size())
            throw std::runtime_error("Premature end of array!");

        str.assign(&m_vec[m_index], size);

        m_index += str.size();
    }

private:
    std::vector<char> m_vec;
    size_t m_index;
};

template<>
void mem_istream::read(std::vector<char>& vec)
{
    if(eof())
        throw std::runtime_error("Premature end of array!");
        
    if((m_index + vec.size()) > m_vec.size())
        throw std::runtime_error("Premature end of array!");

    std::memcpy(reinterpret_cast<void*>(&vec[0]), &m_vec[m_index], vec.size());

    m_index += vec.size();
}

template<typename T>
mem_istream& operator >> (mem_istream& istm, T& val)
{
    istm.read(val);

    return istm;
}

template<>
mem_istream& operator >> (mem_istream& istm, std::string& val)
{
    int size = 0;
    istm.read(size);

    if(size<=0)
        return istm;

    istm.read(val, size);

    return istm;
}

class file_ostream
{
public:
    file_ostream() {}
    file_ostream(const char * file, std::ios_base::openmode mode)
    {
        open(file, mode);
    }
    void open(const char * file, std::ios_base::openmode mode)
    {
        m_ostm.open(file, mode);
    }
    void flush()
    {
        m_ostm.flush();
    }
    void close()
    {
        m_ostm.close();
    }
    bool is_open()
    {
        return m_ostm.is_open();
    }
    template<typename T>
    void write(const T& t)
    {
        m_ostm.write(reinterpret_cast<const char*>(&t), sizeof(T));
    }
    void write(const char* p, size_t size)
    {
        m_ostm.write(p, size);
    }

private:
    std::ofstream m_ostm;

};

template<>
void file_ostream::write(const std::vector<char>& vec)
{
    m_ostm.write(reinterpret_cast<const char*>(&vec[0]), vec.size());
}

template<typename T>
file_ostream& operator << (file_ostream& ostm, const T& val)
{
    ostm.write(val);

    return ostm;
}

template<>
file_ostream& operator << (file_ostream& ostm, const std::string& val)
{
    int size = val.size();
    ostm.write(size);

    if(val.size()<=0)
        return ostm;

    ostm.write(val.c_str(), val.size());

    return ostm;
}

file_ostream& operator << (file_ostream& ostm, const char* val)
{
    int size = std::strlen(val);
    ostm.write(size);

    if(size<=0)
        return ostm;

    ostm.write(val, size);

    return ostm;
}

class mem_ostream
{
public:
    mem_ostream() {}
    void close()
    {
        m_vec.clear();
    }
    const std::vector<char>& get_internal_vec()
    {
        return m_vec;
    }
    template<typename T>
    void write(const T& t)
    {
        std::vector<char> vec(sizeof(T));
        std::memcpy(reinterpret_cast<void*>(&vec[0]), reinterpret_cast<const void*>(&t), sizeof(T));
        write(vec);
    }
    void write(const char* p, size_t size)
    {
        for(size_t i=0; i<size; ++i)
            m_vec.push_back(p[i]);
    }

private:
    std::vector<char> m_vec;
};

template<>
void mem_ostream::write(const std::vector<char>& vec)
{
    m_vec.insert(m_vec.end(), vec.begin(), vec.end());
}

template<typename T>
mem_ostream& operator << (mem_ostream& ostm, const T& val)
{
    ostm.write(val);

    return ostm;
}

template<>
mem_ostream& operator << (mem_ostream& ostm, const std::string& val)
{
    int size = val.size();
    ostm.write(size);

    if(val.size()<=0)
        return ostm;

    ostm.write(val.c_str(), val.size());

    return ostm;
}

mem_ostream& operator << (mem_ostream& ostm, const char* val)
{
    int size = std::strlen(val);
    ostm.write(size);

    if(size<=0)
        return ostm;

    ostm.write(val, size);

    return ostm;
}

} // ns simple

#endif // MiniBinStream_H

Version 0.9.5 Breaking Changes

Requires C++11 now. The classes are templates.

C++
template<typename same_endian_type>
class file_istream {...}

template<typename same_endian_type>
class mem_istream  {...}

template<typename same_endian_type>
class ptr_istream  {...}

template<typename same_endian_type>
class file_ostream {...}

template<typename same_endian_type>
class mem_ostream  {...}

How to pass in same_endian_type to the class? Use std::is_same<>().

C++
// 1st parameter is data endian and 2 parameter is platform endian, if they are different, swap.
using same_endian_type = std::is_same<simple::BigEndian, simple::LittleEndian>;
simple::mem_ostream<same_endian_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";

simple::ptr_istream<same_endian_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;

cout << num1 << "," << num2 << "," << str << endl;

If your data and platform always share the same endianness, you can skip the test by specifying std::true_type directly.

C++
simple::mem_ostream<std::true_type> out;
out << (int64_t)23 << (int64_t)24 << "Hello world!";

simple::ptr_istream<std::true_type> in(out.get_internal_vec());
int64_t num1 = 0, num2 = 0;
std::string str;
in >> num1 >> num2 >> str;

cout << num1 << "," << num2 << "," << str << endl;

Advantages of compile-time Check

  • For same_endian_type = true_type, the swap function is a empty function which is optimised away.
  • For same_endian_type = false_type, the swapping is done without any prior runtime check cost.

Disadvantages of compile-time Check

  • Cannot parse file/data which is sometimes different endian. I believe this scenario is rare.

Swap functions are listed below:

C++
enum class Endian
{
    Big,
    Little
};
using BigEndian = std::integral_constant<Endian, Endian::Big>;
using LittleEndian = std::integral_constant<Endian, Endian::Little>;

template<typename T>
void swap(T& val, std::true_type)
{
    // same endian so do nothing.
}

template<typename T>
void swap(T& val, std::false_type)
{
    std::is_integral<T> is_integral_type;
    swap_if_integral(val, is_integral_type);
}

template<typename T>
void swap_if_integral(T& val, std::false_type)
{
    // T is not integral so do nothing
}

template<typename T>
void swap_if_integral(T& val, std::true_type)
{
    swap_endian<T, sizeof(T)>()(val);
}

template<typename T, size_t N>
struct swap_endian
{
    void operator()(T& ui)
    {
    }
};

template<typename T>
struct swap_endian<T, 8>
{
    void operator()(T& ui)
    {
        union EightBytes
        {
            T ui;
            uint8_t arr[8];
        };

        EightBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[7]);
        std::swap(fb.arr[1], fb.arr[6]);
        std::swap(fb.arr[2], fb.arr[5]);
        std::swap(fb.arr[3], fb.arr[4]);

        ui = fb.ui;
    }
};

template<typename T>
struct swap_endian<T, 4>
{
    void operator()(T& ui)
    {
        union FourBytes
        {
            T ui;
            uint8_t arr[4];
        };

        FourBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[3]);
        std::swap(fb.arr[1], fb.arr[2]);

        ui = fb.ui;
    }
};

template<typename T>
struct swap_endian<T, 2>
{
    void operator()(T& ui)
    {
        union TwoBytes
        {
            T ui;
            uint8_t arr[2];
        };

        TwoBytes fb;
        fb.ui = ui;
        // swap the endian
        std::swap(fb.arr[0], fb.arr[1]);

        ui = fb.ui;
    }
};

The code is hosted at Github.

  • 2016-08-01: Version 0.9.4 Update: Added ptr_istream which shares the same interface as mem_istream except it does not copy the array
  • 2016-08-06: Version 0.9.5 Update: Added Endian Swap
  • 2017-02-16: Version 0.9.6 Using C File APIs, instead of STL file streams
  • 2017-02-16: Version 0.9.7 Added memfile_istream

    Benchmark of 0.9.7(C file API) against 0.9.5(C++ File Stream)

    C++
       # File streams (C++ File stream versus C file API)
    
       old::file_ostream:  359ms
       old::file_istream:  416ms
       new::file_ostream:  216ms
       new::file_istream:  328ms
    new::memfile_ostream:  552ms
    new::memfile_istream:   12ms
    
       # In-memory streams (No change in source code)
    
        new::mem_ostream:  534ms
        new::mem_istream:   16ms
        new::ptr_istream:   15ms
  • 2017-03-07: Version 0.9.8: Fixed GCC and Clang template errors
  • 2017-08-17: Version 0.9.9: Fixed bug of getting previous value when reading empty string
  • 2018-01-23: Version 1.0.0: Fixed buffer overrun bug when reading string (reported by imtrobin)
  • 2018-05-14: Version 1.0.1: Fixed memfile_istream tellg and seekg bug reported by macxfadz, and use is_arithmetic instead of is_integral to determine a type is integer or floating point that can be swapped
  • 2018-08-12: Version 1.0.2: Add overloaded file open functions that take in file parameter in wide char string. (only available on win32)

Related Articles

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Singapore Singapore
Shao Voon is from Singapore. His interest lies primarily in computer graphics, software optimization, concurrency, security, and Agile methodologies.

In recent years, he shifted focus to software safety research. His hobby is writing a free C++ DirectX photo slideshow application which can be viewed here.

Comments and Discussions

 
Questioneof() problem Pin
Festering19-Dec-19 7:42
Festering19-Dec-19 7:42 
AnswerRe: eof() problem Pin
Shao Voon Wong20-Dec-19 0:10
mvaShao Voon Wong20-Dec-19 0:10 
QuestionTossing a wrench in here to promote discussion... Pin
charlieg28-Sep-18 7:57
charlieg28-Sep-18 7:57 
AnswerRe: Tossing a wrench in here to promote discussion... Pin
Shao Voon Wong9-Oct-18 17:30
mvaShao Voon Wong9-Oct-18 17:30 
GeneralRe: Tossing a wrench in here to promote discussion... Pin
charlieg10-Oct-18 0:20
charlieg10-Oct-18 0:20 
Suggestionreading empty string not clear the string Pin
paolobia11-Aug-17 0:59
paolobia11-Aug-17 0:59 
GeneralRe: reading empty string not clear the string Pin
Shao Voon Wong17-Aug-17 3:48
mvaShao Voon Wong17-Aug-17 3:48 
QuestionThrow on read error Pin
Robin27-Apr-17 2:20
Robin27-Apr-17 2:20 
AnswerRe: Throw on read error Pin
Shao Voon Wong27-Apr-17 3:14
mvaShao Voon Wong27-Apr-17 3:14 
QuestionNDK Pin
Robin6-Mar-17 18:28
Robin6-Mar-17 18:28 
AnswerRe: NDK Pin
Shao Voon Wong6-Mar-17 19:47
mvaShao Voon Wong6-Mar-17 19:47 
GeneralRe: NDK Pin
Robin6-Mar-17 22:32
Robin6-Mar-17 22:32 
GeneralRe: NDK Pin
Shao Voon Wong7-Mar-17 2:38
mvaShao Voon Wong7-Mar-17 2:38 
GeneralRe: NDK Pin
Robin7-Mar-17 19:11
Robin7-Mar-17 19:11 
QuestionComparision Pin
Robin18-Feb-17 1:06
Robin18-Feb-17 1:06 
AnswerRe: Comparision Pin
Shao Voon Wong18-Feb-17 20:42
mvaShao Voon Wong18-Feb-17 20:42 
QuestionPerformance Pin
Andy Bantly18-Aug-14 9:20
Andy Bantly18-Aug-14 9:20 
AnswerRe: Performance Pin
Shao Voon Wong19-Aug-14 16:16
mvaShao Voon Wong19-Aug-14 16:16 
GeneralRe: Performance Pin
Andy Bantly21-Aug-14 2:23
Andy Bantly21-Aug-14 2:23 
GeneralRe: Performance Pin
Shao Voon Wong25-Aug-14 18:46
mvaShao Voon Wong25-Aug-14 18:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.