Click here to Skip to main content
15,893,508 members
Articles / Programming Languages / C

Generate C Code That Partially Simulates A C++ Class

Rate me:
Please Sign up or sign in to vote.
4.95/5 (4 votes)
20 Oct 2013MIT10 min read 18.7K   1.3K   5  
C code generator uses program make_cpp_class.py input file

Introduction 

The purpose of program make_c_struct.py is to make it easier to produce a C header file and implementation file that simulates some aspects of a C++ class.  This program will often produce correct code, however, the program cannot always identify the proper header files to include, and therefore the generated code might have to be edited before the code can be successfully compiled.  This program saves a lot of typing.

The program was written after the make_cpp_class program was written because, on a specific platform, I could not use C++.  I wanted to be able to use the same input file I used for the program make_cpp_class.py, but to generate C code instead.  This program is designed to take the same input file format as the make_cpp_class.py program.

C++ supports encapsulation, inheritance, and polymorphism.  This program simulates C++ encapsulation and for C++ virtual functions, puts code to simulate polymorphism in the generated C code.  Some C++ features are ignored.

The program is for experienced C programmers, because understanding the changes that must be made to the generated code require understanding the C language.  If using C++ features in the input file, it is also necessary to understand the C++ language.

The program takes file as input and a structure name and a few optional switch arguments.  The input file contains code in a very simple language.  Once code is generated using this program, the input file can be discarded.

The body of any method in the input file is copied to the generated code without any changes.  In other words, you still must write the code that does anything, the program merely eliminates the need to write the C boilerplate code.

This program works, but handling virtual methods was added only recently and has not been tested.  Please revisit this periodically as I will update the code and the history if there are any issues.  Aside from that one feature, the program has been tested and works as indicated below.

A folder is created with the same name as the class name passed on the command line and the generated code is created in this folder.

Program Files

  • make_c_struct.py              - The main program file that parses the arguments and calls the code generator.
  • bstruct.py                        - The main code generation module that parses the input file and creates the generated files.
  • cparse.py                         - Parses the input file.
  • file_buffer.py                    - Buffers the input file data; used by the cparse.py
  • cfunction_info.py             - Stores method information, including the method name, return type, and argument types.
  • include_file_manager.py   - Stored the include file names for some standard types.
  • ctype_and_name_info.py  - Stores a name, either of a variable or a method and a type_info instance.
  • ctype_info.py                   - Stores attributes of a type.
  • test_input.txt                   - A file that can be used as input to the program to demonstrate some capabilities.  This file is not part of the program. 

Using the code

The program takes an input file name, a structure name, and a few optional switch arguments as input.  The input file contains code in a very simple language.  Once code is generated using this program, the input file can be discarded.

The body of any method in the input file is copied to the generated code without any changes.  In other words, you still must write the code that does anything, the program merely eliminates the boilerplate that must normally be written when producing a C++ class.

While the input language is described below, looking at file 'test_input.txt' will make the descriptions below much clearer.

The program does relatively little lexical analysis of the data in the input file.  While some sequences will be recognized as erroneous, if the input file is not correct the program might produce garbage.  This is another reason it's important to know C++ if you use this program.

Using the code

The program takes an input file name, a structure name, and a few optional switch arguments as input.  The input file contains code in a very simple language.  Once code is generated using this program, the input file can be discarded.

The body of any method in the input file is copied to the generated code without any changes.  In other words, you still must write the code that does anything, the program merely eliminates the boilerplate that must normally be written when producing a C++ class.

While the input language is described below, looking at file 'test_input.txt' will make the descriptions below much clearer.

The program does relatively little lexical analysis of the data in the input file.  While some sequences will be recognized as erroneous, if the input file is not correct the program might produce garbage.  This is another reason it's important to know C++ if you use this program.

Usage:

C++
python make_c_struct.py <struct_name> <input_file_name> [-a author_name] [-f] [-c] 

    The program accepts the following switches:

        -a author_name, --author author_name  - The author name.
        -f, --full                                                   - Write full detailed header information.
        -c, --c99comment                                   - Write C99 style comments.
        -h, --help                                                - Show help and exit

The Input File format 

Structure Element Format In The Input File

From here on, a structure element is referred to as a "Data Member".

Data members are first in the file and are declared using the following format:

C++
<Type> <Name> [= initial_value]<;>
The type can be a pointer parameter, a reference parameter, and be preceded by const, volatile, or static.  'Const volatile", while valid, are not allowed together.  I didn't implement this only to simplify the parser; if needed use one keyword and add the other to the generated code.

A sample data member set might be:

C++
short m_age;
int m_count = 7;
Foo_t * m_foo;
static const float m_height = 1.0;
If an initial value is supplied for a data member, then everything between the equal sign and the terminating semicolon is used as the initial value in the generated code.

While the "static" keyword in C means the variable has file-scopt, the static keyword in the input file means that the variable will be declared as a global variable.

Constructors and the Destructor

Constructors are turned into C factory functions with the signature:

C++
struct_name * create<struct_name><digit_string>([argument_list])
The first constructor method has empty digit_string in the name.  After that, digit_string is the number string "1" and increases by 1 each factory function that is generated.


The destructor in the input file generates function with the signature:

C++
void destroy<struct_name>(<struct_name> * <lower_case_struct_name>_ptr)
After the data member declarations, the constructors, destructor, and methods are listed in the input file.  Headers are written automatically for all methods.

The return value for a method can be declared the same as for a data member type and also allows the additional keywords 'inline' and 'virtual'.  These cannot be used together and both should be first on the function declaration line.

Because the class name is specified on the command line, when used for a constructor or destructor, the class name is specified in the input file using the character '@'.

Here are some examples that show two constructors and one virtual destructor.

C++
@()
{
}

@(int x)
{
    // Some code here.
}

virtual ~@()
{
} 
Member initialization lists are created for every constructor.  Member initialization lists might need to be edited, but much of the time what is written can be left as-is.

If the class name passed on the command line is 'Foobar' and the data member list is the example shown above, then the first constructor definition shown above, which starts with the '@' character, would produce the following generated code:

C++
Foobar::Foobar()
  : m_age(0),
  , m_count(7)
  , m_foo(NULL)
{
}

The Copy Constructor function

Putting the keyword, "copy:" on an input line will automatically generate code for a function that does a shallow copy of the passed structure. 

C++
copy:

The created function has the following signature:

C++
<struct_name> * createFoobarCopy(const <struct_name> * source_<lower_case_struct_name>_ptr)
The keyword "nocopy:" is ignored.

Methods

All methods that are not declared as "static" automatically will have a first argument added that is a pointer to an instances of type <struct_name>.

Methods are declared similar to a C function, but the methods can be made const by putting the 'const' keyword at the end.  An "= 0" following a method is ignored.  Here are three method examples that could be specified in the input file:

If there are any virtual methods in the input file, then a static function table is created, similar to a C++ vtable.  An alternate approach would be to have multiple function pointers in the structure, one for each virtual functions, however, if many structure instances are instantiated, then this  would waste memory.  The tradeoff is another level of indirection in a virtual function call.  The form of the code makes it easy to change back to a multiple function pointer in the structure.

C++
double accumulate(double addend)
{
    m_sum += addend;
    return m_sum;
}

const & Foobar GetFoobar() const
{
    return m_foobar;
}

virtual int doSomething(int anIntegerToUseForSomething) const
{
    // Need to return something here, but make_c_struct.py won't detect
    // that no value is returned as the function body is merely copied
    // into the method in the generated code.
} 

 Property Methods

There is another special keyword to define properties.  A data member for a property should not be declared in the data member section mentioned above.

The syntax to generate a property, which uses the ":property" keyword is:

C++
:property <type> <data_member_name> <property_name>

This defintion will create a method named set<property_name> and a method named get<property_name>.

Here is an example property definition.

C++
property: int m_age Age

The property definition above will result in the code generator writing the following data member (structure element) and methods.  The code headers are omitted here for brevity.  Also, the data member m_age would be declared in the data member section of the class header file.

C++
int m_age;

int Age() const
{
    return m_age;
}

void setAge(int the_value)
{
     m_age = the_value;
}
If the data type used in the declaration of a property is not an intrinsic type, then the code generator will generate slightly different code than shown above.  Here is an example:
C++
property: Person_t * m_person Person
And the generated code is:
C++
Person_t * m_person;

Person_t Person() const
{
    return m_person;
}

void setPerson(const Person_t * the_value)
{
     m_person = the_value;
}

The Message Body

The parser detects the start of a method body by parsing past the method signature to the first open curly bracket character, or '{'.

At that point, ignoring brackets that are contained in string declarations, an open curly bracket causes a counter that is initially at zero to be incremented, and close curly brackets '}' causes the counter to be decremented.  When the count returns to zero, the method body is ended.

If the brackets are not correct in the input file, the program will produce incorrect code.

The Mistakes This Program Makes

As already mentioned, there is minimal lexical analysis of the input file, so garbage in will result in garbage out.

Also, the program makes the assumption that all type names that designate either non-intrinsic (non-built-in) types or are that are not one of the special type names that are handled in the include-file-manager code, will generate an 'include' statement to include a header file with the same names as the type name followed by the extension ".h".  Of course, this often won't be a valid header file name, and therefore it will often be necessary to either delete the include-statement or change the name of the header file in the include statement.

Also, any data types in the body of the code are not detected, and it might be necessary to include header files in the generated code for these types.

Tips and Tricks

If I have some data that I know the program will not generate, such as a comment-block, or 'include statements', and I want these in the class header file, I will declare an inline function, such as:

C++
inline void dummy()
{
#include "foobar.h"
#include "barfoo.h"
}
Later, I edit the file, move the text where it belongs, and delete the unneeded method.

Also, using '@' for the class name allows generating multiple classes with the same set of methods that are all derived from the same base-class by running the program multiple times passing a different class name and the same base class on the input line.

Points of Interest 

The program was written out of a desire to eliminate the necessity to code all the redudant information in a C++ header file and C++ implementation file, and also to write all the boilerplate code that is needed for every class.

This program could be better, but it would have taken many months, if not years, to write a proper program that handled all cases with no errors.  For me, this would have been slower than just writin this tool and then fixing the small errors in include files.

Still, if thousands of people are going to generate code, it might be worthwhile to create an input language grammar, and use yacc, or ANTLR to create a parser, and then add a code generator.  This hypothetical program could handle parsing message body and would be more robust.

History

Initial post


License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Software Developer (Senior)
United States United States
I'm an electrical engineer who has spend most of my career writing software. My background includes Digital Signal Processing, Multimedia programming, Robotics, Text-To-Speech, and Storage products. Most of the code that I've written is in C, C++ and Python. I know Object Oriented Design and I'm a proponent of Design Patterns.

My hobbies include writing software for fun, amateur radio, chess, and performing magic, mostly for charities.

Comments and Discussions

 
-- There are no messages in this forum --