Click here to Skip to main content
15,879,348 members
Articles / Programming Languages / C++

Inside C++ – Introduction

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
21 Jul 2014MPL5 min read 19.5K   15   5
C++ is a vast language. Now the language is even more beautiful and exciting and evolving. With Microsoft and Apple backing it up it has a lot of potential. In these series we will talk about C++ in general, different features and some internals of C++. I will be using the llvm/Clang compiler tool c

C++ is a vast language. Now the language is even more beautiful and exciting and evolving. With Microsoft and Apple backing it up it has a lot of potential. In these series we will talk about C++ in general, different features and some internals of C++. I will be using the llvm/Clang compiler tool chain to dive into this language and explore its features. The best thing to follow that I can think of is the Inside the C++ object model. I will follow the amazing book and see it the Clang way.

Note: This is not a C++ beginner tutorial and do not intend to be. You can refer to other books or tutorials available on net.

In this introduction we will see what llvm assembly looks like. You can install Clang installer for windows from the llvm.org website download links.

For a complete description of the llvm assembly refer the llvm documentation. Here I will give a brief introduction, and we will discover the assembly when and where required. LLVM assembly is a typed assembly language. Unlike regular assembly the variables represent types. It is designed so as to represent closely to high level languages and provide better optimization and safety features. Some things to remember:

  • Comments in LLVM assembly begin with a semicolon (;) and continue to the end of the line.
  • Global identifiers begin with the at (@) character. All function names and global variables must begin with @, as well.
  • Local identifiers in the LLVM begin with a percent symbol (%). The typical regular expression for identifiers is[%@][a-zA-Z$._][a-zA-Z$._0-9]*.
  • The LLVM has a strong type system, and the same is counted among its most important features. The LLVM defines an integer type as iN, where N is the number of bits the integer will occupy. You can specify any bit width between 1 and 223- 1.
  • You declare a vector or array type as [no. of elements X size of each element]. For the string “Hello World!” this makes the type [13 x i8], assuming that each character is 1 byte and factoring in 1 extra byte for the NULL character.
  • You declare a global string constant for the hello-world string as follows: @hello = constant [13 x i8] c"Hello World!\00". Use the constant keyword to declare a constant followed by the type and the value. The type has already been discussed, so let’s look at the value: You begin by using c followed by the entire string in double quotation marks, including \0 and ending with 0. Unfortunately, the LLVM documentation does not provide any explanation of why a string needs to be declared with the c prefix and include both a NULL character and 0 at the end. See Resources for a link to the grammar file, if you’re interested in exploring more LLVM quirks.
  • The LLVM lets you declare and define functions. Instead of going through the entire feature list of an LLVM function, I concentrate on the bare bones. Begin with the define keyword followed by the return type, and then the function name. A simple definition of main that returns a 32-bit integer similar to: define i32 @main() { ; some LLVM assembly code that returns i32 }.
  • Function declarations, like definitions, have a lot of meat to them. Here’s the simplest declaration of a putsmethod, which is the LLVM equivalent of printfdeclare i32 puts(i8*). You begin the declaration with the declare keyword followed by the return type, the function name, and an optional list of arguments to the function. The declaration must be in the global scope.
  • Each function ends with a return statement. There are two forms of return statement: ret <type> <value> orret void. For your simple main routine, ret i32 0 suffices.
  • Use call <function return type> <function name> <optional function arguments> to call a function. Note that each function argument must be preceded by its type. A function test that returns an integer of 6 bits and accepts an integer of 36 bits has the syntax: call i6 @test( i36 %arg1 ).

Now we know enough to start compiling some example and play around. In windows create the file test.cpp with the following contents:

class A{
int _i;
};
int main(){
A t;
return 1;
}

Now use the clang.exe with the following command to generate the “test.ll” file which is the file containing llvm assembly.
clang.exe -S -emit-llvm test.cpp

The generated code is listed below:

%class.A = type { i32 }

; Function Attrs: nounwind
define i32 @main() #0 {
entry:
%retval = alloca i32, align 4
%t = alloca %class.A, align 4
store i32 0, i32* %retval
ret i32 1
}

As we know all the local identifiers start with % and it can include a “.”. So the identifier name is “class.A”.  It should contain only a integer. Comments start with “;”. The main function is starting with the key word “define”.

Clang also includes a handy tool to dump the C++ AST (Abstract Syntax Tree) and we can analyze that. This is not always practical and optimal for any kind of broader use, but we can study some basic things from small examples. I will not use any includes or libraries in the code I give as example so as to reduce the complexity of the AST or the IR.

To generate a AST we can give following command:

"clang.exe -fcolor-diagnostics -Xclang -ast-dump test.cpp"

The above statement will give the following output:

JavaScript
TranslationUnitDecl 0x270dc0 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x2710b0 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
|-TypedefDecl 0x2711b0 <test.cpp:1:1, col:32> col:16 FunPtrType 'void (*)(void)'
|-CXXRecordDecl 0x2711e0 <line:2:1, line:9:1> line:2:7 union U definition
| |-CXXRecordDecl 0x2712b0 <col:1, col:7> col:7 implicit union U
| |-FieldDecl 0x271320 <line:3:1, col:5> col:5 _i 'int'
| |-FieldDecl 0x271360 <line:4:1, col:7> col:7 _f 'float'
| |-FieldDecl 0x2713a0 <line:5:1, col:6> col:6 _c 'char'
| |-FieldDecl 0x2713e0 <line:6:1, col:8> col:8 _d 'double'
| |-FieldDecl 0x271420 <line:7:1, col:7> col:7 _p 'void *'
| |-FieldDecl 0x271470 <line:8:1, col:12> col:12 _fp 'FunPtrType':'void (*)(void)'
| |-CXXConstructorDecl 0x271900 <line:2:7> col:7 implicit used U 'void (void) __attribute__((thiscall))' inline noexcept-unevaluated 0x271900
| | `-CompoundStmt 0x271ad8 <col:7>
| `-CXXConstructorDecl 0x2719e0 <col:7> col:7 implicit U 'void (const union U &) __attribute__((thiscall))' inline noexcept-unevaluated 0x2719e0
|   `-ParmVarDecl 0x271aa0 <col:7> col:7 'const union U &'
`-FunctionDecl 0x2714e0 <line:11:1, line:21:2> line:11:5 main 'int (void)'
  `-CompoundStmt 0x271b50 <col:11, line:21:2>
    |-DeclStmt 0x2715e8 <line:12:2, col:27>
    | `-VarDecl 0x271580 <col:2, col:26> col:6 sizeInt 'int'
    |   `-ImplicitCastExpr 0x2715d8 <col:16, col:26> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x2715c0 <col:16, col:26> 'unsigned int' sizeof 'int'
    |-DeclStmt 0x271678 <line:13:2, col:31>
    | `-VarDecl 0x271610 <col:2, col:30> col:6 sizefloat 'int'
    |   `-ImplicitCastExpr 0x271668 <col:18, col:30> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x271650 <col:18, col:30> 'unsigned int' sizeof 'float'
    |-DeclStmt 0x271700 <line:14:2, col:29>
    | `-VarDecl 0x2716a0 <col:2, col:28> col:6 sizeChar 'int'
    |   `-ImplicitCastExpr 0x2716f0 <col:17, col:28> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x2716d8 <col:17, col:28> 'unsigned int' sizeof 'char'
    |-DeclStmt 0x271788 <line:15:2, col:33>
    | `-VarDecl 0x271720 <col:2, col:32> col:6 sizeDouble 'int'
    |   `-ImplicitCastExpr 0x271778 <col:19, col:32> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x271760 <col:19, col:32> 'unsigned int' sizeof 'double'
    |-DeclStmt 0x271818 <line:16:2, col:27>
    | `-VarDecl 0x2717b0 <col:2, col:26> col:6 sizeV 'int'
    |   `-ImplicitCastExpr 0x271808 <col:14, col:26> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x2717f0 <col:14, col:26> 'unsigned int' sizeof 'void *'
    |-DeclStmt 0x2718a0 <line:17:2, col:33>
    | `-VarDecl 0x271840 <col:2, col:32> col:6 sizeFP 'int'
    |   `-ImplicitCastExpr 0x271890 <col:15, col:32> 'int' <IntegralCast>
    |     `-UnaryExprOrTypeTraitExpr 0x271878 <col:15, col:32> 'unsigned int' sizeof 'FunPtrType':'void (*)(void)'
    |-DeclStmt 0x271b10 <line:19:2, col:5>
    | `-VarDecl 0x2718c0 <col:2, col:4> col:4 u 'union U'
    |   `-CXXConstructExpr 0x271ae8 <col:4> 'union U' 'void (void) __attribute__((thiscall))'
    `-ReturnStmt 0x271b40 <line:20:2, col:9>
      `-IntegerLiteral 0x271b20 <col:9> 'int' 1

C++ is a statically typed language. This allows for better optimization of code. So what do we mean by a object model?

It means the properties of the object, the way they are laid out in memory and they way this feat is achieved. This also includes a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. The object model will include inheritance, ways object are passed around, the way they are used to achieve polymorphism, exception throwing etc. We will look at the C++ object model in detail in following chapters.

In the next chapter we will cover the C++ struct, class and objects in C++

Next>

Bibliography

  1. C++ Language Draft used
  2. LLVM
  3. Why llvm IR is better than assembly?
  4. llvm online assembler
  5. llvm introduction – PDF by Nick Sumner
  6. Implementing a language using llvm
  7. Inside C++ Object Model – Stanley Lippman

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)


Written By
Architect
India India
I like to explore different aspects of technology. Try new things, and get delighted. My interests are programming language, and Imaging. But its not hard to work on other things also. Algorithms delight me over a coffee break.

I basically code in C++, but JAVA is not so alien for me. I know few scripting languages also. Basically I feel that knowing a programing language is just a matter of getting introduced to it.

For my other articles check my blog on homepage:

http://brainlesslabs.com/

https://github.com/BrainlessLabsInc

http://www.luxrender.net/en_GB/authors_contributors - SMISRA

Comments and Discussions

 
QuestionMy rating of 5 Pin
Loic URIEN22-Jul-14 23:33
Loic URIEN22-Jul-14 23:33 
AnswerRe: My rating of 5 Pin
BrainlessLabs.com23-Jul-14 1:27
BrainlessLabs.com23-Jul-14 1:27 
QuestionCompiler Pin
geoyar22-Jul-14 8:42
professionalgeoyar22-Jul-14 8:42 
AnswerRe: Compiler Pin
BrainlessLabs.com23-Jul-14 1:32
BrainlessLabs.com23-Jul-14 1:32 
GeneralRe: Compiler Pin
geoyar23-Jul-14 8:50
professionalgeoyar23-Jul-14 8:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.