Do You Really Mind What's Inside Your Computer? (An Introduction to High Performance Computing)

Ravimal Bandara

5.00/5 (1 vote)

Aug 28, 2017

CPOL

7 min read

6943

An introduction to high performance computing

As a programmer, have you ever thought about what is inside your computer or what would be inside your client's computer? I know the answer is "Yes" because any programmer who is willing to deploy his or her product should consider the platform in order to make sure the product runs smoothly. But I know many programmers cannot promise that they optimally utilize the available computer hardware resources for the software product.

I'm going to remind a few things that a programmer should know when the program is critical in execution time, in other words, when a program which is computationally expensive and should be executed in real time. Precisely, I'm going to explain how we can get into high performance computing using a few important things we already have inside our computers, but may never be used.

What is High Performance Computing (HPC)

According to the web definition, HPC is solving advanced computation problems using supercomputers and computer clusters. As we know, supercomputers are not within a reachable domain of a typical programmer and many clients. But still, we can experience HPC with our personal computers. Isn't it a surprise that we can bring the power of a super computer or a computer cluster to our personal computers?

Idea Behind HPC

When we need to process a big data or need to do a computationally expensive process to small or big data, then we can program a solution which can be executed in parallel. If we have big data, then we can divide it into sets of small data and process them in parallel. If we have a computationally expensive process, then we can divide the process into several sub processes and execute in parallel with sharing the data for each process. Execution in parallel can greatly reduce the execution time of certain applications.

When Do We Need HPC?

People use HPC in many cases. The following are the major cases where HPC is used.

Having a big data (sized from a few gegabytes or more) to process (example: Scientists regularly encounter large data sets in many areas, including meteorology and genomics to analyze)
Having a computationally expensive process and need to be executed within a limited time (example: real time object detection in a video requires many filtering tasks, feature detection tasks and many other calculations which should do over each and every pixels in every frame in the video).
The task includes a lot of non-correlated processes and needs to be executed within a limited time (example: in Google search, they extract a lot of information out of our keywords and the way we give the keywords before returning the search result)

At a glance, it looks like HPC is not required in executing software applications that we use in day to day life. But still, we use small scale HPC in modern computer games. We may need HPC in many day to day tasks such as video processing, compressing, video type conversion, image processing, business data analyzing, simulating, 3D image rendering and any photo editing software, etc.

What Do We Need to Do HPC

As I described before, typical HPC needs supercomputer or computer clusters. But we can do HPC in several ways without supercomputers or computer clusters.

Using GPGPU (General-Purpose Computing on Graphics Processing Unit)
Using Coprocessor devices
Using multicore CPU
Using set of SIMD (Single Instruction Multiple Data) instructions of CPU

Other than the above mentioned, we need to train our mind to think in a new way of programming. It is not surprising that we all used to do sequential programming nowadays. But it is the time we need to think in parallel programming. It is not that easy to program which executes as a combination of large number of parallel processes. Let's see the difference between the thinking patterns.

The above image shows 2 ways of calculating the summation of all the elements in an array. It shows how we can reduce the depth of process by executing in parallel. The depth is proportional to the execution time. Hence, reducing the depth decreases the execution time.

However, most of the real world problems do not fit the above simple model. They may have multiple dimensions and multiple steps in the process. Most of the time, those steps should be executed one after another in an orderly manner. Therefore, the parallel implementation depends on the nature of the problem.

There are some known techniques to optimize parallel implementations of certain problems and also many ongoing researches that address the optimization of many general type problems.

The main barrier that we may have is not our knowledge since there are several places that we can get the knowledge about HPC. We need one or several above hardware configuration in order to do HPC. I will explain the above hardware features in the next section.

GPGPU (General-Purpose Computing on Graphics Processing Unit)

If you are a gamer or a graphics designer, you may already know a lot about modern graphics cards and their capabilities and this may be the best option if you are really concerned about money. Many of modern graphic cards are capable of handling processes related to not only graphics but also general computing. Popular brands like ATI and NVIDIA have introduced their own technologies to bring high performance computing.

There are a few popular programming languages that are used in general computing with GPGPU. OpenCL is the currently dominant open general-purpose GPU computing language. The dominant proprietary framework is Nvidia's CUDA which works only with NVIDIA hardware.

I am using a few software that has the capability to utilize NVIDIA CUDA. "Any Video Converter" is a video format converter which is capable of running highly complex conversion algorithms on GPGPU. The following image shows that the software shows its capability of using NVIDIA CUDA technology. I have experienced a good speed gain for X264/H264 (Comes with MPEG-4) video conversion even with my NVIDIA GT210.

Using NVIDIA CUDA technology can do the conversion processes a few times faster than it runs in conventional way.

Cyberlink Power Drector is a video editing software and it also have the capability of using NVIDIA CUDA. You can see the NVIDIA CUDA logo with PowerDirector.

Even Adobe Photoshop can utilize your GPU.

Not only these software, but almost all pro level video editing software have the capability of utilizing GPGPU available in your computer. It might be a good idea to look for a good GPGPU card instead of a bigger RAM or a CPU with many cores if you need to speedup this kind of software.

Whenever you use this kind of software, try to find whether it has these capabilities and if so, you can get the maximum benefit out of your HPC capable graphics card.

Check here to know whether your graphics card has the CUDA capability.

Co-processor Devices

Normally, we have co-processors in our computers inbuilt. But here I am concerned about another kind of co-processors which is really an additional device that can be attached to our computers. These devices normally contain thousands of cores which can handle thousands of processes in parallel. These devices are a bit expensive compared to GPGPUs but they can convert our PC into a supercomputer with suitable software. AMD Stream Processor (aka ATI Firestream) and NVIDIA Tesla are popular co-processors available up to date.

At a glance, the above image looks like an NVIDIA graphic card. But if you look at it carefully, you can't see a video out port. That is because this device is made for general processing. This single device may contain more than thousand cores. If we need more processing power, we can scale up the device by combining more devices. The below image shows an array of tesla cards fitted in a single machine.

The secret of this device is the principle that "thousands of lite weight processors are better than a few heavy weight processors for most of the applications". The best real world example I have ever heard for explaining this principle is: if a set of 100 people have to travel 100km then they can use either 2-3 buses (analogy to costly and heavy duty processors) or 50 scooters (analogy to low cost lite duty processors), but definitely using 50 scooters will do the job faster than the buses.

To be continued...