Android* Application Development and Optimization on the Intel® Atom™ Platform

Android on Intel

0/5 (0 vote)

Dec 9, 2013

CPOL

16 min read

18149

Visit Intel® Developer Zone for Android

Abstract

This paper introduces detailed methods for developing and porting an Android application on the Intel Atom platform, and discusses the best known methods for developing applications using the Android Native Development Kit (NDK) and optimizing performance. Android developers can use this document as a reference to build high quality applications for the Intel Architecture.

1. Android Application Classification

Android applications can be classified into two types as shown in Figure 1.

Dalvik applications that include Java* code and use the Android official SDK API only and necessary resource files, such as XML and PNG, compiled into an APK file.
Android NDK applications that include Java code and resource files as well as C/C++ source code and sometimes assembly code. All native code is compiled into a dynamic linked library (.so file) and then called by Java in the main program using a JNI mechanism.

Figure 1: Two types of Android applications

2. Android Native Development Kit

2.1 Introduction

The Android Native Development Kit (NDK) is a companion tool to the Android SDK. The NDK is a powerful tool for developing Android applications because it:

Builds performance-critical portions of your applications in native code. When using Java code, the Java-based source code needs to be interpreted into machine language using a virtual machine. In contrast, the native code is compiled and optimized into binary directly before execution. With proper use of native code, you can build high performance code in your application, such as hardware video encoding and decoding, graphics processing, and arithmetical operation.
Reuses legacy native code. C/C++ codes can be compiled into a dynamic library that can be called by Java code with a JNI mechanism.

2.2 Tools Overview

During the development period, you can use the Intel® Hardware Execution Manager (HAXM) to improve Android simulator performance. HAXM is a hardware-assisted virtualization engine (hypervisor) that uses Intel® Virtualization Technology (Intel® VT) to speed up Android application emulation on a host machine. In combination with Android x86 emulator images provided by Intel and the official Android SDK Manager, HAXM results in a faster Android emulation experience on Intel VT-enabled systems. To get more information about HAXM, visit: http://software.intel.com.

2.3 Installing HAXM

Use Android SDK Manager to install HAXM (recommended), or you can manually install HAXM by downloading the installer from Intel’s web site. If you want to update it automatically, please install it using Android SDK manager as shown in Figure 2. ^[1]

Figure 2: Install Intel HAXM using Android SDK Manager

You can also download an appropriate installation package from http://www.intel.com/software/android to your host platform, and then follow the step-by-step instructions to install it.

2.3.1 Set up HAXM

The Android x86 system image provided by Intel is required when running HAXM. You can download the system image using Android SDK Manager or manually download it from the Intel® Developer Zone website.

After images install successfully, Intel® x86 Android emulator images are automatically executed using the “emulator-x86” binary provided with the Android SDK. The Android emulator is accelerated by Intel VT, which speeds up your development process.

3. Developing and Porting NDK Applications for Intel Atom Architecture

3.1 Developing NDK Applications for Intel Atom Processor-Based Devices

After successfully installing the NDK, please take a few minutes to read the documents in <ndk>/docs/ directory, especially OVERVIEW.html and CPU-X86.html, so that you understand the NDK mechanism and how to use it.

NDK application development can be divided into five steps shown in Figure 3:

Figure 3: NDK Application Development Process

The hello-jni demo is used to illustrate these five steps. You can find the demo in your NDK Root\samples\hello-jni folder [5]. Hello-jni demo is a simple application included in the NDK that get a string from a native method in a shared library and uses it in the application UI.

3.1.1. Create native code

Create a new Android project and place your native source code under <project>/jni/. The project content is shown in Figure 4. This demo includes a simple function in native code called Java_com_example_hellojni_HelloJni_stringFromJNI(). As shown in the source code, it returns a simple string from JNI.

Figure 4: Create Native Code

3.1.2 Create MakeFile ‘Android.mk’

NDK applications are built for the ARM platform by default. To build NDK applications for the Intel Atom platform, you need to add APP_ABI := x86 into the MakeFile.

Figure 5: Create MakeFile

3.1.3 Compile native code

Build native code by running the 'ndk-build' script from the project's directory. It is located in the top-level NDK directory. The result is shown in Figure 6.

Figure 6: Compiled native code

The build tools automatically copy the stripped shared libraries to the proper location in the application's project directory.

3.1.4 Call native code from Java

When you deploy the shared library successfully, you can call the function from Java side. The code is shown in Figure 7. A public native function call stringFromJNI() is created in Java code, and this function loads the shared library using System.loadlibrary().

Figure 7: Call native code from Java

3.1.5 Debug with GDB

If you want to debug the NDK application with GDB, the following conditions must be satisfied:

NDK application is built with 'ndk-build'
NDK application is set to ‘debuggable’ in Android.manifest
NDK application is run on Android 2.2 (or higher)
Only one target is running
Add adb’s directory to PATH

Use the ndk-gdb command to debug the application. You can either set a breakpoint or a step-by-step debug to track the change history of a variable value as shown in Figure 8.

Figure 8: Debug NDK application with GDB

3.2 Porting Existing NDK Applications to Intel Atom Processor-Based Devices

In this section, it is assumed that you have an Android application for the ARM platform and that you need to port it before deploying it on the Intel Atom platform.

Porting Android applications to the Intel Atom platform is similar to the development process. The steps are shown in Figure 9.

Figure 9: Port Android applications to the Intel Atom platform

3.2.1 Port Dalvik applications

Dalvik applications can run on Intel Atom processor-based devices directly. The user interface needs to be adjusted for the target device. For a high resolution device, such as tablets with a 1280*800 resolution or higher, default memory allocation may not meet the application requirements, which results in the inability to launch the application. Increasing the default memory allocation is recommended for high resolution devices.

3.2.2 Port Android NDK applications

Porting NDK applications is a bit more complicated than porting Dalvik applications. All NDK applications can be divided into three types based on the following properties of the native code:

Consists of C/C++ code only that is not related to hardware
Uses a third-party dynamic linked library
Includes assembly code that is highly related to non-IA platforms

Native code that consists of C/C++ code only that is not related to hardware

Recompile the native code to run the application on the Intel Atom platform successfully.
Open the NDK project and search for Android.mk file and add APP_ABI := armeabi armeabi-v7a x86 in Android.mk and re-build the native code with ndk-build.
If the Android.mk file is not found, use the ndk-build APP_ABI="armeabi armeabi-v7a x86" command to build the project.
Package the application again with supported x86 platforms.

If native code uses a third-party dynamic linked library, the shared library must be recompiled into x86 version for the Intel Atom platform.

If native code includes assembly code that is highly related to non-IA platforms, code must be rewritten with IA assembly or C/C++.

4. Best Known Methods for Developing Native Code

4.1 Forced Memory Alignment

Because of differences among architectures, platforms, and compilers, data sizes of the same data structure on different platforms may be different. Without forced memory alignment, there may be loading errors for inconsistent data size. ^[2]

The following example explains data sizes of the same data structure on different platforms:

struct TestStruct { 
 int mVar1; 
 long long mVar2; 
 int mVar3; 
 };

This is a simple struct with three variables called mVar1, mVar2, and mVar3.

mVar1 is an int, and it will cost 4 bytes
mVar2 is long long int which will cost 8 bytes
mVar3 is also int, it will cost 4 byte.

How many spaces are needed on the ARM and Intel Atom platforms?

The size of the data compiled for ARM and Intel Atom platforms with a default compiler switch is shown in Figure 10. ARM automatically adopts double malign and occupies 24 bytes, while x86 occupies 16 bytes.

Figure 10: Memory allocated by default compile flags

The 8-byte (64-bit) mVar2 results in a different layout for TestStruct because ARM requires 8-byte alignment for 64-bit variables like mVar2. In most cases, this does not cause problems because building for x86 vs. ARM requires a full rebuild.

However, a size mismatch could occur if an application serializes class or structures. For example, you create a file in an ARM application and it writes TestStruct to a file. If you later load the data from that file on an x86 platform, the class size in the application is different in the file. Similar memory alignment issues can occur for network traffic that expects a specific memory layout.

The GCC compiler option “-malign-double” generates the same memory alignment on x86 and ARM.

Figure 11: Memory allocated when -malign-double flags are added

4.2 Porting NEON* Instructions to SSE ^[3]

4.2.1 NEON

The ARM NEON* technology is primarily used in multimedia such as smartphones and HDTV applications. According to ARM documentation, its 128 bit SIMD engine–based technology, which is an ARM Cortex*–A Series extension, has at least 3x performance over ARMv5 architecture and at least 2x over the follow-on, ARMv6. For more information on NEON technology visit: http://www.arm.com/products/processors/technologies/neon.php.

4.2.2 SSE: The Intel equivalent

SSE is the Streaming SIMD Extension for Intel Architecture (IA). The Intel Atom processor currently supports SSSE3 (Supplemental Streaming SIMD Extensions 3) and earlier versions, but does not yet support SSE4.x. SSE is a 128-bit engine that handles the packing of floating-point data. The execution model started with the MMX technology, and SSx is essentially the newer generation replacing the need for MMX. For more information, refer to the "Volume 1: Basic Architecture" section of the Intel 64 and IA-32 Architectures Software Developer’s Manuals. The SSE overview, in section 5.5, provides the instructions for SSE, SSE2, SSE3, and SSSE3. These data operations move packed, precision-based floating point values between XMM registers or between XMM registers and memory. XMM registers are intended to be used as a replacement for MMX registers.

4.2.3 NEON to SSE at the assembly level

While using the IA Software Developer Manual as a cross-reference for all the individual SSE(x) mnemonics, also take a look at various SSE assembly-level instructions located at: http://neilkemp.us/src/sse_tutorial/sse_tutorial.html. Use the table of contents to access code samples or peruse the background information.

Similarly, the following manual from ARM provides information about NEON and contains small assembly snippets in section 1.4,”Developing for NEON”: http://infocenter.arm.com/help/topic/com.arm.doc.dht0002a/DHT0002A_introducing_neon.pdf.

Key differences when comparing NEON and SSE assembly code:

Endian-ness. Intel only supports little-endian assembly, whereas ARM supports big or little endian order (ARM is bi-endian). In the code examples provided, the ARM code is little-endian similar to Intel. Note: There may be some ARM compiler implications. For example, compiling for ARM using GCC includes flags –mlittle-endian and –mbig-endian. For more information refer to http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html.
Granularity. In the case of the simple assembly code examples referenced, compare the ADDPS instructions for SSE (Intel) with VADD.ix for NEON such as x = 8 or 16. Notice that the latter puts some granularity on the data to be handled as part of the mnemonic referenced.

Note: These differences are not all-inclusive. You may see other differences between NEON and SSE.

4.2.4 NEON to SSE at the C/C++ level

There are many API nuisances that may occur when porting C/C++ and NEON code to SSE. Keep in mind the assumption is that inline assembly is not being used, but rather, true C/C++ code is being used. NEON instructions also supply some native C libraries. Although these instructions are C code, they cannot be executed on the Intel Atom platform and must be rewritten.

5. Optimizing Application Performance

5.1 Performance Tuning

During the coding process, use the following methods to optimize your application performance on the Intel Atom Platform.

5.1.1 Use Inline instead of Frequently Used Short

Inline functions are best used for small functions such as accessing private data members. Short functions are sensitive to the overhead of function calls. Longer functions spend proportionately less time in the calling/returning sequence and benefit less from inlining. ^[4]

The Inline function saves overhead on:

Function calls (including parameter passing and placing the object's address on the stack)
Preservation of caller's stack frame
New stack-frame setup
Return-value communication
Old stack-frame restore
Return

5.1.2 Use Float instead of Double

FPU is a floating-point unit that is a part of a computer system specially designed to carry out operations on floating point numbers, such as: addition, subtraction, multiplication, division, and square root. Some systems (particularly older, microcode-based architectures) can also perform various transcendental functions such as exponential or trigonometric calculations. Current processors perform these calculations with software library routines. In most modern general purpose computer architectures, one or more FPUs are integrated with the CPU ^[6].

The Intel Atom platform has FPU enabled. In most cases, using Float instead of Double speeds up the data computing process and saves memory bandwidth in Intel Atom processor-based devices.

5.1.3 Multi-thread coding

Multi-thread coding allows you to use the hyper-threading function of the Intel Atom processor to increase throughput and improve overall performance. For more information about multi-threading, refer to: http://www.intel.com/content/www/us/en/architecture-and-technology/hyper-threading/hyper-threading-technology.html.

5.2 Building High Performance Applications with Compiler Flags

As you know, native code is built by GCC in Android applications. But do you know the default target device of GCC? It is the Pentium® Pro processor. Targeted binary code runs best on the Pentium Pro platform if you do not add any flags when compiling your native code. Most Android applications run on the Intel Atom platform instead of Pentium Pro. Adding specific flags according to your target platform is highly recommended. You can add the following recommended flags during compilation on the Intel Atom platform:

-march=atom
-msse4
-mavx
-maes

For more information about compiler parameters, refer to: http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html

6. Conclusion

This document discusses how to develop and optimize Android applications on Intel Atom platforms, as well as how to develop and port NDK applications.

A summary of the key points include:

Most Android applications can execute on the Intel Atom platform directly. NDK applications need to recompile native code. If assembly code is included in the application, this portion of the code must be rewritten.
Make full use of IA (Intel Architecture) features to improve your Android application performance.
Add platform-specific compile switches to make the GCC build code more effective.

Reference

About the Author

Dawei is an application engineer focus on mobile application enabling, including Android applications development and optimization for x86 devices, Web HTML5 application development. Besides, Dawei also has rich experience in mobile application UI and UX design.

Notices

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Any software source code reprinted in this document is furnished under a software license and may only be used or copied in accordance with the terms of that license.

Intel, Atom, Pentium, and the Intel logo are trademarks of Intel Corporation in the US and/or other countries.

*Other names and brands may be claimed as the property of others.

**This sample source code is released under the Intel Sample Source Code License Agreement