EPUB Viewer for Android with Text to Speech

dteviot

4.98/5 (33 votes)

May 14, 2013

CPOL

20 min read

200632

5086

This is an Android application that shows the basics of building an EPUB file viewer for Android.

Download source - 450.4 KB

Introduction

This application shows the basics of building an EPUB file viewer for Android.

The features of the viewer are:

A "list view" of the .epub files on the SD card.
A "list view" of the book's Table of Contents (ToC), with the ability to select a ToC entry, and jump to it.
Viewing an EPUB's contents.
Ability to set a bookmark and return to the bookmark.
Using Android's Text to Speech API to read the book aloud.

What this article will cover:

A summary of the EPUB file format, and the steps involved in reading it.
How to use the android.sax library to parse an EPUB file.
How to use the android WebClient to display HTML, including:
- Using Android 3.0's shouldInterceptRequest() to fetch the HTML to display.
- Web Client differences between Android 2.3 and 3.0.
- Dealing with the Web Client's caching
- Getting the current scroll position of the HTML document
- Restoring the scroll position, when a document is reloaded
- Adding "Fling" gesture handling to the WebClient
- Formatting URIs
How to set up a chain of XMLFilters to process an XML document with a SAX parser.
Converting a SAX parser's output back into XML.
Using Parcelable to package data into an intent, for passing to an activity.
Using Android's Text to Speech API

Caution, as this project is intended to show the basics of viewing an EPUB file, there's a number of simplifying assumptions I've made.

All files are UTF-8, don't handle UTF-16
Language is English (only really relevant for Text to Speech)
The EPUB files are well formed. (No error handling in XML parsing.)
Only supporting EPUB 2.0 (and a limited set at that, e.g. minimal SVG support, not all manifest attributes.)

Using the code

If you don't know how to set up Eclipse and the Android SDK, go here for instructions.

Download the project, unzip and import into Eclipse. Requires minimum of Android 2.3

EPUB File Format and Parsing an EPUB

Wikipedia provides a good description of EPUB format, including links to the official documentation. For those of you who don't want to read it, here's the 30 second executive summary. An EPUB file is ZIP file that contains HTML files and images (often in multiple folders) and a few XML files. The HTML (actually XHTML) files and images are the content of the book. The XML files are metadata, covering things like:

Information on the book itself, e.g. Title, Author(s), Publisher, etc.
Details of the HTML files. e.g. Format for each file, order to read them in, which file corresponds to each chapter of the book, etc.
The table of contents.

As the EPUB file is a zip file, the first thing we need is a function that allows us to extract the files from a zip file. We can use the java.util.zip.ZipFile class in Android to do most of the work for us.

public class Book {
    private ZipFile mZip;

    public Book(String fileName) {
        try {
            mZip = new ZipFile(fileName);
        } catch (IOException e) {
            Log.e(Globals.TAG, "Error opening file", e);
        }
    }

    public InputStream fetchFromZip(String fileName) {
        InputStream in = null;
        ZipEntry containerEntry = mZip.getEntry(fileName);
        if (containerEntry != null) {
            try {
                in = mZip.getInputStream(containerEntry);
            } catch (IOException e) {
                Log.e(Globals.TAG, "Error reading zip file " + fileName, e);
            }
        }
        return in;
    }
}

We will also need to parse the XML files. Android has a number of ways to parse an XML file, IBM's DeveloperWorks has an excellent article on the options. We're going to use a SAX parser. The basic idea of the SAX approach is you write a ContentHandler and plug it into the SAX pipeline. There is a certain amount of boiler plate involved in setting up a SAX pipeline, but the following helper function will handle it for us.

void parseXmlResource(String fileName, ContentHandler handler) {
    InputStream in = fetchFromZip(fileName);
    if (in != null) {
        try {
            try {
                // obtain XML Reader
                SAXParserFactory parseFactory = SAXParserFactory.newInstance();
                XMLReader reader = parseFactory.newSAXParser().getXMLReader();
                
                // connect reader to content handler
                reader.setContentHandler(handler);

                // process XML
                InputSource source = new InputSource(in);
                source.setEncoding("UTF-8");
                reader.parse(source);
            } finally {
                in.close();
            }
        } catch (ParserConfigurationException e) {
            Log.e(Globals.TAG, "Error setting up to parse XML file ", e);
        } catch (IOException e) {
            Log.e(Globals.TAG, "Error reading XML file ", e);
        } catch (SAXException e) {
            Log.e(Globals.TAG, "Error parsing XML file ", e);
        }
    }
}

The first step in parsing an EPUB file is to read the container.xml file for the location of the .opf file. From the EPUB specification, the container file is always called "container.xml" and must be in the folder "META-INF". A typical example is:

<?xml version="1.0" encoding="UTF-8" ?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

To get the location of the .opf file, we look through the <rootfile> elements until we find one with a media-type of application/oebps-package+xml. The full-path attribute of this element gives us the name of the .opf file which holds most of the metadata needed to understand the contents of the EPUB file. We need to write a ContentHandler to do this, and write the .opf filename to mOpfFileName, a String variable of the Book class.

The traditional way to create a ContentHandler is to derive a class from ContentHandler overriding the startElement() and/or endElement() functions (and possibly others) as required by your needs. The difficulty with this pattern is that these functions are called for every element, regardless of type. Thus, the startElement() implementation needs to know how to parse each type of element it will encounter. But, things are even more complicated. It also has to keep track of where it is in the XML schema, so that it can know which element it is currently working on. When working with a non-trivial schema, this frequently results in complicated code, as the logic to parse the individual elements gets mixed up with the schema tracking logic.

A little thought suggests a better design for building parsers would separate the two concerns. Something like this:

Describe the relationship of the XML elements
- Start with the root element of the tree.
- List the expected child elements of the root node we have an interest in.
- For each child element, recursively list its child elements we have an interest in.
For each type of element we've defined, specify how to parse it by providing appropriate startElement() and/or endElement() logic.

The good news is that the android.sax package allows us to construct a ContentHandler in exactly this manner.

private static final String XML_NAMESPACE_CONTAINER = "urn:oasis:names:tc:opendocument:xmlns:container";

private String mOpfFileName;

private ContentHandler constructContainerFileParser() {
    // describe the relationship of the elements
    RootElement root = new RootElement(XML_NAMESPACE_CONTAINER,"container");
    Element rootfilesElement = root.getChild(XML_NAMESPACE_CONTAINER,"rootfiles");
    Element rootfileElement = rootfilesElement.getChild(XML_NAMESPACE_CONTAINER, "rootfile");

    // how to parse a rootFileElement 
    rootfileElement.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            String mediaType = attributes.getValue("media-type");
            if ((mediaType != null) && mediaType.equals("application/oebps-package+xml")) {
                mOpfFileName = attributes.getValue("full-path"); 
            }
        }
    });
    return root.getContentHandler();
}

As you can see from the above code, to use the android.sax, you:

start by creating a RootElement.
You then add child Elements to match the XML elements you're interested in. Note, you can leave out any elements you're not interested in.
For each element, you use setStartElementListener, setEndTextElementListener, and/or setEndElementListener to add the logic to process/extract the wanted portions of the Element.
Finally, getContentHandler() is called to package it all up into a ContentHandler that can be to passed to a XMLReader.

Putting all the pieces together, we can obtain the name of the .opf file with the following code:

parseXmlResource("META-INF/container.xml", constructContainerFileParser());

The .opf file contains a manifest, a spine, and some other things we don't care about. Namely, metadata and guide. A .opf file looks something like this:

<?xml version="1.0"  encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="calibre_id">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <-- assorted values here, which we don't care about, so I've deleted then-->
    </metadata>
    <manifest>
        <item id="id1" href="title_page.html" media-type="application/xhtml+xml"/>
        <item href="chapter1.html" id="id2.1" media-type="application/xhtml+xml"/>
        <item href="chapter2.html" id="id2.2" media-type="application/xhtml+xml"/>
        <item id="id3.1" href="stylesheet1.css" media-type="text/css"/>
        <item id="id3.2" href="stylesheet2.css" media-type="text/css"/>
        <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
        <item href="content/resources/_cover_.jpg" id="id4.1" media-type="image/jpeg"/>
    </manifest>
    <spine toc="ncx">
        <itemref idref="id1"/>
        <itemref idref="id2.1"/>
        <itemref idref="id2.2"/>
    </spine>
</package>

The manifest element holds a list of all the files contained in the zip file.

A manifest item's href element is the name (including path) of the file in the zip. Note that the path may be relative to the .opf file's location. The media-type attribute is the mimetype of the file. The id attribute links to the spine's idref attribute.

The spine provides two things, the reading order for the files in the manifest and the table of contents file. The order to read the files is trivial. The items in the spine are in the correct order and it's just a case of matching the idref attributes in the spine to the id attributes in the manifest. Obtaining the table of contents is almost as easy. The spine's toc attribute matches the id attribute of the manifest entry for the table of contents file. Given this, content handler to parse an .opf file is:

private static final String XML_NAMESPACE_PACKAGE = "http://www.idpf.org/2007/opf";

private HashMap<String, String> mManifestIndex;
private HashMap<String, String> mManifestMediaTypes;
private ArrayList<String> mSpine;
private String mTocName;

private ContentHandler constructOpfFileParser() {
    // describe the relationship of the elements
    RootElement root = new RootElement(XML_NAMESPACE_PACKAGE, "package");
    Element manifest = root.getChild(XML_NAMESPACE_PACKAGE, "manifest");
    Element manifestItem = manifest.getChild(XML_NAMESPACE_PACKAGE, "item");
    Element spine = root.getChild(XML_NAMESPACE_PACKAGE, "spine");
    Element itemref = spine.getChild(XML_NAMESPACE_PACKAGE, "itemref");
    
    // build up list of files in book from manifest    
    manifestItem.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            String href = attributes.getValue("href");
            // href may be a relative path, so need to resolve
            href = FilenameUtils.concat(FilenameUtils.getPath(mOpfFileName), href);
            mManifestIndex.put(attributes.getValue("id"), href);
            mManifestMediaTypes.put(href, attributes.getValue("media-type"))
        }
    });

    // get name of Table of Contents file from the Spine
    spine.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            String toc = attributes.getValue("toc");
            mTocName = mManifestIndex.get(toc).getHref();
        }
    });
    
    // Build "spine", the files in the zip in reading order
    itemref.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            mSpine.add(attributes.getValue("idref"));
        }
    });
    return root.getContentHandler();
}

The table of contents file (also known as a .ncx file) contains a hierarchical table of contents, along with assorted metadata. It is used to provide the user of an e-book a table of contents where the user can select an item in the table, and then have the book jump to that position in the book. An example of a .ncx file is as follows:

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en">
    <head>
        <meta name="dtb:uid" content="ae60509a-b048-5f93-abd0-5333f347e4c1"/>
        <meta name="dtb:depth" content="3"/>
        <meta name="dtb:totalPageCount" content="0"/>
        <meta name="dtb:maxPageNumber" content="0"/>
    </head>
    <docTitle><text>Tax Guide</text></docTitle>
    <docAuthor><text>IRS</text></docAuthor>
    <navMap>
        <navPoint id="116f4d31-2b73-4fbd-85c4-8d437f6fccc1" playOrder="1">
            <navLabel><text>Volume 1</text></navLabel>
            <content src="volume1.html"/>
            <navPoint id="1563d3d9-33c5-472e-bcf4-587923f3137b" playOrder="2">
                <navLabel><text>Chapter 1</text></navLabel>
                <content src="volume1/chapter001.html"/>
            </navPoint>
            <navPoint id="1563d3d9-33c5-472e-bcf4-587923f3137d" playOrder="3">
                <navLabel><text>Chapter 2</text></navLabel>
                <content src="volume1/chapter002.html"/>
                <navPoint id="1563d3d9-33c5-472e-bcf4-587923f3137c" playOrder="4">
                    <navLabel><text>Section 1</text></navLabel>
                    <content src="volume1/chapter002.html#Section_1"/>
                </navPoint>
            </navPoint>
        </navPoint>
        <navPoint id="1563d3d9-33c5-472e-bcf4-587923f3137a" playOrder="5">
            <navLabel><text>Volume 2</text></navLabel>
            <content src="volume2.html"/>
        </navPoint>
    </navMap>
</ncx>

The major points to note are:

The table of content information is provided by the <navPoint> elements.
Each <navPoint> represents a Table of Contents item to show the user.
The <navPoint> elements are in order.
<navPoint> elements can be (but usually are not) nested.
The name of the item to show to the user is the <navLabel> element.
The src attribute is where the content is. If the attribute contains a hash '#' character, then the part to the left of the hash is the name of the file in the zip holding the content. The part to the right of the hash is the position in the file where the content begins.

A navPoint can be represented by a Plain Old Java Object (POJO):

public class NavPoint {
    private String mNavLabel;
    private String mContent;
    
    public String getNavLabel() { return mNavLabel; }
    public String getContent() { return mContent; }

    public void setNavLabel(String navLabel) { mNavLabel = navLabel; }
    public void setContent(String content) { mContent = content; }
}

Using this class, we can parse the Table of Contents.

private ArrayList<NavPoint> mNavPoints;
private int mCurrentDepth = 0;
private int mSupportedDepth = 1;

// Used to fetch the last navPoint we're building
public NavPoint getLatestPoint() {
    return mNavPoints.get(mNavPoints.size() - 1);
}

private ContentHandler constructTocFileParser() {
    RootElement root = new RootElement(XML_NAMESPACE_TABLE_OF_CONTENTS, "ncx");
    Element navMap = root.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS, "navMap");
    Element navPoint = navMap.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS, "navPoint");
    AddNavPointToParser(navPoint);
    return root.getContentHandler();
}

// Build up code to parse a ToC NavPoint
private void AddNavPointToParser(final Element navPoint) {
    // describe the relationship of the elements
    Element navLabel = navPoint.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS, "navLabel");
    Element text = navLabel.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS, "text");
    Element content = navPoint.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS, "content");
    
    navPoint.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            mNavPoints.add(new NavPoint());
            // extend parser to handle another level of nesting if required
            if (mSupportedDepth ==  ++mCurrentDepth) {
                Element child = navPoint.getChild(XML_NAMESPACE_TABLE_OF_CONTENTS,"navPoint");
                AddNavPointToParser(child);
                ++mSupportedDepth;
            }
        }
    });
    text.setEndTextElementListener(new EndTextElementListener(){
        public void end(String body) {
            getLatestPoint().setNavLabel(body);
        }
    });
    content.setStartElementListener(new StartElementListener(){
        public void start(Attributes attributes) {
            getLatestPoint().setContent(attributes.getValue("src"));
        }
    });
    navPoint.setEndElementListener(new EndElementListener(){
        public void end() {
            --mCurrentDepth;
        }
    });
}

You may notice that to cope with the arbitrary nesting of navPoints, in StartElementListener the nesting level is tracked and additional levels are added (as required) to the ContentHandler while the XML is being parsed.

Providing a Table of Contents

chapters screenshot

The simplest way to provide a Table of Contents is to use a ListActivity, passing it mNavPoints. The list view shows the NavLabel as the text for each item. When an item is selected, the corresponding content is returned. Passing mNavPoints to the ListActivity is an interesting problem. Probably the easiest way is to make the NavPoint class parcelable. In which case we can add mNavPoints to the intent that launches the ListActivity with a single line of code.

showTocIntent.putExtra("CHAPTERS_EXTRA", mNavPoints);

Likewise, extracting mNavPoints from the bundle passed to ListActivity.onCreate() is also a single line of code.

mNavPoints = getIntent().getParcelableArrayListExtra("CHAPTERS_EXTRA");

The steps to make NavPoint parcelable are quite simple, and are given in the Google's android documentation. They are:

Have the class implement the android.os.Parcelable interface.
Add the boilerplate code provided in the android instructions, replacing MyParcelable in the boilerplate with your class.
Implement writeToParcel() and the private constructor as appropriate for your class.

The rest of the steps involved in creating a ListActivity to display a table of contents from an array of NavPoints are trivial. You can look at this project's file ListChaptersActivity.java. For a more detailed explanation, I recommend this article.

Viewing the Content (on Android 3.0 and above)

The obvious way to view the HTML files in the zip would be to use the android.webkit.WebView class, as it's whole purpose is to display HTML files. And the obvious way to use it is to extract the HTML files from the EPUB, and send them to the WebView using loadDataWithBaseURL(String baseUrl, String data, String mimeType, String encoding, String historyUrl). Unfortunately, this doesn't work because the HTML files have links to other documents. For example.

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Cover</title>
        <link href="resources/stylesheet00.css" type="text/css" charset="UTF-8" rel="stylesheet"/>
    </head>
    <body>
        <img src="resources/cover.jpg" alt="cover" style="height: 100%"/>
</html>

As you can see, this HTML file has both a stylesheet and an image. So, when WebView tries to show this document, it will try to obtain the stylesheet and JPEG. As these are both buried inside the EPUB's zip file, the WebView is unable to obtain them and it will fail to show the document. If we're running on Android 3.0, (i.e. Honeycomb) or above, we can solve this problem by intercepting the calls the WebClient makes to obtain the linked resources. This is done by setting the WebView's WebViewClient to a WebViewClient instance that overrides shouldInterceptRequest() to retrieve the desired file. E.g.:

public class EpubWebView extends WebView {

    public EpubWebView(Context context, AttributeSet attrs) {
        super(context, attrs);
        settings.setCacheMode(WebSettings.LOAD_NO_CACHE);        
        setWebViewClient(new WebViewClient() {
            @Override
            public WebResourceResponse shouldInterceptRequest(WebView view, String url) {
                // WebView will now call onRequest when it wants a file loaded
                // we implement this function to fetch the file from the EPUB
                return onRequest(url);
            }
        });
    }

However, there is still a small problem. If you look at onRequest() you will note that the requested file is given by a URL. You may also note that loadDataWithBaseURL() takes a URL parameter. Thus, we need a way to convert the file names in the zip file into URLs, and back. We also need to cope with the fact that the zip file may contain folders, and the links in the HTML files may be relative. E.g.:

Assume we wish to show the HTML file "content/title.xhtml". This file references the image "content/resource/cover.jpg". Then the <img> element in the HTML file will look something like '<img src="resource/cover.jpg"/>':

Converting the file names into file URIs solves this problem. If "file:///content/title.xhtml" is passed in as the URL, then the WebView will ask for the jpeg with a URL of "file:///content/resource/cover.jpg". Converting file names into URIs is not just a case of tacking "file:///" to the file name, as many of the characters that can appear in a file name need to be escaped. Fortunately, Android provides APIs to do most of the work.

/*
 * @param url used by WebView
 * @return resourceName used by zip file
 */
private static String url2ResourceName(Uri url) {
    // we only care about the path part of the URL
    String resourceName = url.getPath();
    
    // if path has a '/' prepended, strip it
    if (resourceName.charAt(0) == '/') {
        resourceName = resourceName.substring(1);
    }
    return resourceName;
}

/*
 * @param resourceName used by zip file
 * @return URL used by WebView 
 */
public static Uri resourceName2Url(String resourceName) {
    // pack resourceName into path section of a file URI
    // need to leave '/' chars in path, so WebView is aware
    // of path to current resource, so it can can correctly resolve
    // path of any relative URLs in the current resource.
    return new Uri.Builder().scheme("file")
            .authority("")
            .appendEncodedPath(Uri.encode(resourceName, "/"))
            .build();
}

Given all the above, implementing onRequest(), in our Book class, is trivial.

public WebResourceResponse onRequest(String url) {
    String resourceName = url2ResourceName(Uri.parse(url));
    return new WebResourceResponse(
        fetchFromZip(resourceName),
        "UTF-8",
        mManifestMediaTypes.get(resourceName)
    );
}

At this point, calling loadDataWithBaseURL() is no longer required. Instead, call loadUrl(String url). However, before calling loadUrl(), "clearCache(false)" must be called. This is because WebView caches resource requests, and when the WebView thinks a resource is cached, it will not call shouldInterceptRequest(). This can be a problem, because many EPUB books use the same name for their stylesheet (stylesheet.css) and cover page image (cover.jpg). Thus, if you view one EPUB book, and then view another, the WebView will use the cached resource from the previous book. This can be quite disorientating for the user to open an EPUB and get the title page from the previous book. The only workaround I've found to the problem is to explicitly clear the cache before calling loadUrl(). Setting the WebView's cache mode to LOAD_NO_CACHE does not work.

Viewing the Content on Android 2.3 (and below)

As you've probably guessed by now, Android 2.3 does not have shouldInterceptRequest(). The workaround is to rewrite the XHTML to put any resources inline and then pass the XHTML to the WebView by calling loadDataWithBaseURL(). To use our example XHTML from earlier:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Cover</title>
        <link href="resources/stylesheet00.css" type="text/css" charset="UTF-8" rel="stylesheet"/>
    </head>
    <body>
        <img src="resources/cover.jpg" alt="cover" style="height: 100%"/>
</html>

This has two sorts of links; a stylesheet and an image. Conceptually, a stylesheet link is easy to remove. The steps are:

Find the stylesheet link(s). In this case, there's just one, <link href="resources/stylesheet00.css" type="text/css" charset="UTF-8" rel="stylesheet"/>
Find the referred stylesheet file ("resources/stylesheet00.css") in the zip.
Fetch the contents of the stylesheet. e.g. .bold {font-weight: bold}
Replace the link in the XHTML with a stylesheet element holding the stylesheet's contents. e.g. <style>.bold {font-weight: bold}</style>

Removing the link in image element is a similar, but slightly more difficult, process because we can't just inject the jpeg's raw bytes into the XHTML. Instead, we need to pack the JPEG into a DataURI and put that into the XHTML. Which I will get to in a moment.

The final link I've seen in an EPUB is a SVG image. These look something like:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" 
         width="100%" height="100%" viewBox="0 0 300 400" preserveAspectRatio="xMidYMid meet">
    <image width="600" height="800" xlink:href="Cover.jpg"/>
</svg>

This is a problem because the Android 2.3 WebView does not have support for SVG elements. The solution I use is to convert these into <img> elements.

DataURI

Wikipedia has an excellent article on DataURIs. However, the concept is to replace a link to a file with the base64 encoded contents of the file. e.g. Replace the image tag from the example above:

<img src="resources/cover.jpg" alt="cover" style="height: 100%"/>

with something like:

<img src="data:image/png;base64,iQU228cmaui98as..." alt="cover" style="height: 100%"/>

The "iQU228cmaui98as..." that I've shown here is just the start of a base64 sequence that usually runs to tens of thousands of characters. You will note that the DataURI includes the mime type of the file. Obtaining this does not present a problem, as the manifest includes this information, which we stored away in the "mManifestMediaTypes" member. So, we can retrieve a file as a DataURI with the following function:

public static String fetchDataUri(String fileName)  throws IOException {
    StringBuilder sb = new StringBuilder("data:");
    sb.append(mManifestMediaTypes.get(fileName));
    sb.append(";base64,");

    int buflen = 4096;
    byte[] buffer = new byte[buflen];
    int offset = 0;
    int len = 0;

    InputStream in = fetchFromZip(fileName);
    while (len != -1) {
        len = in.read(buffer, offset, buffer.length - offset);
        if (len != -1) {
            // must process a multiple of 3 bytes, so that padding chars are not emitted
            int total = offset + len;
            offset = total % 3; 
            int bytesToProcess = total - offset;
            if (0 < bytesToProcess) {
                sb.append(Base64.encodeToString(buffer, 0, bytesToProcess, Base64.NO_WRAP));
            }
            // shuffle unused bytes to start of array
            System.arraycopy(buffer, bytesToProcess, buffer, 0, offset);
        } else if (0 < offset) {
            // flush
            sb.append(Base64.encodeToString(buffer, 0, offset, Base64.NO_WRAP));
        }
    }
    return sb.toString();
}

Implementing an XML filter pipeline with XMLFilterImpl

So we're now at the stage where we want to take the XHTML file, run a series of conversions (e.g. turn any links into embedded resources) and feed the resulting XHTML into the WebView. How can we do this?

Well, so far we've being doing our processing using the ContentHandler hooked into an XMLReader. But for this XHTML conversion, writing a single ContentHandler to do the job would require a pretty complex ContentHandler. A better solution is to use the XMLFilterImpl class. This class allows us to build a XML processing "pipeline" of simpler processing stages, where the output of each stage is the input of the next. I.e. instead of this:

file -> XMLReader -> ContentHandler -> file

We have this:

file -> XMLReader -> InlineStyleSheetFilter -> SvgFilter -> InlineImageFilter -> XMLWriter -> file

Using the XMLFilterImpl is pretty simple. It derives from ContentHandler, so the basic steps are:

Build a ContentHandler much as you normally would, except derive from XMLFilterImpl
In each ContentHandler function you override, call XMLFilterImpl's version of the function with the changed parameters.

Example, here's what a filter to convert <img> elements into DataURIs looks like.

public class InlineImageElementFilter extends XMLFilterImpl {

    @Override
    public void startElement(String namespaceURI, String localName, 
            String qualifiedName, Attributes attrs) throws SAXException {
        if (localName.equals("img")) {
            attrs = XmlUtil.replaceSrcAttributeValueWithDataUri(attrs);
        }
        super.startElement(namespaceURI, localName, qualifiedName, attrs);
    }
}

Hooking the filters together in a pipeline is also easy, although it's somewhat non-intuitive.

Start with the XMLReader.
Add filters by calling setParent() for the new filter, passing it the existing filter that will precede it in the pipeline.
Once the pipeline has been created, call parse() on the last filter added to the pipe.

E.g. to implement the pipeline:

XMLReader -> InlineStyleSheetFilter -> RemoveSvgFilter -> InlineImageFilter -> XMLWriter

The code would look like:

//create the filters
XMLReader     reader           = makeReader();
XMLFilterImpl stylesheetFilter = new InlineStyleSheetFilter(uri, getBook());
XMLFilterImpl svgFilter        = new RemoveSvgFilter();
XMLFilterImpl imageFilter      = new InlineImageFilter(uri, getBook());
XMLFilterImpl xmlWriter        = new XmlWriter(uri, getBook());

// build the pipeline
stylesheetFilter.setParent(reader)
svgFilter.setParent(stylesheetFilter);
imageFilter.setParent(svgFilter);
xmlWriter.setParent(imageFilter);

// run pipeline
xmlWriter.parse(source);

Producing XML output from an Android SAX parser

As some of you may be aware, ContentHandlers do not produce XML output. Which begs the question, how is it done? Well, Android provides us with the org.xmlpull.v1.XmlSerializer class for producing XML. Unfortunately, this class does not derive from ContentHandler so it can't just be plugged into a SAX pipeline.

However, its member functions are very similar to those of the ContentHandler, in fact, they're almost a one-to-one match. Thus, I introduce to you the XmlSerializerToXmlFilterAdapter, which takes a XmlSerializer and wraps it in a XMLFilterImpl so the XmlSerializer can be plugged into a pipeline. With the error handling stripped out, it looks like this:

public class XmlSerializerToXmlFilterAdapter extends XMLFilterImpl {
    XmlSerializer mSerializer;
 
    public XmlSerializerToXmlFilterAdapter(XmlSerializer serializer) {
        mSerializer = serializer;
    }
    
    @Override
    public void startElement(String namespaceURI, String localName, 
            String qualifiedName, Attributes attrs) throws SAXException {
        super.startElement(namespaceURI, localName, qualifiedName, attrs);
        mSerializer.startTag(namespaceURI, localName);
        for(int i = 0; i < attrs.getLength(); ++i) {
            mSerializer.attribute(attrs.getURI(i), 
                    attrs.getLocalName(i), attrs.getValue(i));    
        }
    }

    @Override
    public void endElement(String namespaceURI, String localName, 
            String qualifiedName) throws SAXException {
        super.endElement(namespaceURI, localName, qualifiedName);
        mSerializer.endTag(namespaceURI, localName);
    }
    
    //...
    // overrides for the other ContentHandler functions.
    // startDocument(), endDocument() etc.
}

Supporting both 2.3 and 3.0 in one app

From the previous sections, it should be reasonably obvious that there are really only two differences between an Android 2.3 and 3.0 EPUB viewer.

In 3.0, the WebViewClient we create implements a shouldInterceptRequest() handler, which is not available in 2.3.
In 3.0, we can load content into the WebView by calling loadUrl() with a URI for the file to load, in 2.3 we have to fetch the XML and pre-process it before passing the actual XML to the WebView via loadDataWithBaseURL().

To produce a single app that works on both 2.3 and 3.0, the following steps are required.

protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    EpubWebView epubWebView = null;
    if (android.os.Build.VERSION.SDK_INT <= android.os.Build.VERSION_CODES.GINGERBREAD_MR1) {
        epubWebView = new EpubWebView23(this); 
    } else {
        epubWebView = return new EpubWebView30(this); 
    }
    setContentView(epubWebView);
}

In the AndroidManifest.xml, we set android:minSdkVersion="9" and android:targetSdkVersion="16" (9 = Android 2.3 and 16 = 3.0).
We take our WebView derived class "EpubWebView" and make two virtual functions, createWebClient() and loadUri().
We derive two classes from EpubWebView, one for Android 2.3, the other for Android 3.0. In each class we implement the virtual functions, using logic that is appropriate for the intended Android version.
We decorate the Android 3.0 class with "@TargetApi(Build.VERSION_CODES.HONEYCOMB)", so we don't get compiler warnings that we're using functions that are not available in 2.3.
In MainActivity's onCreate() we check the version of Android, create an appropriate EpubWebView and set it to the activity's view. E.g.:

Obviously, we could have solved the problem by just using the Android 2.3 code, but that would have cost us the benefits of using the 3.0 features when they were available (such as SVG support).

We could also have solved the problem by having a single EpubWebView class, and inside the createWebClient() and loadUri() functions check the Android version and execute the appropriate logic. This is not a good idea, as you wind up with OS dependant code (and tests for the OS version) being spread throughout the code. E.g. consider what EpubWebView would look like if there were, say, 10 differences between 2.3 and 3.0 we needed to allow for.

Implementing a bookmark

A bookmark requires three things to be recorded.

The EPUB file being viewed.
The selected HTML file of the EPUB.
What part of the HTML is being viewed. This is needed because the HTML files can be many pages long, often an entire chapter.

Unfortunately, in Android, there's no official API to obtain the text being shown on the screen. But, we can do something close. WebClient's getContentHeight() function returns a 32 bit integer indicating the viewing length of the HTML, and getScrollY() returns an integer indicating how the text currently at the top of the screen corresponds to the length of the HTML. Dividing getScrollY() by getContentHeight() gives ratio of how far though the current HTML the user is. We use a ratio, because it allows us to handle font size changes and changing the screen orientation between landscape and portrait.

To restore a bookmark, scrolling the WebView to the desired position is done by calling scrollTo() once the HTML is loaded. According to the Android Docs, if you add a onPageFinished() handler to the WebViewClient, when a HTML file finishes loading, the handler will be called. So, the obvious solution is to add a onPageFinished() handler to the WebViewClient, and have the handler call scrollTo(). Unfortunately, this doesn't work, because the handler is called before the viewing length is calculated, so the scrollTo() call does nothing. We must wait until the viewing length has been calculated. We can do this by using PictureListener(). This is a little complicated because PictureListener() gets called a lot and it's very resource intensive. Which is probably why it has been depreciated. The solution is to only set PictureListener() when we need it, i.e. when onPageFinished() is called, and tear it down when we're done.

Thus, steps for restoring a bookmark are:

Set flag to indicate we're restoring a bookmark. (Because the onPageFinished() is called every time a HTML file is loaded.)
Call loadUrl().
When page is initially loaded. onPageFinished() is called by OS. In onPageFinished(), if bookmark flag is set, register a PictureListener.
When WebView finishes working out page layout, PictureListener is called, which unregisters itself, gets contentHeight and calls scrollTo() to set the correct position.

The resulting code for handling the PictureListener is simple, the only point of interest is the use of @SuppressWarnings, to stop the compiler warnings. Note, setting an onPageFinished() handler is very similar to setting a shouldInterceptRequest() handler, which has already been shown.

private boolean mRestoringBookmark;
private float mScrollY;

@SuppressWarnings("deprecation")
protected void onPageFinished() {
    if (mRestoringBookmark) {
        setPictureListener(mPictureListener);
        mRestoringBookmark = false;
    }
}

@SuppressWarnings("deprecation")
private PictureListener mPictureListener = new PictureListener() {
    @Override
    public void onNewPicture(WebView view, Picture picture) {
        // stop listening 
        setPictureListener(null);
        
        scrollTo(0, (int)(getContentHeight() * mScrollY));
    }
};

Implementing horizontal swipe to move between HTML files

This is almost identical to the method used in my Comic Book Viewer article. To detect flings, we override the WebView's onTouchEvent(), pass the MotionEvents onto a GestureDetector, and override GestureDetector.SimpleOnGestureListener.onFling() to be called when a fling occurs. The only part that's different is that the WebView has its own gesture processing logic. To make sure that works, in onTouchEvent() we must pass any MotionEvents() that our GestureDetector doesn't use on to the WebView. E.g.:

@Overridee
public boolean onTouchEvent(MotionEvent event) {
    return mGestureDetector.onTouchEvent(event) ||
         super.onTouchEvent(event);
}

The fling logic is trivial, as we have the HTML files in viewing order (from the spine), we just locate the current HTML file in the spine, find the next (or previous) HTML file, and call WebView's loadUrl() with the HTML's URI.

Using Android Text To Speech (TTS) API

The basic steps for doing text to speech in Android are:

Check if TTS is present. This requires launching an intent and checking the return in onActivityResult(). Except, on Android 4.1 and above, this doesn't work. But, the 4.1 specs require TTS to be present, so skip the test.
Create an instance of android.speech.tts.TextToSpeech, passing it a OnInitListener handler to call when the TextToSpeech is available.
When the handler is called, configure the TTS. E.g. Set language, reading speed and an OnUtteranceCompletedListener.
You can now call speak(), passing in the text to speak.
Your OnUtteranceCompletedListener will be called when TTS finishes reading the text, at this point, call speak() with the next text to speak.

I was planning to write more about TTS. But, this article covers the details very well. And I can't think of anything more I can usefully add.

Extracting Text from XHTML

In order to do text to speech, we need the text. Thus, we need to extract the text from the XHTML.

To do this, we use, you guessed it, a ContentHandler. There's not really much to it. The text that is shown to a user is the inner text of the elements, so extracting the text is just collecting the results of the ContentHandler's characters() function. Beyond that, the only issues are:

None of the text in the <head> element is visible to the user, so ignore any text found there.
Add white space after the text in some elements, e.g. <h1>, to prevent the text from adjacent elements running together.

For the fine details, look at the file XhtmlToText.java in the attached source.

Credits and Thanks

I'd like to thank:

Paul, Ross and Sean at Pharos Systems for their feedback and proofing of early drafts of this document.
Baen Books, for their permission to use excerpts from some of their epub books as examples. (Although in the end I wrote my own.)