Click here to Skip to main content
15,881,413 members
Articles / Programming Languages / Javascript

"Reader Mode" for Desktop Internet Browsers

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
3 Feb 2019CPOL4 min read 7.2K   4   1
For a more customizable "reading mode" or "article view" in your browser when reading news and articles.

Introduction

This article is about a User JavaScript (User JS) script that will enable desktop browsers to mimic the functionality typically seen in ebook reader apps of mobile devices. It will try to display only the main content of a web page and remove all or most extraneous information from it. It is recommended for web pages that display news or articles. It works best on HTML5 pages but it should perform all right on structurally well-organized pages that use plain old HTML.

Background

User JS scripts are also known as browser scripts and are similar to bookmarklets. They are also known as GreaseMonkey scripts, after the Firefox add-on that allows browser users to customize their favorite websites by injecting their own Javascript scripts from the client-side.

I developed this script for inclusion in a namesake Android browser app that I developed. The app has special support for User CSS and User JS files. During development, I used a Firefox browser with a GreaseMonkey add-on, as coding and testing it on an Android device emulator is cumbersome. I use this version of the script when reading articles on my desktop or laptop using the Bamboo RSS reader add-on for Firefox.

Using the Code

My Android app requires the JavaScript code to be converted to bookmarklet style. So, much of the code is in that style. For the Firefox (desktop) browser, I added an event handler for the "DOMContentLoaded" event. It runs the book reader mode after a 5-second delay. This gives me enough time to click on links even on home pages of websites where this script is not expected to work very well.

Here is the GreaseMonkey Javascript. If the code gets mangled when it is published, then the raw Javascript source code text can be copied from its GitHub location.

JavaScript
// ==UserScript==
// @name        BookReaderView
// @namespace   com.vsubhash.js.BookReaderView
// @version     1
// @grant       none
// ==/UserScript==

if (subhash_browser_js == null) {
  var subhash_browser_js = {};
}

subhash_browser_js.book_reader_js = {
	sHtml: "", 
	bTitleFound: false,
	arYucks: [ "-ads", "_ads", "advert", "adcode", "adselect", "addthis", "alsoread", "comment", 
               "discuss", "email", "facebook", "float", "follow", "franchise", "googlead",  
               "hide_", "hidden", "hover", "jump", "lazy", "linkedin", "navig", "notifi", 
               "outbrain", "partner", "popular", "popup", "print", "reddit", "share", "sharing", 
               "short-url", "social", "sponsor", "sprite", "subscribe", "taboola", "trend", 
               "twitter", "url-short", "zipr" ],	
	
	createHeader: function() {
		subhash_browser_js.book_reader_js.sHtml = "<head>\n" +
			"	<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n" +
			"	<title>" + document.title + "</title>\n" +
			"	<style>\n" +
			" a { border-bottom: 1px dotted navy; }\n" +
			" body { background-color: rgb(200,200,220); color: black; font-family: sans-serif;\n" +
            "        font-size: 0.5cm; margin: 1em auto; padding: 1em; max-width: 9in; }\n" +
			" code { font-family: monospace; }\n" +
			" h1 { text-align: center; border-bottom: 1px solid black; padding-bottom: 0.2em; }\n" +
			" a h1, a h2, a h3, a h4, a h5, a h6, h1 a, h2 a, h3 a, h4 a, h5 a, h6 a\n" +
            "       { color: black; border-bottom: 1px dotted black; }\n" +
			" pre, figure { margin: 1em auto; padding: 1em;  }\n" +		
			" img { display: block; margin: 1em auto; max-height: 40%; max-width: 40%; }\n" +
			" img[src*='.svg'] { display: none!important; }\n" +
			" figcaption { font-weight: bold; font-size: 0.8em; text-align: center; }\n" +
			" header, footer, aside, nav { display: none; }\n" +
			"	</style>\n" +
			"</head>\n" +
			"<body>\n";
	},	
	
	removeUnwantedTags: function() {
		var arrTagsToHide = [ "aside", "footer", "iframe", "nav", "noscript", "script"];
		for (var i = 0; i < arrTagsToHide.length; i++) {
			var arElementsToHide = document.getElementsByTagName(arrTagsToHide[i]);
			var j = arElementsToHide.length;
			while (j > 0) {
				arElementsToHide[j-1].parentNode.removeChild(arElementsToHide[j-1]);
				--j;
			}
		}
	},	
	
	addNoYuckiesStyle: function() {
		var sStyle = "\n<style>";
		for (var i = 0; i < subhash_browser_js.book_reader_js.arYucks.length; i++) {
			sStyle += "*[class*=\"" + subhash_browser_js.book_reader_js.arYucks[i] + "\"], *[id*=\"" + subhash_browser_js.book_reader_js.arYucks[i] + "\"] ";
			if (i < (subhash_browser_js.book_reader_js.arYucks.length-1)) {
				sStyle += ",";
			}
		}
		sStyle += " { display: none!important; }\n</style>\n";
		document.getElementsByTagName("body")[0].innerHTML += sStyle;
		//console.error(sStyle);
	},	
	
	parseFiniteElement: function(aoEl) {
		//console.error("Finite tag: " + aoEl.tagName);
		subhash_browser_js.book_reader_js.isTitleTag(aoEl.tagName.toLowerCase());
		if (!subhash_browser_js.book_reader_js.bTitleFound || 
			!subhash_browser_js.book_reader_js.hasNoYuckiness(aoEl)) { return; }
	
		var sElTag = aoEl.tagName.toLowerCase();
		if (!subhash_browser_js.book_reader_js.isUsefulTag(sElTag)) { return; }
		//console.error(sElTag + " outed");
		if (sElTag == "a" && aoEl.href) {
			if (aoEl.href.indexOf("#") == 0) {
				subhash_browser_js.book_reader_js.sHtml += aoEl.textContent;
			} else {
				subhash_browser_js.book_reader_js.sHtml += "<a href=\"" + aoEl.getAttribute("href") + "\">" + aoEl.textContent + "</a>";
			}
		} else if (sElTag == "abbr") {
			subhash_browser_js.book_reader_js.sHtml += aoEl.textContent + 
            " (" + aoEl.getAttribute("title") + ") " + "\n";
		} else if ((sElTag == "b") || (sElTag == "em") || (sElTag == "strong")) {
			subhash_browser_js.book_reader_js.sHtml += "<b>" + aoEl.textContent + "</b>";
		} else if (sElTag == "br") {
			subhash_browser_js.book_reader_js.sHtml += "<br />";
		} else if ((sElTag == "cite") || (sElTag == "i") || (sElTag == "time")) {
			subhash_browser_js.book_reader_js.sHtml += "<i>" + aoEl.textContent + "</i>";
		} else if ((sElTag == "ins") || (sElTag == "kbd") || (sElTag == "mark") || (sElTag == "u")) {
			subhash_browser_js.book_reader_js.sHtml += "<u>" + aoEl.textContent + "</u>";
		} else if (sElTag == "img") {
			subhash_browser_js.book_reader_js.sHtml += "<img src=\"" + 
                                      aoEl.getAttribute("src") + "\" />";
		} else if ((sElTag == "cite") || (sElTag == "s") || (sElTag == "strike")) {
			subhash_browser_js.book_reader_js.sHtml += "<s>" + aoEl.textContent + "</s>";
		} else if ((sElTag == "code") || (sElTag == "samp") || (sElTag == "var")) {
			subhash_browser_js.book_reader_js.sHtml += "<code>" + aoEl.textContent + "</code>";
		} else if ((sElTag == "sub")) {
			subhash_browser_js.book_reader_js.sHtml += "<sub>" + aoEl.textContent + "</sub>";
		} else if (sElTag == "sup") {
			subhash_browser_js.book_reader_js.sHtml += "<sup>" + aoEl.textContent + "</sup>";
		} else if ((sElTag == "label") || (sElTag == "span") || (sElTag == "wbr")) {
			subhash_browser_js.book_reader_js.sHtml += aoEl.textContent;  // ignore
	
	
		} else if ((sElTag == "h1") || (sElTag == "h2") || (sElTag == "h3") || 
							(sElTag == "h4") || (sElTag == "h5") || (sElTag == "h6") ||
							(sElTag == "figcaption") || (sElTag == "p")) {
			subhash_browser_js.book_reader_js.sHtml += "<" + sElTag + ">" + 
                             aoEl.textContent + "</" + sElTag + ">";
		}
	},
	
	isUsefulTag: function(asTag) {
		var arTags = [ "a", "b", "i", "s", "u", "abbr", "article", "br", "code", 
                       "cite", "em", "figure", "figcaption", "h1", "h2", "h3", "h4", "h5", 
                       "h6", "img", "ins", "kbd", "label", "li", "main", "mark", "navig", "ol", 
                       "p", "pre", "samp", "strike", "sub", "sup", "span", 
                       "strong", "time", "ul", "var", "wbr" ]; 
		for (var i = 0; i < arTags.length; i++) {
			if (asTag == arTags[i]) { 
				//console.error(asTag + " is valid");
				return(true);
			}
		}
		//console.error(asTag + " is not valid");
		return(false);
	},	
	
	isTitleTag: function(asTag) {
		if ((!subhash_browser_js.book_reader_js.bTitleFound) && 
            ((asTag == "h1") || (asTag == "h2") || (asTag == "h3"))) {
			subhash_browser_js.book_reader_js.bTitleFound = true;
			//console.error("found");
		}
		return(subhash_browser_js.book_reader_js.bTitleFound);
	},

	hasNoYuckiness: function(aoNode) {
		if (aoNode.className) {
			if (aoNode.className.indexOf) {
			  for (var i = 0; i < subhash_browser_js.book_reader_js.arYucks.length; i++) {
			  	if (aoNode.className.toLowerCase().indexOf(subhash_browser_js.book_reader_js.arYucks[i]) > -1) {
			  		//console.error("Yucky " + aoNode.className);
			  		return(false);
			  	} else {
			  		//console.error("Yucky no find " + subhash_browser_js.book_reader_js.arYucks[i]);
			  	}
			  }
			}
		}
	
		if (aoNode.getAttribute) {
			if (aoNode.getAttribute("id")) {
				if (aoNode.getAttribute("id").indexOf) {
					for (var i = 0; i < subhash_browser_js.book_reader_js.arYucks.length; i++) {
						if (aoNode.getAttribute("id").toLowerCase().indexOf(subhash_browser_js.book_reader_js.arYucks[i]) > -1) {
							//console.error("Yucky " + aoNode.getAttribute("id"));
							return(false);
						}
					}
				}
			}
		}

		return(true);
	},

	parseNode: function(aoNode) {
		var sTag = aoNode.nodeName.toLowerCase();
		//console.error("Node checking " + sTag);
		subhash_browser_js.book_reader_js.isTitleTag(aoNode.nodeName.toLowerCase());
		if (subhash_browser_js.book_reader_js.bTitleFound && 
				subhash_browser_js.book_reader_js.isUsefulTag(sTag) && 
				subhash_browser_js.book_reader_js.hasNoYuckiness(aoNode)) { 
			if (sTag == "a" && (aoNode.href)) {
				subhash_browser_js.book_reader_js.sHtml += "<" + sTag + 
                                         " href=\"" + aoNode.href  + "\">"; 
			} else {
				subhash_browser_js.book_reader_js.sHtml += "<" + sTag + ">"; 
			}
		}
		for (var i = 0; i < aoNode.childNodes.length; i++) {
			var oNode = aoNode.childNodes[i];
			subhash_browser_js.book_reader_js.isTitleTag(oNode.nodeName.toLowerCase());
			if (oNode.nodeType == Node.ELEMENT_NODE) {
				subhash_browser_js.book_reader_js.parseElement(oNode);
			} else if (oNode.nodeType == Node.TEXT_NODE) {
				if (subhash_browser_js.book_reader_js.bTitleFound) { 
					subhash_browser_js.book_reader_js.sHtml += oNode.nodeValue;
				}
			}
		}
		if (subhash_browser_js.book_reader_js.bTitleFound && 
				subhash_browser_js.book_reader_js.isUsefulTag(sTag)) {
			subhash_browser_js.book_reader_js.sHtml += "</" + sTag + ">";
		}
		//console.error("Html is : " + subhash_browser_js.book_reader_js.sHtml);
	},
	
	parseElement: function(aoEl) {
		if (window.getComputedStyle(aoEl)) {
			if (window.getComputedStyle(aoEl).getPropertyValue("display") == "none") {
				try { console.error("Ignoring hidden element: " + 
                      aoEl.outerHTML.substr(0,300)); } catch (e) {}
		  	return;
			}
		}
		//console.error("Checking element " + aoEl.tagName);
		subhash_browser_js.book_reader_js.isTitleTag(aoEl.tagName.toLowerCase());
		if (aoEl.children.length > 0) {
			subhash_browser_js.book_reader_js.parseNode(aoEl);
		} else if (subhash_browser_js.book_reader_js.bTitleFound) {
			subhash_browser_js.book_reader_js.parseFiniteElement(aoEl);
		}
	},

	changeToReader: function() {
		try {
			subhash_browser_js.book_reader_js.addNoYuckiesStyle(); 
			subhash_browser_js.book_reader_js.removeUnwantedTags();
			subhash_browser_js.book_reader_js.createHeader();
			var	oEl = document.getElementsByTagName("body")[0];
			subhash_browser_js.book_reader_js.parseElement(oEl);
			subhash_browser_js.book_reader_js.sHtml += "</body>\n";
			document.getElementsByTagName("html")[0].innerHTML = subhash_browser_js.book_reader_js.sHtml;
		} catch (e) {
			console.error("Subhash Browser BRV Error" + e);
		}
	},

	handle_DOMLoaded: function() {
		try {
			window.setTimeout(
				function() {
					subhash_browser_js.book_reader_js.changeToReader();
				}, 
				5*1000);
		} catch (e) {
			console.error("Subhash Browser BRV Error: " + e);
		}
	}
}

document.addEventListener
  ("DOMContentLoaded", subhash_browser_js.book_reader_js.handle_DOMLoaded, false);

Some Samples

Many websites serve HTML5 but they do not use the tags in a contextually meaningful way. This is how a Verge website article looked. It did not have many problems.

Image 1

Some websites put their H1 (title) tag inside the header tag instead of the article or main tag. Their reasoning is, obviously, the article heading should be in the header! Some websites have no use for h-tag hierarchy at all and everything is dumped in a DIV box styled with inline CSS. It is definitely not worth using this JavaScript on those sites.

Here is a randomly chosen CodeProject article page, incidentally written by me.

Image 2

I had to make a change to my original code because CodeProject places the entire article inside a form tag. The original version deletes form tags along with others such as script, footer and aside. Why does CodeProject put its article inside a form tag? Is it an ASP.NET thing?

The popular blogging tool WordPress is much worse. It take the tags/ categories that a blogger adds to a blog post and suffixes them to the class attribute of the div containing the post contents. So, if the blogger adds a category "aside" to an blog post, then this JavaScript will discard the entire post. Hence, the scope for failure can be high with some website software.

Here is an AP News article.

An AP News article in Book Reader mode

It seems rough in some places but that is how the web designers are presenting the content to search engines and accessibility applications. This is another reason why you should think carefully before choosing your content management system or server scripting technology.

Here is how it looks in my Android app.

Image 4

Do note that when you browse from a mobile device, the server might send a different version of the same page that is customized for the device. So, this JavaScript may not work in the same way as on the desktop. However, if the HTML tag organization of the mobile page is content-aware, then there should be no problem.

Points of Interest

The code eliminates an HTML element based on its tag, class name and ID attribute. To make the code customizable, it uses JavaScript arrays in which the tags, classes and IDs can be easily added or removed. The code works initially by discarding all tags until it finds the title. It assumes that the title might be in a H1, H2 or H3 tag. Then it parses the remaining tags - limiting itself to paragraphs, lists, text nodes, images and hyperlinks, and discarding everything else.

This exercise made me look into the guts of popular websites and frankly, I am disgusted by the amount of useless junk that popular CMS software and JavaScript frameworks dump into a web page. Social media plugins are a major culprit. They not only increase the weight of the page but also block the loading of the contents. The server farms of the world would consume quite a lot of less energy if there was not so much JavaScript inside webpages.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer www.VSubhash.in
India India
V. Subhash is an invisible Indian writer, programmer and illustrator. In 2020, he wrote one of the biggest jokebooks of all time and then ended up with over two dozen mostly non-fiction books including Linux Command-Line Tips & Tricks (first paperback to have syntax highlighting in colour), CommonMark Ready Reference (the first book on CommonMark), PC Hardware Explained, Cool Electronic Projects and How To Install Solar. His book Quick Start Guide to FFmpeg was published by Apress/SpringerNature in 2023. He wrote, illustrated, designed and produced all of his books using only open-source software. Subhash has programmed in more than a dozen languages (as varied as assembly, Java and Javascript); published software for desktop (NetCheck), mobile (Subhash Browser & RSS Reader) and web (TweetsToRSS); and designed several websites. As of 2024, he is working on a portable Javascript-free CMS using plain-jane PHP and SQLite. Subhash also occasionally writes for Open Source For You magazine and CodeProject.com.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Member 123643904-Feb-19 21:38
Member 123643904-Feb-19 21:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.