Click here to Skip to main content
15,867,308 members
Articles / Web Development / CSS

Mozilla Firefox CSS Parser Ported to C#

Rate me:
Please Sign up or sign in to vote.
4.91/5 (32 votes)
1 Jun 2015MPL6 min read 52K   824   53   10
C# CSS parser with with support for all modern CSS features.

Introduction

The Alba.CsCss library enables you to parse a CSS file into a detailed object model. It has been ported to C# from Mozilla Firefox 22 code, originally written in C++, and supports all modern CSS features.

Implemented features:

  • Supports all modern CSS features, including media queries, flexbox properties, etc. Mozilla-specific extensions (properties prefixed with -moz-) are supported too.
  • Supports two compatibility modes for parsing: FullStandars (standards are strictly followed) and Quirks (some incorrect expressions are allowed, for example, length units can be omitted).
  • Property values are parsed into a detailed object model. For example, shorthand background property is expanded into multiple background-* properties. Among them, background-image is represented by a list of values (corresponding to multiple backgrounds, as supported by CSS3); each value can be a gradient function or an URL; gradients, in turn, contain gradient stops and various properties like size and angle.
  • Error recovery follows standard CSS parsing rules. If an unsupported feature or invalid syntax (often caused by "CSS hacks") is encountered, parsing will skip to the next declaration and continue from there.
  • Detailed error logging: when syntax errors are encountered, warning messages are produced. The messages are logged into TraceSource and are fired as an event.
  • Can be installed using NuGet package. Debugging symbols and sources are provided by SymbolSource package.

Not implemented features (these features are on the ToDo list):

  • Different encodings. @charset rules are ignored. Users need to convert strings to Unicode before parsing.
  • Modification of a parsed style sheet and serialization back into string.
  • CSS Object Model. Like all "standard" DOM models, CSSOM is cumbersome and does not fit into C# coding standards, so the necessity of it is questionable.
  • Vendor-specific features of other browsers (properties prefixed with -webkit-, -ms-, -o- etc.) are ignored.
  • The library contains very few unit tests and has not been tested thoroughly. Unfortunately, tests from Firefox are written in JS and are too problematic to port to C#.

Background

What I actually needed was a way to extract all url() expressions from CSS files to download all images they link to. However, as you can see, I got a little carried away and ported a lot of code.

There's already a library for parsing CSS files written in C#, ExCSS. However, it stops parsing after some expressions, so it doesn't fit my needs. It is being rewritten to use ANTLR though, so we may yet see it support a full CSS feature set.

When choosing a library to port to C#, I had two options: code from Chrome and code from Firefox. The two major open-source browsers have support for all modern CSS features and are regularly updated as CSS standards evolve. Chrome relies on grammar generators, so porting to C# would require finding a tool with a similar feature set and rewriting grammar and supplemental code. Firefox, on the other hand, uses a manually written LL(1) parser. I chose the latter, because I had less dependencies this way: only plain C++, no third-party tools and grammar languages.

As CSS is constantly evolving, I needed to find a way to keep the port up to date as new versions of the original code are released without too much hassle. Instead of manually rewriting the code, I chose to apply regular expressions to C++ code inside T4 code generation templates. As C++ and C# languages have a lot in common, a lot of code required only minor modifications. There was some code which required significant modifications like member pointers and returning references. Fortunately, there were very few such cases. I was also in luck because Mozilla coding standards have very strict naming rules: arguments, fields, constants, static variables, enumeration members have different prefixes. Class naming rules are either missing or not strictly followed, but it was a minor issue.

Obviously, not all code can be converted (or is worth converting for that matter) using regular expressions. Code which is unlikely to change often (CSS value types, CSS rule types) is written manually, as well as all code which has no direct counterparts in C#, like the new operator overloading to allocate additional memory for "unnamed" fields or using unions.

Whether this method will work in the future to simplify porting of new versions of Firefox code, only time will tell. So far statistics are pretty good. For example, conversion of more than 10,000 lines from nsCSSParser.cpp to CssParser.conv.cs required only 400 lines of regular expressions (4%). Statistics for smaller files are less impressive though.

Using the Code

Parsing starts with creating an instance of the CssLoader class, optionally changing its Compatibility property, then calling its primary method, ParseSheet, which parses a string representing contents of a CSS file into a CssStyleSheet object.

The ParseSheet method accepts three arguments: the CSS string, the sheet URL (for logging purposes) and the base URL (for resolving url() expressions with relative URLs).

C#
CssStyleSheet css = new CssLoader().ParseSheet("h1, h2 { color: #123; }",
    new Uri("http://example.com/sheet.css"), new Uri("http://example.com/"));
Console.WriteLine(css.SheetUri); // http://example.com/sheet.css

The CssStyleSheet object contains the Rules property which contains CSS rules of different types: style rule, charset rule, media query rule, keyframes rule, etc. In this case, we are interested in the style rule. It can be obtained either by filtering Rules using the OfType<T>() LINQ method or by the StyleRules property (there are shortcuts like this for every rule type; one of the benefits of code generation). CssStyleRule contains a CssDeclaration with all properties. Data and ImportantData in the CssDeclaration are lists of property-value pairs.

CssValue represents a CSS value which can a single value or a list or a list of pairs etc. Its type can be distinguished by its Unit property. To get value of a specific type, use String, Color, List, Uri and other properties.

C#
CssStyleSheet css = new CssLoader().ParseSheet("h1, h2 { color: #123; }",
    new Uri("http://example.com/sheet.css"), new Uri("http://example.com/"));
Console.WriteLine(css.SheetUri); // http://example.com/sheet.css
// Get color property (equivalent code)
Console.WriteLine(css.StyleRules.Single().Declaration
        .Color.Color.R); // 17
Console.WriteLine(css.Rules.OfType<CssStyleRule>().Single().Declaration
        .GetValue(CssProperty.Color).Color.R); // 17
// Get h1 selector
Console.WriteLine(css.StyleRules.Single().SelectorGroups.First().Selectors.Single().Tag);

A more useful example, extracting all URLs:

C#
List<string> uris = new CssLoader().GetUris(source).ToList();

This method relies only on the tokenizer, CssScanner. It will not check whether properties are correct, it will just find all url() expressions.

Alternatively, URLs can be extracted after parsing the file into a style sheet object. This method will most closely match behavior of web browsers: it will skip invalid properties, invalid expressions (like color: url(a) url(b)) etc.

C#
CssStyleSheet css = new CssLoader().ParseSheet(source, sheetUri, baseUri);
// Get rules of CssStyleRule type on all levels (including style rules inside media rules)
List<string> uris = css.AllStyleRules
    // Get property-value pairs, both non-important and important (marked with !important)
    .SelectMany(styleRule => styleRule.Declaration.AllData)
    // A property can be a list of values (background-image, for example, contains a list of URLs)
    .SelectMany(prop => prop.Value.Unit == CssUnit.List ? prop.Value.List : new[] { prop.Value })
    // Filter values of CssUrlValue type
    .Where(val => val.Unit == CssUnit.Url)
    // Get unresolved URLs (you can use Uri property to get resolved URLs)
    .Select(val => val.OriginalUri)
    .ToList();

Class Diagram

Image 1

(click to zoom)

The diagram's file name in the sources is Alba.CsCss/Diagrams/CssStyleSheet.cd.

Installation

Note: .NET 3.5 or higher is required to build the project. .NET 4.5 or higher is required to build the complete solution (including tests and supplemental projects).

PM> Install-Package Alba.CsCss
  • You can install NuGet package directly from Visual Studio (see more detailed instructions on NuGet.org) or using the package manager console:
  • You can build from sources on GitHub. To use the library, you will need to include only Alba.CsCss/Alba.CsCss.csproj in your solution.

Feedback Needed

I'm using only a small part of the library myself. There may be some bugs or missing features (like some values remaining internal). If you find the library useful, please tell me how you're going to use the library and what features you need most. The future of the library depends on your feedback.

History

  • 1.0.0.3 (2013-08-25): First NuGet release.

License

This article, along with any associated source code and files, is licensed under The Mozilla Public License 1.1 (MPL 1.1)


Written By
Software Developer
Russian Federation Russian Federation


C#, JavaScript, PHP developer.




Comments and Discussions

 
QuestionOut of curiosity.... Pin
Super Lloyd25-Jun-16 0:13
Super Lloyd25-Jun-16 0:13 
AnswerMessage Closed Pin
15-Jan-18 21:23
Masood raza15-Jan-18 21:23 
QuestionC# has unions... Pin
Florian Rappl7-Sep-13 7:06
professionalFlorian Rappl7-Sep-13 7:06 
AnswerRe: C# has unions... Pin
Athari7-Sep-13 8:48
Athari7-Sep-13 8:48 
GeneralMy vote of 5 Pin
Florian Rappl5-Sep-13 23:35
professionalFlorian Rappl5-Sep-13 23:35 
QuestionToString Pin
GrimaceOfDespair4-Sep-13 21:20
GrimaceOfDespair4-Sep-13 21:20 
AnswerRe: ToString Pin
Athari4-Sep-13 23:27
Athari4-Sep-13 23:27 
GeneralRe: ToString Pin
GrimaceOfDespair5-Sep-13 3:27
GrimaceOfDespair5-Sep-13 3:27 
GeneralRe: ToString Pin
Athari5-Sep-13 4:34
Athari5-Sep-13 4:34 
GeneralRe: ToString Pin
GrimaceOfDespair6-Sep-13 2:32
GrimaceOfDespair6-Sep-13 2:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.