Click here to Skip to main content
15,867,686 members
Articles / General Programming / Parser

C++ Replacement for getopt

Rate me:
Please Sign up or sign in to vote.
5.00/5 (11 votes)
2 Jan 2023MIT6 min read 16K   250   21   14
Parser for command line options
In this article, you will find a simple C++ replacement for the traditional C getopt parser with most of the flexibility required by the POSIX standard.

Image 1

Introduction

These days, command-line programs are not as popular as they used to be. However, from time to time, it's easier to make one of those instead of making a GUI-based program. In the world of C, since times immemorial, programmers have used getopt function to parse command line arguments. Technically getopt is not a C feature, it is a POSIX feature and that is why, if you are using Microsoft Visual C++, it's not even available.

The code shown below is a simple C++ replacement for the traditional C getopt parser with most of the flexibility required by the POSIX standard.

Sample Usage

Let's say we have to parse a command line like this:

testopt -y --params p1 p2 p3 -o 123 arg1 arg2 arg3

Below is the program that can do this:

C++
#include <mlib/options.h>
#include <iostream>

using namespace std;
using mlib::OptionsParser;

int main(int argc, char** argv)
{
  OptParser parser{
    "h|help \t show help message",
    "y| \t boolean flag",
    "n| \t another boolean flag",
    "p+param parameters \t one or more parameters",
    "o:option value \t optional value",
    "*stuff things \t option with zero or more arguments"
  };

This declares the parser object and defines the allowed options. We will see soon the exact syntax for option definitions.

C++
int nonopt;
if (opt.parse (argc, argv, &nonopt) != 0)
{
  cout << "Syntax error. Usage:" <<endl;
  cout << opt.synopsis () << endl << "Where:" << opt.description () << endl;
  exit (1);
}

The opt.parse() function parses the command line arguments. nonopt is set to the index of the first non option argument. In the case of our sample command line, nonopt will be 8, the index of arg1 in the command line.

C++
  string par;
  if (opt.getopt ("params", par))
  {
    cout << "params:" << par << endl;
  }
}

The opt.getopt() function returns a non-zero value if an option is present on the command line. If the option can have arguments, they are returned as a string (par in our example) separated by a user defined character (by default, the vertical pipe '|').

C++
if (opt.hasopt ('y'))
  cout << "Yes option set" << endl;

The opt.hasopt() function returns true if an option is present on the command line.

Command Line Syntax Understood by OptParser

According to POSIX standard, a command line has three parts: command name, options plus options arguments and operands. The POSIX standard describes only short options made of one character preceded by -(hyphen), however it is common to have also long options that are preceded by --. The arguments that follow the last option and its arguments are called operands. If one of the arguments is --, option processing stops at that point and all remaining arguments are considered operands. Options can have one or more arguments. If an option has multiple arguments, the arguments end when the next option starts or at the -- argument or at the end of the command line. Short options can be combined behind one single hyphen, provided they don't have arguments (except maybe the last). For instance, instead of writing:

command -a -b -c

one can write:

command -abc

Options can be repeated. In this case, if the option has arguments, all arguments are accumulated. For instance, the following two lines are equivalent:

command -a arg1 arg2 -b
command -a arg1 -b -a arg2

It is customary to describe command line syntax using a synopsis with some type of BNF notation like this:

command -a <arg_a> -b [<arg_b>...] -c|--clong <operand1> <operand2>

where optional arguments are enclosed in square brackets and alternative are denoted by '|'. Arguments that can repeat one or more times are indicated by '...' (ellipsis).

Defining Options

Each valid option is described by a descriptor string with the following syntax:

[<short>] <flag> [<long>] [<spaces><parameter>] [\t<description>]

where:

  • <short> - a single character that is the short form of the option.
  • <flag> - one character that specifies the number of arguments that can follow the option:
    • '|' - no arguments
    • ':' - one required argument
    • '?' - one optional argument
    • '+' - one or more arguments
    • '*' - zero or more arguments
  • <long> - a string that specifies the long form of the option
  • <parameter> - a string that is used as parameter name in synopsis
  • <description> - a string used for option description. Parameter name and description are separated by a tab \t character. Either one of the long or short forms of an option can be missing.

OptParser API

Some of the functions have been mentioned before.

Constructors

C++
OptParser ()                                          (1)

Default constructor creates a parser object with an empty list of valid options.

C++
OptParser (std::vector<const char*> &list)              (2)

Initializes parser and sets the list of valid options.

C++
OptParser (const char **list)                           (3)

Initializes parser and sets the list of options descriptors. The argument is a list of C strings terminated with a NULL pointer.

C++
OptParser (std::initializer_list<const char*> list)     (4)

Initializes parser and sets the list of valid options. The initializer list does not need a NULL terminator. See the sample code at the beginning for an example.

Member Functions

C++
void add_option (const char* descr)                         (1)

Adds a new option descriptor to the list of valid options.

C++
void set_options (std::vector <const char*> &list)          (2)

Set list of valid options. Any previous options are removed and new ones are added.

C++
int parse (int argc, const char* const* argv, int* stop=0)  (3)

Parse command line arguments. stop is a pointer to an integer value that, if not null, receives the index in the argv array of the first non-option argument. If there are no non-option arguments, stop == argc.

If successful, the function returns 0. Otherwise, it returns an error code:

  • 1 = Unknown option
  • 2 = Required argument missing
  • 3 = Invalid multiple options string If an error occurred, the stop argument, if not null, is the index in the argv array of the argument that triggered the error.
C++
int getopt (char option, std::string& optarg, char sep='|') const               (4)
int getopt (const std::string& option, std::string& optarg, char sep='|') const (5)
int getopt (char option, std::vector<std::string>& optarg) const                (6)
int getopt (const std::string& option, std::vector<std::string>& optarg) const  (7)

Returns a specific option from the command. The function returns the number of option occurrences on the command line. For (4) and (5), the optarg argument receives a string containing all the option's arguments separated by the sep character. The (6) and (7) return the option arguments as a vector of strings. The option can be specified either using the short form (in (4) and (6)) or the long form (in (5) and (7)).

C++
bool  hasopt (const std::string &option) const          (8)
bool  hasopt (char option) const                        (9)

Checks if an option is present on the command line. Option can be specified using either the short form (9) or the long form (8).

C++
bool  next (std::string& opt, std::string& optarg, char sep='|')  (10)
bool  next (std::string& opt, std::vector<std::string>& optarg)   (11)

Returns the next option on the command line. The parser maintains an internal iterator that is initialized to the first available option when command line is parsed. At each call of a next function, the iterator is incremented and the function returns the next option and its arguments. The function returns false if there are no more options.

The form (10) returns as a string containing all the option's arguments separated by the sep character. The form (11) returns the arguments as a vector of strings.

C++
const std::string synopsis () const (12)

Generates a nicely formatted syntax string. For the example shown before, the synopsis string is:

appname -h|--help -y -n -p|--param <parameters>... -o|--option <value> --stuff [things ...]

Where appname is the actual name of the executable.

C++
const std::string description (size_t indent_size=2) const  (13)

Generates a nicely formatted description string. For the example shown before, the description string is:

-h|--help                   show help message
-y                          boolean flag
-n                          another boolean flag
-p|--param <parameters>...  one or more parameters
-o|--option <value>         optional value
--stuff [things ...]        option with zero or more arguments
C++
const std::string& appname () const  (14)

Returns the program name. This is the content of argv[0] with directory path and extension removed.

In Case You Are Wondering

  • How are quoted strings handled? They are not. OptParser relies on the operating system to break arguments on the command line.
  • If arguments are combined, why do you return the number of option occurrences? Because it allows you to do more fancy stuff. For instance, if the application has an option '-v' for verbose mode, you could increase the level of verbosity by repeating the option. '-v' will be verbose, '-vv' will be more verbose '-vvv', incredibly verbose and so on.
  • Some command line parsers allow you to have the argument value separated by an equal sign (like -o=123). Can yours do that? No, first of all, that syntax is outside POSIX specification. Also, it would open a can of worms about argument quoting.
  • How efficient is OptParser? Well, not particularly efficient. For all its storage needs, OptParser uses strings and vectors of strings. Normally, option parsing is a one time activity and its impact on the execution time is minimal. Efficiency was not a design goal.

Alternative Solutions

There are other packages that provide a similar functionality. In case you want to look at alternatives, you can check:

I haven't tried all of them, but I'd love to hear other opinions.

History

  • 2nd January, 2023: Initial version

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Canada Canada
Mircea is the embodiment of OOP: Old, Opinionated Programmer. With more years of experience than he likes to admit, he is always opened to new things, but too bruised to follow any passing fad.

Lately, he hangs around here, hoping that some of the things he learned can be useful to others.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA23-Jan-24 8:40
professionalȘtefan-Mihai MOGA23-Jan-24 8:40 
GeneralRe: My vote of 5 Pin
Mircea Neacsu23-Jan-24 8:41
Mircea Neacsu23-Jan-24 8:41 
PraiseMy vote of 5 Pin
Xav8319-Jan-23 23:10
Xav8319-Jan-23 23:10 
GeneralRe: My vote of 5 Pin
Mircea Neacsu19-Jan-23 0:21
Mircea Neacsu19-Jan-23 0:21 
QuestionRe: My vote of 5 Pin
Xav8319-Jan-23 22:37
Xav8319-Jan-23 22:37 
I am curious then, which one is the best, in your opinion ? Smile | :) (so that I know the first one to test in my next projects Wink | ;) )

Note: you can find here the details about how argh can allow you to handle option values with the "=" sign
AnswerRe: My vote of 5 Pin
Mircea Neacsu20-Jan-23 2:19
Mircea Neacsu20-Jan-23 2:19 
PraiseRe: My vote of 5 Pin
Xav8320-Jan-23 10:58
Xav8320-Jan-23 10:58 
GeneralRe: My vote of 5 Pin
Mircea Neacsu20-Jan-23 12:57
Mircea Neacsu20-Jan-23 12:57 
QuestionAny reason why boost::program_options is not suitable Pin
hpcoder210-Jan-23 18:19
hpcoder210-Jan-23 18:19 
AnswerRe: Any reason why boost::program_options is not suitable Pin
Mircea Neacsu11-Jan-23 0:09
Mircea Neacsu11-Jan-23 0:09 
QuestionTCLAP! Pin
E. Papulovskiy2-Jan-23 20:50
E. Papulovskiy2-Jan-23 20:50 
AnswerRe: TCLAP! Pin
Mircea Neacsu3-Jan-23 0:38
Mircea Neacsu3-Jan-23 0:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.