# How to build an expressions analyzer, the implementation (I)

In the previous article in this series I reviewed the main issues about the design of the **grammar** to build an **analyzer**, or **parser**, for arithmetic expressions. In this article I will show the first part of its implementation from the BNF **grammar** rules, converting the input expression, in the form of a text string, in a series of objects that we can use to evaluate it and give different values to the **constants** and **variables** that compose it.

If you prefer to start at the beginning, this is the link to the article on the design of the grammar for an expressions analyser.

In this link you can download the source code of the project with the parser I will use as an expample in this article, written in **CSharp** with **Visual Studio** **2013**.

The **parser** that I will implement functions as a **compiler** that can generate the results as an executable and configurable object that will allow us to evaluate the expression and modify the value of its parameters, **constants** and **variables**. The code is in the **Expressions** project, which generates a class library that can be linked as a reference to be used from any application.

## Defining a base class for the analyzer

First, I define an abstract base class from which they will derive all the other elements, so you can access them all using the same interface, the **ExpressionBase** class:

`public abstract class ExpressionBase`

{

protected static int _precision = 10;

protected static List<Variable> _variables = new List<Variable>();

protected static List<Constant> _constants = new List<Constant>();

public ExpressionBase(){}

public static int Precision

{

get { return ExpressionBase._precision; }

set { ExpressionBase._precision = Math.Min(20, Math.Max(1, value)); }

}

public IEnumerable<Variable> Variables

{

get { foreach (Variable v in ExpressionBase._variables)

{ yield return v; } }

}

public IEnumerable<Constant> Constants

{

get { foreach (Constant c in ExpressionBase._constants)

{ yield return c; } }

}

public static void Clear()

{

_variables.Clear();

_constants.Clear();

}

public abstract double Value { get; set; }

public virtual string Parse(string expr)

{

return expr.TrimStart(new char[] { ' ', '\n', '\r', '\t' });

}

}

I declare a static collection of **variables** and another one of **constants**, in order that all of them can be shared by all the expressions, allowing us to work with systems of equations. These collections are accessed through the instance properties **Variables** and **Constants**, through which we can enumerate them using a **foreach** statement. These collections can be emptied using the **Clear** method.

Since all values are of type **double**, also I define a value of the precision to avoid rounding errors using the **Math.Round** function to return the results of the operations.

All derived classes must implement the abstract property **Value**, by means of which you can assign values to **variables** and **constants** and obtain the value of all the objects that make up the expression to perform the arithmetic operations.

Finally, the **Parse** method is the responsible of analyze the part of the expression corresponding to the class. In the base class, all it does is remove the blanks and line breaks until the next evaluable character. This method should return the rest of the input string, once the class has removed the element in which it specializes.

Now we have to decide, between all of the rules of the **grammar**, those which we will choose to define the elements classes. Each element must work with a main rule and all his derived rules. In this case, I have divided the rules into four groups, **numbers**, **constants**, **variables** and **expressions**, which give rise to the corresponding classes.

The classes that make up the **parser** will use two exceptions defined in the **Exceptions** module to indicate a syntax error (**BadSyntaxException**) and to reject an input string if it does not comply with the class rules so that it can be tried another alternative (**NotRecognizedException**).

## The Number class

This class will be in charge of analyzing the numbers. As a reminder, these are the **BNF** **grammar** rules by which the syntax is defined:

`<number>::=<digit><rnumber>|<decimalsep><rdecimal>`

<rnumber>::=<digit><rnumber>|<decimalsep><rdecimal>|<void>

<rdecimal>::=<number><fdecimal>

<fdecimal>::=<number><fdecimal>|<void>

<digit>::={0,1,2,3,4,5,6,7,8,9}

<decimalsep>::={“,”,.}

The rule **<number>** is the one that results in the class, while **<rnumber>**, **<rdecimal>** and **<fdecimal>** are implemented as functions within the class.

In the method **Parse** we begin to analyze from the rule **<number>**, checking if the first character is a number or a decimal point:

`public override string Parse(string expr)`

{

string snum = base.Parse(expr);

if (string.IsNullOrEmpty(snum)) { throw new NotRecognizedException(); }

if (char.IsNumber(snum, 0))

{

_value = Convert.ToDouble(snum[0].ToString());

if (snum.Length > 1)

{

return RNumber(snum.Substring(1));

}

return "";

}

else if ((snum[0] == ',') || (snum[0] == '.'))

{

if (snum.Length > 1)

{

_value = 0;

_decimalpos = 1;

return RDecimal(snum.Substring(1));

}

throw new BadSyntaxException(snum);

}

throw new NotRecognizedException();

}

Depending on if the first character is a number or a decimal point, we will take the decision to follow the rule **<rnumber>** or **<rdecimal>**. The value of the number will be going calculated on the **_value** variable, using the auxiliary **_decimalpos** variable to calculate the decimal part.

In the method **RNumber** we will go adding recursively the digits to the value of the number until find the end of the input or a decimal separator, to continue with the rule **<rdecimal>**:

`private string RNumber(string expr)`

{

if (char.IsNumber(expr, 0))

{

_value = (10 * _value) + Convert.ToDouble(expr[0].ToString());

if (expr.Length > 1)

{

return RNumber(expr.Substring(1));

}

return "";

}

else if ((expr[0] == ',') || (expr[0] == '.'))

{

if (expr.Length > 1)

{

_decimalpos = 1;

return RDecimal(expr.Substring(1));

}

throw new BadSyntaxException(expr);

}

return expr;

}

In the **RDecimal** method simply we check that the next character is a number, to move to analyze the rest of the decimal places using the rule **<fdecimal>**:

`private string RDecimal(string expr)`

{

if (char.IsNumber(expr, 0))

{

_value += Math.Round(Convert.ToDouble(expr[0].ToString()) /

(10 * _decimalpos), ExpressionBase._precision);

_decimalpos++;

if (expr.Length > 1)

{

return FDecimal(expr.Substring(1));

}

return "";

}

throw new BadSyntaxException(expr);

}

Finally, the **FDecimal** method recursively finish the calculation of the value of the decimal part of the number:

`private string FDecimal(string expr)`

{

if (char.IsNumber(expr, 0))

{

_value += Math.Round(Convert.ToDouble(expr[0].ToString()) /

(10 * _decimalpos), Precision);

_decimalpos++;

if (expr.Length > 1)

{

return FDecimal(expr.Substring(1));

}

return "";

}

return expr;

}

## The Variable class

With this class we will implement the variables of the expression, according to the rule of the **grammar**:

`<var>::={x,y,z}`

Since the **variables** are stored in a collection, I have implemented the **IComparable** and **IEquatable** interfaces in order to check whether the collection contains the **variable** and, if not, add it. Two **variables** are equal if they have the same name, which is stored in the **_name** variable. The **IComparable** interface also allows us to order the collection.

This element is very easy to identify, as it simply consists of one of the letters **x**, **y** or **z**. It does not distinguish between uppercase and lowercase:

`public override string Parse(string expr)`

{

string svar = base.Parse(expr);

if (!string.IsNullOrEmpty(svar))

{

if ((char.ToLower(svar[0]) >= 'x') && (char.ToLower(svar[0]) <= 'z'))

{

_name = svar.Substring(0, 1).ToLower();

if (!ExpressionBase._variables.Contains<Variable>(this))

{

ExpressionBase._variables.Add(this);

}

if (svar.Length > 1)

{

return svar.Substring(1);

}

return "";

}

}

throw new NotRecognizedException();

}

## The Constant class

To end this article, we will see how to build the **Constant** class to analyze **constants**, based on this set of rules:

`<const>::=<letter><rconst>`

<rconst>::=<allchar><rconst>|<void>

<letter>::={a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w}

<allchar>::={0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,g,h,i,j,

k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}

Again, the class will implement the main rule, **<const>**, and the secondary rule, **<rconst>** will be implemented in the form of a function. This class also implements the **IComparable** and **IEquatable** interfaces, as well as the Variable class, for the same reasons.

The **Parse** method first checks if the next character is a valid one, and if there are more characters, calls the **RConst** function to build the rest of the name of the constant:

`public override string Parse(string expr)`

{

string scon = base.Parse(expr);

if (!string.IsNullOrEmpty(scon))

{

if ((char.ToLower(scon[0]) >= 'a') && (char.ToLower(scon[0]) <= 'w'))

{

_name = scon.Substring(0, 1).ToLower();

if (scon.Length > 1)

{

return RConst(scon.Substring(1));

}

if (!ExpressionBase._constants.Contains(this))

{

ExpressionBase._constants.Add(this);

}

return "";

}

}

throw new NotRecognizedException();

}

And the **RConst** function continues adding characters to the name while it find letters or numbers:

`private string RConst(string expr)`

{

if (char.IsLetterOrDigit(expr, 0))

{

_name += expr.Substring(0, 1).ToLower();

if (expr.Length > 1)

{

return RConst(expr.Substring(1));

}

expr = "";

}

if (!ExpressionBase._constants.Contains(this))

{

ExpressionBase._constants.Add(this);

}

return expr;

}

In the following article I will show how to implement the expressions themselves. Meanwhile, as in the previous article, I leave you a list of books that can help you delve deeper into the subject: