We use own and third party cookies to improve our services and show related advertising to your preferences by analyzing your browsing habits. If you continue browsing, we consider that you accept their use. Can change the settings or get more information here

Ver sitio en español Go to homepage Contact me
viernes, 17 de junio de 2016

How to build an expressions analyzer, the implementation (I)

In the previous article in this series I reviewed the main issues about the design of the grammar to build an analyzer, or parser, for arithmetic expressions. In this article I will show the first part of its implementation from the BNF grammar rules, converting the input expression, in the form of a text string, in a series of objects that we can use to evaluate it and give different values to the constants and variables that compose it.

If you prefer to start at the beginning, this is the link to the article on the design of the grammar for an expressions analyser.

In this link you can download the source code of the project with the parser I will use as an expample in this article, written in CSharp with Visual Studio 2013.

The parser that I will implement functions as a compiler that can generate the results as an executable and configurable object that will allow us to evaluate the expression and modify the value of its parameters, constants and variables. The code is in the Expressions project, which generates a class library that can be linked as a reference to be used from any application.

Defining a base class for the analyzer

First, I define an abstract base class from which they will derive all the other elements, so you can access them all using the same interface, the ExpressionBase class:

public abstract class ExpressionBase
{
protected static int _precision = 10;
protected static List<Variable> _variables = new List<Variable>();
protected static List<Constant> _constants = new List<Constant>();

public ExpressionBase(){}
public static int Precision
{
get { return ExpressionBase._precision; }
set { ExpressionBase._precision = Math.Min(20, Math.Max(1, value)); }
}
public IEnumerable<Variable> Variables
{
get { foreach (Variable v in ExpressionBase._variables)
{ yield return v; } }
}
public IEnumerable<Constant> Constants
{
get { foreach (Constant c in ExpressionBase._constants)
{ yield return c; } }
}
public static void Clear()
{
_variables.Clear();
_constants.Clear();
}
public abstract double Value { get; set; }
public virtual string Parse(string expr)
{
return expr.TrimStart(new char[] { ' ', '\n', '\r', '\t' });
}
}

I declare a static collection of variables and another one of constants, in order that all of them can be shared by all the expressions, allowing us to work with systems of equations. These collections are accessed through the instance properties Variables and Constants, through which we can enumerate them using a foreach statement. These collections can be emptied using the Clear method.

Since all values are of type double, also I define a value of the precision to avoid rounding errors using the Math.Round function to return the results of the operations.

All derived classes must implement the abstract property Value, by means of which you can assign values to variables and constants and obtain the value of all the objects that make up the expression to perform the arithmetic operations.

Finally, the Parse method is the responsible of analyze the part of the expression corresponding to the class. In the base class, all it does is remove the blanks and line breaks until the next evaluable character. This method should return the rest of the input string, once the class has removed the element in which it specializes.

Now we have to decide, between all of the rules of the grammar, those which we will choose to define the elements classes. Each element must work with a main rule and all his derived rules. In this case, I have divided the rules into four groups, numbers, constants, variables and expressions, which give rise to the corresponding classes.

The classes that make up the parser will use two exceptions defined in the Exceptions module to indicate a syntax error (BadSyntaxException) and to reject an input string if it does not comply with the class rules so that it can be tried another alternative (NotRecognizedException).

The Number class

This class will be in charge of analyzing the numbers. As a reminder, these are the BNF grammar rules by which the syntax is defined:

<number>::=<digit><rnumber>|<decimalsep><rdecimal>
<rnumber>::=<digit><rnumber>|<decimalsep><rdecimal>|<void>
<rdecimal>::=<number><fdecimal>
<fdecimal>::=<number><fdecimal>|<void>
<digit>::={0,1,2,3,4,5,6,7,8,9}
<decimalsep>::={“,”,.}

The rule <number> is the one that results in the class, while <rnumber>, <rdecimal> and <fdecimal> are implemented as functions within the class.

In the method Parse we begin to analyze from the rule <number>, checking if the first character is a number or a decimal point:

public override string Parse(string expr)
{
string snum = base.Parse(expr);
if (string.IsNullOrEmpty(snum)) { throw new NotRecognizedException(); }
if (char.IsNumber(snum, 0))
{
_value = Convert.ToDouble(snum[0].ToString());
if (snum.Length > 1)
{
return RNumber(snum.Substring(1));
}
return "";
}
else if ((snum[0] == ',') || (snum[0] == '.'))
{
if (snum.Length > 1)
{
_value = 0;
_decimalpos = 1;
return RDecimal(snum.Substring(1));
}
throw new BadSyntaxException(snum);
}
throw new NotRecognizedException();
}

Depending on if the first character is a number or a decimal point, we will take the decision to follow the rule <rnumber> or <rdecimal>. The value of the number will be going calculated on the _value variable, using the auxiliary _decimalpos variable to calculate the decimal part.

In the method RNumber we will go adding recursively the digits to the value of the number until find the end of the input or a decimal separator, to continue with the rule <rdecimal>:

private string RNumber(string expr)
{
if (char.IsNumber(expr, 0))
{
_value = (10 * _value) + Convert.ToDouble(expr[0].ToString());
if (expr.Length > 1)
{
return RNumber(expr.Substring(1));
}
return "";
}
else if ((expr[0] == ',') || (expr[0] == '.'))
{
if (expr.Length > 1)
{
_decimalpos = 1;
return RDecimal(expr.Substring(1));
}
throw new BadSyntaxException(expr);
}
return expr;
}

In the RDecimal method simply we check that the next character is a number, to move to analyze the rest of the decimal places using the rule <fdecimal>:

private string RDecimal(string expr)
{
if (char.IsNumber(expr, 0))
{
_value += Math.Round(Convert.ToDouble(expr[0].ToString()) /
(10 * _decimalpos), ExpressionBase._precision);
_decimalpos++;
if (expr.Length > 1)
{
return FDecimal(expr.Substring(1));
}
return "";
}
throw new BadSyntaxException(expr);
}

Finally, the FDecimal method recursively finish the calculation of the value of the decimal part of the number:

private string FDecimal(string expr)
{
if (char.IsNumber(expr, 0))
{
_value += Math.Round(Convert.ToDouble(expr[0].ToString()) /
(10 * _decimalpos), Precision);
_decimalpos++;
if (expr.Length > 1)
{
return FDecimal(expr.Substring(1));
}
return "";
}
return expr;
}

The Variable class

With this class we will implement the variables of the expression, according to the rule of the grammar:

<var>::={x,y,z}

Since the variables are stored in a collection, I have implemented the IComparable and IEquatable interfaces in order to check whether the collection contains the variable and, if not, add it. Two variables are equal if they have the same name, which is stored in the _name variable. The IComparable interface also allows us to order the collection.

This element is very easy to identify, as it simply consists of one of the letters x, y or z. It does not distinguish between uppercase and lowercase:

public override string Parse(string expr)
{
string svar = base.Parse(expr);
if (!string.IsNullOrEmpty(svar))
{
if ((char.ToLower(svar[0]) >= 'x') && (char.ToLower(svar[0]) <= 'z'))
{
_name = svar.Substring(0, 1).ToLower();
if (!ExpressionBase._variables.Contains<Variable>(this))
{
ExpressionBase._variables.Add(this);
}
if (svar.Length > 1)
{
return svar.Substring(1);
}
return "";
}
}
throw new NotRecognizedException();
}

The Constant class

To end this article, we will see how to build the Constant class to analyze constants, based on this set of rules:

<const>::=<letter><rconst>
<rconst>::=<allchar><rconst>|<void>
<letter>::={a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w}
<allchar>::={0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,g,h,i,j,
k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}

Again, the class will implement the main rule, <const>, and the secondary rule, <rconst> will be implemented in the form of a function. This class also implements the IComparable and IEquatable interfaces, as well as the Variable class, for the same reasons.

The Parse method first checks if the next character is a valid one, and if there are more characters, calls the RConst function to build the rest of the name of the constant:

public override string Parse(string expr)
{
string scon = base.Parse(expr);
if (!string.IsNullOrEmpty(scon))
{
if ((char.ToLower(scon[0]) >= 'a') && (char.ToLower(scon[0]) <= 'w'))
{
_name = scon.Substring(0, 1).ToLower();
if (scon.Length > 1)
{
return RConst(scon.Substring(1));
}
if (!ExpressionBase._constants.Contains(this))
{
ExpressionBase._constants.Add(this);
}
return "";
}
}
throw new NotRecognizedException();
}

And the RConst function continues adding characters to the name while it find letters or numbers:

private string RConst(string expr)
{
if (char.IsLetterOrDigit(expr, 0))
{
_name += expr.Substring(0, 1).ToLower();
if (expr.Length > 1)
{
return RConst(expr.Substring(1));
}
expr = "";
}
if (!ExpressionBase._constants.Contains(this))
{
ExpressionBase._constants.Add(this);
}
return expr;
}

In the following article I will show how to implement the expressions themselves. Meanwhile, as in the previous article, I leave you a list of books that can help you delve deeper into the subject:

Share this article: Share in Twitter Share in Facebook Share in Google Plus Share in LinkedIn
Comments (0):
* (Your comment will be published after revision)

E-Mail


Name


Web


Message


CAPTCHA
Change the CAPTCHA codeSpeak the CAPTCHA code