Universal object compiler using BNF rules I
In this series of articles I will show a class library that implements a compiler that uses any language defined by BNF rules and that generates objects from a user-written class library, which must implement a simple interface so that the compiler can construct and initialize them from the source code.
In this link you can download the source code of the BNFUP project, with the class library that implements the compiler, the BNF rules editor and three examples of class libraries that implement objects that can be compiled from three different languages. It is written in CSharp using Visual Studio 2015.
Language syntax
First, I will define the format of the BNF rules that I have used to define each of the languages, also using BNF rules:
<<rulelist>> ::= <rule> [<rulelist>]
<rule> ::= <rulename> '::=' <defrule> ';'
<rulename> ::= <ruleid>
| '<<' <identifier> '>>'
<defrule> ::= <defrulech> ['|' <defrule>]
<defrulech> ::= <item> [<defrulech>]
<ruleid> ::= '<' <identifier> '>'
<identifier> ::= {a-zA-Z} [<rid>]
<rid> ::= {a-zA-Z0-9} [<rid>]
<item> ::= <token>
| <ltoken>
| <charset>
| <ruleid>
| '[' <item> ']'
<token> ::= ''' {.} [<rtoken>] '''
| ''' '\'' [<rtoken>] '''
<rtoken> ::= {.} [<rtoken>]
| '\'' [<rtoken>]
<ltoken> ::= <token> [',' <ltoken>]
<charset> ::= '{' <charlist> '}'
<charlist> ::= '\}' [<charlist>]
| <char> [<charlist>]
The root rule by which the compiler start interpreting the code is marked using a double sign << and >>, in this case it is rulelist, a list of rules.
The rule rule defines the syntax of each of the rules in the list. Its name is defined and the token ::= is used to separate it from its definition. The token ; is used To mark the end of the definition. The tokens belonging to the language should be written between single quotes.
The token | is used to separate several alternative definitions for the same rule. If a part of the rule is optional, it must be enclosed in brackets [ and ].
The elements that can form the definition of a rule are the token, the token list (ltoken), which defines several possible tokens for a same position of the code, a set of characters, enclosed in braces { and }, that must be written using the character set syntax of the regular expressions used by the .NET framework, and the name of any rule, which may be also the rule that we are defining.
For example, this would be the grammar corresponding to the language of this article about the design of the grammar for an expressions analyser:
<number>::=<sdigit>[<rnumber>];
<rnumber>::=<digit>[<rnumber>]
|<decimalsep><rdecimal>;
<rdecimal>::=<digit>[<rdecimal>];
<sdigit>::={-0-9};
<digit>::={0-9};
<decimalsep>::={,\.};
<var>::={xyz};
<const>::=<letter>[<rconst>];
<rconst>::=<allchar>[<rconst>];
<letter>::={a-w};
<allchar>::={0-9a-z};
<<expr>>::=<expr2>['+','-'<expr>];
<expr2>::=<expr1>['*','/'<expr2>];
<expr1>::=<expr0>['^'<expr1>];
<expr0>::=['-']<element>;
<element>::=<pexpr>
|<number>
|<const>
|<var>;
<pexpr>::='('<expr>')';
The BNFUP class library
The BNFUP.dll class library implements the object compiler and defines the necessary interfaces with which the objects of a class library can be constructed by compiling a source code file of the language that we have designed for our objects.
First, your class library must contain a single class that implements the ICompilableObjectFactory interface, defined in the BNFUP.Interfaces namespace, and that must implement a constructor without parameters:
public interface ICompilableObjectFactory
{
ICompilableObject Object { get; }
void Init();
ICompilableObject CreateObject(string ctype, IRuleItem item);
}
Using the Object property you can get the final object built by the compiler. Init is a method that allows you to perform initialization actions at the beginning of the compilation, and CreateObject is the method that will be called whenever it is necessary to create an object. The ctype parameter is a text string with the unique identifier of the type of object that is to be created, which you can see how it is defined in the section about the rule editor, and the item parameter contains the element of the language that triggered the creation of the object, although usually it won't be necessary to use it.
Each of the objects that will be handled by the compiler must implement the ICompilableObject interface, also defined in the BNFUP.Interfaces namespace:
public interface ICompilableObject
{
string Text { get; set; }
bool AddItem(ICompilableObject item);
ICompilableObject Simplify();
void Test(Form fmain);
}
The compiler uses the Text property to read and write the text of the source code that gave rise to the object. By means of the AddItem method, it is passed to a composite object its component objects, such as the arguments and the operator in an arithmetic expression. The Simplify method can be used to return a simplified version of the object. You can return the object itself, if it does not need simplification, or a simplified version of it. For example, an arithmetic expression consisting only of a number can return the number itself as a result.
Finally, there is a Test method that allows you to implement testing on the object once it is built.
You can implement your own version of the object's ToString method to provide debugging information to the built-in compiler in the rule editor.
The compiler itself is implemented in the RuleTable class, in the BNFUP.Rules namespace. You can construct an object of this type using the static method RuleTable.FromFile, passing it as parameter the path of a file generated with the rule editor that we will see next.
The RuleTable class has a Factory property to which we must assign an instance of a ICompilableObjectFactory class before it can be used. Then it is only necessary to call the Build method, passing as a parameter a TextReader object with the source code to be compiled, which will return the generated object.
The BNFUPEditor rule editor
To facilitate the definition of language rules, I have implemented an editor in the BNFUPEditor project that will allow you to do it quickly and easily.
Although you can define the language definition rule by rule from scratch using the editor, with the New option in the File menu, the quickest way to start is to write the rules in a text file and use the Open... option in the File menu to build them from this file. With this option you can also open binary files with bnf extension in the binary format of the program.
On the left side of the form is the list with the rules. At the top right is the definition of the selected rule as a hierarchical list, and at the bottom right you can edit the properties of the element selected in this list.
The elements with which you can construct a rule are the following:
- Rule: any simple rule. You can use any rule defined in the language.
- Alternative rules: Alternative rules are simply a list of simple rules, from which the compiler can find any one in the corresponding position of the source code. The rules that compose them do not appear in the list on the right.
- Token: Any token of those defined in the language.
- Token List: A list of tokens, one of which must be found in the corresponding position of the source code.
- Character set: that determines a character from among those defined in the set that must appear in the corresponding position of the source code.
- List of elements: It is an auxiliary component that allows you to group other components. The main function is to mark one or more components as optional at any point in the source code.
All components have a Compiler Object ID property, which allows defining the type of object that must be generated if the element is found in the source code. If this property is left blank, no object will be generated.
All elements, except the rules, have an Optional property that allows you to mark them as optional when compiling, and may or may not appear in the source code. If you want to mark a rule as optional, a list of elements containing it should be used.
The name of the rules is defined with the Name property. This name must be unique. The equivalent property for tokens is Token, and Char Set for character sets.
The rules also have a Root property that allows you to mark one of them as the main rule by which the source code begins to be interpreted.
Finally, with the Color property you can highlight any element with a certain color in the list of rules.
To edit the contents of a rule or the elements that compose it, you must select the element in the hierarchical list and display the context menu of options using the right mouse button.
The options that appear in this menu depend on the type of item that you have selected. The New Child... option allows you to add a new component of the appropriate type to composite objects, which are the rules and lists. A dialog box will appear with all the types of elements that can be added:
In this dialog box you can find two types of elements. Those that are already defined appear with their name or content. If you select an empty element of any type, a new element of that type will be created.
If the Insert At control appears, you can indicate the position in which you want to insert the new element.
With the Delete option you can delete the selected item. This will not cause the element to be completely deleted; it is only removed from the element you are editing.
The Cut, Copy and Paste options have the usual use.
The Free Items option will delete the list and its components will be located at the level that previously occupied the list.
The Enlist option will create a list of items and put the selected item inside.
The Extract option allows you to extract an item from the list that contains it, passing it to the top level.
Move Up and Move Down allow you to change the position of an item within a list.
The simple rules can be transformed into alternative ones using the option Add alternatives. For compound rules, there is the Simplify option that performs the opposite operation, making them a simple rule provided that they contain only one alternative.
To create new rules or to delete an existing one, there are three buttons on the toolbar of the form, one to create a new simple rule, another to create a new compound rule and another one to delete the selected rule. If you remove a rule, it will disappear entirely from the list and from all other rule positions where it appear.
In the next article I will show you how to use the editor to compile and test objects and a few examples.