Discrete Mathematics and Computer Science Question Show that
Discrete Mathematics and Computer Science Question:
Show that EBNF (extended Backus-Naur form) and BNF (Backus-Naur form) can generate the same languages by describing how productions for a grammar in EBNF can be translated into a set of productions for the grammar in BNF.
Solution
Extended Backus-Naur form (EBNF) differs from BNF in several places:
It (usually) does not use brackets to denote nonterminals. We can still tell which symbols are nonterminal by looking at all the left-hand sides (the names defined there are nonterminals, the rest of the symbols are terminal). It is then easy to recover the < > for nonterminals
? symbol can be used to indicate that some part (a symbol or group of symbols enclosed in parentheses) of the production is optional. For example S ::= a?bcde? is a simple EBNF style grammar that can produce strings abcde, abcd, bcde, bcd as taking a and e is optional. The way to capture that in BNF is to have 2 productions for each ?x: one with x and one without. In our example, the BNF equivalent is then:
<S> ::= abcde | bcde | abcd | bcd – this is obtained in 2 steps, first we remove ?a to get
<S> ::= abcd?e | bcd?e and then we remove ?e.
• * symbol is used to indicate that something occurs zero or more times. For example S ::= a*b describes all strings with arbitrary num- ber of a’s in front and ending with b (so the language described by this grammar is {anb|n N}). We can simulate x* in BNF by creating a new nonterminal object, let us call it <Xstar>. Now, production for <Xstar> will be
<Xstar> ::= x<Xstar>| and we will use <Xstar> where x* was in the original grammar.
S ::= a*b becomes then
<S> ::= <Astar>b <Astar> ::= a<Astar>|
+ symbol is used to indicate a symbol occurs one or more times. But notice that x+ is the same as xx* so we can exchange them – and we already know how to transform x*.
Now how to transform a more complicated EBNF expression? We just have to keep transforming the grammar step by step until all the + and * symbols are gone. The result may not be optimal in terms of size (number of rules), but the two grammars will describe the same language.
