好吧在上一章中的Pascal语法并不准确,接下来,我们要修正那些语法错误,并加入更多的Pascal语言特性,本章我们将更接近真正的Pascal解释器,完成形如以下代码的解释:
PROGRAM Part10;
VAR
number : INTEGER;
a, b, c, x : INTEGER;
y : REAL;
BEGIN {Part10}
BEGIN
number := 2;
a := number;
b := 10 * a + 10 * number DIV 4;
c := a - - b
END;
x := 11;
y := 20 / 7 + 3.14;
{ writeln('a = ', a); }
{ writeln('b = ', b); }
{ writeln('c = ', c); }
{ writeln('number = ', number); }
{ writeln('x = ', x); }
{ writeln('y = ', y); }
END. {Part10}
更新文法
我们新增了PRGRAM保留字,一片VAR区块,还有注释特性,更新文法如下:
program : PROGRAM variable SEMI block DOT
block : declarations compound_statement
declarations : VAR (variable_declaration SEMI)+
| empty
variable_declaration : ID (COMMA ID)* COLON type_spec
type_spec : INTEGER | REAL
compound_statement : BEGIN statement_list END
statement_list : statement
| statement SEMI statement_list
statement : compound_statement
| assignment_statement
| empty
assignment_statement : variable ASSIGN expr
empty :
expr : term ((PLUS | MINUS) term)*
term : factor ((MUL | INTEGER_DIV | FLOAT_DIV) factor)*
factor : PLUS factor
| MINUS factor
| INTEGER_CONST
| REAL_CONST
| LPAREN expr RPAREN
| variable
variable: ID
更新词法分析器
根据新的文法,我们新增保留字PROGRAM,VAR等,新增一个skipComments用来跳过注释,把integer函数换成number函数用来同时返回INTEGER类型或者REAL类型,注意Pascal的除法整型和浮点型使用不同的符号,整数除法是"DIV"这个保留字:
#include "Tokenizer/Lexer.h"
#include "Tokenizer/LexerException.h"
#include <ctype.h>
#include <map>
using namespace std;
static const std::map<std::string, TokenPtr> REVERSED_WORDS =
{
{"PROGRAM",make_shared<Token>(TokenType::TT_PROGRAM,"PROGRAM")},
{"VAR",make_shared<Token>(TokenType::TT_VAR,"VAR")},
{"DIV",make_shared<Token>(TokenType::TT_INTEGER_DIV,"DIV")},
{"INTEGER",make_shared<Token>(TokenType::TT_INTEGER,"INTEGER")},
{"REAL",make_shared<Token>(TokenType::TT_REAL,"REAL")},
{"BEGIN",make_shared<Token>(TokenType::TT_BEGIN,"BEGIN")},
{"END",make_shared<Token>(TokenType::TT_END,"END") },
};
void Lexer::advance()
{
currentPos += 1;
if (currentPos >= strlen(text.c_str()))
{
currentChar = '\0';
return;
}
currentChar = text.c_str()[currentPos];
}
void Lexer::skipWhiteSpaces()
{
while (isSpace(currentChar))
{
advance();
}
}
void Lexer::skipComments()
{
while (currentChar != '}')
{
advance();
}
advance();
}
TokenPtr Lexer::number()
{
std::string result = "";
while (isDigit(currentChar))
{
result += currentChar;
advance();
}
if (currentChar == '.')
{
result += currentChar;
advance();
while (isDigit(currentChar))
{
result += currentChar;
advance();
}
return make_shared<Token>(TokenType::TT_REAL_CONST,std::stof(result));
}
return make_shared<Token>(TokenType::TT_INTEGER_CONST,std::stoi(result));
}
bool Lexer::isSpace(char c)
{
return c == ' ' || c == '\t' || c == '\n';
}
bool Lexer::isDigit(char c)
{
return c >= '0' && c <= '9';
}
void Lexer::raiseError()
{
throw LexerException();
}
char Lexer::peek()
{
if (currentPos + 1 >= strlen(text.c_str()))
{
return NULL;
}
return text.c_str()[currentPos + 1];
}
TokenPtr Lexer::id()
{
std::string result = "";
while (isalnum(currentChar))
{
result += currentChar;
advance();
}
if (REVERSED_WORDS.find(result) == REVERSED_WORDS.end())
{
return make_shared<Token>(TokenType::TT_ID,result);
}
else
{
return REVERSED_WORDS.at(result);
}
}
Lexer::Lexer(std::string txt) :text(txt)
{
currentChar = txt.c_str()[0];
currentPos = 0;
}
TokenPtr Lexer::getNextTokenPtr()
{
while (currentChar != '\0' && currentPos < strlen(text.c_str()))
{
if (isSpace(currentChar))
{
skipWhiteSpaces();
}
if (currentChar == '{')
{
advance();
skipComments();
continue;
}
if (currentChar == '\0')
{
return make_shared<Token>(TokenType::TT_EOF,NULL);
}
TokenType tokenType;
TokenValue tokenValue;
if (isDigit(currentChar))
{
return number();
}
if (isalpha(currentChar))
{
return id();
}
switch (currentChar)
{
case ',':
tokenType = TokenType::TT_COMMA;
tokenValue = ",";
break;
case '.':
tokenType = TokenType::TT_DOT;
tokenValue = ".";
break;
case ':':
if (peek() == '=')
{
advance();
tokenType = TokenType::TT_ASSIGN;
tokenValue = ":=";
}
else
{
tokenType = TokenType::TT_COLON;
tokenValue = ":";
}
break;
case ';':
tokenType = TokenType::TT_SEMI;
tokenValue = ";";
break;
case '+':
tokenType = TokenType::TT_PLUS;
tokenValue = "+";
break;
case '-':
tokenType = TokenType::TT_MINUS;
tokenValue = "-";
break;
case '*':
tokenType = TokenType::TT_MULTI;
tokenValue = "*";
break;
case '/':
tokenType = TokenType::TT_FLOAT_DIV;
tokenValue = "/";
break;
case '(':
tokenType = TokenType::TT_LPAREN;
tokenValue = "(";
break;
case ')':
tokenType = TokenType::TT_RPAREN;
tokenValue = ")";
break;
default:
raiseError();
break;
}
advance();
return make_shared<Token>(tokenType, tokenValue);
}
return make_shared<Token>(TokenType::TT_EOF, NULL);
}
更新ASTNode
下一步,根据文法,更新所需的node类,我们新增ProgramNode、BlockNode、TypeNode、VarDeclNode四个node:
class ProgramNode :public AbstractSyntaxTreeNodeBase
{
private:
std::string programName;
BlockNodePtr blockNode;
public:
ProgramNode(std::string pName, BlockNodePtr node);
std::string getProgramName();
BlockNodePtr getBlockNodePtr();
GET_CLASS_NAME(ProgramNode)
};
typedef std::shared_ptr<ProgramNode> ProgramNodePtr;
class BlockNode :public AbstractSyntaxTreeNodeBase
{
private:
std::vector<VarDeclNodePtr> declarations;
CompoundNodePtr compoundNodePtr;
public:
BlockNode(std::vector<VarDeclNodePtr> decls, CompoundNodePtr cNode);
std::vector<VarDeclNodePtr> getDeclarations();
CompoundNodePtr getCompound();
GET_CLASS_NAME(BlockNode)
};
typedef std::shared_ptr<BlockNode> BlockNodePtr;
class TypeNode :public AbstractSyntaxTreeNodeBase
{
private:
TokenPtr tokenPtr;
public:
TypeNode(TokenPtr token);
TokenPtr getTokenPtr();
GET_CLASS_NAME(TypeNode)
};
typedef std::shared_ptr<TypeNode> TypeNodePtr;
class VarDeclNode : public AbstractSyntaxTreeNodeBase
{
private:
VarNodePtr varNodePtr;
TypeNodePtr typeNodePtr;
public:
VarDeclNode(VarNodePtr vNode, TypeNodePtr tNode);
VarNodePtr getVar();
TypeNodePtr getType();
GET_CLASS_NAME(VarDeclNode)
};
typedef std::shared_ptr<VarDeclNode> VarDeclNodePtr;
更新语法分析器
依旧地,对于文法中的每一条规则,语法分析器都需要实现一个对应的函数:
#include "Parser/ASTParser.h"
#include "Parser/ASTException.h"
#include "ASTNodes/UnaryOpNode.h"
#include "ASTNodes/CompoundNode.h"
#include "ASTNodes/VarNode.h"
#include "ASTNodes/AssignNode.h"
#include "ASTNodes/NoOpNode.h"
#include "ASTNodes/BlockNode.h"
#include <memory>
using namespace std;
void ASTParser::raiseError()
{
throw ASTException();
}
void ASTParser::eat(TokenType type)
{
if (currentTokenPtr->getType() != type)
{
raiseError();
return;
}
currentTokenPtr = lexerPtr->getNextTokenPtr();
}
BlockNodePtr ASTParser::block()
{
std::vector<VarDeclNodePtr> declarationNodes = declarations();
CompoundNodePtr compoundStatementNode = compoundStatement();
return std::make_shared<BlockNode>(declarationNodes, compoundStatementNode);
}
std::vector<VarDeclNodePtr> ASTParser::declarations()
{
std::vector<VarDeclNodePtr> result;
if (currentTokenPtr->getType() == TokenType::TT_VAR)
{
eat(TokenType::TT_VAR);
while (currentTokenPtr->getType() == TokenType::TT_ID)
{
auto tmp = variableDeclaration();
result.insert(result.end(), std::make_move_iterator(tmp.begin()), std::make_move_iterator(tmp.end()));
eat(TokenType::TT_SEMI);
}
}
return result;
}
std::vector<VarDeclNodePtr> ASTParser::variableDeclaration()
{
std::vector<VarNodePtr> varNodes;
VarNodePtr node = variable();
varNodes.push_back(node);
while (currentTokenPtr->getType() == TokenType::TT_COMMA)
{
eat(TokenType::TT_COMMA);
VarNodePtr tmp = variable();
varNodes.push_back(tmp);
}
eat(TokenType::TT_COLON);
TypeNodePtr typeNode = typeSpec();
std::vector<VarDeclNodePtr> result;
for (auto varNode : varNodes)
{
result.push_back(make_shared<VarDeclNode>(varNode, typeNode));
}
return result;
}
TypeNodePtr ASTParser::typeSpec()
{
TokenPtr token = currentTokenPtr;
if (currentTokenPtr->getType() == TokenType::TT_INTEGER)
{
eat(TokenType::TT_INTEGER);
return make_shared<TypeNode>(token);
}
if (currentTokenPtr->getType() == TokenType::TT_REAL)
{
eat(TokenType::TT_REAL);
return make_shared<TypeNode>(token);
}
raiseError();
return NULL;
}
ProgramNodePtr ASTParser::program()
{
eat(TokenType::TT_PROGRAM);
VarNodePtr programVar = variable();
string programName = get<string>(programVar->getTokenPtr()->getValue());
eat(TokenType::TT_SEMI);
BlockNodePtr blockNode = block();
ProgramNodePtr programNode = make_shared<ProgramNode>(programName, blockNode);
eat(TokenType::TT_DOT);
return programNode;
}
CompoundNodePtr ASTParser::compoundStatement()
{
eat(TokenType::TT_BEGIN);
CompoundNodePtr root = make_shared<CompoundNode>(statementList());
eat(TokenType::TT_END);
return root;
}
std::vector<ASTNodePtr> ASTParser::statementList()
{
ASTNodePtr node = statement();
std::vector<ASTNodePtr> result = { node };
while (currentTokenPtr->getType() == TokenType::TT_SEMI)
{
eat(TokenType::TT_SEMI);
result.push_back(statement());
}
if (currentTokenPtr->getType() != TokenType::TT_END)
{
raiseError();
}
return result;
}
ASTNodePtr ASTParser::statement()
{
if (currentTokenPtr->getType() == TokenType::TT_BEGIN)
{
return compoundStatement();
}
if (currentTokenPtr->getType() == TokenType::TT_ID)
{
return assignStatement();
}
return empty();
}
ASTNodePtr ASTParser::assignStatement()
{
ASTNodePtr left = variable();
TokenPtr opToken = currentTokenPtr;
eat(TokenType::TT_ASSIGN);
ASTNodePtr right = expr();
return make_shared<AssignNode>(left, opToken, right);
}
VarNodePtr ASTParser::variable()
{
VarNodePtr varNode = make_shared<VarNode>(currentTokenPtr);
eat(TokenType::TT_ID);
return varNode;
}
ASTNodePtr ASTParser::empty()
{
return make_shared<NoOpNode>();
}
ASTParser::ASTParser(std::string txt)
{
lexerPtr = make_shared<Lexer>(txt);
currentTokenPtr = lexerPtr->getNextTokenPtr();
}
ASTNodePtr ASTParser::parse()
{
ProgramNodePtr programNode = program();
if (currentTokenPtr->getType() != TokenType::TT_EOF)
{
raiseError();
}
return programNode;
}
ASTNodePtr ASTParser::expr()
{
ASTNodePtr node = term();
while (currentTokenPtr->getType() == TokenType::TT_PLUS ||
currentTokenPtr->getType() == TokenType::TT_MINUS)
{
TokenPtr tokenPtr = currentTokenPtr;
eat(tokenPtr->getType());
node = make_shared<BinOpNode>(node, tokenPtr, term());
}
return node;
}
ASTNodePtr ASTParser::term()
{
ASTNodePtr node = factory();
while (currentTokenPtr->getType() == TokenType::TT_MULTI ||
currentTokenPtr->getType() == TokenType::TT_INTEGER_DIV ||
currentTokenPtr->getType() == TokenType::TT_FLOAT_DIV)
{
TokenPtr tokenPtr = currentTokenPtr;
eat(currentTokenPtr->getType());
node = make_shared<BinOpNode>(node, tokenPtr, factory());
}
return node;
}
ASTNodePtr ASTParser::factory()
{
TokenPtr tokenPtr = currentTokenPtr;
if (tokenPtr->getType() == TokenType::TT_PLUS ||
tokenPtr->getType() == TokenType::TT_MINUS)
{
eat(tokenPtr->getType());
return make_shared<UnaryOpNode>(tokenPtr, factory());
}
if (tokenPtr->getType() == TokenType::TT_INTEGER_CONST)
{
eat(TokenType::TT_INTEGER_CONST);
return make_shared<NumNode>(tokenPtr);
}
if (tokenPtr->getType() == TokenType::TT_REAL_CONST)
{
eat(TokenType::TT_REAL_CONST);
return make_shared<NumNode>(tokenPtr);
}
if (tokenPtr->getType() == TokenType::TT_LPAREN)
{
eat(TokenType::TT_LPAREN);
ASTNodePtr result = expr();
eat(TokenType::TT_RPAREN);
return result;
}
if (tokenPtr->getType() == TokenType::TT_ID)
{
return variable();
}
raiseError();
return NULL;
}
更新Visitor
最后,在Interpreter中实现这些新增的node对应的visit函数:
#include "NodeVisitor/Interpreter.h"
#include "NodeVisitor/InterpreterException.h"
#include "ASTNodes/UnaryOpNode.h"
using namespace std;
#define INTERPRETER_VISITOR(NODE_NAME) DEFINE_VISITOR(Interpreter,NODE_NAME)
void Interpreter::raiseError(string content)
{
throw InterpreterException(content);
}
INTERPRETER_VISITOR(NumNode)
{
auto numNodePtr = dynamic_pointer_cast<NumNode>(nodePtr);
return numNodePtr->getTokenPtr()->getValue();
}
template<typename LT, typename RT>
TokenValue calculateBinOp(LT left, TokenType opType, RT right)
{
switch (opType)
{
case TokenType::TT_PLUS:
return left + right;
case TokenType::TT_MINUS:
return left - right;
case TokenType::TT_FLOAT_DIV:
return left / right;
case TokenType::TT_INTEGER_DIV:
return (int)(left / right);
case TokenType::TT_MULTI:
return left * right;
default:
throw InterpreterException("Syntax error.");
break;
}
return NULL;
}
INTERPRETER_VISITOR(BinOpNode)
{
auto binOpNodePtr = dynamic_pointer_cast<BinOpNode>(nodePtr);
ASTNodePtr leftNodePtr = binOpNodePtr->left();
ASTNodePtr rightNodePtr = binOpNodePtr->right();
TokenValue leftValue = this->visit(leftNodePtr);
TokenValue rightValue = this->visit(rightNodePtr);
if (std::holds_alternative<int>(leftValue) && std::holds_alternative<int>(rightValue))
{
return calculateBinOp<int, int>(get<int>(leftValue), binOpNodePtr->getOpTokenPtr()->getType(), get<int>(rightValue));
}
else if (std::holds_alternative<int>(leftValue) && std::holds_alternative<float>(rightValue))
{
return calculateBinOp<int, float>(get<int>(leftValue), binOpNodePtr->getOpTokenPtr()->getType(), get<float>(rightValue));
}
else if (std::holds_alternative<float>(leftValue) && std::holds_alternative<int>(rightValue))
{
return calculateBinOp<float, int>(get<float>(leftValue), binOpNodePtr->getOpTokenPtr()->getType(), get<int>(rightValue));
}
else if (std::holds_alternative<float>(leftValue) && std::holds_alternative<float>(rightValue))
{
return calculateBinOp<float, float>(get<float>(leftValue), binOpNodePtr->getOpTokenPtr()->getType(), get<float>(rightValue));
}
raiseError("Syntax error.");
}
INTERPRETER_VISITOR(UnaryOpNode)
{
auto unaryOpNode = dynamic_pointer_cast<UnaryOpNode>(nodePtr);
if (unaryOpNode->getOpTokenPtr()->getType() == TokenType::TT_PLUS)
{
return +get<int>(this->visit(unaryOpNode->getRightNodePtr()));
}
if (unaryOpNode->getOpTokenPtr()->getType() == TokenType::TT_MINUS)
{
return -get<int>(this->visit(unaryOpNode->getRightNodePtr()));
}
raiseError("Syntax error");
return NULL;
}
INTERPRETER_VISITOR(CompoundNode)
{
auto children = dynamic_pointer_cast<CompoundNode>(nodePtr)->getChildren();
for (ASTNodePtr child : children)
{
this->visit(child);
}
return NULL;
}
INTERPRETER_VISITOR(NoOpNode)
{
return NULL;
}
INTERPRETER_VISITOR(AssignNode)
{
auto node = dynamic_pointer_cast<AssignNode>(nodePtr);
auto varNode = dynamic_pointer_cast<VarNode>(node->getLeftPtr());
globalVariables[get<std::string>(varNode->getTokenPtr()->getValue())] = this->visit(node->getRightPtr());
return NULL;
}
INTERPRETER_VISITOR(VarNode)
{
auto varNode = dynamic_pointer_cast<VarNode>(nodePtr);
auto varName = get<string>(varNode->getTokenPtr()->getValue());
if (globalVariables.find(varName) == globalVariables.end())
{
raiseError("Name error");
}
return globalVariables[varName];
}
INTERPRETER_VISITOR(ProgramNode)
{
auto node = dynamic_pointer_cast<ProgramNode>(nodePtr);
return this->visit(node->getBlockNodePtr());
}
INTERPRETER_VISITOR(BlockNode)
{
auto node = dynamic_pointer_cast<BlockNode>(nodePtr);
for (auto decl : node->getDeclarations())
{
this->visit(decl);
}
this->visit(node->getCompound());
return NULL;
}
INTERPRETER_VISITOR(VarDeclNode)
{
return NULL;
}
INTERPRETER_VISITOR(TypeNode)
{
return NULL;
}
Interpreter::Interpreter(std::string text)
{
ADD_TO_VISIT_MAP(NumNode);
ADD_TO_VISIT_MAP(BinOpNode);
ADD_TO_VISIT_MAP(UnaryOpNode);
ADD_TO_VISIT_MAP(CompoundNode);
ADD_TO_VISIT_MAP(NoOpNode);
ADD_TO_VISIT_MAP(AssignNode);
ADD_TO_VISIT_MAP(VarNode);
ADD_TO_VISIT_MAP(ProgramNode);
ADD_TO_VISIT_MAP(BlockNode);
ADD_TO_VISIT_MAP(VarDeclNode);
ADD_TO_VISIT_MAP(TypeNode);
astParserPtr = make_shared<ASTParser>(text);
}
TokenValue Interpreter::interpret()
{
ASTNodePtr rootNode = astParserPtr->parse();
return this->visit(rootNode);
}
这里对visitBinOpNode的实现比较丑陋,因为C++不是个动态类型语言,所以对所有可能性进行了枚举,当然我也可以用两个double类型来临时存储变量,但那样同样会引起TokenValue自动类型推导的问题,同时也对内存不友好,当TokenValue的类型越来越多时这将成为灾难,但至少现在我们只需要4个分支,先凑合用着。
至此,我们成功完成了Pascal语法的扩展,是不是很简单,然后写一段代码测试:
#include <string>
#include "Parser/ASTParser.h"
#include "NodeVisitor/Interpreter.h"
#include <iostream>
int main()
{
std::string text = R"(
PROGRAM part10;
VAR
number : INTEGER;
a,b,c : INTEGER;
d : REAL;
BEGIN {part10}
a:=1;
b:=5;
d:=1.2;
c:= a+b;
c:= a DIV b;
d:= 1.0 / b;
END.
)";
ASTParser parser(text);
Interpreter inter(text);
inter.interpret();
std::cin.get();
return 0;
}
运行以后查看inter的globalVariables,可以看到所有变量都被正确地赋值了,再看看text的内容,我们的解释器越来越像那么回事了。