前言
在上篇文章里,我们着重讲了vscode插件的开发,对于语法检查的部分只是一笔带过,并且只是单纯的使用了正则,并没有把所有的情况都考虑在内,存在很多问题。
- 可能漏掉错误,存在检查漏洞。
- 不能根据用户写的代码来确定划线的范围,例如class="ab",划线范围并不包括ab。
- 不能很方便地实现font-family的规则检查。
这次,我们就使用ast抽象语法树
来优化之前写的检查语法部分解决遗留的问题,要使用的工具是htmlparser
。
抽象语法树(abstract syntax tree或者缩写为AST),或者语法树(syntax tree),是源代码的抽象语法结构的树状表现形式,这里特指编程语言的源代码。ast是解释器/编译器进行语法分析的基础,运用场景十分广泛:
JS:代码压缩、混淆、编译
CSS:代码兼容多版本
HTML:Vue中Virtual DOM的实现
htmlparser
简要地说,htmlparser包提供方便、简洁的处理html文件的方法,它将html页面中的标签按树形结构解析成一个一个结点,一种类型的结点对应一个类,通过调用其方法可以轻松地访问标签中的内容。
安装
npm install htmlparser
配置
// 配置
let handler = new htmlparser.DefaultHandler(
function (error: any, dom: any) {
if (error) {
}
},
{ ignoreWhitespace: true }//忽略换行和空格
);
let parser = new htmlparser.Parser(handler);
配置完成之后,我们就可以开始解析html了
解析html
//html文件每次发生变化时检查语法
documents.onDidChangeContent((change) => {
connection.window.showInformationMessage("validateTextDocument");
validateTextDocument(change.document);
});
let diagnostics: Diagnostic[] = [];
async function validateTextDocument(textDocument: TextDocument): Promise<void> {
// The validator creates diagnostics for all uppercase words length 2 and more
let text = textDocument.getText();
parser.parseComplete(text);//解析html
diagnostics = [];
testGrammar(handler.dom, textDocument);//handler.dom 就可以获得解析后的内容
}
function testGrammar(dom: any[], textDocument: TextDocument) {
let domStr = JSON.stringify(dom, null, 2);
logger.info(domStr);//利用log4js打印日志
}
//打印内容
[
{
"raw": "!DOCTYPE html",
"data": "!DOCTYPE html",
"type": "directive",
"name": "!DOCTYPE"
},
{
"raw": "html lang=\"en\"",
"data": "html lang=\"en\"",
"type": "tag",
"name": "html",
"attribs": {
"lang": "en"
},
"children": [
{
"raw": "head",
"data": "head",
"type": "tag",
"name": "head",
"children": [
{
"raw": "meta charset=\"UTF-8\"",
"data": "meta charset=\"UTF-8\"",
"type": "tag",
"name": "meta",
"attribs": {
"charset": "UTF-8"
}
},
{
"raw": "meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"",
"data": "meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\"",
"type": "tag",
"name": "meta",
"attribs": {
"http-equiv": "X-UA-Compatible",
"content": "IE=edge"
}
},
{
"raw": "meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"",
"data": "meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"",
"type": "tag",
"name": "meta",
"attribs": {
"name": "viewport",
"content": "width=device-width, initial-scale=1.0"
}
},
{
"raw": "title",
"data": "title",
"type": "tag",
"name": "title",
"children": [
{
"raw": "DOCUMENT",
"data": "DOCUMENT",
"type": "text"
}
]
},
{
"raw": "style",
"data": "style",
"type": "style",
"name": "style"
},
{
"raw": "link href=\"\" ",
"data": "link href=\"\"",
"type": "tag",
"name": "link",
"attribs": {
"href": ""
}
}
]
},
{
"raw": "body",
"data": "body",
"type": "tag",
"name": "body",
"children": [
{
"raw": "tr class = \"a1111\" style= \" position : relative; position : absolute; background: #333; margin : 0 auto; background: rgb(red, green, blue);background: url(ddd);\"",
"data": "tr class = \"a1111\" style= \" position : relative; position : absolute; background: #333; margin : 0 auto; background: rgb(red, green, blue);background: url(ddd);\"",
"type": "tag",
"name": "tr",
"attribs": {
"class": "a1111",
"style": " position : relative; position : absolute; background: #333; margin : 0 auto; background: rgb(red, green, blue);background: url(ddd);"
},
"children": [
{
"raw": "td style=\"color:#EF6C00;font-size: 14px!important;\"",
"data": "td style=\"color:#EF6C00;font-size: 14px!important;\"",
"type": "tag",
"name": "td",
"attribs": {
"style": "color:#EF6C00;font-size: 14px!important;"
},
"children": [
{
"raw": " \n ffffddddddd\n ",
"data": " \n ffffddddddd\n ",
"type": "text"
}
]
}
]
},
{
"raw": "tr",
"data": "tr",
"type": "tag",
"name": "tr",
"children": [
{
"raw": "td style=\"border-radius: 0!important;\"",
"data": "td style=\"border-radius: 0!important;\"",
"type": "tag",
"name": "td",
"attribs": {
"style": "border-radius: 0!important;"
},
"children": [
{
"raw": "div",
"data": "div",
"type": "tag",
"name": "div",
"children": [
{
"raw": "1111",
"data": "1111",
"type": "text"
}
]
},
{
"raw": "p",
"data": "p",
"type": "tag",
"name": "p",
"children": [
{
"raw": "222",
"data": "222",
"type": "text"
}
]
}
]
}
]
}
]
}
]
}
]
根据解析后的树结构,我们就可以轻松的去遍历内容,检查语法啦
edm语法规则
我统计好的语法主要是这些:
1.不要用margin,在大部分邮箱会失效
2.颜色使用十六进制,rgb在部分邮箱会失效
3.为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;
4.不要设置!important,会导致样式无效
5.img不要设置line-height,不然图片会出现空隙,行内样式设置vertical-align: top;(百度不设置这个也会莫名奇妙出现空隙)display:block;border:0;
6.多用
换行而非<p>
7.利用td实现空白书写,而不是margin。如果使用一个空td隔开,最好中间插入一个 不然outlook会没用
<tr>
<td height="34" valign="top"></td>
</tr>
<tr>
<td width="20"> </td>
</tr>
<tr>
<td width="" valign="top" style="padding-left:55px;">
<p>更多讲师</p>
<p>持续更新中......</p>
</td>
</tr>
8.邮箱被自动识别颜色变为浏览器给的默认颜色时,将放在a便签里,并设置style
9.不可使用position定位
10.不可以通过用class来设置样式,只能通过内联样式
11.不可以设置背景图片
12.border-radius无效
13.不要简写颜色,比如#fff,要完整写成#ffffff。简写的颜色在IE怪异模式下会出些小问题。
根据规则,我们要把testGrammar
方法写成递归去检查每个层级是否符合标准。
以style标签和class属性为例:
style标签不应该存在,在树结构中发现有style标签就去生成一个诊断结果,诊断结果的定位需要正则来帮忙。
将规则和提示定义在regList里,通过getDiagnostics
方法传入内容进行匹配。
上篇文章里我们将规则写死,可能用户增加或减少一个空格就会导致检查结果有误,所以这次我进行了优化,根据ast获得的具体内容来制定规则,例如reglist里的class。
function testGrammar(dom: any[], textDocument: TextDocument) {
let domStr = JSON.stringify(dom, null, 2);
logger.info(domStr);
dom.forEach((item: any) => {
if (item.type === "style") getDiagnostics("styleTag", textDocument);
if (item.type === "tag") {
let attribs = item.attribs || null;
if (attribs && "class" in attribs)
getDiagnostics("class", textDocument, attribs.class);
if (item.children && item.name !== "title") {
testGrammar(item.children, textDocument);
}
}
});
}
let regList = (content?: string): any => {
return {
link: {
reg: "<(|/)link.*>",
message:
"<link> is not supported by edm, use Inline CSS Style. \n edm不支持<link>标签,请使用内联样式",
},
class: {
reg: `${
content
? `class(\\s{0,}=\\s{0,}|=)"${content}"`
: `class(\\s{0,}=\\s{0,}|=)`
}`,
message:
"class is not supported by edm, use Inline CSS Style. \n edm不支持class属性,请使用内联样式",
},
}
}
function getDiagnostics(
content: string,
textDocument: TextDocument,
params?: string,
message?: string
) {
let m: RegExpExecArray | null;
let text = textDocument.getText();
let reg = params
? new RegExp(regList(params)[content].reg, "g")
: new RegExp(regList()[content].reg, "g");
while ((m = reg.exec(text))) {
let diagnostic: Diagnostic = {
severity: DiagnosticSeverity.Warning,
range: {
start: textDocument.positionAt(m.index),
end: textDocument.positionAt(m.index + m[0].length),
},
message: message ? message : regList()[content].message,
source: "edmHelper",
};
diagnostic.relatedInformation = [
{
location: {
uri: textDocument.uri,
range: Object.assign({}, diagnostic.range),
},
message: "edm grammar",
},
];
diagnostics.push(diagnostic);
}
connection.sendDiagnostics({ uri: textDocument.uri, diagnostics });
}
看一下效果
这样就很好的解决了之前存在的检测漏洞的问题,并且划线的范围也变得更加人性化。
font-family的规则检查
font-family的规则是最外层table标签上,以及包裹了文字的标签上,都要加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;
因为我开发edm时会使用edm模板,这里我们就只检查包裹了文字的标签上是否有正确地配置font-family。
function testGrammar(dom: any[], textDocument: TextDocument) {
let domStr = JSON.stringify(dom, null, 2);
logger.info(domStr);
dom.forEach((item: any) => {
if (item.type === "style") getDiagnostics("styleTag", textDocument);
if (item.type === "tag") {
let attribs = item.attribs || null;
if (attribs && "class" in attribs)
getDiagnostics("class", textDocument, attribs.class);
if (item.children && item.name !== "title") {
item.children.forEach((tag: any) => {
if (tag.type === "text") {
if (attribs && attribs.style) {
let object: any = {};
attribs.style.includes(";") &&
attribs.style.split(";").forEach((style: any) => {
if (style.includes(":")) {
object[style.split(":")[0].trim()] = style.split(":")[1];
}
});
if (
"font-family" in object &&
object["font-family"] !==
`'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif`
) {
getDiagnostics(
"font-family",
textDocument,
object["font-family"]
);
}
if (!("font-family" in object)) {
getDiagnostics(
"content",
textDocument,
`<\\s{0,}${item.raw}\\s{0,}>`,
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif"
);
}
} else {
//无字体
getDiagnostics(
"content",
textDocument,
`<\\s{0,}${item.raw}\\s{0,}>`,
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif"
);
}
}
});
testGrammar(item.children, textDocument);
}
}
});
}
regList也要加上相应的规则
"font-family": {
reg: `${
content
? `font-family(\\s{0,}:\\s{0,}|:)${content}`
: `font-family(\\s{0,}:\\s{0,}|:|)`
}`,
message:
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif",
},
content: {
reg: `${content ? `${content}` : ``}`,
},
效果:
没有写font-family的情况:
没有写规定的font-family时:
其他规则也是类似地在testGrammar里加上就好了。
server.ts完整代码
import {
createConnection,
TextDocuments,
Diagnostic,
DiagnosticSeverity,
ProposedFeatures,
InitializeParams,
TextDocumentSyncKind,
InitializeResult,
} from "vscode-languageserver/node";
import { TextDocument } from "vscode-languageserver-textdocument";
import { errorList, regList } from "./errorList";
let htmlparser = require("htmlparser");
import { configure, getLogger } from "log4js";
import * as _ from "lodash";
// Create a connection for the server, using Node's IPC as a transport.
// Also include all preview / proposed LSP features.
let connection = createConnection(ProposedFeatures.all);
// Create a simple text document manager.
let documents: TextDocuments<TextDocument> = new TextDocuments(TextDocument);
configure({
appenders: {
edm_helper: {
type: "dateFile",
filename: "/Users/grace/Documents/learn",
pattern: "yyyy-MM-dd-hh.log",
alwaysIncludePattern: true,
},
},
categories: { default: { appenders: ["edm_helper"], level: "debug" } },
});
const logger = getLogger("edm_helper");
let handler = new htmlparser.DefaultHandler(
function (error: any, dom: any) {
if (error) {
}
},
{ ignoreWhitespace: true }
);
let parser = new htmlparser.Parser(handler);
connection.onInitialize((params: InitializeParams) => {
const result: InitializeResult = {
capabilities: {
textDocumentSync: TextDocumentSyncKind.Full,
},
};
return result;
});
connection.onInitialized(() => {
connection.window.showInformationMessage("Hello World! form server side");
});
// The content of a text document has changed. This event is emitted
// when the text document first opened or when its content has changed.
documents.onDidChangeContent((change) => {
connection.window.showInformationMessage("validateTextDocument");
validateTextDocument(change.document);
});
function testGrammar(dom: any[], textDocument: TextDocument) {
let domStr = JSON.stringify(dom, null, 2);
logger.info(domStr);
dom.forEach((item: any) => {
if (item.name === "link") getDiagnostics("link", textDocument);
if (item.name === "div") getDiagnostics("div", textDocument);
if (item.type === "style") getDiagnostics("styleTag", textDocument);
if (item.type === "tag") {
let attribs = item.attribs || null;
if (attribs && "class" in attribs)
getDiagnostics("class", textDocument, attribs.class);
if (attribs && attribs.style) {
attribs.style.includes(";") &&
attribs.style.split(";").forEach((item: any) => {
if (item.includes(":")) {
if (item.split(":")[1].includes("rgb")) {
let str = item
.split(":")[1]
.replace(/\(/, "\\(")
.replace(/\)/, "\\)");
getDiagnostics("rgb", textDocument, str);
}
if (item.split(":")[1].includes("!important"))
getDiagnostics("!important", textDocument);
if (
item.split(":")[1].includes("#") &&
item.split(":")[1].trim().length === 4
)
getDiagnostics("color", textDocument, item.split(":")[1]);
let arr = [
"position",
"margin",
"border-radius",
"background-image",
];
if (_.includes(arr, item.split(":")[0].trim())) {
getDiagnostics(
item.split(":")[0].trim(),
textDocument,
item.split(":")[1]
);
}
if (
item.split(":")[0].trim() === "background" &&
item.split(":")[1].includes("url")
) {
let str = item
.split(":")[1]
.replace(/\(/, "\\(")
.replace(/\)/, "\\)");
getDiagnostics(item.split(":")[0].trim(), textDocument, str);
}
}
});
}
if (item.children && item.name !== "title") {
item.children.forEach((tag: any) => {
if (tag.type === "text") {
if (attribs && attribs.style) {
let object: any = {};
attribs.style.includes(";") &&
attribs.style.split(";").forEach((style: any) => {
if (style.includes(":")) {
object[style.split(":")[0].trim()] = style.split(":")[1];
}
});
if (
"font-family" in object &&
object["font-family"] !==
`'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif`
) {
getDiagnostics(
"font-family",
textDocument,
object["font-family"]
);
}
if (!("font-family" in object)) {
getDiagnostics(
"content",
textDocument,
`<\\s{0,}${item.raw}\\s{0,}>`,
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif"
);
}
} else {
//无字体
getDiagnostics(
"content",
textDocument,
`<\\s{0,}${item.raw}\\s{0,}>`,
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif"
);
}
}
});
testGrammar(item.children, textDocument);
}
}
});
}
function getDiagnostics(
content: string,
textDocument: TextDocument,
params?: string,
message?: string
) {
let m: RegExpExecArray | null;
let text = textDocument.getText();
let reg = params
? new RegExp(regList(params)[content].reg, "g")
: new RegExp(regList()[content].reg, "g");
while ((m = reg.exec(text))) {
let diagnostic: Diagnostic = {
severity: DiagnosticSeverity.Warning,
range: {
start: textDocument.positionAt(m.index),
end: textDocument.positionAt(m.index + m[0].length),
},
message: message ? message : regList()[content].message,
source: "edmHelper",
};
diagnostic.relatedInformation = [
{
location: {
uri: textDocument.uri,
range: Object.assign({}, diagnostic.range),
},
message: "edm grammar",
},
];
diagnostics.push(diagnostic);
}
connection.sendDiagnostics({ uri: textDocument.uri, diagnostics });
}
let diagnostics: Diagnostic[] = [];
async function validateTextDocument(textDocument: TextDocument): Promise<void> {
// The validator creates diagnostics for all uppercase words length 2 and more
let text = textDocument.getText();
parser.parseComplete(text);
diagnostics = [];
testGrammar(handler.dom, textDocument);
}
// Make the text document manager listen on the connection
// for open, change and close text document events
documents.listen(connection);
// Listen on the connection
connection.listen();
regList
export let regList = (content?: string): any => {
return {
link: {
reg: "<(|/)link.*>",
message:
"<link> is not supported by edm, use Inline CSS Style. \n edm不支持<link>标签,请使用内联样式",
},
styleTag: {
reg: "<(|/)style.*>",
message:
"<style> is not supported by edm, use Inline CSS Style. \n edm不支持<style>标签,请使用内联样式",
},
div: {
reg: "<(|/)div.*>",
message:
"<div> is not supported by edm,use TABLE not DIV. \n edm不支持<div>标签,请使用table",
},
class: {
reg: `${
content
? `class(\\s{0,}=\\s{0,}|=)"${content}"`
: `class(\\s{0,}=\\s{0,}|=)`
}`,
message:
"class is not supported by edm, use Inline CSS Style. \n edm不支持class属性,请使用内联样式",
},
position: {
reg: `${
content
? `position(\\s{0,}:\\s{0,}|:)${content}`
: `position(\\s{0,}:\\s{0,}|:|)`
}`,
message: "position is not support by edm \n edm不支持position定位",
},
margin: {
reg: `${
content
? `margin(\\s{0,}:\\s{0,}|:)${content}`
: `margin(\\s{0,}:\\s{0,}|:|)`
}`,
message:
"margin will fail in some mailboxes, use padding. \n margin在部分邮箱会失效,请使用padding",
},
"border-radius": {
reg: `${
content
? `border-radius(\\s{0,}:\\s{0,}|:)${content}`
: `border-radius(\\s{0,}:\\s{0,}|:|)`
}`,
message:
"border-radius is not support by edm, use Image. \n edm不支持border-radius,请使用图片来实现圆角",
},
"background-image": {
reg: `${
content
? `background-imag(\\s{0,}:\\s{0,}|:)${content}`
: `background-imag(\\s{0,}:\\s{0,}|:|)`
}`,
message: "background-imag is not support by edm.\n edm不支持背景图片",
},
"font-family": {
reg: `${
content
? `font-family(\\s{0,}:\\s{0,}|:)${content}`
: `font-family(\\s{0,}:\\s{0,}|:|)`
}`,
message:
"font-family must be `font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif;`\n为了让outlook客户端显示指定字体,请在最外层table标签上,以及包裹了文字的标签上,加上font-family: 'Microsoft Yahei UI', Verdana,'Segoe UI', -apple-system, BlinkMacSystemFont, Roboto, 'Helvetica Neue', 'WenQuanYi Micro Hei', Arial, sans-serif",
},
"!important": {
reg: `!important`,
message: "!important is not support by edm. \n !important会导致样式无效",
},
background: {
reg: `${content ? `background(\\s{0,}:\\s{0,}|:)${content}` : ``}`,
message: "background image is not support by edm. \n edm不支持背景图片",
},
rgb: {
reg: `${content ? `${content}` : ``}`,
message:
"background image is not support by edm. \n edm不支持rgb,请使用十六进制颜色",
},
color: {
reg: `${content ? `${content}` : ``}`,
message: "do not abbreviate color, such as #fff, write it as #ffffff",
},
content: {
reg: `${content ? `${content}` : ``}`,
},
};
};