Hyperscan (introduction)

Introduction

简介

Hyperscan is a software regular expression matching engine designed with high performance and flexibility in mind. It is implemented as a library that exposes a straightforward C API.

HS 是一个以性能和灵活性为宗旨而设计的正则匹配引擎,作为一个单独的库,通过提供C API的形式对外提供服务。

The Hyperscan API itself is composed of two major components:

HS API包括两个部分:编译与扫描。

Compilation

编译

These functions take a group of regular expressions, along with identifiers and option flags, and compile them into an immutable database that can be used by the Hyperscan scanning API. This compilation process performs considerable analysis and optimization work in order to build a database that will match the given expressions efficiently.

编译相关的功能函数按照自定义编译配置参数把一组正则表达式,表达式ID,编译成一个互斥的数据库,编译过程中,会做表达式的词义分析以及优化,为后续的扫描阶段做准备工作。

If a pattern cannot be built into a database for any reason (such as the use of an unsupported expression construct, or the overflowing of a resource limit), an error will be returned by the pattern compiler.

如果一组特征表达式中任何一个特征表达式编译失败了,比如不支持的正则语法或者编译超过资源限制,编译模块会返回一个错误。

Compiled databases can be serialized and relocated, so that they can be stored to disk or moved between hosts. They can also be targeted to particular platform features (for example, the use of Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions).

编译后的特征数据库可以序列化,这样可以把编译后的结果存储到磁盘上,或者传送到其他的主机。

Scanning

扫描

Once a Hyperscan database has been created, it can be used to scan data in memory. Hyperscan provides several scanning modes, depending on whether the data to be scanned is available as a single contiguous block, whether it is distributed amongst several blocks in memory at the same time, or whether it is to be scanned as a sequence of blocks in a stream.

一旦特征数据库生成后,那么我们可以用其扫描内存中的数据了。根据数据在内存中是做为连续的一块,还是数据在内存中分了几块,亦或是作为数据流分成不同的数据块三种不同的场景,HS提供不同的扫描模式。

Matches are delivered to the application via a user-supplied callback function that is called synchronously for each match.

应用程序通过提供一个回调函数,在匹配的过程中,如果有特征匹配成功,回同步回屌回来。

For a given database, Hyperscan provides several guarantees:

对于一个给定的特征数据库,HS能提供的几点保证:

No memory allocations occur at runtime with the exception of two fixed-size allocations, both of which should be done ahead of time for performance-critical applications:

除了2个大小确定的内存分配外,在扫描过程中,不会分配任何内存。如果对性能要求严格的应用程序,这两个大小确定的内存也应该提前分配。

Scratch space: temporary memory used for internal data at scan time. Structures in scratch space do not persist beyond the end of a single scan call.

暂存空间:作为一个临时内存,用来存放扫描过程的中间数据。存放在暂存空间中的数据结构扫描调用结束之后会收回。

Stream state: in streaming mode only, some state space is required to store data that persists between scan calls for each stream. This allows Hyperscan to track matches that span multiple blocks of data.

流状态:仅在流模式下,需要一些状态空间来存储扫描每个流过程中需要的持久化数据。这样HS跟踪跨越多个数据块的匹配。

The sizes of the scratch space and stream state (in streaming mode) required for a given database are fixed and determined at database compile time. This means that the memory requirements of the application are known ahead of time, and these structures can be pre-allocated if required for performance reasons.

暂存空间的大小以及流状态数据是在编译期决定的。这也意味着需要的内存空间可以提前分配,尤其是对性能要求比较搞的应用程序。

Any pattern that has successfully been compiled by the Hyperscan compiler can be scanned against any input. There are no internal resource limits or other limitations at runtime that could cause a scan call to return an error.

特征一旦编译成功,HS引擎可以用其扫描任何数据的数据流, 运行的时候,不会因为被扫描的数据问题导致错误发生。 

©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容