66 lines
2.4 KiB
Markdown
66 lines
2.4 KiB
Markdown
This directory contains tests for serialized objects from the regex-automata
|
|
crate. Currently, there are only two supported such objects: dense and sparse
|
|
DFAs.
|
|
|
|
The idea behind these tests is to commit some serialized objects and run some
|
|
basic tests by deserializing them and running searches and ensuring they are
|
|
correct. We also make sure these are run under Miri, since deserialization is
|
|
one of the biggest places where undefined behavior might occur in this crate
|
|
(at the time of writing).
|
|
|
|
The main thing we're testing is that the *current* code can still deserialize
|
|
*old* objects correctly. Generally speaking, compatibility extends to semver
|
|
compatible releases of this crate. Beyond that, no promises are made, although
|
|
in practice callers can at least depend on errors occurring. (The serialized
|
|
format always includes a version number, and incompatible changes increment
|
|
that version number such that an error will occur if an unsupported version is
|
|
detected.)
|
|
|
|
To generate the dense DFAs, I used this command:
|
|
|
|
```
|
|
$ regex-cli generate serialize dense regex \
|
|
MULTI_PATTERN_V2 \
|
|
tests/gen/dense/ \
|
|
--rustfmt \
|
|
--safe \
|
|
--starts-for-each-pattern \
|
|
--specialize-start-states \
|
|
--start-kind both \
|
|
--unicode-word-boundary \
|
|
--minimize \
|
|
'\b[a-zA-Z]+\b' \
|
|
'(?m)^\S+$' \
|
|
'(?Rm)^\S+$'
|
|
```
|
|
|
|
And to generate the sparse DFAs, I used this command, which is the same as
|
|
above, but with `s/dense/sparse/g`.
|
|
|
|
```
|
|
$ regex-cli generate serialize sparse regex \
|
|
MULTI_PATTERN_V2 \
|
|
tests/gen/sparse/ \
|
|
--rustfmt \
|
|
--safe \
|
|
--starts-for-each-pattern \
|
|
--specialize-start-states \
|
|
--start-kind both \
|
|
--unicode-word-boundary \
|
|
--minimize \
|
|
'\b[a-zA-Z]+\b' \
|
|
'(?m)^\S+$' \
|
|
'(?Rm)^\S+$'
|
|
```
|
|
|
|
The idea is to try to enable as many of the DFA's options as possible in order
|
|
to test that serialization works for all of them.
|
|
|
|
Arguably we should increase test coverage here, but this is a start. Note
|
|
that in particular, this does not need to test that serialization and
|
|
deserialization correctly round-trips on its own. Indeed, the normal regex test
|
|
suite has a test that does a serialization round trip for every test supported
|
|
by DFAs. So that has very good coverage. What we're interested in testing here
|
|
is our compatibility promise: do DFAs generated with an older revision of the
|
|
code still deserialize correctly?
|