313 lines
14 KiB
Markdown
313 lines
14 KiB
Markdown
prettyplease::unparse
|
||
=====================
|
||
|
||
[<img alt="github" src="https://img.shields.io/badge/github-dtolnay/prettyplease-8da0cb?style=for-the-badge&labelColor=555555&logo=github" height="20">](https://github.com/dtolnay/prettyplease)
|
||
[<img alt="crates.io" src="https://img.shields.io/crates/v/prettyplease.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/prettyplease)
|
||
[<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-prettyplease-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20">](https://docs.rs/prettyplease)
|
||
[<img alt="build status" src="https://img.shields.io/github/actions/workflow/status/dtolnay/prettyplease/ci.yml?branch=master&style=for-the-badge" height="20">](https://github.com/dtolnay/prettyplease/actions?query=branch%3Amaster)
|
||
|
||
A minimal `syn` syntax tree pretty-printer.
|
||
|
||
<br>
|
||
|
||
## Overview
|
||
|
||
This is a pretty-printer to turn a `syn` syntax tree into a `String` of
|
||
well-formatted source code. In contrast to rustfmt, this library is intended to
|
||
be suitable for arbitrary generated code.
|
||
|
||
Rustfmt prioritizes high-quality output that is impeccable enough that you'd be
|
||
comfortable spending your career staring at its output — but that means
|
||
some heavyweight algorithms, and it has a tendency to bail out on code that is
|
||
hard to format (for example [rustfmt#3697], and there are dozens more issues
|
||
like it). That's not necessarily a big deal for human-generated code because
|
||
when code gets highly nested, the human will naturally be inclined to refactor
|
||
into more easily formattable code. But for generated code, having the formatter
|
||
just give up leaves it totally unreadable.
|
||
|
||
[rustfmt#3697]: https://github.com/rust-lang/rustfmt/issues/3697
|
||
|
||
This library is designed using the simplest possible algorithm and data
|
||
structures that can deliver about 95% of the quality of rustfmt-formatted
|
||
output. In my experience testing real-world code, approximately 97-98% of output
|
||
lines come out identical between rustfmt's formatting and this crate's. The rest
|
||
have slightly different linebreak decisions, but still clearly follow the
|
||
dominant modern Rust style.
|
||
|
||
The tradeoffs made by this crate are a good fit for generated code that you will
|
||
*not* spend your career staring at. For example, the output of `bindgen`, or the
|
||
output of `cargo-expand`. In those cases it's more important that the whole
|
||
thing be formattable without the formatter giving up, than that it be flawless.
|
||
|
||
<br>
|
||
|
||
## Feature matrix
|
||
|
||
Here are a few superficial comparisons of this crate against the AST
|
||
pretty-printer built into rustc, and rustfmt. The sections below go into more
|
||
detail comparing the output of each of these libraries.
|
||
|
||
| | prettyplease | rustc | rustfmt |
|
||
|:---|:---:|:---:|:---:|
|
||
| non-pathological behavior on big or generated code | 💚 | ❌ | ❌ |
|
||
| idiomatic modern formatting ("locally indistinguishable from rustfmt") | 💚 | ❌ | 💚 |
|
||
| throughput | 60 MB/s | 39 MB/s | 2.8 MB/s |
|
||
| number of dependencies | 3 | 72 | 66 |
|
||
| compile time including dependencies | 2.4 sec | 23.1 sec | 29.8 sec |
|
||
| buildable using a stable Rust compiler | 💚 | ❌ | ❌ |
|
||
| published to crates.io | 💚 | ❌ | ❌ |
|
||
| extensively configurable output | ❌ | ❌ | 💚 |
|
||
| intended to accommodate hand-maintained source code | ❌ | ❌ | 💚 |
|
||
|
||
<br>
|
||
|
||
## Comparison to rustfmt
|
||
|
||
- [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs)
|
||
- [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs)
|
||
- [output.rustfmt.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustfmt.rs)
|
||
|
||
If you weren't told which output file is which, it would be practically
|
||
impossible to tell — **except** for line 435 in the rustfmt output, which
|
||
is more than 1000 characters long because rustfmt just gave up formatting that
|
||
part of the file:
|
||
|
||
```rust
|
||
match segments[5] {
|
||
0 => write!(f, "::{}", ipv4),
|
||
0xffff => write!(f, "::ffff:{}", ipv4),
|
||
_ => unreachable!(),
|
||
}
|
||
} else { # [derive (Copy , Clone , Default)] struct Span { start : usize , len : usize , } let zeroes = { let mut longest = Span :: default () ; let mut current = Span :: default () ; for (i , & segment) in segments . iter () . enumerate () { if segment == 0 { if current . len == 0 { current . start = i ; } current . len += 1 ; if current . len > longest . len { longest = current ; } } else { current = Span :: default () ; } } longest } ; # [doc = " Write a colon-separated part of the address"] # [inline] fn fmt_subslice (f : & mut fmt :: Formatter < '_ > , chunk : & [u16]) -> fmt :: Result { if let Some ((first , tail)) = chunk . split_first () { write ! (f , "{:x}" , first) ? ; for segment in tail { f . write_char (':') ? ; write ! (f , "{:x}" , segment) ? ; } } Ok (()) } if zeroes . len > 1 { fmt_subslice (f , & segments [.. zeroes . start]) ? ; f . write_str ("::") ? ; fmt_subslice (f , & segments [zeroes . start + zeroes . len ..]) } else { fmt_subslice (f , & segments) } }
|
||
} else {
|
||
const IPV6_BUF_LEN: usize = (4 * 8) + 7;
|
||
let mut buf = [0u8; IPV6_BUF_LEN];
|
||
let mut buf_slice = &mut buf[..];
|
||
```
|
||
|
||
This is a pretty typical manifestation of rustfmt bailing out in generated code
|
||
— a chunk of the input ends up on one line. The other manifestation is
|
||
that you're working on some code, running rustfmt on save like a conscientious
|
||
developer, but after a while notice it isn't doing anything. You introduce an
|
||
intentional formatting issue, like a stray indent or semicolon, and run rustfmt
|
||
to check your suspicion. Nope, it doesn't get cleaned up — rustfmt is just
|
||
not formatting the part of the file you are working on.
|
||
|
||
The prettyplease library is designed to have no pathological cases that force a
|
||
bail out; the entire input you give it will get formatted in some "good enough"
|
||
form.
|
||
|
||
Separately, rustfmt can be problematic to integrate into projects. It's written
|
||
using rustc's internal syntax tree, so it can't be built by a stable compiler.
|
||
Its releases are not regularly published to crates.io, so in Cargo builds you'd
|
||
need to depend on it as a git dependency, which precludes publishing your crate
|
||
to crates.io also. You can shell out to a `rustfmt` binary, but that'll be
|
||
whatever rustfmt version is installed on each developer's system (if any), which
|
||
can lead to spurious diffs in checked-in generated code formatted by different
|
||
versions. In contrast prettyplease is designed to be easy to pull in as a
|
||
library, and compiles fast.
|
||
|
||
<br>
|
||
|
||
## Comparison to rustc_ast_pretty
|
||
|
||
- [input.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/input.rs)
|
||
- [output.prettyplease.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.prettyplease.rs)
|
||
- [output.rustc.rs](https://github.com/dtolnay/prettyplease/blob/0.1.0/examples/output.rustc.rs)
|
||
|
||
This is the pretty-printer that gets used when rustc prints source code, such as
|
||
`rustc -Zunpretty=expanded`. It's used also by the standard library's
|
||
`stringify!` when stringifying an interpolated macro_rules AST fragment, like an
|
||
$:expr, and transitively by `dbg!` and many macros in the ecosystem.
|
||
|
||
Rustc's formatting is mostly okay, but does not hew closely to the dominant
|
||
contemporary style of Rust formatting. Some things wouldn't ever be written on
|
||
one line, like this `match` expression, and certainly not with a comma in front
|
||
of the closing brace:
|
||
|
||
```rust
|
||
fn eq(&self, other: &IpAddr) -> bool {
|
||
match other { IpAddr::V4(v4) => self == v4, IpAddr::V6(_) => false, }
|
||
}
|
||
```
|
||
|
||
Some places use non-multiple-of-4 indentation, which is definitely not the norm:
|
||
|
||
```rust
|
||
pub const fn to_ipv6_mapped(&self) -> Ipv6Addr {
|
||
let [a, b, c, d] = self.octets();
|
||
Ipv6Addr{inner:
|
||
c::in6_addr{s6_addr:
|
||
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xFF,
|
||
0xFF, a, b, c, d],},}
|
||
}
|
||
```
|
||
|
||
And although there isn't an egregious example of it in the link because the
|
||
input code is pretty tame, in general rustc_ast_pretty has pathological behavior
|
||
on generated code. It has a tendency to use excessive horizontal indentation and
|
||
rapidly run out of width:
|
||
|
||
```rust
|
||
::std::io::_print(::core::fmt::Arguments::new_v1(&[""],
|
||
&match (&msg,) {
|
||
_args =>
|
||
[::core::fmt::ArgumentV1::new(_args.0,
|
||
::core::fmt::Display::fmt)],
|
||
}));
|
||
```
|
||
|
||
The snippets above are clearly different from modern rustfmt style. In contrast,
|
||
prettyplease is designed to have output that is practically indistinguishable
|
||
from rustfmt-formatted code.
|
||
|
||
<br>
|
||
|
||
## Example
|
||
|
||
```rust
|
||
// [dependencies]
|
||
// prettyplease = "0.2"
|
||
// syn = { version = "2", default-features = false, features = ["full", "parsing"] }
|
||
|
||
const INPUT: &str = stringify! {
|
||
use crate::{
|
||
lazy::{Lazy, SyncLazy, SyncOnceCell}, panic,
|
||
sync::{ atomic::{AtomicUsize, Ordering::SeqCst},
|
||
mpsc::channel, Mutex, },
|
||
thread,
|
||
};
|
||
impl<T, U> Into<U> for T where U: From<T> {
|
||
fn into(self) -> U { U::from(self) }
|
||
}
|
||
};
|
||
|
||
fn main() {
|
||
let syntax_tree = syn::parse_file(INPUT).unwrap();
|
||
let formatted = prettyplease::unparse(&syntax_tree);
|
||
print!("{}", formatted);
|
||
}
|
||
```
|
||
|
||
<br>
|
||
|
||
## Algorithm notes
|
||
|
||
The approach and terminology used in the implementation are derived from [*Derek
|
||
C. Oppen, "Pretty Printing" (1979)*][paper], on which rustc_ast_pretty is also
|
||
based, and from rustc_ast_pretty's implementation written by Graydon Hoare in
|
||
2011 (and modernized over the years by dozens of volunteer maintainers).
|
||
|
||
[paper]: http://i.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf
|
||
|
||
The paper describes two language-agnostic interacting procedures `Scan()` and
|
||
`Print()`. Language-specific code decomposes an input data structure into a
|
||
stream of `string` and `break` tokens, and `begin` and `end` tokens for
|
||
grouping. Each `begin`–`end` range may be identified as either "consistent
|
||
breaking" or "inconsistent breaking". If a group is consistently breaking, then
|
||
if the whole contents do not fit on the line, *every* `break` token in the group
|
||
will receive a linebreak. This is appropriate, for example, for Rust struct
|
||
literals, or arguments of a function call. If a group is inconsistently
|
||
breaking, then the `string` tokens in the group are greedily placed on the line
|
||
until out of space, and linebroken only at those `break` tokens for which the
|
||
next string would not fit. For example, this is appropriate for the contents of
|
||
a braced `use` statement in Rust.
|
||
|
||
Scan's job is to efficiently accumulate sizing information about groups and
|
||
breaks. For every `begin` token we compute the distance to the matched `end`
|
||
token, and for every `break` we compute the distance to the next `break`. The
|
||
algorithm uses a ringbuffer to hold tokens whose size is not yet ascertained.
|
||
The maximum size of the ringbuffer is bounded by the target line length and does
|
||
not grow indefinitely, regardless of deep nesting in the input stream. That's
|
||
because once a group is sufficiently big, the precise size can no longer make a
|
||
difference to linebreak decisions and we can effectively treat it as "infinity".
|
||
|
||
Print's job is to use the sizing information to efficiently assign a "broken" or
|
||
"not broken" status to every `begin` token. At that point the output is easily
|
||
constructed by concatenating `string` tokens and breaking at `break` tokens
|
||
contained within a broken group.
|
||
|
||
Leveraging these primitives (i.e. cleverly placing the all-or-nothing consistent
|
||
breaks and greedy inconsistent breaks) to yield rustfmt-compatible formatting
|
||
for all of Rust's syntax tree nodes is a fun challenge.
|
||
|
||
Here is a visualization of some Rust tokens fed into the pretty printing
|
||
algorithm. Consistently breaking `begin`—`end` pairs are represented by
|
||
`«`⁠`»`, inconsistently breaking by `‹`⁠`›`, `break` by `·`, and the
|
||
rest of the non-whitespace are `string`.
|
||
|
||
```text
|
||
use crate::«{·
|
||
‹ lazy::«{·‹Lazy,· SyncLazy,· SyncOnceCell›·}»,·
|
||
panic,·
|
||
sync::«{·
|
||
‹ atomic::«{·‹AtomicUsize,· Ordering::SeqCst›·}»,·
|
||
mpsc::channel,· Mutex›,·
|
||
}»,·
|
||
thread›,·
|
||
}»;·
|
||
«‹«impl<«·T‹›,· U‹›·»>» Into<«·U·»>· for T›·
|
||
where·
|
||
U:‹ From<«·T·»>›,·
|
||
{·
|
||
« fn into(·«·self·») -> U {·
|
||
‹ U::from(«·self·»)›·
|
||
» }·
|
||
»}·
|
||
```
|
||
|
||
The algorithm described in the paper is not quite sufficient for producing
|
||
well-formatted Rust code that is locally indistinguishable from rustfmt's style.
|
||
The reason is that in the paper, the complete non-whitespace contents are
|
||
assumed to be independent of linebreak decisions, with Scan and Print being only
|
||
in control of the whitespace (spaces and line breaks). In Rust as idiomatically
|
||
formatted by rustfmt, that is not the case. Trailing commas are one example; the
|
||
punctuation is only known *after* the broken vs non-broken status of the
|
||
surrounding group is known:
|
||
|
||
```rust
|
||
let _ = Struct { x: 0, y: true };
|
||
|
||
let _ = Struct {
|
||
x: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,
|
||
y: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyy, //<- trailing comma if the expression wrapped
|
||
};
|
||
```
|
||
|
||
The formatting of `match` expressions is another case; we want small arms on the
|
||
same line as the pattern, and big arms wrapped in a brace. The presence of the
|
||
brace punctuation, comma, and semicolon are all dependent on whether the arm
|
||
fits on the line:
|
||
|
||
```rust
|
||
match total_nanos.checked_add(entry.nanos as u64) {
|
||
Some(n) => tmp = n, //<- small arm, inline with comma
|
||
None => {
|
||
total_secs = total_secs
|
||
.checked_add(total_nanos / NANOS_PER_SEC as u64)
|
||
.expect("overflow in iter::sum over durations");
|
||
} //<- big arm, needs brace added, and also semicolon^
|
||
}
|
||
```
|
||
|
||
The printing algorithm implementation in this crate accommodates all of these
|
||
situations with conditional punctuation tokens whose selection can be deferred
|
||
and populated after it's known that the group is or is not broken.
|
||
|
||
<br>
|
||
|
||
#### License
|
||
|
||
<sup>
|
||
Licensed under either of <a href="LICENSE-APACHE">Apache License, Version
|
||
2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option.
|
||
</sup>
|
||
|
||
<br>
|
||
|
||
<sub>
|
||
Unless you explicitly state otherwise, any contribution intentionally submitted
|
||
for inclusion in this crate by you, as defined in the Apache-2.0 license, shall
|
||
be dual licensed as above, without any additional terms or conditions.
|
||
</sub>
|