SCL:V1 Specification
This document defines the only valid grammar for SCL:V1. All engines and SDKs MUST match this exactly: bytes in → AST out → canonical JSON out → hash out.
No changes will ever be made to SCL:V1 after launch. Future revisions belong to SCL:V2.
0. Normative Definitions #
0.1 Byte Model #
SCL:V1 documents are sequences of bytes.
0.2 Encoding (UTF-8 Only) #
- The entire document MUST be valid UTF-8 (RFC 3629).
- Engines MUST NOT apply Unicode normalization (no NFC/NFD/NFKC/NFKD). Bytes are canonical.
- Any invalid UTF-8 sequence MUST raise
E001. - For
E001invalid UTF-8,byte_offsetMUST be the first byte of the invalid UTF-8 sequence.
0.3 Line Endings (LF Only) #
- The only permitted line ending is LF (
0x0A). - CR (
0x0D) is forbidden anywhere in the document (including inside quoted strings and raw content) and MUST raiseE001at that byte offset.
0.4 Forbidden Bytes (Global) #
- Tabs (
0x09) are forbidden anywhere in the document (including inside quoted strings and raw content) and MUST raiseE001at that byte offset.
0.5 Forbidden Control Characters Inside Quoted Strings #
Inside any quoted string (handle tags or SCL quoted content):
- Any Unicode code point U+0000 through U+001F OR U+007F is forbidden.
- A literal LF inside a quoted string is forbidden (quoted strings never span physical lines).
- Violation MUST raise
E001.
0.6 Deterministic Error Rule ("First Failure") #
- Parsing proceeds left-to-right, top-to-bottom.
- The engine MUST report the first error by byte offset (the lowest offset where parsing cannot continue).
byte_offsetis 0-based, counting from the first byte of the document.- If multiple rules fail at the same byte offset, precedence is:
E001(encoding/control/CR/tab)- Header/block structural errors (
E101/E102/E103/E104/E105) - Handle/tag errors (
E201/E202) E900(only if no other code applies)
(code, byte_offset) internally even if the public API only returns code.0.7 Token Bytes and Comparisons #
Unless explicitly stated otherwise:
- All fixed keywords and punctuation are ASCII bytes and MUST match exactly.
- "No trailing spaces" means no
0x20bytes before LF/EOF where disallowed.
1. Document Structure #
A valid SCL:V1 document MUST be exactly:
- Header line:
SCL:V1 - One blank line (exactly
\n\n) - Handles block
- Immediately followed by SCL block with no blank line between
- End of document exactly at the final
}of the SCL block (no extra bytes)
No other blank lines are permitted anywhere else in the document.
A "blank line" is a physical line consisting of exactly LF with no preceding bytes.
Top-level order is fixed: Header → Handles → SCL.
Any violation MUST raise a deterministic error per §0.6 and the error codes table.
2. Header #
Valid header line is exactly:
SCL:V1
Rules:
- The only valid version header for SCL:V1 documents is exactly
SCL:V1. - Engines MUST reject any other version identifier with
E101. - Future versions (e.g. SCL:V2) require separate explicit support and are not valid input to V1 engines.
- First byte of file MUST be
S(0x53). - No UTF-8 BOM is allowed. If the file begins with
0xEF 0xBB 0xBF, raiseE101at byte offset 0. - No leading or trailing spaces.
- Followed by exactly one blank line (
\n\n) and no additional blank lines.
Invalid header → E101 (byte offset is the first byte where the header cannot match, per §0.6).
3. Handles Block #
3.1 Exact Block Delimiters
The handles block MUST be:
- Opening line exactly:
handles {followed immediately by LF (\n). No trailing spaces, nothing after{. - Closing line exactly:
}followed by LF (\n).
Missing handles block → E102
EOF before handles closes → E103
3.2 Handles Block Contents
- At least one handle definition is required. An empty handles block (
handles {\n}\n) MUST raiseE102at the byte offset of the}that closes the block. - No nested blocks.
- Inside
handles { ... }, every physical line (each ending in LF) MUST be either: a handle definition line (§3.3), OR the closing delimiter line}\n(§3.1). - Indentation, if present on handle definition lines, MUST be spaces only (
0x20). - Whitespace-only lines are forbidden.
- Any other non-empty line inside the handles block that is not a valid handle definition line and not
}\nMUST raiseE102.
3.3 Handle Definition Line Grammar #
A handle definition is one physical line ending with LF:
<indent><handle_id>(<tag_list>)\n
<handle_id>
Regex (ASCII only):
^[A-Za-z_][A-Za-z0-9_]*$
Invalid → E201
Exact Punctuation / Spacing Rules
- No spaces between
<handle_id>and(. - No spaces after
(or before). - No spaces around commas in tag lists.
- No trailing spaces at end of line.
Deterministic Handle-line Parsing (MANDATORY)
Engines MUST parse each non-empty handles-block line using this fixed procedure:
- Parse
<indent>as zero or more bytes0x20. - Parse
<handle_id>bytes matching[A-Za-z_][A-Za-z0-9_]*. Otherwise raiseE201. - Next byte MUST be
(with no intervening spaces. If LF or space, raiseE201. - Parse
<tag_list>inside parentheses using §3.4. Any failure raisesE202(unlessE001applies). - After closing
), next byte MUST be LF. If any byte other than LF occurs, raiseE201.
3.4 Tag List Grammar #
Tags appear inside parentheses:
id("tag1","tag2")
Rules:
- Tag list MUST contain one or more tags.
- Empty tag list
()is forbidden →E202 - Tags MUST be double-quoted with
"characters. - Commas MUST separate tags; no trailing comma.
- No spaces anywhere inside the parentheses.
- Each tag: does not span lines, contains no forbidden control characters (§0.5), is valid UTF-8.
Invalid tag syntax → E202. Invalid UTF-8/control/CR/tab → E001
4. SCL Block #
4.1 Exact Block Delimiters
- Opening line exactly:
scl {followed immediately by LF (\n). No trailing spaces. - Document MUST end immediately after the closing
}that terminates the SCL block with no newline and no trailing bytes.
Missing SCL block → E104
EOF before SCL block closes → E105
4.2 Content Modes #
Immediately after scl {\n, content MUST be in exactly one of two modes. Once selected by the first content line, all subsequent lines MUST conform to that mode.
Mode A — Quoted Mode
A quoted-mode body is one or more physical lines, each of the form:
<indent>"<bytes>"\n
<indent>is zero or more spaces (0x20).- Each line MUST contain exactly one opening
"and one closing"on the same physical line. - No bytes permitted after the closing
"(byte after"MUST be LF). - Bytes inside quotes are taken literally; no escape processing in V1.
- No forbidden control characters (§0.5).
Termination: ends when next line is exactly } followed by EOF (no trailing LF).
AST content: strip outer quotes, join inner contents with \n between lines.
Mode B — Raw Mode
Raw mode begins when the first content line after scl {\n does NOT begin (after optional spaces) with ".
- Raw content begins on the line after
scl {\n. - Ends when final bytes match: optional spaces, then
}, then EOF. - If candidate terminator matches
^[ ]*}[ ]+$, raiseE104at first trailing space. - Empty physical lines within raw content are allowed.
- Raw content must be valid UTF-8 →
E001.
AST content: concatenate all raw lines before terminator, joined with \n.
5. AST Canonical Form #
Every engine MUST emit exactly this AST object model:
{
"type":"Document",
"version":"SCL:V1",
"handles":[
{
"type":"Handle",
"id":"<id>",
"tags":["<tag1>","<tag2>"]
}
],
"scl":{
"type":"SclBlock",
"content":"<content-bytes-as-utf8-string>",
"refs":[],
"hints":[]
}
}
Invariants:
refsandhintsMUST exist and MUST be empty arrays in V1.handlesMUST preserve input order.tagsMUST preserve input order.contentMUST equal the exact mode-derived content (§4.2).- No extra fields. No nulls.
6. Canonical JSON & Document Hash #
The canonical JSON bytes are the sole hash input for SCL:V1.
Engines MUST canonicalize the parsed document (AST canonical form; §5) to canonical JSON bytes. The Document Hash is:
doc_hash = SHA256(canonical_json_bytes)
6.1 Canonical JSON Rules #
- Encoding: UTF-8, no BOM.
- Single-line: No whitespace bytes (
0x20,0x09,0x0A,0x0D) outside JSON strings. - Key uniqueness: Object keys MUST be unique.
- Key ordering: Sorted by binary lexicographic order of UTF-8 byte sequences. Shorter key sorts first if strict prefix.
- Arrays: Preserve order.
- String escaping (minimal): Only
"and\MUST be escaped. Control characters U+0000–U+001F MUST use\u00XX. No other escapes permitted. - No trailing commas.
- Schema: Only fields defined in §5. No numbers, booleans, or null.
6.2 Document Hash Invariance #
The hash input is the canonical JSON byte sequence exactly as emitted (no Unicode normalization, newline translation, or whitespace modification).
Any two SCL inputs that parse to the same AST MUST produce identical canonical JSON bytes and therefore identical doc_hash.
7. Error Codes #
| Code | Meaning |
|---|---|
E001 | Invalid UTF-8, CR, tab, or forbidden control character |
E101 | Missing or invalid SCL:V1 header |
E102 | Handles block missing, empty, or contains an invalid line |
E103 | Unclosed handles block (EOF before closing }) |
E104 | Missing SCL block, or invalid token where SCL content/terminator expected |
E105 | Unclosed SCL block (EOF before valid SCL termination) |
E201 | Invalid handle ID |
E202 | Invalid tag list or tag token |
E900 | Internal parser error |
8. Forbidden Syntax #
Everything not explicitly allowed by this spec is forbidden, including:
- Comments of any form
- Additional top-level blocks
- Nested blocks
- Alternative keywords/casing
- Extra blank lines (outside raw-mode content as described in §4.2)
- Trailing bytes after the final
}
9. Change Control #
SCL:V1 is frozen after launch. Any change requires SCL:V2.