core.text — UTF-8 text, Char, formatting, regex
import StdlibStatus from '@site/src/components/StdlibStatus';
<StdlibStatus
status="partial"
detail="63/67 protocol-conformance tests pass on 2026-05-14. §C (Iterator.next dispatch) closed 2026-05-13. Remaining defects span Text.capacity tracking (task #5), KMP find, function-id collision, Char.encode_utf8 receiver-kind, Text.truncate NullPointer."
defects={[
{area: 'text', summary: 'Task #5: Text.capacity() lost cap field through small-string materialisation in with_capacity/try_with_capacity. §A rfind dispatch / §B Char.encode_utf8 receiver-kind / §D function-id collision / §E truncate NullPointer / §F KMP find byte-indexing — see audit §C closed.'},
{area: 'char', summary: '5 defect classes — &mut Char mutation, eq_ignore_ascii_case, from_digit hex, general_category misroute, AnyChar.matches (§E now closes via shared root with text/text §C)'},
{area: 'builder', summary: 'Int.BAnd / Int.BNeq dispatch broken — every push fails'},
{area: 'regex', summary: 'Verum/Rust intrinsic ABI bridge defects — find_all SetIdx NullPointer, Maybe
Status legend. See stdlib status badge system.
The Text receiver is the most-used type in the standard library after
Int, so any defect in Text propagates broadly. The conformance suite atcore-tests/text/pins every public API behaviour with@test, every algebraic law with@property/exhaustive sweeps, every cross-stdlib integration with@test, and every active defect with an@ignored regression test. When a defect closes, removing the@ignoreshould immediately turn the test green.
Text is Verum's string type: UTF-8, mutable in-place via &mut, with
SSO (small-string optimisation — up to 23 bytes stored inline,
no allocation). The flat layout {ptr: &unsafe Byte, len: Int, cap: Int}
is 24 bytes; cap == 0 indicates a static / immutable string literal.
| File | What's in it | Status |
|---|---|---|
text.vr | Text + 100+ method API surface | regression-only |
char.vr | Char + classification, conversion, CharPattern, GeneralCategory | partial |
format.vr | Formatter, FormatSpec, Alignment, Sign, DebugStruct/Tuple/List/Map, Write, print/println/eprint/eprintln, dbg, format_display, format_debug | partial |
regex.vr | Regex, RegexError, 7 intrinsics (is_match, find, find_all, replace, replace_all, split, captures) | regression-only |
tagged_literals.vr | validate_json / validate_sql / validate_uri runtime validators | regression-only |
case_fold.vr | fold_char_ascii / fold_byte_ascii / fold_text_ascii / compare_ascii_nocase / equal_ascii_nocase (SQLite NOCASE) | complete |
builder.vr | TextBuilder — incremental string construction | regression-only |
numeric/ | Decimal, BigInt, BigDecimal, Rational, Modular | partial |
Text implements: Clone, Drop, Eq, Ord, Hash, Default,
Length, Debug, Display, AsRef<[Byte]>, Add, Add<&Text>,
AddAssign<Text>, AddAssign<&Text>, From<&str>, From<Char>,
FromStr (and FromStr for every primitive type via Text), FromIterator<Char>,
Extend<Char>, Extend<Text>.
Text
Construction
let s: Text = "hello"; // static literal (cap = 0)
let s2 = Text.new(); // empty
let s3 = Text.with_capacity(64); // pre-allocate; len() == 0
let s4 = Text.try_with_capacity(1024); // fallible counterpart
let s5 = Text.from_utf8(bytes)?; // Result<Text, Utf8Error>
let s5a = Text.from_bytes(bytes)?; // alias for from_utf8 (compat alias)
let s6 = Text.from_utf8_lossy(bytes); // replaces invalid bytes with U+FFFD
let s7 = Text.from_utf16(units)?; // Result<Text, Utf16Error>
let s8 = Text.from_utf16_lossy(units); // U+FFFD on unpaired surrogates
let s9 = Text.from_char('A'); // single-char Text
let sa = Text.from_int(42); // "42"
let sb = Text.from_float(3.14); // "3.14"
let sc = Text.from_bool(true); // "true"
let sd = f"x={x}, y={y + 1}"; // format literal
Length & Capacity
s.len() -> Int // byte length
s.is_empty() -> Bool
s.char_count() -> Int // Unicode-scalar count (iterates)
s.capacity() -> Int // capacity (see "Capacity tracking" below)
s.as_str() -> &Text // borrow as &Text
s.as_bytes() -> &[Byte] // raw UTF-8 byte view
s.as_ptr() -> &unsafe Byte // raw pointer (unsafe APIs)
Capacity tracking
capacity() reports the byte budget the buffer can hold without
reallocating. The semantics differ by underlying representation:
| Representation | capacity() |
|---|---|
Static literal "..." (cap = 0, immutable) | len() |
| Small-string (NaN-boxed inline, ≤6 bytes) | len() |
Heap-allocated flat [hdr][len:u64][bytes…] | len() |
Builder layout {ptr, len, cap} (from with_capacity / try_with_capacity / reserve) | the cap field |
The first three are immutable views — pushing past len() requires
migrating to a builder layout, so the reported capacity equals the
current byte length. Only the builder layout carries a separate cap
field that can exceed len().
Tier-0 caveat (open — task #5): the Tier-0 interpreter materialises
Text.with_capacity / try_with_capacity results into a representation
that preserves the cap field, but earlier revisions of the runtime
collapsed them to a small-string and reported capacity == 0. Tests pin
the contract at core-tests/text/text/regression_test.vr:: regression_with_capacity_reports_capacity (+ siblings).
Indexing (byte- and char-based)
s.byte_at(i) -> Maybe<Byte> // raw byte
s.char_at(byte_idx) -> Maybe<Char> // char starting at byte_idx
s.nth_char(n) -> Maybe<Char> // n-th Unicode scalar
s.byte_index_of_char(n) -> Maybe<Int> // byte offset of n-th char
s.is_char_boundary(idx) -> Bool // safe byte-split point
Text is not character-indexed — s[i] would be ambiguous over UTF-8
and is deliberately not provided.
Iteration
s.chars() -> Chars // Iterator<Char>
s.bytes() -> ByteIter // Iterator<Byte>
s.char_indices() -> CharIndices // Iterator<(Int, Char)>
s.lines() -> Lines // Iterator<&Text> (split on '\n')
s.matches(pat) -> TextMatches
s.match_indices(pat) -> TextMatchIndices
s.to_chars() -> List<Char> // collect-to-list shortcut
All four iterator types implement Iterator, IntoIterator, and
FusedIterator.
Slicing
s.slice(start_byte, end_byte) -> Text // byte range
s.substring(start_char, end_char) -> Text // char range
s.split_at(mid: Int) -> (Text, Text) // byte split
Predicates
s.starts_with(prefix: &Text) -> Bool
s.ends_with(suffix: &Text) -> Bool
s.contains(needle: &Text) -> Bool
s.contains_any(chars: &List<Char>) -> Bool
s.is_empty() -> Bool
s.is_ascii() -> Bool
s.is_numeric() -> Bool // every char is_numeric
s.is_alphabetic() -> Bool // every char is_alphabetic
s.is_alphanumeric() -> Bool
s.is_whitespace() -> Bool // every char is_whitespace; "" → false
s.is_blank() -> Bool // empty or whitespace-only
s.is_uppercase() -> Bool
s.is_lowercase() -> Bool
Searching
s.find(needle: &Text) -> Maybe<Int> // first byte index (KMP)
s.rfind(needle: &Text) -> Maybe<Int> // last byte index
s.find_char(ch: Char) -> Maybe<Int> // first byte of char
s.index_of(needle: &Text) -> Maybe<Int> // alias for find
s.index_of_any(chars: &List<Char>) -> Maybe<Int>
s.count(needle: &Text) -> Int // non-overlapping count
s.count_matches(pattern: &Text) -> Int // alias of count
Splitting
s.split(sep: &Text) -> List<Text>
s.splitn(n: Int, sep: &Text) -> List<Text>
s.rsplit(sep: &Text) -> List<Text>
s.rsplitn(n: Int, sep: &Text) -> List<Text>
s.split_whitespace() -> List<Text>
s.split_ascii_whitespace() -> List<Text>
s.split_inclusive(sep: &Text) -> List<Text> // keep separator
s.split_once(sep: &Text) -> Maybe<(Text, Text)>
s.rsplit_once(sep: &Text) -> Maybe<(Text, Text)>
s.words() -> List<Text> // split_whitespace alias
s.lines() -> Lines
Trimming & Stripping
s.trim() -> Text
s.trim_start() -> Text
s.trim_end() -> Text
s.trim_matches(pattern: &Text) -> Text
s.trim_start_matches(pattern: &Text) -> Text
s.trim_end_matches(pattern: &Text) -> Text
s.strip_prefix(prefix: &Text) -> Maybe<Text> // Some(rest) on match, None otherwise
s.strip_suffix(suffix: &Text) -> Maybe<Text>
s.remove_prefix(prefix: &Text) -> Text // rest on match, self otherwise
s.remove_suffix(suffix: &Text) -> Text
Case Conversion
s.to_uppercase() -> Text // full Unicode
s.to_lowercase() -> Text
s.to_upper() -> Text // alias
s.to_lower() -> Text
s.to_ascii_uppercase() -> Text // faster; ASCII-only
s.to_ascii_lowercase() -> Text
s.capitalize() -> Text // first char uppercase
s.to_title_case() -> Text // every word uppercase
s.swapcase() -> Text // upper↔lower
s.eq_ignore_case(other: &Text) -> Bool
s.make_ascii_uppercase() // in-place, &mut self
s.make_ascii_lowercase()
Replacement
s.replace(pattern: &Text, replacement: &Text) -> Text // all matches
s.replacen(pattern: &Text, replacement: &Text, count: Int) -> Text
s.replace_range(start: Int, end: Int, replacement: &Text) // &mut self
s.remove_matches(pattern: &Text) // &mut self
Mutation (&mut self)
s.push(ch: Char)
s.push_byte(b: Byte) // assumes valid UTF-8 context
s.push_str(other: &Text)
s.insert(idx: Int, ch: Char)
s.insert_str(idx: Int, other: &Text)
s.truncate(new_len: Int)
s.clear()
s.pop() -> Maybe<Char>
s.remove(idx: Int) -> Char
s.retain(predicate: fn(Char) -> Bool)
s.reserve(additional: Int)
s.shrink_to_fit()
s.shrink_to(min_capacity: Int)
Padding & Centering
s.pad_left(width: Int, fill: Char) -> Text // right-align, fill on left
s.pad_right(width: Int, fill: Char) -> Text // left-align, fill on right
s.pad_start(width: Int, fill: Char) -> Text // alias for pad_left
s.pad_end(width: Int, fill: Char) -> Text // alias for pad_right
s.center(width: Int, fill: Char) -> Text
s.zfill(width: Int) -> Text // zero-pad numerically
s.expand_tabs(tab_size: Int) -> Text // '\t' → tab_size spaces
Concat / Join / Repeat / Reverse
s.concat(other: &Text) -> Text
Text.join(parts: &[Text], sep: &Text) -> Text
s.repeat(n: Int) -> Text
s.reverse() -> Text // by Unicode scalar
Parsing & Conversion
s.parse_int() -> Result<Int, ParseError>
s.parse_int_radix(radix: Int) -> Result<Int, ParseError>
s.parse_float() -> Result<Float, ParseError>
s.parse_bool() -> Result<Bool, ParseError>
s.try_to_int() -> Maybe<Int>
s.try_to_float() -> Maybe<Float>
s.to_int() -> Int // panics on invalid
s.to_float() -> Float
s.into_bytes() -> List<Byte> // consumes
s.encode_utf16() -> List<Int>
FromStr is implemented for every primitive: Int, Int8, Int16,
Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64,
UInt128, ISize, USize, Float, Float32, Float64, Bool,
Text itself.
Building (incremental)
let mut s = Text.with_capacity(128);
s.push_str(&"hello ");
s.push('w');
s.push_str(&"orld");
s += "!"; // via AddAssign
// Heavy concatenation: prefer TextBuilder (see below)
Pitfalls
- byte index ≠ char index —
s.find("…") -> Maybe<Int>returns a byte offset. Slicing by a char offset requiresbyte_index_of_charorchar_indices(). - Static literals have
cap == 0— mutating methods liketruncate/clearon alet mut s: Text = "hello"need the null-pointer guard documented incore-tests/text/text/audit.md §E. - Inserting / popping on a mutable Text invokes
&mut Charderef — seecore-tests/text/char/audit.md §Afor the active defect class.
Error types
public type Utf8Error is { valid_up_to: Int }; // implements Display, Debug, Eq
public type Utf16Error is { index: Int }; // implements Display, Debug, Eq
public type ParseError is { message: Text }; // implements Display, Debug, Eq
Char
Char is a Unicode scalar value (up to U+10FFFF, excluding
surrogates).
ASCII Classification
c.is_ascii() c.is_ascii_alphabetic() c.is_ascii_uppercase()
c.is_ascii_lowercase() c.is_ascii_digit() c.is_ascii_hexdigit()
c.is_ascii_alphanumeric() c.is_ascii_whitespace() c.is_ascii_control()
c.is_ascii_punctuation() c.is_ascii_graphic()
Unicode Classification
c.is_alphabetic() c.is_numeric() c.is_alphanumeric()
c.is_whitespace() c.is_control()
c.is_uppercase() c.is_lowercase()
c.is_digit(radix)
c.is_valid_unicode()
Conversion
c.to_ascii_uppercase() c.to_ascii_lowercase()
c.to_uppercase() c.to_lowercase() // full Unicode
c.make_ascii_uppercase() // &mut self, in-place
c.make_ascii_lowercase()
c.eq_ignore_ascii_case(other: &Char) -> Bool
c.to_digit(radix: Int) -> Maybe<Int>
Char.from_digit(digit: Int, radix: Int) -> Maybe<Char>
UTF-8 / UTF-16 Encoding
c.len_utf8() -> Int // 1, 2, 3, or 4
c.len_utf16() -> Int // 1 or 2
c.encode_utf8(buf: &mut [Byte]) -> Int // returns bytes written
c.encode_utf16(buf: &mut [Int]) -> Int // returns code units written
Unicode Categories
c.general_category() -> GeneralCategory
c.escape_debug() -> Text
c.escape_unicode() -> Text
c.escape_default() -> Text
GeneralCategory
29 variants matching Unicode UCD categories:
type GeneralCategory is
| Lu | Ll | Lt | Lm | Lo // Letter
| Mn | Mc | Me // Mark
| Nd | Nl | No // Number
| Pc | Pd | Ps | Pe | Pi | Pf | Po // Punctuation
| Sm | Sc | Sk | So // Symbol
| Zs | Zl | Zp // Separator
| Cc | Cf | Cs | Co | Cn; // Other
Group predicates: is_letter(), is_mark(), is_number(),
is_punctuation(), is_symbol(), is_separator(), is_other().
Implements Debug.
CharPattern — predicate trait
CharPattern is implemented by Char (matches itself), fn(Char) -> Bool
(arbitrary predicate), AnyChar (built from char_any_of(&[c1, c2, ...])),
and CharRange (built from char_range(start, end)).
type AnyChar is { chars: List<Char> };
type CharRange is { start: Char, end: Char };
public fn char_any_of(chars: &[Char]) -> AnyChar;
public fn any_of(chars: &[Char]) -> AnyChar; // alias
public fn char_range(start: Char, end: Char) -> CharRange;
pattern.matches(c: Char) -> Bool
// Used by Text.trim_matches, Text.split, etc.
s.trim_matches(char_any_of(&[' ', '\t', '\n']))
Format strings — f"..."
let msg = f"x = {x}, y = {y + 1}";
let deb = f"{value:?}"; // Debug format
let hex = f"{byte:02X}"; // 2-char uppercase hex
let pct = f"{ratio:.2%}"; // 2-decimal percent
let pad = f"{name:>20}"; // right-align width 20
let lz = f"{n:05}"; // zero-padded width 5
Format specifier grammar
[fill][align][sign][#][0][width][.precision][type]
| Field | Values |
|---|---|
fill | any char (default: space) |
align | < left, > right, ^ centre |
sign | + always, - only negative (default), space for leading space |
# | alternate form (0x for hex, 0b for binary, ...) |
0 | zero-pad numerics |
width | integer; or * to read from next argument |
precision | integer; controls float digits / string truncation |
type | ? Debug, x/X hex, o octal, b binary, e/E scientific, % percent |
Examples
f"{123:+}" // "+123"
f"{-5:+}" // "-5"
f"{0xFF:#x}" // "0xff"
f"{0xFF:#X}" // "0xFF"
f"{255:08b}" // "11111111"
f"{3.14159:.3}" // "3.142"
f"{0.85:.1%}" // "85.0%"
f"{42:>10}" // " 42"
f"{name:.5}" // truncate to 5 chars
Formatting & Write
Write protocol
Where formatting writes to.
public type Write is protocol {
fn write(&mut self, bytes: &[Byte]) -> Result<Int, WriteError>;
fn write_str(&mut self, s: &Text) -> Result<(), WriteError>; // default impl
fn write_char(&mut self, c: Char) -> Result<(), WriteError>; // default impl
fn flush(&mut self) -> Result<(), WriteError>;
};
WriteError & WriteErrorKind
public type WriteError is { kind: WriteErrorKind };
public type WriteErrorKind is Io | BufferFull | Encoding;
WriteError.io() -> WriteError
WriteError.buffer_full() -> WriteError
WriteErrorKind implements Eq, Display, Debug. WriteError
implements Eq, Display, Debug.
TextAlignment / Sign
public type TextAlignment is Left | Center | Right; // default Left
public type Sign is Minus | Plus | Space; // default Minus
FormatSpec
public type FormatSpec is {
fill: Char,
align: TextAlignment,
sign: Sign,
width: Maybe<Int>,
precision: Maybe<Int>,
alternate: Bool,
format_type: Maybe<Char>,
};
FormatSpec.default() // (' ', Left, Minus, None, None, false, None)
FormatSpec.new() // alias of default()
spec.with_width(w)
spec.with_precision(p)
spec.with_align(a)
spec.with_fill(c)
Formatter / TextFormatter
The receiver in every Display / Debug impl. Implements Write.
implement Display for Point {
fn fmt(&self, f: &mut Formatter) -> Result<(), FormatError> {
f.write_str(&f"({self.x}, {self.y})")
}
}
implement Debug for Point {
fn fmt_debug(&self, f: &mut Formatter) -> Result<(), FormatError> {
f.debug_struct("Point")
.field("x", &self.x)
.field("y", &self.y)
.finish()
}
}
Debug helpers
f.debug_struct("Name").field("x", &x).field("y", &y).finish()
f.debug_tuple("Name").field(&a).field(&b).finish()
f.debug_list().entries(iter).finish()
f.debug_map().entries(iter).finish()
Print functions (require [IO])
print(s: &Text)
println(s: &Text)
println_empty()
eprint(s: &Text)
eprintln(s: &Text)
format_debug<T: Debug>(value: &T) -> Text // pure
format_display<T: Display>(value: &T) -> Text // pure
dbg<T: Debug>(value: T) -> T // prints to stderr, passes through
FormatResult is Result<(), FormatError> — the canonical return type of
Display.fmt and Debug.fmt_debug.
Regex
public type Regex is { pattern: Text };
public type RegexError is { message: Text };
API
Regex.new(pattern: Text) -> Result<Regex, RegexError>
// today: always Ok (compile is deferred)
r.is_match(text: Text) -> Bool // any match
r.find(text: Text) -> Maybe<Text> // first match
r.find_all(text: Text) -> List<Text> // every match
r.replace(text: Text, repl: Text) -> Text // first match
r.replace_all(text: Text, repl: Text) -> Text // every match
r.split(text: Text) -> List<Text> // split on every match
r.captures(text: Text) -> Maybe<List<Text>> // ordered capture groups,
// index 0 = whole match,
// missing groups = ""
r.as_str() -> Text // recover raw pattern
All seven runtime ops are wired end-to-end through the VBC interpreter and the AOT MLIR lowering path:
| Surface | Intrinsic | Sub-opcode |
|---|---|---|
is_match | regex_is_match | TensorSubOpcode 0xE2 |
find_all | regex_find_all | TensorSubOpcode 0xE0 |
replace_all | regex_replace_all | TensorSubOpcode 0xE1 |
split | regex_split | TensorSubOpcode 0xE3 |
find | regex_find | TensorExtSubOpcode 0x0A |
replace | regex_replace | TensorExtSubOpcode 0x0B |
captures | regex_captures | TensorExtSubOpcode 0x0C |
The single-match / capture variants live in the ext-extended opcode space
because the bulk variants pre-empted the regex-dedicated 0xE0..=0xE3
slot before they landed.
Replacement syntax
replace / replace_all honour the regex crate's
replacement syntax:
$0— whole match$1,$2, … — numbered capture groups${name}— named groups (when the pattern uses(?<name>…))$$— literal$
Capture groups
let r = Regex.new("(\\d+)-(\\w+)").unwrap();
match r.captures("id-42-foo extra") {
Some(groups) => {
// groups[0] = "42-foo" (whole match)
// groups[1] = "42" (first group)
// groups[2] = "foo" (second group)
},
None => panic("no match"),
}
Non-participating groups appear as empty strings — re-checking group
membership against the pattern is the caller's responsibility for now.
A future Maybe<Text> per-group surface is on the v2 list.
Tagged literals
Compile-time and runtime validators for tagged string literals.
Recognised tags (each compiles to a validate_<tag>(literal_text)
runtime check; panic on false):
json#"{ \"x\": 1 }" // → JsonValue
sql#"""SELECT * FROM users WHERE id = ${id}""" // → SqlQuery
html#"""<div>${content}</div>""" // → Html (escape-by-default)
url#"https://example.com/search?q=${q}" // → Url (URL-encoded)
yaml#"..." toml#"..." xml#"..."
rx#"..." re#"..." // Regex
Runtime validators
public fn validate_json(s: Text) -> Bool // must start { } / [ ] / " "
public fn validate_sql(s: Text) -> Bool // must start with SELECT/INSERT/UPDATE/DELETE/CREATE/DROP/ALTER/WITH (case-insensitive)
public fn validate_uri(s: Text) -> Bool // must contain ://
These validators are structural shape checks, not full parsers —
they reject obviously-malformed input but do not guarantee
syntactic validity. For full validation, route the value through a
purpose-built parser (json.parse(&t), sql.parse(&t), ...).
Unknown tags become user-defined via @meta_macro — see
metaprogramming.
TextBuilder
Incremental text construction without repeated allocation. Backed by a
single growing Text buffer; amortised O(1) per byte.
let mut b = TextBuilder.new();
b.push(&"hello ");
b.push_char('w');
b.push(&"orld");
b.push_line(&"!"); // appends '\n'
let s = b.build(); // clones the buffer (builder reusable)
let s2 = b.into_text(); // consumes the builder
API
TextBuilder.new() -> TextBuilder
TextBuilder.with_capacity(cap: Int) -> TextBuilder
TextBuilder.default() -> TextBuilder
b.push(text: &Text) // &mut self
b.push_char(ch: Char)
b.push_line(text: &Text) // appends text + '\n'
b.clear()
b.len() -> Int
b.is_empty() -> Bool
b.build() -> Text // clone
b.into_text() -> Text // consume
b.clone() -> TextBuilder
Implements Debug, Display, Clone, Default, Length.
Case-folded comparison
ASCII-subset case folding — matches SQLite's documented NOCASE
collation semantics. Non-ASCII bytes pass through unchanged; for full
Unicode case folding (CaseFolding.txt, status = C+S+T) opt in to
the cog.verum.collation-icu package.
mount core.text.case_fold.{
fold_char_ascii, fold_byte_ascii, fold_text_ascii,
compare_ascii_nocase, equal_ascii_nocase,
};
// Folding
let lower: Char = fold_char_ascii('A'.into()); // 'a'
let lb: Byte = fold_byte_ascii(0x41 as Byte); // 0x61
let ls: Text = fold_text_ascii(&"Hello".into()); // "hello"
// Comparison — does not allocate; byte-wise case-insensitive
let cmp: Ordering = compare_ascii_nocase(&a, &b);
let eq: Bool = equal_ascii_nocase(&a, &b);
| Function | Behaviour |
|---|---|
fold_char_ascii(c) | A..Z → a..z; identity otherwise |
fold_byte_ascii(b) | Byte variant — caller pre-verified ASCII (b < 0x80) |
fold_text_ascii(s) | Whole-string ASCII lowercase; non-ASCII untouched |
compare_ascii_nocase(a, b) | Ordering — lexicographic over folded bytes |
equal_ascii_nocase(a, b) | Boolean equality, early-exit |
Used by core.database.sqlite.native.l2_record.collation.NOCASE,
which is the collation the native SQLite port exposes out-of-the-box.
Numeric text representations
Five sub-modules under core.text.numeric:
| Module | Type | Purpose |
|---|---|---|
decimal | Decimal { coefficient: Int, scale: Int } | Fixed-precision (scale 0..=18, i64 coefficient). PG NUMERIC codec. Financial workloads. |
bigint | BigInt { sign: Bool, digits: List<Int> } | Arbitrary-precision signed integer (base 10^9 chunks, little-endian). |
bigdecimal | BigDecimal { coefficient: BigInt, scale: Int } | Sibling of Decimal with BigInt coefficient. Scale up to 1024. |
rational | Rational { num: BigInt, den: BigInt } | Exact-rational arithmetic in canonical reduced form (Euclidean GCD). |
modular | (free fns) | gcd, lcm, ext_gcd, mod_pow, mod_inverse, mod_sqrt, is_probable_prime, crt, crt2 |
Decimal
mount core.text.numeric.decimal.{
Decimal, DecimalError, RoundingMode, MAX_SCALE,
HalfEven, HalfUp, HalfDown, Truncate,
parse_decimal,
};
Decimal.zero() // (0, 0)
Decimal.one() // (1, 0)
Decimal.from_int(n) // (n, 0)
Decimal.from_parts(coef, scale) -> Result<Decimal, DecimalError>
d.is_zero() d.is_negative() d.is_positive()
d.abs() d.neg()
d.add(other) -> Result<Decimal, DecimalError>
d.sub(other) -> Result<Decimal, DecimalError>
d.mul(other) -> Result<Decimal, DecimalError>
d.div(other, precision: Int, mode: RoundingMode) -> Result<Decimal, DecimalError>
parse_decimal(text: &Text) -> Result<Decimal, DecimalError>
MAX_SCALE = 18. RoundingMode is HalfEven | HalfUp | HalfDown | Truncate.
DecimalError is one of:
ParseEmpty | ParseInvalidChar { byte_offset, byte } | ParseInvalidShape { reason } | ScaleOutOfRange { scale } | Overflow { op } | DivByZero.
BigInt
mount core.text.numeric.bigint.{
BigInt, BigIntError, BIGINT_BASE, BIGINT_DIGITS_PER_CHUNK,
parse_bigint,
};
BigInt.zero() // (sign=false, digits=[])
BigInt.one() // (sign=false, digits=[1])
BigInt.from_int(n)
b.is_zero() b.is_negative() b.is_positive()
b.is_even() b.is_odd()
b.abs() b.neg()
b.add(other) b.sub(other) b.mul(other)
b.div_rem(other) -> Result<(BigInt, BigInt), BigIntError>
parse_bigint(text: &Text) -> Result<BigInt, BigIntError>
Constants: BIGINT_BASE = 10^9, BIGINT_DIGITS_PER_CHUNK = 9.
Multiplication is schoolbook O(N×M); division uses Knuth Algorithm D.
BigDecimal
Same surface as Decimal but with BigInt coefficient. MAX_SCALE_BIG = 1024.
Rational
Exact-rational arithmetic. Always canonical (gcd-reduced). API includes
add / sub / mul / div / inv plus parse_rational.
modular
Number-theoretic free functions over BigInt:
gcd(a: &BigInt, b: &BigInt) -> BigInt
lcm(a: &BigInt, b: &BigInt) -> BigInt
ext_gcd(a: &BigInt, b: &BigInt) -> (BigInt, BigInt, BigInt) // (g, x, y) with a*x + b*y = g
mod_pow(base: &BigInt, exp: &BigInt, modulus: &BigInt) -> Result<BigInt, ModularError>
mod_inverse(a: &BigInt, m: &BigInt) -> Result<BigInt, ModularError>
mod_sqrt(a: &BigInt, p: &BigInt) -> Result<BigInt, ModularError>
is_probable_prime(n: &BigInt, witnesses: &List<BigInt>) -> Bool
crt(residues: &List<BigInt>, moduli: &List<BigInt>) -> Result<BigInt, ModularError>
crt2(r1: &BigInt, m1: &BigInt, r2: &BigInt, m2: &BigInt) -> Result<BigInt, ModularError>
Implementation notes
- SSO: 23 bytes stored inline. Transitions to heap silently; no user-visible type change.
- UTF-8 invariants:
Textnever contains invalid UTF-8. APIs that could introduce invalidity (byte manipulation) areunsafe. &TextvsText:&Textis the borrowed form (cheap to pass);Textis owned. Protocols likeAsRef<Text>let most APIs take either.- Concatenation cost:
s + &tallocates a newText. For heavy building, useTextBuilder.
Example — word-frequency counter
fn word_freq(text: &Text) -> Map<Text, Int> {
let mut freq = Map.new();
for w in text.split_whitespace() {
let k = w.to_ascii_lowercase();
*freq.entry(k).or_insert(0) += 1;
}
freq
}
Conformance & defects
The core-tests/text/ suite pins the contract; each submodule has its
own audit.md cataloguing open defects + drift surfaces.
| Submodule | Tests pass | Audit |
|---|---|---|
text/text | 121 / 218 (55%) | audit.md |
text/char | 75 / 86 (87%) | audit.md |
text/case_fold | 25 / 30 (83%) | audit.md |
text/builder | 4 / 23 (17%) | audit.md |
text/format | 39 / 41 (95%) | audit.md |
text/regex | 8 / 31 (26%) | audit.md |
text/tagged_literals | 1 / 29 (3.4%) | audit.md |
text/numeric/decimal | 27 / 45 (60%) | audit.md |
text/numeric/{bigint, bigdecimal, rational, modular} | partial | subtree audit |
The single highest-leverage closure across this entire surface is the Iterator.next dispatch defect (text/text §C) — a primitive iterator panic that owns ~30 downstream test failures across text, builder, char, and numeric. Closing it is the unblock-everything change.
See also
base—Display,Debug,FromStr,ToStringprotocols.io—Read/Writefor streaming text I/O.collections—Map<Text, V>,Set<Text>.- Language → refinement types — e.g.
Email is Text { self.matches(rx#"^[^@]+@[^@]+$") }.