Info set
The space of values defined by D3S is partitioned — into atomic and aggregate values; each of these subspaces is further partitioned into types.
The atomic types are int
, string
, symbol
and byte-block
; the aggregate types are list
, set
and map
.
Every D3S value is an element of precisely one of these types.
The wire format defined in Wire format imposes restrictions on the space of values beyond their abstract mathematical constraints. In particular, due to limitations in the wire-format representation, the spaces of integers, strings, symbols, byte-blocks and sets are finite: very large integers and strings have no representation. However, implementations may place lower limits on the space of values they can represent. The following sections specify lower bounds — i.e., 'minimum maxima' — on these limits.
These bounds represent the absolute minimum acceptable bounds; implementations are strongly encouraged to avoid imposing arbitrary limitations.
This is a specification of the D3S info set and encoding, and not of any particular protocol. Cooperating implementations may, and probably will, impose additional restrictions on the values that they are willing to accept; they may also relax restrictions stated here.
Types
The following sections specify the properties and values belonging to the seven types described above.
int: Integers
The type integer consists of an implementation-defined subset of the ring of integers.
Integers shall be compared solely on the basis of their numerical values.
D3S does not distinguish between fixnums
and bignums
: they are all simply integers.
These names come from the Lisp community. A fixnum
('fixed-size number') is an integer which fits into a machine word, while a bignum
may require an unbounded amount of storage. Even so, the concepts apply to many implementations and languages; for example, Python 2 has two integer types: int
is a simple fixnum
type, while long
is a bignum
type — Python steadily eroded the distinctions between these types, and Python 3 no longer acknowledges any difference, having a single int
type for both.
An integer value a
shall compare equal (respectively, not equal, less than, etc.) to an integer value b
if and only if the mathematical value of a
is equal (respectively, not equal, less than, etc.) to that of b
, regardless of the way in which a
and b
happen to be represented within the implementation.
A D3S implementation shall be able to represent every integer with absolute value less than 232768
, if available memory permits.
string: Text strings
A string
is a (finite, possibly empty) sequence of Unicode [Con07] characters; an implementation may impose limitations on the space of strings.
Examples of such limitations are maximum length, or permitted characters.
A D3S implementation shall not attempt to canonicalize strings, for example, to prefer or avoid combining characters or a particular letter case.
Strings containing Unicode surrogate pairs are erroneous. An implementation should fail to construct such strings; a string containing a surrogate pair shall not compare equal to a string not containing a surrogate pair. Beyond this, the semantics of such erroneous strings are undefined.
It is permitted for a string to contain a character with code-point zero — a 'null character'. Such a character shall not be considered a string terminator.
An implementation shall be able to represent all strings containing fewer than 65536
characters drawn from U+0009
, U+000A
, U+000D
and U+0020
up to U+007E
.
The above characters are the ASCII tab, linefeed, carriage return, space and the printable ASCII characters.
symbol: Symbolic names
A symbol
is an object with a textual name.
The name is a string (see string: Text strings); an implementation may impose stricter limitations on the names of symbols than it imposes on strings.
Note that the types symbol and string are still disjoint: a symbol is not a string, though its name is a string.
Two symbols shall compare equal (respectively, not equal, less than, etc.) if and only if their names compare equal (respectively, not equal, less than, etc.).
An implementation shall be able to represent all symbols whose names consist of fewer than 256 characters drawn from U+0030
up to U+0039
, U+0041
up to U+005A
, U+005F
, and U+0061
up to U+007A
.
The above required characters are the ASCII digits, letters and underscore.
Symbols and strings are of different type. This makes it possible to optimize symbol comparison and symbol lookup.
byte-block: Octet strings
A byte-block
is a (finite, possibly empty) sequence of octets, i.e., nonnegative integers with absolute value less than 255.
An implementation shall be able to represent all byte blocks whose length is less than 65536
octets, subject to available memory.
Byte blocks are distinguished from strings. Under certain circumstances it may be necessary to recode text strings, such as for presentation to a user, or for communication with an external system. It is incorrect to recode byte-blocks.
list: Ordered sequences of values
A list
is an ordered (finite, possibly empty) sequence of values, called the elements of the list.
The elements of a list need not be distinct.
Two lists shall compare equal if and only if they have the same number of elements, and their respective elements are equal. The relative ordering of lists is not specified.
The number of elements in a list is called the length of the list.
Each individual element of a list is assigned an integer index based on its position in the list: if the length of a list is n
, then the indices shall be the integers 0
,1
,…
,n-1
in turn.
An implementation shall be able to represent all lists whose length is less than 256, subject to available memory.
set: Unordered collections of distinct atomic values
A set
is an unordered (possibly empty)
collection of distinct atomic values, called the elements of the set.
A set shall not contain an aggregate value; an implementation shall fail to construct a set which is invalid in this regard.
Two sets shall compare equal if and only if every element of one is also an element of the other. The relative ordering of sets is not specified.
An implementation shall be able to represent all sets containing fewer than 256 elements.
Sets are distinguished from lists precisely because sets are not ordered. This feature makes them useful for containing flags, i.e., values whose presence or absence is significant. The wire format defines a canonical representation for sets.
map: Associative arrays
A map
(also known as an associative array, dictionary, or partial function) is an unordered collection of distinct ordered pairs of values.
The individual pairs are called associations; the two elements of an association are the key
and the value
respectively.
The key of an association shall be an atomic value; an implementation shall fail to construct a map containing an association whose key is an aggregate value.
There is no restriction on the type of an association’s value.
A map shall not contain two associations whose keys compare equal.
Two maps shall compare equal if and only if, for every association in one, there is an association in the other for which the keys and values compare equal. The relative order of maps is not specified.
An implementation shall be able to represent all maps containing fewer than 256 associations.
Other restrictions
Circularity
The subvalues of a value are defined as follows.
-
An atomic value has no subvalues.
-
The subvalues of a list or set are the elements of the list or set, together with the subvalues of those elements.
-
The subvalues of a map are the keys and values of the map’s associations, together with the subvalues of those keys and values.
A value is circular if it is equal to one of its subvalues. A value is infinite if it has infinitely many distinct subvalues.
An implementation shall not construct an infinite or circular value.