TNetStrings stand for a "tagged netstrings" and are a modification of Dan Bernstein's netstrings specification to allow for the same data structures as JSON but in a format that meets these requirements:
The grammar for the protocol is simply:
SIZE = [0-9]{1,9}
COLON = ':'
DATA = (.*)
TYPE = ('#' | '}' | ']' | ',' | '!' | '~' | '^')
payload = (SIZE COLON DATA TYPE)+
Each of these elements is defined as:
Each TYPE is used to determine the contents and maps to:
TNetstrings are all or nothing. Either they parse cleanly and a value is returned, or it aborts and cleans up any in-process data returning nothing. As in the reference implementation below, it's normal to return the remainder of a given buffer for further processing, meaning all of a given buffer does not need to be parsed for a single parsing call to be successful.
You are not allowed to implement any of the following features:
These restrictions exist to make the protocol reliable for anyone who uses it and to act as a constraint on the design to keep it simple.
You should be able to work with this simple reference implementation written in Python 2.5 or greater (but not 3.x):
def dump(data):
if type(data) is long or type(data) is int:
out = str(data)
return '%d:%s#' % (len(out), out)
elif type(data) is float:
out = '%f' % data
return '%d:%s^' % (len(out), out)
elif type(data) is str:
return '%d:' % len(data) + data + ','
elif type(data) is dict:
return dump_dict(data)
elif type(data) is list:
return dump_list(data)
elif data == None:
return '0:~'
elif type(data) is bool:
out = repr(data).lower()
return '%d:%s!' % (len(out), out)
else:
assert False, "Can't serialize stuff that's %s." % type(data)
def parse(data):
payload, payload_type, remain = parse_payload(data)
if payload_type == '#':
value = int(payload)
elif payload_type == '}':
value = parse_dict(payload)
elif payload_type == ']':
value = parse_list(payload)
elif payload_type == '!':
value = payload == 'true'
elif payload_type == '^':
value = float(payload)
elif payload_type == '~':
assert len(payload) == 0, "Payload must be 0 length for null."
value = None
elif payload_type == ',':
value = payload
else:
assert False, "Invalid payload type: %r" % payload_type
return value, remain
def parse_payload(data):
assert data, "Invalid data to parse, it's empty."
length, extra = data.split(':', 1)
length = int(length)
payload, extra = extra[:length], extra[length:]
assert extra, "No payload type: %r, %r" % (payload, extra)
payload_type, remain = extra[0], extra[1:]
assert len(payload) == length, "Data is wrong length %d vs %d" % (length, len(payload))
return payload, payload_type, remain
def parse_list(data):
if len(data) == 0: return []
result = []
value, extra = parse(data)
result.append(value)
while extra:
value, extra = parse(extra)
result.append(value)
return result
def parse_pair(data):
key, extra = parse(data)
assert extra, "Unbalanced dictionary store."
value, extra = parse(extra)
return key, value, extra
def parse_dict(data):
if len(data) == 0: return {}
key, value, extra = parse_pair(data)
assert type(key) is str, "Keys can only be strings."
result = {key: value}
while extra:
key, value, extra = parse_pair(extra)
result[key] = value
return result
def dump_dict(data):
result = []
for k,v in data.items():
result.append(dump(str(k)))
result.append(dump(v))
payload = ''.join(result)
return '%d:' % len(payload) + payload + '}'
def dump_list(data):
result = []
for i in data:
result.append(dump(i))
payload = ''.join(result)
return '%d:' % len(payload) + payload + ']'
You can download tnetstrings.py directly to review it.
If your implementation does not work with the above Python implementation then it is wrong and is not tnetstrings. It's that simple.
Tnetstrings put the length at the beginning and the type at the end so that you have to read all of the data element and cannot "stream" it. This makes it much easier to handle, since nested data structures need to be loaded into RAM anyway to handle them. It's also unnecessary to allow for streaming, since sockets/files/etc are already streamable. If you need to send 1000 DVDs, don't try to encode them in 1 tnetstring payload, instead send them as a sequence of tnetstrings as payload chunks with checks and headers like most other protocols. In other words: If you think you need to dive into a tnetstring data type to "stream", then you need to remove one layer and flatten it instead.
Here's an example to make this concrete. Many protocols have a simple HEADER+BODY design where the HEADER is usually some kind of dict, and the body is a raw binary blob of data. Your first idea might be to create one tnetstring of [{HEADER}, "BODY"], but that'd only work if you expect to limit the request sizes. If the requests can be any size then you actually should do one of two things:
Finally, the general rule is senders should have to completely specify the size of what they send and receivers should be ready to reject it. If you allow arbitrary streaming then your servers will suffer attacks that eat your resources.
Mon Aug 22 09:25:06 PDT 2011 -- Fixed a bug in the dump if-statement. Sun Aug 21 00:09:19 PDT 2011 -- Added support for floats and how to do "streaming".