Type-Length-Value Encoding Scheme Practice

May 5, 2021
869 Words
#TLV#Encoding#Protocol#IPC

This article was last updated 3 year(s) 8 month(s) ago, and the content may have evolved or changed since then.

https://p6-tt-ipv6.byteimg.com/origin/pgc-image/54ccb21bca824259bcb865cf9bc9049c.webp

In the previous post, I introduced how to use Elixir to write a rate limit tool. After that, I was planning to integrate it with the DIEM-API. In this way, my API System doesn’t need to depend on Redis. However, DIEM-API is written in Golang, and I have not decided to re-write it in Elixir in the short term. Besides, I wrote a search apifor my blog using Rust recently. Therefore, I need a cross-platform solution to address communication issues between Golang, Elixir and Rust.

Why not use other tools

Currently, there have been some ways to communicate between different processes (e.g. Pipe, Signal, Message Queue). But in fact, they have many limitations. Pipe and Signal require communication processes to be on the same machine. Message Queue and GRPC may be better solutions, but introducing new dependencies will complicate the system further.

TCP is another potential solution, but it is only able to transfer binary data. In other words, there is no simple way to decode data into typed parameters after receiving it from sender. REdis Serialization Protocol (RESP) is a request/response protocol over TCP, with various types of response (e.g. status reply, error reply, integer reply, bulk reply) but only one parameter type (string). However, function calls have only one response type but various types of parameters, which is the opposite to RESP. Therefore, RESP can not be used for inter-process calls.

Another way I tried is using separators between multiple parameters, which also introduces two new problems:

  1. Performance: splitting and casting string has penalty on performance; (benchmarks)
  2. Accuracy: if the parameter contains characters that are the same as the separator, it is not able to distinguish them.

Although these ideas don’t work, they gave me some inspiration. I decided to transfer the length of parameters and the typed parameter together.

This is Type-Length-Value (hereinafter referred to as TLV).

Introduce TLV

TLV is an encoding scheme. It must be based on one communication protocol, like: TCP, UDP or Unix Domain Socket.

TLV

A TLV element should consist of three parts (Figure 1):

  1. Type. It represents raw value’s type before encoding. Usually, we define an enumeration for these types. We use ones-digits to represent basic types, such as 0x1 for String, 0x2 for Integer, 0x3 for Float, etc.. We use tens-digits to represent compound types, such as 0x1[any digit] for List, 0x2[any digit] for Hash. The enumeration can also contain types which are unique to programming language, such as Atom type in Elixir.
  2. Length. It represents raw value’s byte length after encoding. However, it is useless for Integer and Float, because these types have fixed byte length (usually 8 bytes) after encoding.
  3. Value. It represents the raw value’s bytes after encoding. It can also nest in another TLV element when Type is a List or Hash (Figure 2). We can decode this part recursively.

Implementation Details

In this section, I will introduce how I implemented the encoding and decoding of TLV elements.

Type

This part should occupy 1 byte / 8 bits. Lower 4 bits represent the basic type, which can cover 16 basic types. Higher 4 bits represent compound type, which can also cover 16 compound types:

one

  1. If raw value is a basic type, then higher 4 bits should all be 0.
  2. If raw value is a compound type, we should only use higher 4 bits, regardless of lower 4 bits.

Length

This part should occupy 4 bytes / 32 bits. It can represent a value whose length up to 2 ** 32 bytes(about 4 GB).

For Integer and Float, higher 3 bytes are 0s, lowest byte is 8, which means Integer and Float can be represented by 8 bytes.

For String and compound type, we need to calculate the length of value first, and then convert the length to bytes.

Besides, we can use binary.BigEndian.PutUint32 function for Golang, use <<int_value::integer-32>> for Elixir, use .to_be_bytes() for Rust.

Value

This part should occupy at least 8 bytes / 64 bit (except Bool).

For Integer, we can use the above method to convert it to bytes; For Float, we can use math.Float64bits to convert it to int64, then convert int64 to bytes. For Golang, Rust and Elixir, String is a byte array (char array), which can be converted to bytes directly.

If there are nested elements, they must be encoded first, which can be processed recursively (in Elixir) or iteratively (in Rust and Golang). Same as decoding. And we should consider the case where a List element contains different types, which is allowed for dynamically typed languages (Python, Elixir) and some weakly typed languages (Golang can use []interface), although using reflection will cause more overhead.

Appendix

Benchmarks (Golang)

I wrote some bench tests for TLV (encoding, decoding) and Separator + strings (mentioned in the previous section):

goos: darwin
goarch: amd64
pkg: DIEM-API/rpcserver
BenchmarkToString-8      1000000               298 ns/op              32 B/op          3 allocs/op
BenchmarkTLVEncode-8     1000000               130 ns/op              96 B/op          6 allocs/op
BenchmarkStringTo-8      1000000               115 ns/op              48 B/op          1 allocs/op
BenchmarkTLVDecode-8     1000000              14.9 ns/op               0 B/op          0 allocs/op
PASS

You can find the source test codes in this repo.

Implementation Code

I used Elixir, Golang and Rust to implement TLV Encode Scheme, you can check the following repositories to view the source code:

Type-Length-Value Encoding Scheme Practice

https://blog.itswincer.com/posts/tlv-encoding-practice/

Author

Wincer

Updated

May 5, 2021
  1. Mar 4, 2024

    基于 OpenCL 生成 Solana 虚荣地址
  2. Nov 12, 2023

    Quantumult X 的链式代理配置
  3. Mar 14, 2023

    ChatGPT 与 TTS 之间奇妙的反应
  4. Jul 8, 2020

    基于 ETS 的漏斗限流
  5. Apr 22, 2020

    使用 OpenCore 引导黑苹果踩坑记录
  6. Mar 16, 2020

    Kubernetes 初探