This article was last updated 3 year(s) 3 month(s) ago, and the content may have evolved or changed since then.
In the previous post, I introduced how to use Elixir to write a rate limit tool. After that, I was planning to integrate it with the DIEM-API. In this way, my API System doesn’t need to depend on Redis. However, DIEM-API is written in Golang, and I have not decided to re-write it in Elixir in the short term. Besides, I wrote a search apifor my blog using Rust recently. Therefore, I need a cross-platform solution to address communication issues between Golang, Elixir and Rust.
Why not use other tools
Currently, there have been some ways to communicate between different processes (e.g. Pipe, Signal, Message Queue). But in fact, they have many limitations. Pipe and Signal require communication processes to be on the same machine. Message Queue and GRPC may be better solutions, but introducing new dependencies will complicate the system further.
TCP is another potential solution, but it is only able to transfer binary data. In other words, there is no simple way to decode data into typed parameters after receiving it from sender. REdis Serialization Protocol (RESP) is a request/response protocol over TCP, with various types of response (e.g. status reply, error reply, integer reply, bulk reply) but only one parameter type (string). However, function calls have only one response type but various types of parameters, which is the opposite to RESP. Therefore, RESP can not be used for inter-process calls.
Another way I tried is using separators between multiple parameters, which also introduces two new problems:
- Performance: splitting and casting string has penalty on performance; (benchmarks)
- Accuracy: if the parameter contains characters that are the same as the separator, it is not able to distinguish them.
Although these ideas don’t work, they gave me some inspiration. I decided to transfer the length of parameters and the typed parameter together.
This is Type-Length-Value (hereinafter referred to as TLV).
Introduce TLV
TLV is an encoding scheme. It must be based on one communication protocol, like: TCP, UDP or Unix Domain Socket.
A TLV element should consist of three parts (Figure 1):
- Type. It represents raw value’s type before encoding. Usually, we define an enumeration for these types. We use ones-digits to represent basic types, such as
0x1
for String,0x2
for Integer,0x3
for Float, etc.. We use tens-digits to represent compound types, such as0x1[any digit]
for List,0x2[any digit]
for Hash. The enumeration can also contain types which are unique to programming language, such as Atom type in Elixir. - Length. It represents raw value’s byte length after encoding. However, it is useless for Integer and Float, because these types have fixed byte length (usually 8 bytes) after encoding.
- Value. It represents the raw value’s bytes after encoding. It can also nest in another TLV element when Type is a List or Hash (Figure 2). We can decode this part recursively.
Implementation Details
In this section, I will introduce how I implemented the encoding and decoding of TLV elements.
Type
This part should occupy 1 byte / 8 bits. Lower 4 bits represent the basic type, which can cover 16 basic types. Higher 4 bits represent compound type, which can also cover 16 compound types:
- If raw value is a basic type, then higher 4 bits should all be 0.
- If raw value is a compound type, we should only use higher 4 bits, regardless of lower 4 bits.
Length
This part should occupy 4 bytes / 32 bits. It can represent a value whose length up to 2 ** 32 bytes(about 4 GB).
For Integer and Float, higher 3 bytes are 0s, lowest byte is 8, which means Integer and Float can be represented by 8 bytes.
For String and compound type, we need to calculate the length of value first, and then convert the length to bytes.
Besides, we can use
binary.BigEndian.PutUint32
function for Golang, use<<int_value::integer-32>>
for Elixir, use.to_be_bytes()
for Rust.
Value
This part should occupy at least 8 bytes / 64 bit (except Bool).
For Integer, we can use the above method to convert it to bytes; For Float, we can use math.Float64bits to convert it to int64, then convert int64 to bytes. For Golang, Rust and Elixir, String is a byte array (char array), which can be converted to bytes directly.
If there are nested elements, they must be encoded first, which can be processed recursively (in Elixir) or iteratively (in Rust and Golang). Same as decoding. And we should consider the case where a List element contains different types, which is allowed for dynamically typed languages (Python, Elixir) and some weakly typed languages (Golang can use []interface{}), although using reflection will cause more overhead.
Appendix
Benchmarks (Golang)
I wrote some bench tests for TLV (encoding, decoding) and Separator + strings (mentioned in the previous section):
goos: darwin
goarch: amd64
pkg: DIEM-API/rpcserver
BenchmarkToString-8 1000000 298 ns/op 32 B/op 3 allocs/op
BenchmarkTLVEncode-8 1000000 130 ns/op 96 B/op 6 allocs/op
BenchmarkStringTo-8 1000000 115 ns/op 48 B/op 1 allocs/op
BenchmarkTLVDecode-8 1000000 14.9 ns/op 0 B/op 0 allocs/op
PASS
You can find the source test codes in this repo.
Implementation Code
I used Elixir, Golang and Rust to implement TLV Encode Scheme, you can check the following repositories to view the source code:
- DIEM-APIfor Golang implementation;
- search-apifor Rust implementation;
- Ralfor Elixir implementation.