I have recently created a command line tool json2yaml
written in Go language. This command reads JSON file and outputs YAML to the standard output. There are lots of tools like this so far, but I want a converter capable to be embedded in different Go tools. I was also curious whether it is possible to convert from JSON to YAML in streaming fashion, without loading the entire JSON into the memory.
You can install using Homebrew,
brew install itchyny/tap/json2yaml
or build with Go.
go install github.com/itchyny/json2yaml/cmd/json2yaml@latest
You can convert a JSON file,
$ cat sample.json
{
"Sample": "JSON"
}
$ json2yaml sample.json
Sample: JSON
or standard input stream.
$ curl -sSf "https://httpbin.org/json" | json2yaml
slideshow:
author: Yours Truly
date: date of publication
slides:
- title: Wake up to WonderWidgets!
type: all
- items:
- Why <em>WonderWidgets</em> are great
- Who <em>buys</em> WonderWidgets
title: Overview
type: all
title: Sample Slide Show
As a library, it provides json2yaml.Convert function which accepts io.Writer
and io.Reader
. If you want to convert from the standard input to output, you can simply implement by json2yaml.Convert(os.Stdout, os.Stdin)
. The arguments order is same as io.Copy
and Transformer
. This package relies only on the standard packages so it does grow the dependency tree deeply.
Let’s get into the implementation of json2yaml. Since YAML is a superset of JSON, even if you just print out the input, it could be called a JSON to YAML conversion. But commonly YAML is known as the format you can omit string quotes and collection braces. Like this.
sample:
string: Hello, world!
sequence:
- 0
- null
- boolean: true
- string: This is a string!
nested:
mapping:
and: |-
#!/bin/sh
echo "This is"
echo "a multi-line"
echo "string!"
sequence:
- - - - Deeply nested sequence!
empty:
- mapping: {}
- sequence: []
How do you implement a formatter for human-friendly YAML output? Firstly, we need to manage the indentation level. Secondly, we need to manage the state of mapping and sequence using a stack. Also, empty mapping and sequence should be outputted differently.
We need to implement complex rules to judge whether we need to quote strings or write in the plain style. For example, strings with leading some symbols, !
, #
, &
, *
, [
, {
need to be quoted. x: 0
needs to be quoted to differentiate from mapping, but x:0
does not need quotes. In json2yaml, I have implemented using regular expressions to detect strings that need to be quoted.
Since JSON strings are YAML strings, we can use same escaping to emit double quoted strings in YAML. However, some YAML parsers do not allow Noncharacters even in double quoted strings. This is a parser bug, but json2yaml escapes Noncharacters as well as control codes.
json2yaml does not load the entire JSON on the memory, but it converts each JSON tokens into YAML tokens with indentation. Thus it avoids allocations of map[string]any
and []any
. This is not a practical example, but it can convert huge sequences.
$ (echo "["; yes "0," | head -n 100000000; echo "0]") | json2yaml
- 0
- 0
- 0
- 0
- 0
- 0
- 0
^C
YAML specification is difficult to understand due to the two different styles; flow style and block style. Also, YAML 1.1 was less strict than 2.0, and there are still YAML parsers for 1.1 (you may know the notorious behavior that y
considered as boolean true
). I’m not sure why such a complex format is adopted widely, but it is actually used as a configuration format, especially in the container ecosystem nowadays.
I hope this command line tool json2yaml
written in Go helps your development. Thank you!