79

I think indentation is important in YAML.

I tested the following in irb:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
- 1
- 2
- 3
 => nil 

I expected something like this:

> puts({1=>[1,2,3]}.to_yaml)
--- 
1: 
  - 1
  - 2
  - 3
 => nil 

Why isn't there indentation for the array?

I found this at http://www.yaml.org/YAML_for_ruby.html#collections.

The dash in a sequence counts as indentation, so you can add a sequence inside of a mapping without needing spaces as indentation.

2
  • apparently it does not need indentation when mapping a scalar to a sequence.
    – akonsu
    Commented Jun 9, 2013 at 21:54
  • 7
    Both are valid. I agree with you that they should not be. Even The Official YAML Web Site has both... yaml.org
    – nroose
    Commented Mar 20, 2019 at 22:38

3 Answers 3

84

The short answer is that both are valid because they are unambiguous for the YAML parser. This fact was already pointed by the other answers, but allow me to add some gasoline to this discussion.

YAML uses indentation not only for aesthetics or readability, it has a crucial meaning when composing different data structures and nesting them:

# YAML:         # JSON equivalent:
---             # {
one:            #   "one": {
  two:          #     "two": null,
  three:        #     "three": null
                #   }
                # }
                
---             # {
one:            #   "one": {
  two:          #     "two": {
    three:      #       "three": null
                #     }
                #   }
                # }

As we can see, the simple addition of an indentation level before three changes its nesting level and removes the previous null value assignment we had for two.

This behavior is, however, not consistent when it comes to lists, as they tolerate the removal of a level of indentation that we would naturally expect to occur (as anticipated by the OP), in order to reflect the correct nesting level of the items. It will still work the same way:

# YAML:         # JSON equivalent:
---             #
one:            #
  two:          #
    - foo       # {            
    - bar       #   "one": {   
                #     "two": [ 
                #       "foo", 
                #       "bar"  
                #     ]        
---             #   }          
one:            # }            
  two:          #
  - foo         #
  - bar         #

The second form above is somewhat unexpected and breaks with the idea that the indentation level is connected to nesting level, as it is very clear that both two (an object) and the nested list are written with the same indentation, but are placed at different nesting levels.

What is even worse, it won't work all the times, but only when the list is placed immediately under an object key. Nesting lists inside other lists won't allow freely dropping a level of indentation because, obviously, would bring the nested elements to the parent list:

# YAML:         # JSON equivalent:
---             # {
one:            #   "one": {
  two:          #     "two": [
    -           #       null,
    -           #       [
      -         #         null,
      -         #         null
                #       ]
                #     ]
                #   }
                # }
                #
---             # {           
one:            #   "one": {  
  two:          #     "two": [
    -           #       null, 
    -           #       null, 
    -           #       null, 
    -           #       null  
                #     ]       
                #   }         
                # }         

I know, I know... Don't even start and say that the example above is a bit extreme and could be considered an edge case. They are perfectly valid data structures and prove my point. More complicated situations also happen when mixing objects and nested lists of objects, specially if they have a single key. Not only it may lead to errors in the data structure declaration, but also becomes extremely hard to read.

The following YAML documents are identical:

# YAML:             # JSON equivalent
---                 # 
one:                # {
  two:              #   "one": {
  - three: foo      #     "two": [
  - bar             #       {"three": "foo"},
  - four:           #       "bar",
    - baz           #       {
    five:           #         "four": ["baz"],
    - fizz          #         "five": ["fizz", "buzz"],
    - buzz          #         "six": null
    six:            #       }
  seven:            #     ],
                    #     "seven": null
---                 #   }
one:                # }
  two:              #      
    - three: foo    # 
    - bar           #
    - four:         #
        - baz       #
      five:         #
        - fizz      #
        - buzz      #
      six:          #
  seven:            #   

I don't know about you, but I find the second one much easier to read and follow, specially in a very large document. It's very easy to get lost in the first one, specially when losing the visibility of the beginning of a given object declaration. There is simply no clear connection between the indentation level and the nesting level.

Keeping the indentation level consistently connected to the nesting level is very important to improve readability. Allowing the suppression of an indentation level for lists as optional sometimes is something you have to be very careful about.

6
  • 9
    yaml indentation rule is very counter intuitive
    – Alan
    Commented Jun 15, 2022 at 14:21
  • 1
    It would be really helpful if you would add the JSON equivalent for the last example in the same way that you did for all the other examples. But yes I agree YAML sucks.
    – Neutrino
    Commented Jul 14, 2022 at 8:16
  • 1
    Actually, I feel exactly the opposite. Take a look at the last example in your answer, especially after syntax highlighting is applied - the distance between "two" and "three" is double of that between "one" and "two", somehow making "three" feel like two levels down rather than one. It's like mixing 2-space indentation with 4-space. The idea is that the actual word should be indented (or aligned) rather than the dashes themselves. - is two characters wide, so are two spaces.
    – zypA13510
    Commented Jul 24 at 5:58
  • 1
    @zypA13510 "somehow making "three" feel like two levels down rather than one." That's because IT IS two levels down: three is an element of an object that is in two's list. Commented Jul 30 at 10:28
  • 1
    @LukeNelson No, it is not. It doesn't matter if "three" is an object. You can take the "bar" on the next line as an example; it is just one level below "two", yet there are four characters separating them.
    – zypA13510
    Commented Aug 4 at 7:13
36

Both ways are valid, as far as I can tell:

require 'yaml'

YAML.load(%q{--- 
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

YAML.load(%q{--- 
1:
  - 1
  - 2
  - 3
})
# => {1=>[1, 2, 3]}

It's not clear why you think there should be spaces before the hyphens. If you think this is a violation of the spec, please explain how.

Why isn't there indentation for the array?

There's no need for indentation before the hyphens, and it's simpler not to add any.

3
  • 68
    while there is no need for spaces I find it to be more readable Commented Apr 5, 2015 at 23:49
  • 3
    quite the opposite, when you have objects like kubernetes specs, the more indentations - the less readable it is, due to extra whitespace\scrolling\wrapping
    – 4c74356b41
    Commented Dec 4, 2020 at 13:06
  • 12
    I dare to disagree. The suppression of an expected level of indentation for lists makes the document much harder to read, mainly for big files such as k8s specs as you mentioned. Keeping the indentation in sync with nesting level is gold. Commented Apr 22, 2022 at 12:24
15

It's so you can do:

1: 
- 2: 3
  4: 5
- 6: 7
  8: 9
- 10
=> {1 => [{2 => 3, 4 => 5}, {6 => 7, 8 => 9}, 10]}

Basically, dashes delimit objects, and indentation denotes the "value" of the key-value pair.

That's the best I can do; I haven't managed to find any of the reasons behind this or that aspect of the syntax.

3
  • 14
    hm but you can do that anyway ... the equivalent (indenting all lines bar the top by 2 spaces) is the same result
    – Nick
    Commented Aug 14, 2018 at 10:57
  • Unfortunately, this is not compatible with the Perl 5.18 (the version I am bound to) built-in YAML parser. Without indentation, I get "YAML Error: Invalid element in map". I am not sure if newer versions of Perl have adapted to this apparently legal syntax. Commented Oct 31, 2019 at 17:20
  • Correction: If I 'use YAML::Syck;' in Perl, I am able to read Ruby's default flavor of YAML. The best thing about standards is that there are so many to choose from :). Commented Oct 31, 2019 at 17:47

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.