The letter A styled as Alchemists logo. lchemists
Published January 1, 2023 Updated July 6, 2023
Cover
Ruby Data

The Data primitive — not to be confused with the DATA constant — was added to the language in Ruby 3.2.0 and is a minimal, immutable, and non-enumerable value-only class. The Data class is not a Struct but is Struct-like in nature with a limited Object API. There are several unique aspects to this new primitive so we’ll spend the rest of this article exploring more of those details. Buckle up!

Overview

For starters, I’d recommend reading my Struct article before proceeding because knowing how structs work will be helpful when comparing and contrasting with data objects since they are similar. That said, here’s how to construct, initialize, and interact with a data object:

Point = Data.define :x, :y
point = Point.new 1, 2

point.x          # 1
point.y          # 2
point.to_h       # {:x=>1, :y=>2}
point.to_s       # #<data Point x=1, y=2>
point.with y: 5  # #<data Point x=1, y=5>
point.members    # [:x, :y]
point.inspect    # #<data Point x=1, y=2>
point.frozen?    # true

As you can see, the Object API is quite small and definitely smaller than that of a Struct. Additionally, a data object doesn’t inherit from Enumerable so can’t it be iterated over or have attributes accessed via #[] like a Struct can. Despite the limited feature set, Data objects have the following advantages:

  • Great for concurrency when used with Ractors, Fibers, Threads, etc.

  • Great for pattern matching (more on this later).

  • Bypasses the baggage of the keyword_init: true flag as found when using a Struct which makes working with positional or keyword arguments more interchangeable.

There is one major disadvantage which has to do with overriding the #initialize method in that it only accepts keyword parameters. It’s an odd and surprising design choice but will be explained later in this article.

Construction

Construction of a data object is like a Struct but is, sadly, an inconsistent departure.

Define

You must use .define to define your data object. Example:

Point = Data.define :x, :y

This is definitely inconsistent from how you’d use .new to construct a Struct. I’ll admit that I like the use of .define more than .new and wish Struct was updated to use .define as well so we had more consistency between these two value objects.

Subclass

Much like a Struct, you can subclass:

class Inspectable < Data
  def inspect = to_h.inspect
end

Point = Inspectable.define :x, :y

As mentioned with subclassing a Struct, you’re much better off composing your objects rather than using complicated inheritance structures. I don’t recommend this approach.

Initialization

Initialization is similar to a Struct except arguments are strictly enforced.

New

As hinted at earlier, you can initialize a data instance via the .new class method using either positional or keyword arguments:

# Positional
point = Point.new 1, 2

# Keyword
point = Point.new x: 1, y: 2

All arguments are required, though. Otherwise, you’ll get an ArgumentError (you can avoid this stipulation by defining defaults which will be explained later):

point = Point.new 1     # missing keyword: :y (ArgumentError)
point = Point.new x: 1  # missing keyword: :y (ArgumentError)

Mixing and matching of arguments isn’t allowed either:

point = Point.new 1, y: 2
# wrong number of arguments (given 2, expected 0) (ArgumentError)

Anonymous

You can anonymously create a new data instance via single line construction and initialization. Example:

# Positional
point = Data.define(:x, :y).new 1, 2

# Keyword
point = Data.define(:x, :y).new x: 1, y: 2

The problem with anonymous data objects is that they are only useful within the scope they are defined as temporary and short lived objects. Worse, you must redefine them each time you want to use them. For anything more permanent, you’ll need to define a constant for improved reuse. That said, anonymous data objects can be handy for one-off situations like scripts, specs, or code spikes.

Brackets

There is a shorter way to initialize a data object and that’s via square brackets:

# Positional
point = Point[1, 2]

# Keyword
point = Point[x: 1, y: 2]

This is my favorite form of initialization and for two important reasons:

  1. Brackets require three less characters to type.

  2. Brackets signify, more clearly, you are working with a struct/data object versus a class which improves readability.

Defaults

You can provide defaults by defining an #initialize method. Example:

Point = Data.define :x, :y do
  def initialize x: 1, y: 2
    super
  end
end

point = Point.new

point.x  # 1
point.y  # 2

Continuing with the above example, this also means you now have the flexibility to use partial arguments when defaults are defined:

Point.new 0     # #<data Point x=0, y=2>
Point.new x: 0  # #<data Point x=0, y=2>

One stipulation — when defining defaults — is all keywords must be present. For instance, the following is syntactically correct but unusable and will throw an error:

Point = Data.define :x, :y do
  def initialize x: 1
    super
  end
end

Point.new 0, 1        # unknown keyword: :y (ArgumentError)
Point.new x: 0, y: 1  # unknown keyword: :y (ArgumentError)

That said, you can fix the above by supplying all keyword arguments with only the defaults you need. Here’s a modification to the above code which allows you to supply a default value for only one of the parameters:

Point = Data.define :x, :y do
  def initialize x: 1, y:
    super
  end
end

Point.new 0, 1        # #<data Point x=0, y=1>
Point.new x: 0, y: 1  # #<data Point x=0, y=1>

Again, the only difference between this code snippet and the earlier code snippet is that I make sure to define all keyword arguments even if I only provide defaults for a subset of them. To emphasize the significance of this further, consider the following invalid code:

Point = Data.define :x, :y do
  def initialize(**) = super
end

# missing keywords: :x, :y (ArgumentError)

The fact that you can only use keyword arguments for the #initialize method is another oddity and departure from what most people are used to when writing Ruby code.

Lastly, once incoming arguments are passed to super, further modification of attributes is impossible because they are immediately frozen and inaccessible. With a Struct you could use #[] to access member values but there is no such method for a Data object.

Values

Unlike a Struct, values must be messaged directly which means you can’t access them via #[] or assign new values. For example, this won’t work:

point[:x]                          # NoMethodError
point.x = 5                        # NoMethodError
point.each { |value| puts value }  # NoMethodError

All you can do is ask for a value (as shown earlier):

point.x  # 1
point.y  # 2

With

Unique to Data objects, you can make a shallow copy your instance (i.e. instance variable copies but not the objects referenced by them). This makes for a fast way to build altered versions of your data. Consider the following:

Point = Data.define :x, :y
point = Point[1, 2]

point.with x: 2, y: 3  # #<data Point x=2, y=3>
point.with x: 0        # #<data Point x=0, y=2>
point.with y: 0        # #<data Point x=1, y=0>
point.with bogus: "✓"  # unknown keyword: :bogus (ArgumentError)

Notice that you can quickly build a new version of your original data object with the same attributes. You can also mix and match all or some of your attributes. However, attempting to reference an attribute that doesn’t exist will result in an ArgumentError.

Equality

Structs and data objects share the same superpower in that they are both value objects by default. Even better, data objects take this one step further since all values are immutable by default. The following illustrates this more clearly:

a = Point[x: 1, y: 2]
b = Point[x: 1, y: 2]

a == b      # true
a === b     # true
a.eql? b    # true
a.equal? b  # false

Pattern Matching

Pattern matching is supported by default and is identical in behavior to Struct pattern matching. One difference worth pointing out is that you can define data without any arguments which isn’t possible with a struct. Example:

module Monads
  Just = Data.define :content
  None = Data.define
end

This is handy when pattern matching:

case Monads::Just[content: "demo"]
  in Monads::Just then "Something"
  in Monads::None then "Nothing"
end

# "Something"

case Monads::None.new
  in Monads::Just then "Something"
  in Monads::None then "Nothing"
end

# "Nothing"

Benchmarks

In terms of performance, data objects can be faster than structs but it depends on whether you are using positional or keyword arguments. Consider the following YJIT-enabled benchmark:

#!/usr/bin/env ruby
# frozen_string_literal: true

# Save as `benchmark`, then `chmod 755 benchmark`, and run as `./benchmark`.

require "bundler/inline"

gemfile true do
  source "https://rubygems.org"
  gem "benchmark-ips", require: "benchmark/ips"
end

MAX = 1_000_000

ExampleStruct = Struct.new :to, :from
ExampleData = Data.define :to, :from

Benchmark.ips do |benchmark|
  benchmark.config time: 5, warmup: 2

  benchmark.report "Data" do
    MAX.times { ExampleData[to: "Mork", from: "Mindy"] }
  end

  benchmark.report "Struct" do
    MAX.times { ExampleStruct[to: "Mork", from: "Mindy"] }
  end

  benchmark.compare!
end

If you save the above script to file and run locally, you’ll get output that looks roughly like this:

Warming up --------------------------------------
   Data (positional)   370.931k i/100ms
     Data (keywords)   381.070k i/100ms
 Struct (positional)   813.698k i/100ms
   Struct (keywords)   365.877k i/100ms
Calculating -------------------------------------
   Data (positional)      4.515M (± 3.1%) i/s -     22.627M in   5.015602s
     Data (keywords)      4.571M (± 3.4%) i/s -     22.864M in   5.007710s
 Struct (positional)     10.574M (± 1.9%) i/s -     52.890M in   5.003647s
   Struct (keywords)      4.446M (± 3.0%) i/s -     22.318M in   5.024381s

Comparison:
 Struct (positional): 10574098.3 i/s
     Data (keywords):  4570799.4 i/s - 2.31x  slower
   Data (positional):  4515415.4 i/s - 2.34x  slower
   Struct (keywords):  4445886.6 i/s - 2.38x  slower

💡 If you’d like more benchmarks, check out my Struct article and/or Benchmarks project for further details.

Avoidances

As emphasized in my Struct article, avoid anonymous inheritance of your data objects. In other words, don’t do the following:

class Point < Data.define(:x, :y)
end

Point.ancestors
# [Point, #<Class:0x000000010da8de88>, Data, Object, Kernel, BasicObject]

In a nutshell, anonymous superclasses (i.e. <Class:0x000000010da8de88>) are wasteful and inefficient, performance-wise, so definitely refer back to the original Struct article to learn more on why this is a bad practice.

Concerns

Before wrapping up, I want to highlight some concerns — briefly touched upon earlier — with additional material you might want to take a deeper look at:

Victor Shepelev — the feature author — has been closing down these issues but has also written about the design choices made which sheds light on bypassing the Struct legacy design baggage but at the great cost of introducing inconsistency between how Struct and Data initialization works.

Conclusion

Data objects are definitely minimalistic but powerful in that they are only meant to hold a collection of data values which can’t be mutated. I’ve wanted an immutable value object in Ruby for some time but I’m also worried about the design and precedence this new Data object has introduced into the language. That said, if you find yourself reaching for a struct, consider using a data object instead especially if you only need the immutable encapsulation of raw values.