
The Data
primitive — not to be confused with the DATA
constant — was added to the language in Ruby 3.2.0 and is a minimal, immutable, and non-enumerable value-only class. The Data
class is not a Struct but is Struct
-like in nature with a limited Object API. There are several unique aspects to this new primitive so we’ll spend the rest of this article exploring more of those details. Buckle up!
Overview
For starters, I’d recommend reading my Struct article before proceeding because knowing how structs work will be helpful when comparing and contrasting with data objects since they are similar. That said, here’s how to construct, initialize, and interact with a data object:
Point = Data.define :x, :y
point = Point.new 1, 2
point.x # 1
point.y # 2
point.to_h # {:x=>1, :y=>2}
point.to_s # #<data Point x=1, y=2>
point.with y: 5 # #<data Point x=1, y=5>
point.members # [:x, :y]
point.inspect # #<data Point x=1, y=2>
point.frozen? # true
As you can see, the Object API is quite small and definitely smaller than that of a Struct. Additionally, a data object doesn’t inherit from Enumerable
so can’t it be iterated over or have attributes accessed via #[]
like a Struct can. Despite the limited feature set, Data
objects have the following advantages:
-
Great for concurrency when used with Ractors, Fibers, Threads, etc.
-
Great for pattern matching (more on this later).
-
Bypasses the baggage of the
keyword_init: true
flag as found when using a Struct which makes working with positional or keyword arguments more interchangeable.
There is one major disadvantage which has to do with overriding the #initialize
method in that it only accepts keyword parameters. It’s an odd and surprising design choice but will be explained later in this article.
Construction
Construction of a data object is like a Struct
but is, sadly, an inconsistent departure.
Define
You must use .define
to define your data object. Example:
Point = Data.define :x, :y
This is definitely inconsistent from how you’d use .new
to construct a Struct
. I’ll admit that I like the use of .define
more than .new
and wish Struct
was updated to use .define
as well so we had more consistency between these two value objects.
Subclass
Much like a Struct, you can subclass:
class Inspectable < Data
def inspect = to_h.inspect
end
Point = Inspectable.define :x, :y
As mentioned with subclassing a Struct, you’re much better off composing your objects rather than using complicated inheritance structures. I don’t recommend this approach.
Initialization
Initialization is similar to a Struct except arguments are strictly enforced.
New
As hinted at earlier, you can initialize a data instance via the .new
class method using either positional or keyword arguments:
# Positional
point = Point.new 1, 2
# Keyword
point = Point.new x: 1, y: 2
All arguments are required, though. Otherwise, you’ll get an ArgumentError
(you can avoid this stipulation by defining defaults which will be explained later):
point = Point.new 1 # missing keyword: :y (ArgumentError)
point = Point.new x: 1 # missing keyword: :y (ArgumentError)
Mixing and matching of arguments isn’t allowed either:
point = Point.new 1, y: 2
# wrong number of arguments (given 2, expected 0) (ArgumentError)
Anonymous
You can anonymously create a new data instance via single line construction and initialization. Example:
# Positional
point = Data.define(:x, :y).new 1, 2
# Keyword
point = Data.define(:x, :y).new x: 1, y: 2
The problem with anonymous data objects is that they are only useful within the scope they are defined as temporary and short lived objects. Worse, you must redefine them each time you want to use them. For anything more permanent, you’ll need to define a constant for improved reuse. That said, anonymous data objects can be handy for one-off situations like scripts, specs, or code spikes.
Brackets
There is a shorter way to initialize a data object and that’s via square brackets:
# Positional
point = Point[1, 2]
# Keyword
point = Point[x: 1, y: 2]
This is my favorite form of initialization and for two important reasons:
-
Brackets require three less characters to type.
-
Brackets signify, more clearly, you are working with a struct/data object versus a class which improves readability.
Defaults
You can provide defaults by defining an #initialize
method. Example:
Point = Data.define :x, :y do
def initialize x: 1, y: 2
super
end
end
point = Point.new
point.x # 1
point.y # 2
Continuing with the above example, this also means you now have the flexibility to use partial arguments when defaults are defined:
Point.new 0 # #<data Point x=0, y=2>
Point.new x: 0 # #<data Point x=0, y=2>
One stipulation — when defining defaults — is all keywords must be present. For instance, the following is syntactically correct but unusable and will throw an error:
Point = Data.define :x, :y do
def initialize x: 1
super
end
end
Point.new 0, 1 # unknown keyword: :y (ArgumentError)
Point.new x: 0, y: 1 # unknown keyword: :y (ArgumentError)
That said, you can fix the above by supplying all keyword arguments with only the defaults you need. Here’s a modification to the above code which allows you to supply a default value for only one of the parameters:
Point = Data.define :x, :y do
def initialize x: 1, y:
super
end
end
Point.new 0, 1 # #<data Point x=0, y=1>
Point.new x: 0, y: 1 # #<data Point x=0, y=1>
Again, the only difference between this code snippet and the earlier code snippet is that I make sure to define all keyword arguments even if I only provide defaults for a subset of them. To emphasize the significance of this further, consider the following invalid code:
Point = Data.define :x, :y do
def initialize(**) = super
end
# missing keywords: :x, :y (ArgumentError)
The fact that you can only use keyword arguments for the #initialize
method is another oddity and departure from what most people are used to when writing Ruby code.
Lastly, once incoming arguments are passed to super
, further modification of attributes is impossible because they are immediately frozen and inaccessible. With a Struct you could use #[]
to access member values but there is no such method for a Data object.
Values
Unlike a Struct, values must be messaged directly which means you can’t access them via #[]
or assign new values. For example, this won’t work:
point[:x] # NoMethodError
point.x = 5 # NoMethodError
point.each { |value| puts value } # NoMethodError
All you can do is ask for a value (as shown earlier):
point.x # 1
point.y # 2
With
Unique to Data
objects, you can make a shallow copy your instance (i.e. instance variable copies but not the objects referenced by them). This makes for a fast way to build altered versions of your data. Consider the following:
Point = Data.define :x, :y
point = Point[1, 2]
point.with x: 2, y: 3 # #<data Point x=2, y=3>
point.with x: 0 # #<data Point x=0, y=2>
point.with y: 0 # #<data Point x=1, y=0>
point.with bogus: "✓" # unknown keyword: :bogus (ArgumentError)
Notice that you can quickly build a new version of your original data object with the same attributes. You can also mix and match all or some of your attributes. However, attempting to reference an attribute that doesn’t exist will result in an ArgumentError
.
Equality
Structs and data objects share the same superpower in that they are both value objects by default. Even better, data objects take this one step further since all values are immutable by default. The following illustrates this more clearly:
a = Point[x: 1, y: 2]
b = Point[x: 1, y: 2]
a == b # true
a === b # true
a.eql? b # true
a.equal? b # false
Pattern Matching
Pattern matching is supported by default and is identical in behavior to Struct pattern matching. One difference worth pointing out is that you can define data without any arguments which isn’t possible with a struct. Example:
module Monads
Just = Data.define :content
None = Data.define
end
This is handy when pattern matching:
case Monads::Just[content: "demo"]
in Monads::Just then "Something"
in Monads::None then "Nothing"
end
# "Something"
case Monads::None.new
in Monads::Just then "Something"
in Monads::None then "Nothing"
end
# "Nothing"
Benchmarks
In terms of performance, data objects can be faster than structs but it depends on whether you are using positional or keyword arguments. Consider the following YJIT-enabled benchmark:
#!/usr/bin/env ruby
# frozen_string_literal: true
# Save as `benchmark`, then `chmod 755 benchmark`, and run as `./benchmark`.
require "bundler/inline"
gemfile true do
source "https://rubygems.org"
gem "benchmark-ips", require: "benchmark/ips"
end
MAX = 1_000_000
ExampleStruct = Struct.new :to, :from
ExampleData = Data.define :to, :from
Benchmark.ips do |benchmark|
benchmark.config time: 5, warmup: 2
benchmark.report "Data" do
MAX.times { ExampleData[to: "Mork", from: "Mindy"] }
end
benchmark.report "Struct" do
MAX.times { ExampleStruct[to: "Mork", from: "Mindy"] }
end
benchmark.compare!
end
If you save the above script to file and run locally, you’ll get output that looks roughly like this:
Warming up -------------------------------------- Data (positional) 370.931k i/100ms Data (keywords) 381.070k i/100ms Struct (positional) 813.698k i/100ms Struct (keywords) 365.877k i/100ms Calculating ------------------------------------- Data (positional) 4.515M (± 3.1%) i/s - 22.627M in 5.015602s Data (keywords) 4.571M (± 3.4%) i/s - 22.864M in 5.007710s Struct (positional) 10.574M (± 1.9%) i/s - 52.890M in 5.003647s Struct (keywords) 4.446M (± 3.0%) i/s - 22.318M in 5.024381s Comparison: Struct (positional): 10574098.3 i/s Data (keywords): 4570799.4 i/s - 2.31x slower Data (positional): 4515415.4 i/s - 2.34x slower Struct (keywords): 4445886.6 i/s - 2.38x slower
💡 If you’d like more benchmarks, check out my Struct article and/or Benchmarks project for further details.
Avoidances
As emphasized in my Struct article, avoid anonymous inheritance of your data objects. In other words, don’t do the following:
class Point < Data.define(:x, :y)
end
Point.ancestors
# [Point, #<Class:0x000000010da8de88>, Data, Object, Kernel, BasicObject]
In a nutshell, anonymous superclasses (i.e. <Class:0x000000010da8de88>
) are wasteful and inefficient, performance-wise, so definitely refer back to the original Struct article to learn more on why this is a bad practice.
Concerns
Before wrapping up, I want to highlight some concerns — briefly touched upon earlier — with additional material you might want to take a deeper look at:
-
Bug 19280 - Wrong error message about arity of Data::define.new: Logged by Kouhel Yanagita which discusses issues with parameter arity which isn’t a huge issue but might be worth knowing about.
-
Bug 19278 - Constructing subclasses of Data with positional arguments: Logged by Aaron Patterson regarding issues with the strict use of keyword parameters used for initialization because this breaks how we normally understand construction and initialization to work in Ruby.
-
Bug 19301 - Fix Data class to report keyrest instead of rest parameters: Logged by me in which I show how
Method#parameters
fails to tell the truth when reporting[[rest]]
instead of[[keyrest]]
(as an example) for the#initialize
method parameters. This turned out to be a much larger issue with howMethod#parameters
works for C-based versus Ruby-based implementations.
Victor Shepelev — the feature author — has been closing down these issues but has also written about the design choices made which sheds light on bypassing the Struct
legacy design baggage but at the great cost of introducing inconsistency between how Struct
and Data
initialization works.
Conclusion
Data objects are definitely minimalistic but powerful in that they are only meant to hold a collection of data values which can’t be mutated. I’ve wanted an immutable value object in Ruby for some time but I’m also worried about the design and precedence this new Data
object has introduced into the language. That said, if you find yourself reaching for a struct, consider using a data object instead especially if you only need the immutable encapsulation of raw values.