Most classes we write just hold data. A User with a name and email. A Point with x and y. But every time, we end up writing the same boring __init__, __repr__, and __eq__ methods. Dataclasses fix this by generating all that boilerplate for us.
The Boilerplate Problem
Here’s a plain class that holds data:
class User:
def __init__(self, name, email, age):
self.name = name
self.email = email
self.age = age
def __repr__(self):
return f"User(name={self.name!r}, email={self.email!r}, age={self.age})"
def __eq__(self, other):
return isinstance(other, User) and (self.name, self.email, self.age) == (other.name, other.email, other.age)
That’s 12 lines just to store three values. And we’d need even more for __hash__, ordering, etc. With @dataclass, we get all of that in 5 lines.
@dataclass Decorator
from dataclasses import dataclass
@dataclass
class User:
name: str
email: str
age: int
That’s it. Python auto-generates __init__, __repr__, and __eq__ for us. We just declare the fields with type annotations.
u1 = User("Manish", "manish@example.com", 25)
u2 = User("Manish", "manish@example.com", 25)
print(u1) # User(name='Manish', email='manish@example.com', age=25)
print(u1 == u2) # True — compares by value, not identity
Default Values and field()
We can set defaults just like function arguments. But mutable defaults (lists, dicts) need field(default_factory=...) to avoid the shared-mutable-default trap.
from dataclasses import dataclass, field
@dataclass
class Team:
name: str
members: list[str] = field(default_factory=list) # each instance gets its own list
max_size: int = 10 # simple default is fine
If we tried members: list = [], every Team instance would share the same list. default_factory creates a fresh one each time.
__post_init__ for Computed Fields
Sometimes we need a field that’s derived from other fields. __post_init__ runs right after the auto-generated __init__.
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False) # not in __init__, computed instead
def __post_init__(self):
self.area = self.width * self.height
r = Rectangle(4, 5)
print(r.area) # 20.0
The init=False tells the dataclass “don’t accept this in the constructor — I’ll set it myself.”
frozen=True for Immutability
Adding frozen=True makes the dataclass immutable — we can’t change fields after creation. This also makes instances hashable (usable as dict keys or in sets).
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
# p.x = 5.0 # FrozenInstanceError — can't modify
print({p: "origin"}) # works as dict key because it's hashable
slots=True and kw_only (Python 3.10+)
slots=True generates __slots__ instead of using __dict__, saving memory. kw_only=True forces all fields to be keyword-only arguments.
@dataclass(slots=True, kw_only=True)
class Config:
host: str
port: int = 8080
debug: bool = False
# Config("localhost") # TypeError — must use keywords
c = Config(host="localhost") # works
# c.__dict__ # AttributeError — uses slots, no __dict__
Typed NamedTuple
NamedTuple is another way to create simple data-holding classes. The key difference: named tuples are tuples. They’re immutable, ordered, and support indexing.
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
p = Point(1.0, 2.0)
print(p.x) # 1.0 — access by name
print(p[0]) # 1.0 — access by index (it's a tuple!)
# p.x = 5.0 # AttributeError — tuples are immutable
x, y = p # unpacking works
@dataclass vs NamedTuple vs Plain Class
Quick Decision Guide
- Need mutable data with nice defaults?
@dataclass - Need an immutable, hashable record that works like a tuple?
NamedTuple - Need complex initialization logic, custom
__new__, or the class is more behavior than data? Plain class
In simple language, dataclasses and named tuples eliminate the boilerplate of writing __init__, __repr__, and __eq__ by hand. @dataclass is the go-to for most data-holding classes, NamedTuple is great when we want immutable tuple-like records, and plain classes are for when we need full control.