Introduction#
Python has three prominent libraries for defining structured data classes: the standard library’s dataclasses, Pydantic, and attrs. Each has different strengths. Using the wrong one adds unnecessary complexity or misses features you need.
dataclasses: Standard Library Baseline#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime
@dataclass
class Order:
id: int
user_id: int
total: float
status: str = "pending"
items: list[str] = field(default_factory=list)
created_at: datetime = field(default_factory=datetime.utcnow)
notes: Optional[str] = None
def is_complete(self) -> bool:
return self.status == "completed"
# Usage
order = Order(id=1, user_id=42, total=99.99)
print(order)
# Order(id=1, user_id=42, total=99.99, status='pending', ...)
Pros: built-in, no dependencies, good IDE support, generates __init__, __repr__, __eq__.
Cons: no validation, no serialization/deserialization, no coercion.
Use when: you need a structured container with no external dependencies and no validation requirements.
Pydantic: Validation and Serialization#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
from pydantic import BaseModel, Field, field_validator, model_validator
from typing import Optional
from datetime import datetime
class OrderItem(BaseModel):
product_id: int
quantity: int = Field(gt=0, description="Must be positive")
price: float = Field(ge=0)
class Order(BaseModel):
id: int
user_id: int
total: float
status: str = "pending"
items: list[OrderItem] = []
created_at: datetime = Field(default_factory=datetime.utcnow)
notes: Optional[str] = None
@field_validator("status")
@classmethod
def validate_status(cls, v: str) -> str:
valid = {"pending", "processing", "completed", "cancelled"}
if v not in valid:
raise ValueError(f"status must be one of {valid}")
return v
@model_validator(mode="after")
def validate_total(self) -> "Order":
calculated = sum(i.price * i.quantity for i in self.items)
if self.items and abs(self.total - calculated) > 0.01:
raise ValueError(f"total {self.total} doesn't match items sum {calculated}")
return self
# Automatic validation and coercion
order = Order(
id="1", # coerced to int
user_id=42,
total=29.98,
items=[{"product_id": 1, "quantity": 2, "price": 14.99}]
)
# Serialization
print(order.model_dump())
print(order.model_dump_json())
# Parse from dict/JSON
order2 = Order.model_validate({"id": 2, "user_id": 42, "total": 0.0})
# Parsing from JSON API response
order3 = Order.model_validate_json('{"id": 3, "user_id": 42, "total": 0.0}')
Pros: automatic validation, coercion, serialization, JSON schema generation, FastAPI integration.
Cons: slight overhead vs plain dataclasses, validation rules are coupled to the model.
Use when: data comes from external sources (API, database, user input), or you need serialization/JSON schema.
attrs: Flexible and Performant#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import attrs
from typing import Optional
@attrs.define
class Order:
id: int
user_id: int
total: float = attrs.field(validator=attrs.validators.ge(0))
status: str = attrs.field(
default="pending",
validator=attrs.validators.in_(["pending", "processing", "completed"])
)
items: list = attrs.Factory(list)
notes: Optional[str] = None
@total.validator
def _validate_total(self, attribute, value):
if value < 0:
raise ValueError("total must be non-negative")
# attrs generates __init__, __repr__, __eq__, __hash__, __slots__
order = Order(id=1, user_id=42, total=99.99)
Pros: very fast (slots by default), validators, converters, extensible, no runtime overhead of Pydantic’s JSON mode.
Cons: no built-in JSON serialization (use cattrs separately), less ecosystem integration than Pydantic.
Use when: performance is critical, you need slots by default, or you want more control over validation without Pydantic’s opinionated approach.
Performance Comparison#
1
2
3
4
5
6
7
8
9
10
11
import timeit
# Instantiation speed (approximate)
# dataclass: ~0.5µs
# attrs: ~0.6µs
# Pydantic v2: ~2-5µs (with validation)
# Pydantic v1: ~10-20µs
# For 1 million object creations:
# dataclass: ~500ms
# Pydantic v2: ~2000-5000ms (validation overhead)
Pydantic v2 (Rust core) is ~5-50x faster than v1 but still slower than plain dataclasses due to validation work.
Decision Guide#
| Requirement | Best Choice |
|---|---|
| No external dependencies | dataclasses |
| Validation from external data | pydantic |
| FastAPI request/response models | pydantic |
| JSON schema generation | pydantic |
| Maximum runtime performance | attrs with __slots__ |
| Complex nested validation | pydantic |
| Internal data containers | dataclasses or attrs |
Mixing Approaches#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Common pattern: Pydantic for API boundaries, dataclasses for internal models
from pydantic import BaseModel
from dataclasses import dataclass
# Pydantic: validates/parses external API input
class CreateOrderRequest(BaseModel):
user_id: int
items: list[dict]
total: float
# dataclass: internal domain object after validation
@dataclass
class Order:
id: int
user_id: int
total: float
items: list
def create_order(request: CreateOrderRequest) -> Order:
# Convert after validation
return Order(
id=generate_id(),
user_id=request.user_id,
total=request.total,
items=request.items,
)
Conclusion#
Use dataclasses for simple internal data containers with no validation. Use pydantic when data comes from outside your system and you need validation, coercion, and serialization. Use attrs when you need validators and maximum performance, particularly for high-frequency object creation. All three solve different parts of the same problem; combining them at system boundaries is a valid approach.