serialize-data-formats
Über
Diese Fähigkeit ermöglicht Serialisierung und Deserialisierung über Formate wie JSON, XML, YAML, Protobuf und MessagePack. Sie hilft Entwicklern, das passende Format basierend auf Kriterien wie Leistung, Größe und Interoperabilität für APIs, Speicherung oder Systemkommunikation auszuwählen. Nutzen Sie sie, wenn Sie Datentransfer optimieren, strukturierte Daten persistieren oder zwischen Serialisierungsstandards migrieren müssen.
Schnellinstallation
Claude Code
Empfohlennpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/serialize-data-formatsKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
Dokumentation
Serialize Data Formats
Pick + implement right data serialization format for use case. Correct encoding/decoding + performance awareness.
When Use
- Pick wire format for API comms
- Persist structured data to disk or object storage
- Exchange data between systems in different languages
- Optimize data transfer size or parse speed
- Migrate from one serialization format to another
Inputs
- Required: Data structure to serialize (schema or example)
- Required: Use case (API, storage, streaming, analytics)
- Optional: Performance needs (size, speed, schema enforcement)
- Optional: Target language/runtime constraints
- Optional: Human readability needs
Steps
Step 1: Select Right Format
| Format | Human Readable | Schema | Size | Speed | Best For |
|---|---|---|---|---|---|
| JSON | Yes | Optional (JSON Schema) | Medium | Medium | REST APIs, config, broad interop |
| XML | Yes | XSD, DTD | Large | Slow | Enterprise/legacy, SOAP, documents |
| YAML | Yes | Optional | Medium | Slow | Config files, CI/CD, Kubernetes |
| Protocol Buffers | No | Required (.proto) | Small | Fast | gRPC, microservices, mobile |
| MessagePack | No | None | Small | Fast | Real-time, embedded, Redis |
| Arrow/Parquet | No | Built-in | Very Small | Very Fast | Analytics, columnar queries, data lakes |
Decision tree.
- Need human editing? → YAML (config) or JSON (data)
- Need strict schema + fast RPC? → Protocol Buffers
- Need smallest wire size? → MessagePack or Protobuf
- Need columnar analytics? → Apache Parquet
- Need in-memory interchange? → Apache Arrow
- Legacy enterprise integration? → XML
Got: Format selected with documented rationale matching use case.
If fail: Requirements conflict (human-readable AND fast)? Prioritize primary use case, note trade-off.
Step 2: Implement JSON Serialization
import json
from datetime import datetime, date
from dataclasses import dataclass, asdict
@dataclass
class Measurement:
sensor_id: str
value: float
unit: str
timestamp: datetime
# Custom encoder for non-standard types
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, date):
return obj.isoformat()
if isinstance(obj, bytes):
import base64
return base64.b64encode(obj).decode('ascii')
return super().default(obj)
# Serialize
measurement = Measurement("sensor-01", 23.5, "celsius", datetime.now())
json_str = json.dumps(asdict(measurement), cls=CustomEncoder, indent=2)
# Deserialize
data = json.loads(json_str)
# R: JSON with jsonlite
library(jsonlite)
# Serialize
df <- data.frame(sensor_id = "sensor-01", value = 23.5, unit = "celsius")
json_str <- jsonlite::toJSON(df, auto_unbox = TRUE, pretty = TRUE)
# Deserialize
df_back <- jsonlite::fromJSON(json_str)
Got: Round-trip serialization preserves all data types accurate.
If fail: Type lost (e.g., dates become strings)? Add explicit type conversion in deserialization step.
Step 3: Implement Protocol Buffers
Define schema (.proto file).
syntax = "proto3";
package sensors;
message Measurement {
string sensor_id = 1;
double value = 2;
string unit = 3;
int64 timestamp_ms = 4; // Unix milliseconds
}
message MeasurementBatch {
repeated Measurement measurements = 1;
}
Generate + use.
# Generate Python code
protoc --python_out=. sensors.proto
# Generate Go code
protoc --go_out=. sensors.proto
from sensors_pb2 import Measurement, MeasurementBatch
import time
# Serialize
m = Measurement(
sensor_id="sensor-01",
value=23.5,
unit="celsius",
timestamp_ms=int(time.time() * 1000)
)
binary = m.SerializeToString() # Compact binary
# Deserialize
m2 = Measurement()
m2.ParseFromString(binary)
Got: Binary output 3-10x smaller than equivalent JSON.
If fail: protoc unavailable? Use language-native protobuf library (e.g., betterproto for Python).
Step 4: Implement MessagePack
import msgpack
from datetime import datetime
# Custom packing for datetime
def encode_datetime(obj):
if isinstance(obj, datetime):
return {"__datetime__": True, "s": obj.isoformat()}
return obj
def decode_datetime(obj):
if "__datetime__" in obj:
return datetime.fromisoformat(obj["s"])
return obj
data = {"sensor_id": "sensor-01", "value": 23.5, "ts": datetime.now()}
# Serialize (smaller than JSON, faster than JSON)
packed = msgpack.packb(data, default=encode_datetime)
# Deserialize
unpacked = msgpack.unpackb(packed, object_hook=decode_datetime, raw=False)
Got: MessagePack output 15-30% smaller than JSON for typical payloads.
If fail: Language lacks MessagePack support? Fall back to JSON with compression (gzip).
Step 5: Implement Apache Parquet (Columnar)
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
# Create data
df = pd.DataFrame({
"sensor_id": ["s-01", "s-02", "s-01", "s-03"] * 1000,
"value": [23.5, 18.2, 24.1, 19.8] * 1000,
"unit": ["celsius"] * 4000,
"timestamp": pd.date_range("2025-01-01", periods=4000, freq="min")
})
# Write Parquet (columnar, compressed)
table = pa.Table.from_pandas(df)
pq.write_table(table, "measurements.parquet", compression="snappy")
# Read Parquet (can read specific columns without loading all data)
table_back = pq.read_table("measurements.parquet", columns=["sensor_id", "value"])
df_subset = table_back.to_pandas()
# R: Parquet with arrow
library(arrow)
# Write
df <- data.frame(sensor_id = rep("s-01", 1000), value = rnorm(1000))
arrow::write_parquet(df, "measurements.parquet")
# Read (with column selection — only reads selected columns from disk)
df_back <- arrow::read_parquet("measurements.parquet", col_select = c("value"))
Got: Parquet files 5-20x smaller than CSV for typical tabular data.
If fail: Arrow unavailable? Use fastparquet (Python) or CSV with gzip as fallback.
Step 6: Compare Performance
Run benchmarks for your specific data + use case.
import json, msgpack, time
import pyarrow as pa, pyarrow.parquet as pq
data = [{"id": i, "value": i * 0.1, "label": f"item-{i}"} for i in range(10000)]
# JSON
start = time.perf_counter()
json_bytes = json.dumps(data).encode()
json_time = time.perf_counter() - start
# MessagePack
start = time.perf_counter()
msgpack_bytes = msgpack.packb(data)
msgpack_time = time.perf_counter() - start
print(f"JSON: {len(json_bytes):>8} bytes, {json_time*1000:.1f} ms")
print(f"MsgPack: {len(msgpack_bytes):>8} bytes, {msgpack_time*1000:.1f} ms")
Got: Benchmark results guide format selection for prod use.
If fail: Performance insufficient for any format? Consider compression (zstd, snappy) as orthogonal optimization.
Checks
- Selected format matches use case (documented rationale)
- Round-trip serialization preserves all data types
- Edge cases handled: empty collections, null/None values, Unicode, large numbers
- Performance benchmarked for representative payload sizes
- Error handling for malformed input (graceful failures, not crashes)
- Schema documented (JSON Schema, .proto, or equiv)
Pitfalls
- Floating-point precision: JSON represents all numbers as IEEE 754 doubles. Use string encoding for financial/decimal precision.
- Date/time handling: JSON has no native datetime type. Always document format (ISO 8601) + timezone handling.
- Schema evolution: Adding or removing fields can break consumers. Protobuf handles this well; JSON needs careful versioning.
- Binary data in JSON: Base64 encoding inflates binary data by ~33%. Use binary format for binary-heavy payloads.
- YAML security: YAML parsers may execute arbitrary code via
!!python/objecttags. Always use safe loaders.
See Also
design-serialization-schema— schema design, versioning, evolution strategiesimplement-pharma-serialisation— pharmaceutical serialisation (different domain, same naming)create-quarto-report— data output formatting for reports
GitHub Repository
Verwandte Skills
railway-docs
DokumentationDiese Fähigkeit ruft aktuelle Railway-Dokumentation ab, um Fragen zu Funktionen, Funktionalität oder spezifischen Dokumentations-URLs zu beantworten. Sie stellt sicher, dass Entwickler genaue, aktuelle Informationen direkt aus den offiziellen Quellen von Railway erhalten. Nutzen Sie sie, wenn Nutzer fragen, wie Railway funktioniert oder auf Railway-Dokumentation verweisen.
n8n-code-python
DokumentationDieses Claude Skill bietet fachkundige Anleitung zum Schreiben von Python-Code in n8n-Code-Nodes, insbesondere für die Verwendung der Python-Standardbibliothek und den Umgang mit n8ns spezieller Syntax wie `_input`, `_json` und `_node`. Es hilft Entwicklern, die Grenzen von Python innerhalb von n8n zu verstehen, empfiehlt JavaScript für die meisten Workflows und bietet gleichzeitig Python-Lösungen für spezifische Datenumwandlungsanforderungen.
archon
DokumentationDie Archon-Funktion bietet semantische Suche auf RAG-Basis und Projektmanagement über eine REST-API. Nutzen Sie sie für das Abfragen von Dokumentation, die Verwaltung hierarchischer Projekte/Aufgaben und die Durchführung von Wissenabruf mit Dokumenten-Upload-Fähigkeiten. Priorisieren Sie stets Archon zuerst bei der Suche in externer Dokumentation, bevor Sie andere Quellen verwenden.
n8n-code-javascript
DokumentationDiese Claude-Skill bietet fachkundige Anleitung für das Schreiben von JavaScript-Code in n8n-Code-Nodes. Sie behandelt wesentliche n8n-spezifische Syntax wie `$input`/`$json`-Variablen, HTTP-Helfer und DateTime-Verarbeitung und hilft bei der Fehlerbehebung häufiger Probleme. Nutzen Sie sie bei der Entwicklung von n8n-Workflows, die eine benutzerdefinierte JavaScript-Verarbeitung in Code-Nodes erfordern.
